Clicky

Home » Practice Verticals » Academic Practice » PhDStats Advisor » PhD Statistical Analysis Guides » Ethical Ph.D. Research Hacks » Maintaining Integrity in Analysis, Writing, & Authorship » Ensuring Reproducibility: Publishing Both Data and Code Under an Institutional Repository

Ensuring Reproducibility: Publishing Both Data and Code Under an Institutional Repository


Publishing data and code for reproducibility in research demands that you share not only your findings but also the underlying data and analysis code. Depositing these materials in a trusted repository (institutional or open‐access) with persistent identifiers (DOIs) ensures transparency, credit, and reusability. Below, find a step‐by‐step guide, DOI assignment tips, and a sample folder structure to get you started.


1. Choosing a Repository for publishing data and code for reproducibility

  • Institutional Repository
    • Pros: University‐branded, often free, integrates with campus IP management.
    • Cons: May lack granular DOI support or community discoverability.
  • Open‐Access Platforms
    • Zenodo (CERN): Free, supports unlimited files up to 50 GB, mints DOIs automatically.
    • Figshare: Free up to 5 GB; makes items citable via DOIs; integrates with GitHub.
    • Dryad, OSF: Discipline‐specific features, various pricing models.

2. Step-by-Step Deposit Guide for publishing data and code for reproducibility

  1. Prepare Your Materials
    • Anonymized Dataset (.csv or .xlsx): Remove direct identifiers; include a README explaining variables.
    • Analysis Scripts (.R, .py, or Jupyter notebooks): Ensure they run from raw data to final tables/figures.
    • Documentation:
      • README.md with project overview, dependencies (e.g., R‐package list), and usage instructions.
      • LICENSE file (e.g., CC BY 4.0 for data; MIT for code).
  2. Log In & Create New Submission
    • For Zenodo: Sign in via ORCID/GitHub → “New Upload” → drag & drop files.
    • For Institutional: Navigate to your university archive portal → “Deposit Research Output” form.
  3. Fill Metadata Fields
    • Title: Reflects your study (e.g., “Survey Data and R Code for Hybrid Instruction Study, 2025”).
    • Authors & Affiliations: Include your ORCID iD.
    • Description/Abstract: Briefly summarize data collection, study scope, and code purpose.
    • Keywords/Subjects: Enhance discoverability (e.g., “educational research,” “R scripts,” “student surveys”).
    • License Selection: Choose open license for maximum reuse.
  4. Review & Publish
    • Preview your submission; fix any warnings.
    • Click “Publish” (or “Submit for Review” if institutional).
    • A DOI will be minted (Zenodo/ Figshare) or you’ll receive one soon via your university repository.
  5. Link From Your Dissertation & Manuscripts
    • In your methods or data-availability section, include:
The anonymized dataset and analysis scripts are available at Zenodo: DOI 10.5281/zenodo.1234567

3. Assigning & Using DOIs

  • Automatic DOI Minting
    • Zenodo and Figshare provide a DOI upon publication.
  • Institutional DOI Requests
    • Contact your library or digital‐scholarship office to reserve a DOI prefix and suffix for your dataset.
  • Citation Format
Gupta, A. (2025). Hybrid Instruction Survey Data & R Code [Data set]. Zenodo. <Enter the link to the artifact here>

4. Sample Folder Structure

Hybrid_Instruction_Study_2025/
├── data/
│   ├── raw/
│   │   └── survey_responses_raw.csv
│   ├── processed/
│   │   └── survey_responses_anonymized.csv
│   └── README_data.md
├── code/
│   ├── 01_data_cleaning.R
│   ├── 02_analysis.R
│   └── 03_visualizations.R
├── docs/
│   ├── README.md           # Overview & instructions
│   ├── LICENSE             # Data: CC BY 4.0; Code: MIT
│   └── CITATION.cff        # Citation metadata
└── outputs/
    ├── tables/
    │   └── table1_summary.csv
    └── figures/
        └── fig1_trend.png
  • data/raw/: Untouched original files (for archive only).
  • data/processed/: Anonymized, analysis‐ready datasets.
  • code/: Modular scripts following a logical pipeline.
  • docs/: Documentation, license, and citation files.
  • outputs/: Exported tables and figures ready for publication.

5. Quick‐Start Checklist

  • Anonymize all personal identifiers before deposit.
  • Include clear README and license files.
  • Ensure code runs end‐to‐end on a fresh machine.
  • Choose an open license (CC BY, MIT) for maximum reuse.
  • Deposit data and code together; link with a DOI in your dissertation.
  • Announce your repository link in conference presentations and your ORCID profile.

By depositing your data and code in a reputable repository—complete with a DOI, comprehensive metadata, and clear folder organization—you cement the reproducibility and impact of your PhD research. Future scholars will thank you, and your work will stand on a foundation of openness and trust.


Explore Other Hacks Under this Module

Authorship Dilemmas: Co-Authoring with Former Students or Junior Colleagues

During co-authorship with former students while collaborations with former students or junior colleagues bring fresh insights—and thorny questions about who qualifies as a co-author.
Read

Data Integrity Checks: Ensuring Your Own Teaching Records Don’t Skew Scholarly Findings

The issue of data integrity in educational research arises when faculty often have a treasure trove of internal data—course evaluations, grade distributions, attendance logs—that can enrich educational research.
Read

Avoiding Self-Plagiarism: When Faculty Write About Their Own Published Work

Avoid self-plagiarism in PhD thesis writing by reusing your own words without adequate transformation—known as text recycling or self-plagiarism—can undermine the originality of your PhD thesis and raise red flags with examiners or journal editors.
Read

Explore Other Modules Under this Guide

Advanced Ethical Research Workflows Data Stewardship

Advanced ethical research workflows and data stewardship provide a principled foundation for conducting transparent, defensible Ph.D. research. These approaches prioritize accountability at every stage of your workflow. Moreover, they promote practices that enhance reproducibility, reduce bias, and respect participants’ rights.
Explore Hacks

Ethical Ph.D. Data Collection Institutional Consent

Ethical Ph.D. data collection and institutional consent helps researchers collect data within their own institutions with clarity and integrity. This guide focuses on negotiating access, avoiding conflicts of interest, and upholding participants’ rights. Moreover, it walks you through required approvals, data boundaries, and record-keeping.
Explore Hacks

Ph.D. Research Conflicts of Interest Dual Relationships

Ph.D. research conflicts of interest and dual relationships often emerge when academic roles overlap. This guide explains how to recognize and manage ethical risks in real time. Moreover, it emphasizes disclosure, transparency, and boundaries as foundational strategies.
Explore Hacks

Ph.D. Time Management Role Balancing

Ph.D. time management and role balancing offers realistic strategies for faculty–scholars juggling academic, research, and personal responsibilities. This guide focuses on sustainable routines that protect both output and well-being. Moreover, it prioritizes ethical practices that prevent corner-cutting under pressure.
Explore Hacks

Explore Our Other Guides

Ph.D. Statistical Data Analysis Case Studies

Ph.D. statistical data analysis case studies provide authentic dissertation examples that guide complex research. They illustrate how scholars frame questions and select methods. Moreover, each case study sets clear objectives to anchor decision‑making.
Explore Cases

Ph.D. Statistical Data Analysis Critiques

Ph.D. statistical data analysis critiques guide you through rigorous evaluation of statistical methods in dissertations. This content highlights how to spot methodological flaws and biases. Moreover, it demonstrates strategies for constructive critique that improve research quality.
Explore Critiques

Research Advice

This basic advice is available freely for Ph.D. / Doctoral Faculty Scholars in India.
Explore Advice

Our Services

📊 Data Analysis

🎓 Ph.D. Consulting

🚀 Business Engineering


Who is a Data Scientist?

Expert in statistical analysis, predictive modeling, and data-driven insights for research and business solutions.
Learn More

About Us

Credentials

Comprehensive overview of skills, work ethic, and professional qualifications.
Explore

Practice Verticals

Independent freelancing professional for data-driven research across multiple domains.
Explore

Get in Touch

Use any of the methods below to contact me. Please note our preferred channels and business hours.
Explore

Consultation Fee ₹2,000/- per hour (By Appointment Only)