Ensuring Reproducibility: Publishing Both Data and Code Under an Institutional Repository

Home » Ensuring Reproducibility: Publishing Both Data and Code Under an Institutional Repository

Publishing data and code for reproducibility in research demands that you share not only your findings but also the underlying data and analysis code. Depositing these materials in a trusted repository (institutional or open‐access) with persistent identifiers (DOIs) ensures transparency, credit, and reusability. Below, find a step‐by‐step guide, DOI assignment tips, and a sample folder structure to get you started.


1. Choosing a Repository for publishing data and code for reproducibility

  • Institutional Repository
    • Pros: University‐branded, often free, integrates with campus IP management.
    • Cons: May lack granular DOI support or community discoverability.
  • Open‐Access Platforms
    • Zenodo (CERN): Free, supports unlimited files up to 50 GB, mints DOIs automatically.
    • Figshare: Free up to 5 GB; makes items citable via DOIs; integrates with GitHub.
    • Dryad, OSF: Discipline‐specific features, various pricing models.

2. Step-by-Step Deposit Guide for publishing data and code for reproducibility

  1. Prepare Your Materials
    • Anonymized Dataset (.csv or .xlsx): Remove direct identifiers; include a README explaining variables.
    • Analysis Scripts (.R, .py, or Jupyter notebooks): Ensure they run from raw data to final tables/figures.
    • Documentation:
      • README.md with project overview, dependencies (e.g., R‐package list), and usage instructions.
      • LICENSE file (e.g., CC BY 4.0 for data; MIT for code).
  2. Log In & Create New Submission
    • For Zenodo: Sign in via ORCID/GitHub → “New Upload” → drag & drop files.
    • For Institutional: Navigate to your university archive portal → “Deposit Research Output” form.
  3. Fill Metadata Fields
    • Title: Reflects your study (e.g., “Survey Data and R Code for Hybrid Instruction Study, 2025”).
    • Authors & Affiliations: Include your ORCID iD.
    • Description/Abstract: Briefly summarize data collection, study scope, and code purpose.
    • Keywords/Subjects: Enhance discoverability (e.g., “educational research,” “R scripts,” “student surveys”).
    • License Selection: Choose open license for maximum reuse.
  4. Review & Publish
    • Preview your submission; fix any warnings.
    • Click “Publish” (or “Submit for Review” if institutional).
    • A DOI will be minted (Zenodo/ Figshare) or you’ll receive one soon via your university repository.
  5. Link From Your Dissertation & Manuscripts
    • In your methods or data-availability section, include:
The anonymized dataset and analysis scripts are available at Zenodo: DOI 10.5281/zenodo.1234567

3. Assigning & Using DOIs

  • Automatic DOI Minting
    • Zenodo and Figshare provide a DOI upon publication.
  • Institutional DOI Requests
    • Contact your library or digital‐scholarship office to reserve a DOI prefix and suffix for your dataset.
  • Citation Format
Gupta, A. (2025). Hybrid Instruction Survey Data & R Code [Data set]. Zenodo. <Enter the link to the artifact here>

4. Sample Folder Structure

Hybrid_Instruction_Study_2025/
├── data/
│   ├── raw/
│   │   └── survey_responses_raw.csv
│   ├── processed/
│   │   └── survey_responses_anonymized.csv
│   └── README_data.md
├── code/
│   ├── 01_data_cleaning.R
│   ├── 02_analysis.R
│   └── 03_visualizations.R
├── docs/
│   ├── README.md           # Overview & instructions
│   ├── LICENSE             # Data: CC BY 4.0; Code: MIT
│   └── CITATION.cff        # Citation metadata
└── outputs/
    ├── tables/
    │   └── table1_summary.csv
    └── figures/
        └── fig1_trend.png
  • data/raw/: Untouched original files (for archive only).
  • data/processed/: Anonymized, analysis‐ready datasets.
  • code/: Modular scripts following a logical pipeline.
  • docs/: Documentation, license, and citation files.
  • outputs/: Exported tables and figures ready for publication.

5. Quick‐Start Checklist

  • Anonymize all personal identifiers before deposit.
  • Include clear README and license files.
  • Ensure code runs end‐to‐end on a fresh machine.
  • Choose an open license (CC BY, MIT) for maximum reuse.
  • Deposit data and code together; link with a DOI in your dissertation.
  • Announce your repository link in conference presentations and your ORCID profile.

By depositing your data and code in a reputable repository—complete with a DOI, comprehensive metadata, and clear folder organization—you cement the reproducibility and impact of your PhD research. Future scholars will thank you, and your work will stand on a foundation of openness and trust.


Explore more ethical research hacks for professors pursuing a PhD in India on our Ethical PhD Research Hacks for Faculty guide page


Discover more from Ankit Gupta

Subscribe to get the latest posts sent to your email.

Leave a ReplyCancel reply

Discover more from Ankit Gupta

Subscribe now to keep reading and get access to the full archive.

Continue reading

Consultation Fee ₹1,000/- per hour (By Appointment Only)