Ensuring Reproducibility: Publishing Both Data and Code Under an Institutional Repository
Publishing data and code for reproducibility in research demands that you share not only your findings but also the underlying data and analysis code. Depositing these materials in a trusted repository (institutional or open‐access) with persistent identifiers (DOIs) ensures transparency, credit, and reusability. Below, find a step‐by‐step guide, DOI assignment tips, and a sample folder structure to get you started.
1. Choosing a Repository for publishing data and code for reproducibility
- Institutional Repository
- Pros: University‐branded, often free, integrates with campus IP management.
- Cons: May lack granular DOI support or community discoverability.
- Open‐Access Platforms
- Zenodo (CERN): Free, supports unlimited files up to 50 GB, mints DOIs automatically.
- Figshare: Free up to 5 GB; makes items citable via DOIs; integrates with GitHub.
- Dryad, OSF: Discipline‐specific features, various pricing models.
2. Step-by-Step Deposit Guide for publishing data and code for reproducibility
- Prepare Your Materials
- Anonymized Dataset (
.csvor.xlsx): Remove direct identifiers; include a README explaining variables. - Analysis Scripts (
.R,.py, or Jupyter notebooks): Ensure they run from raw data to final tables/figures. - Documentation:
README.mdwith project overview, dependencies (e.g., R‐package list), and usage instructions.LICENSEfile (e.g., CC BY 4.0 for data; MIT for code).
- Anonymized Dataset (
- Log In & Create New Submission
- For Zenodo: Sign in via ORCID/GitHub → “New Upload” → drag & drop files.
- For Institutional: Navigate to your university archive portal → “Deposit Research Output” form.
- Fill Metadata Fields
- Title: Reflects your study (e.g., “Survey Data and R Code for Hybrid Instruction Study, 2025”).
- Authors & Affiliations: Include your ORCID iD.
- Description/Abstract: Briefly summarize data collection, study scope, and code purpose.
- Keywords/Subjects: Enhance discoverability (e.g., “educational research,” “R scripts,” “student surveys”).
- License Selection: Choose open license for maximum reuse.
- Review & Publish
- Preview your submission; fix any warnings.
- Click “Publish” (or “Submit for Review” if institutional).
- A DOI will be minted (Zenodo/ Figshare) or you’ll receive one soon via your university repository.
- Link From Your Dissertation & Manuscripts
- In your methods or data-availability section, include:
The anonymized dataset and analysis scripts are available at Zenodo: DOI 10.5281/zenodo.1234567
3. Assigning & Using DOIs
- Automatic DOI Minting
- Zenodo and Figshare provide a DOI upon publication.
- Institutional DOI Requests
- Contact your library or digital‐scholarship office to reserve a DOI prefix and suffix for your dataset.
- Citation Format
Gupta, A. (2025). Hybrid Instruction Survey Data & R Code [Data set]. Zenodo. <Enter the link to the artifact here>
4. Sample Folder Structure
Hybrid_Instruction_Study_2025/
├── data/
│ ├── raw/
│ │ └── survey_responses_raw.csv
│ ├── processed/
│ │ └── survey_responses_anonymized.csv
│ └── README_data.md
├── code/
│ ├── 01_data_cleaning.R
│ ├── 02_analysis.R
│ └── 03_visualizations.R
├── docs/
│ ├── README.md # Overview & instructions
│ ├── LICENSE # Data: CC BY 4.0; Code: MIT
│ └── CITATION.cff # Citation metadata
└── outputs/
├── tables/
│ └── table1_summary.csv
└── figures/
└── fig1_trend.png
data/raw/: Untouched original files (for archive only).data/processed/: Anonymized, analysis‐ready datasets.code/: Modular scripts following a logical pipeline.docs/: Documentation, license, and citation files.outputs/: Exported tables and figures ready for publication.
5. Quick‐Start Checklist
- Anonymize all personal identifiers before deposit.
- Include clear README and license files.
- Ensure code runs end‐to‐end on a fresh machine.
- Choose an open license (CC BY, MIT) for maximum reuse.
- Deposit data and code together; link with a DOI in your dissertation.
- Announce your repository link in conference presentations and your ORCID profile.
By depositing your data and code in a reputable repository—complete with a DOI, comprehensive metadata, and clear folder organization—you cement the reproducibility and impact of your PhD research. Future scholars will thank you, and your work will stand on a foundation of openness and trust.
Explore Other Hacks Under this Module
Authorship Dilemmas: Co-Authoring with Former Students or Junior Colleagues
During co-authorship with former students while collaborations with former students or junior colleagues bring fresh insights—and thorny questions about who qualifies as a co-author.
Domain: Research
Read
Data Integrity Checks: Ensuring Your Own Teaching Records Don’t Skew Scholarly Findings
The issue of data integrity in educational research arises when faculty often have a treasure trove of internal data—course evaluations, grade distributions, attendance logs—that can enrich educational research.
Domain: Research
Read
Avoiding Self-Plagiarism: When Faculty Write About Their Own Published Work
Avoid self-plagiarism in PhD thesis writing by reusing your own words without adequate transformation—known as text recycling or self-plagiarism—can undermine the originality of your PhD thesis and raise red flags with examiners or journal editors.
Domain: Research
Read
Explore Other Modules Under this Guide
Advanced Ethical Research Workflows Data Stewardship
Advanced ethical research workflows and data stewardship provide a principled foundation for conducting transparent, defensible Ph.D. research. These approaches prioritize accountability at every stage of your workflow. Moreover, they promote practices that enhance reproducibility, reduce bias, and respect participants’ rights.
Domain: Research
Explore Hacks
Ethical Ph.D. Data Collection Institutional Consent
Ethical Ph.D. data collection and institutional consent helps researchers collect data within their own institutions with clarity and integrity. This guide focuses on negotiating access, avoiding conflicts of interest, and upholding participants’ rights. Moreover, it walks you through required approvals, data boundaries, and record-keeping.
Domain: Research
Explore Hacks
Ph.D. Research Conflicts of Interest Dual Relationships
Ph.D. research conflicts of interest and dual relationships often emerge when academic roles overlap. This guide explains how to recognize and manage ethical risks in real time. Moreover, it emphasizes disclosure, transparency, and boundaries as foundational strategies.
Domain: Research
Explore Hacks
Ph.D. Time Management Role Balancing
Ph.D. time management and role balancing offers realistic strategies for faculty–scholars juggling academic, research, and personal responsibilities. This guide focuses on sustainable routines that protect both output and well-being. Moreover, it prioritizes ethical practices that prevent corner-cutting under pressure.
Domain: Research
Explore Hacks
Explore Our Other Guides
Ph.D. Statistical Data Analysis Case Studies
Ph.D. statistical data analysis case studies provide authentic dissertation examples that guide complex research. They illustrate how scholars frame questions and select methods. Moreover, each case study sets clear objectives to anchor decision‑making.
Domain: Data Analysis
Explore Cases
Ph.D. Statistical Data Analysis Critiques
Ph.D. statistical data analysis critiques guide you through rigorous evaluation of statistical methods in dissertations. This content highlights how to spot methodological flaws and biases. Moreover, it demonstrates strategies for constructive critique that improve research quality.
Domain: Critical Analysis
Explore Critiques
Research Advice
This basic advice is available freely for Ph.D. / Doctoral Faculty Scholars in India.
Domain: Ph.D. Research Thesis
Explore Advice
Our Services
📊 Data Analysis
Speciality: Predictive Modeling
Clients: Businesses & Academics
🎓 Ph.D. Consulting
Speciality: Quantitative Analysis
Clients: Faculty Scholars
🚀 Business Engineering
Speciality: Data-driven Organizational Strategy
Clients: Businesses
Who is a Data Scientist?
Expert in statistical analysis, predictive modeling, and data-driven insights for research and business solutions.
Domain: Semantics
Learn More
About Us
Credentials
Comprehensive overview of skills, work ethic, and professional qualifications.
Category: Client Trust
Explore
Practice Verticals
Independent freelancing professional for data-driven research across multiple domains.
Category: Consulting Domains
Explore
Get in Touch
Use any of the methods below to contact me. Please note our preferred channels and business hours.
Category: Client Trust
Explore