Menu

Guide8 min read

How to Organize Research Data: Folder Structure & Naming Conventions

By Francesco Villasmunta
How to Organize Research Data: Folder Structure & Naming Conventions

Bad data organization is the #1 reason experiments can't be reproduced. This guide gives you a battle-tested system for folder structure, file naming, and metadata that scales from a single experiment to a multi-year PhD project. For hands-on utilities to implement these practices, check out our Research Toolkit.

What You Will Learn

1.The Golden Rules

2.Folder Structure Template

3.File Naming Conventions

4.Version Control for Non-Coders

5.Metadata & Data Dictionaries

6.Backup Strategy

1. The Golden Rules

Never modify raw data

Treat raw data files as read-only. All transformations go to a separate processed/ folder.

Use consistent naming

Pick a naming convention on day one and follow it for every file in the project.

Document everything

Future-you is a stranger. Write README files and data dictionaries.Learn how

Automate what you can

Scripts are better than memory. If you click 20 times, write a script instead.

2. Folder Structure Template

shell
project-name/ ├── README.md # Project overview, how to reproduce ├── data/ │ ├── raw/ # Original, unmodified data files │ │ ├── 2025-01-15_xrd-scan.csv │ │ └── 2025-01-16_sem-images/ │ ├── processed/ # Cleaned, transformed data │ │ └── xrd-peaks-extracted.csv │ └── external/ # Data from external sources │ └── reference-patterns.csv ├── figures/ │ ├── exploratory/ # Quick plots for analysis │ └── publication/ # Final figures for paper │ ├── figure-1-xrd.pdf │ └── figure-1-xrd.png ├── scripts/ │ ├── 01-import-and-clean.py │ ├── 02-analyze.py │ └── 03-generate-figures.py ├── notebooks/ # Jupyter notebooks for exploration ├── docs/ │ ├── data-dictionary.md # What each column means │ └── protocol.md # Experiment protocol └── results/ └── statistical-summary.csv

Why numbered scripts?

Prefixing scripts with numbers (01-, 02-, 03-) shows the execution order. Anyone can reproduce your analysis by running them in sequence.

Managing Figures

Your figures/publication folder should contain both vector (PDF/SVG) and raster (PNG/TIFF) formats.

Plotivy lets you easily export your figures in both vector and raster formats directly from the visualization editor - no extra tooling needed.

Check out our guide on Common Visualization Mistakes to ensure your figures are publication-ready, or review our Peer Review Guide. For a comprehensive overview, see our Complete Guide to Scientific Data Visualization.

Folder Structure Generator

Create this entire project structure with one click. Download a shell script that sets up your folder hierarchy automatically.

Open Tool

3. File Naming Conventions

RuleGoodBad
Date first (ISO 8601)2025-01-15_xrd-scan.csvxrd scan jan.csv
No spacescell-growth-data.csvcell growth data.csv
Lowercase + hyphensdose-response-drugA.csvDoseResponse_DrugA.csv
Descriptive namesmouse-weight-cohort-3.csvdata2_final_v3.csv
Version numbersanalysis-v02.pyanalysis-FINAL-FINAL.py

The "final" problem

If you have files named thesis-FINAL.docx, thesis-FINAL-v2.docx, thesis-FINAL-REAL.docx - you need version control, not better naming.

Try it

Try it now: turn this method into your next figure

Apply the same approach to your own dataset and generate clean, publication-ready code and plots in minutes.

Open in Plotivy Analyze

Newsletter

Get a weekly Python plotting tip

One concise tip each week for cleaner, faster scientific figures. Built for researchers who publish.

No spam. Unsubscribe anytime.

4. Version Control for Non-Coders

Git + GitHub

Recommended

Best for code and scripts. Free private repos. Track every change with full history.

Google Drive versioning

Easiest

Right-click any file to see version history. No setup required. Good for documents.

OSF (Open Science Framework)

Academic

Purpose-built for research. DOI minting, preregistration, data archiving. Free. Learn more in our FAIR Principles resources.

5. Metadata & Data Dictionaries

Data dictionary template

markdown
# data-dictionary.md ## Dataset: cell-growth-data.csv | Column | Type | Unit | Description | |---------------|---------|---------|-----------------------------------| | sample_id | string | - | Unique sample identifier | | time_hours | float | hours | Time since treatment | | cell_count | integer | cells | Viable cell count (trypan blue) | | treatment | string | - | Drug name or "control" | | concentration | float | uM | Drug concentration | | replicate | integer | - | Biological replicate number (1-3) | ## Collection Notes - Instrument: Countess II (Thermo Fisher) - Operator: J. Smith - Date range: 2025-01-10 to 2025-01-17

Every dataset needs a data dictionary

Columns named "conc", "val", or "x" are meaningless six months later. A data dictionary takes five minutes to write and saves hours of confusion.

Data Dictionary Generator

Create a clear data dictionary for your datasets. Define variables, units, and types in minutes.

Open Data Dictionary Tool

Metadata Generator

Generate standardized YAML/JSON metadata files for your research project. Fill in the form, download the file.

Generate Metadata

README Generator

Create professional README documentation for your research project with structured templates.

Create README

Learn more: Ensure your data follows open science standards with our FAIR Principles Guide, or explore FAIR Principles in the Research Toolkit.

6. Backup Strategy

3 copies

Keep at least 3 copies of critical data.

2 media types

Local drive + cloud storage (or external drive).

1 offsite

At least one backup should be in a different physical location.

Daily: Automatic cloud sync (Google Drive, OneDrive, Dropbox)

Weekly: Verify sync status, check for conflicts

Monthly: Export a snapshot to an external drive or institutional archive

Per-milestone: Tag a Git release when you submit a paper or complete an experiment

Experiment Checklist

Track your daily lab protocols with customizable checklists. Never miss a critical step again.

Open Checklist

Data Licensing Guide

Choose the right license for sharing your research data openly. Compare CC, MIT, Apache, and more.

Explore Licenses
Coming Soon

The Ultimate Scientific
Visualization Guide

We're finalizing our comprehensive PDF guide with best practices, ready-to-use templates, and Python scripts for publication-ready figures.

50+ Page PDF Guide
Python Code Templates
Color Palettes & Presets
Journal Standards Cheatsheet
Unsubscribe at any time. We respect your inbox.

Chart gallery

Visualize Your Organized Data

Once your data is clean and organized, generate publication figures instantly.

Browse all chart types →
Bar chart comparing average scores across 5 groups with error bars
Comparisonmatplotlib, seaborn
From the chart galleryComparing performance across categories

Bar Chart

Compares categorical data using rectangular bars with heights proportional to values.

Sample code / prompt

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Generate performance scores for 5 treatment groups
np.random.seed(42)
groups = ['Control', 'Treatment A', 'Treatment B', 'Treatment C', 'Treatment D']
n_samples = 30
Scatter plot of height vs weight colored by gender with regression line
Statisticalmatplotlib, seaborn
From the chart galleryCorrelation analysis between metrics

Scatterplot

Displays values for two variables as points on a Cartesian coordinate system.

Sample code / prompt

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd

# Generate sample data
np.random.seed(42)
n_samples = 200
height = np.random.normal(170, 8, n_samples)
weight = height * 0.6 + np.random.normal(0, 8, n_samples) - 50
Correlation heatmap with diverging color scale and coefficient annotations
Statisticalseaborn, matplotlib
From the chart galleryCorrelation analysis between variables

Heatmap

Represents data values as colors in a two-dimensional matrix format.

Sample code / prompt

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Create correlation matrix for financial metrics
metrics = ['Revenue', 'Profit', 'Expenses', 'ROI', 'Customers', 'AOV', 'Marketing', 'Employees']
correlation_data = np.array([
    [1.00, 0.85, -0.45, 0.72, 0.88, 0.65, 0.72, 0.55],
    [0.85, 1.00, -0.78, 0.92, 0.75, 0.58, 0.63, 0.48],
Multi-line graph showing temperature trends for 3 cities over a year
Time Seriesmatplotlib, seaborn
From the chart galleryStock price tracking over time

Line Graph

Displays data points connected by straight line segments to show trends over time.

Sample code / prompt

import matplotlib.pyplot as plt
import numpy as np

# Generate temperature data for 3 major US cities over 12 months
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
nyc = [30, 32, 40, 52, 65, 75, 82, 81, 74, 63, 50, 38]
miami = [65, 66, 70, 76, 82, 87, 90, 90, 87, 80, 72, 66]
chicago = [25, 27, 35, 48, 62, 72, 80, 79, 71, 60, 45, 32]

# Create figure with enhanced styling
Box and whisker plot comparing gene expression across 4 genotypes with significance brackets
Distributionseaborn, matplotlib
From the chart galleryComparing experimental groups in scientific research

Box and Whisker Plot

Displays data distribution using quartiles, median, and outliers in a standardized format.

Sample code / prompt

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Generate gene expression data for 4 genotypes
np.random.seed(42)
genotypes = ['WT', 'KO1', 'KO2', 'Mutant']
n_per_group = 20
Violin plot comparing score distributions across 3 groups with inner box plots
Distributionseaborn, matplotlib
From the chart galleryComparing treatment effects across groups

Violin Plot

Combines box plots with kernel density to show distribution shape across groups.

Sample code / prompt

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from scipy.stats import f_oneway

# Generate exam score data for 3 groups
np.random.seed(42)
control = np.random.normal(72, 12, 50)
treatment_a = np.random.normal(78, 10, 50)

From Organized Data to Publication Figures

Upload your clean CSV, describe your figure, and get editable Python code with 600 DPI export.

Make sure to check our Journal Figure Cheat Sheet before exporting.

Try Plotivy Free
Tags:#data organization#research#naming conventions#metadata#reproducibility

Found this helpful? Share it with your network.

FV
Francesco Villasmunta

Experimental Physicist & Photonics Researcher

Hands-on experience in silicon photonics, semiconductor fabrication (DRIE/ICP-RIE), optical simulation, and data-driven analysis. Built Plotivy to help researchers focus on discoveries instead of data struggles.

More about the author

Visualize your own data

Apply the techniques from this article to your own datasets. Upload CSV, Excel, or paste data directly.

Start Analyzing - Free