Spectroscopy Data Analysis in Python

Peak fitting, baseline correction, and publication-ready visualization for Raman, FTIR, UV-Vis, NMR, and fluorescence spectroscopy.

What you will find in this guide

Why this guide uses Python - and why you do not have to write it

Python is the standard for spectroscopy analysis. PLOTIVY removes the implementation friction.

scipy, numpy, and matplotlib provide every step of the spectroscopy pipeline - and produce results that are:

  • Reproducible: re-run with different smoothing parameters or an additional peak component by changing one line after a review
  • Precise: smoothing window length, number of Gaussian components, baseline polynomial degree are explicit parameters in the script
  • Auditable: every fit result, every FWHM, every baseline subtraction is a documented step - nothing hidden in a GUI
  • Publication-ready: export at 600 DPI, control axis labels and residual panels precisely, match journal formatting requirements

The friction is writing the code correctly. PLOTIVY generates and executes the Python code for you in the browser - upload your spectrum, describe the analysis, and the code is generated and run instantly. Your results remain a real Python script you can inspect, copy, and run independently.

This guide covers the concepts behind the code - what each function computes, why parameter choices matter - so you can verify and defend every step of your analysis.

Try it on your data

The Spectroscopy Data Analysis Pipeline

Spectroscopy generates some of the most information-dense data in experimental science. A single Raman spectrum may contain hundreds of peaks encoding molecular fingerprints. An FTIR time series can track chemical reactions across thousands of wavenumber channels simultaneously. UV-Vis absorbance measurements underpin quantitative assays across chemistry, biology, and materials science.

In every case, the raw instrument output requires processing before it becomes a publishable result. The standard workflow follows a consistent sequence:

  1. 1.Import the raw spectrum
  2. 2.Correct the baseline
  3. 3.Reduce noise with smoothing
  4. 4.Identify peaks
  5. 5.Fit quantitative models to those peaks
  6. 6.Produce a figure that communicates the result clearly

Key insight: Python has become the dominant language for this pipeline because scipy, numpy, and matplotlib provide every component - from signal filtering to nonlinear curve fitting - in a single, reproducible script.

Noise Reduction and Signal Preprocessing

Raw spectra almost always contain noise. Detector electronics, thermal fluctuations, and photon statistics introduce random variation that obscures weak features and complicates peak fitting.

The challenge is removing noise without distorting the peaks you care about - a smoothing filter that broadens a narrow Raman line or shifts its center frequency defeats the purpose of the measurement.

Savitzky-Golay Filtering

Savitzky-Golay filtering fits a local polynomial to successive windows of the spectrum, preserving peak shape, position, and relative height while averaging out high-frequency noise. Two parameters control the behavior:

  • Window length: wider for broad FTIR absorption bands; shorter for narrow Raman lines
  • Polynomial order: higher order preserves more peak character but reduces noise suppression

Baseline Correction

Fluorescence background in Raman spectra, scattering contributions in UV-Vis, and detector drift in NMR all produce slowly varying baselines that shift peak intensities.

Important: Always perform baseline correction before fitting. Fitting on uncorrected spectra will produce inflated peak areas and inaccurate center positions.

Peak Identification and Quantitative Fitting

Once the spectrum is clean, the next step is identifying which features are real peaks and which are residual noise. Manual inspection works for simple spectra with a handful of well-separated lines, but Raman maps with thousands of spectra demand automated detection.

The scipy.signal.find_peaks function is the standard tool. The Peak Detection technique page covers how to tune its parameters - minimum height, prominence, and minimum distance between peaks - for different spectral types.

Multi-peak deconvolution - separating overlapping lines into individual components - is one of the most common tasks in Raman and fluorescence analysis. Set up composite models with multiple Gaussian or Lorentzian terms and use detected peak positions as initial guesses.

For quantitative spectroscopy, the Beer-Lambert law establishes a linear relationship between absorbance and concentration. The Linear Regression page provides the full workflow: measure standards, fit a calibration line with error bars, compute R-squared, and use the model to predict unknown concentrations with uncertainty estimates.

Gaussian vs Lorentzian vs Voigt

Choosing the wrong line shape is one of the most common errors in spectral fitting. Use this table to select the correct model for your technique.

Line ShapeBest ForAvoid ForKey Notes
GaussianFluorescence emission, UV-Vis absorption bands, inhomogeneous broadeningRaman lines (Lorentzian is more accurate)FWHM = 2.355σ; faster decay in the tails
LorentzianRaman peaks, NMR lines, homogeneous broadeningBroad fluorescence backgroundsHeavier tails than Gaussian; FWHM = γ
VoigtFTIR absorption bands, X-ray diffraction peaksSimple spectra where extra parameters are not justifiedMost accurate but adds two parameters; use pseudo-Voigt approximation in scipy

Multivariate Spectral Analysis

When you have many spectra - from different samples, different positions on a surface map, or different time points - univariate peak analysis may not capture the full picture. Principal Component Analysis (PCA) treats each spectrum as a point in a high-dimensional space and finds the directions of maximum variance.

PCA is widely used in Raman mapping to distinguish chemical phases across a surface, in NIR spectroscopy for material classification, and in process analytical technology (PAT) for monitoring manufacturing.

What PCA tells you about your spectra

  • Score plots: which samples (or spatial positions) cluster together chemically
  • Loading vectors: which spectral features (wavenumbers) drive the separation
  • Explained variance: how many components are needed to describe most of the variation
  • Outliers: samples that are chemically distinct from the rest of the dataset

Spectroscopy Techniques

Common Mistakes to Avoid

  • Smoothing too aggressively: a window that is too wide broadens peaks and shifts their centers.
  • Skipping baseline subtraction before fitting - fluorescence background in Raman inflates peak integrated areas.
  • Using a Gaussian model for Raman lines; Raman peaks are Lorentzian. The choice affects reported FWHM.
  • Providing initial guesses far from the true peak center - curve_fit converges to a local minimum or fails.
  • Comparing FWHM values across instruments with different spectral resolutions without correcting for the instrument function.
  • Running PCA on raw spectra without baseline correction - the first principal component will reflect baseline variation, not chemistry.

Analyze Your Spectra Now

Upload spectroscopy data and generate publication-ready peak fits, smoothed spectra, and calibration curves with AI-powered analysis.

Start Analyzing