Deconvoluting Real GC-MS Peaks

Applying our pipeline to the Copenhagen Soft Camel Cheese dataset

Warning: Spoiler alert — of course it does not work 😭. Well I tried at least.

1. The Pipeline

We ran the full pipeline on all 24 GC-MS analyses from the Copenhagen dataset:

  1. Preprocess — Gaussian smoothing (σ=2) + per-ion AsLS baseline removal
  2. Peak picking — prominence-based detection on TIC, edge refinement, overlap resolution
  3. Component estimation — SVD features + RandomForest (trained on synthetic data)
  4. MCR-ALS — recover elution profiles and spectra
  5. Identification — cosine similarity search against 9,971 MassBank reference spectra

2. Large Peaks

These are the strongest peaks in the dataset. Gray lines are the individual ion traces, colored lines are the MCR-ALS component models scaled to match the ion intensity range.

3. Medium Peaks

4. Small Peaks

The smallest peaks — noisier signals, harder to decompose.

5. What Goes Wrong

A few things are clearly off:

6. What's Next

Nothing, I'm tired of this unsolvable problem. I'll take a break for now before I go insane.

← Part 3: Recovering Elution Profiles with MCR-ALS