Deconvoluting Real GC-MS Peaks
Applying our pipeline to the Copenhagen Soft Camel Cheese dataset
Warning: Spoiler alert — of course it does not work ðŸ˜. Well I tried at least.
1. The Pipeline
We ran the full pipeline on all 24 GC-MS analyses from the Copenhagen dataset:
- Preprocess — Gaussian smoothing (σ=2) + per-ion AsLS baseline removal
- Peak picking — prominence-based detection on TIC, edge refinement, overlap resolution
- Component estimation — SVD features + RandomForest (trained on synthetic data)
- MCR-ALS — recover elution profiles and spectra
- Identification — cosine similarity search against 9,971 MassBank reference spectra
2. Large Peaks
These are the strongest peaks in the dataset. Gray lines are the individual ion traces,
colored lines are the MCR-ALS component models scaled to match the ion intensity range.
3. Medium Peaks