GC-MS Deconvolution Project

Exploring chemometrics with synthetic data and machine learning

Disclaimer: I'm learning as I go here — I have no formal background in analytical chemistry or chemometrics. This is very much a "figure it out as you build it" project, and nothing here should be taken as state of the art. If you spot something wrong or know a better way, I'd love to hear about it!

I have to give kudos to Claude Code here. This whole project — parsing raw instrument files, extracting and clustering peak shapes, building a data generator, training a model, and writing this blog — was built in a couple of hours. The efficiency is honestly amazing and almost scary at times.

By Jonas Berdoz · Data sources: Copenhagen Soft Camel Cheese GC-MS dataset, MassBank mass spectral library