Last semester, I took a course called Metabolomics Practicum, taught by several professors who were all experts in different areas of metabolomics at The Ohio State University:
- Jessica Cooperstone (study design)
- Emmanuel Hatzakis (Nuclear Magnetic Resonance Spectroscopy [NMR])
- Matthias Klein (statistical data analysis)
- Rachel Kopec (Liquid Chromatography Mass Spectrometry [LC-MS])
- Ewy Mathe (computational data analysis)
- Devin Peterson (Gas Chromatography Mass Spectrometry [GC-MS])
The course introduced me to aspects of metabolomics data sets that I, and I think it’s safe to say most bioinformaticians, hadn’t commonly thought about. Data sources and variation factors are significant in metabolomics but are often overlooked by data scientists and statisticians, who generally assume data is “clean” before starting analysis. This post offers a highly condensed version of the course material for bioinformaticians from backgrounds similar to mine. My hope is that it will help others both understand the origins of their data and look at metabolomics papers and data sets with a critical eye.
Metabolomics Experiment Overview
Metabolomics is the study of all small molecules in a biological system. “small” is defined differently by different sources – one definition includes only metabolites < 1 kDa. Metabolomics studies fall into two groups:
- Targeted metabolomics (designed to test a hypothesis, focuses on specific metabolites)
- Untargeted metabolomics (designed to generate a hypothesis, focuses on surveying all metabolites).
In either case, both the sample processing and experimental workflow are chosen with the scientific question in mind, e.g.
- How does the metabolomic profile of a tomato grown with fertilizer X differ from the metabolomic profile of a tomato grown with fertilizer Y?
- How does the plasma metabolome of a patient given drug X differ from that of a patient given drug Y?
Samples from relevant populations are then collected, extracted, and run through an instrument (either a mass spectrometer or NMR instrument). These instruments have a detector to detect (for MS) mass-to-charge (m/z) ratios and retention times of the compounds and (for NMR) chemical shifts of the compounds in magentic environments. (The retention time of the compound is the time when it exits the column – more on that later). The end result is a three-dimensional plot, where the dimensions are m/z, intensity (frequency of detector hits), and retention time.
MS plots are often projected onto two dimensions for interpretability. There are a few types of two-dimensional plots used. One is the total ion chromatogram, which plots only retention time by intensity, summed over all m/z. Here, multiple metabolites can potentially be represented by a single peak. Another plot is the base peak chromatogram. Here, only the maximum intensity peak for a given retention time is shown in the chromatogram. Finally, there is the mass spectra, which includes only m/z by intensity for a given retention time.
Even 2D plots can be difficult to interpret: compounds don’t always separate neatly from one another, and they fragment during the ionization process used by the instrument. However, software (sometimes open source, but often proprietary) exists to separate out peaks in the chromatogram. This is called “peak picking” and is usually followed by detecting features (peak sets) believed to correspond to metabolites. Once features are detected, the researcher can look up the corresponding metabolites using mass-to-charge ratios and adducts or fragments detected by the software. Although NMR spectra are different, NMR also induces fragmentation and requires peak-picking software and spectrum matching for identification.
In any metabolomics study, most features will not be identified. This happens for multiple reasons. Sometimes, the configuration of the experiment or sample processing procedure prevents some metabolites from being detected. Sometimes, spectra are clear but cannot be matched to anything in a database or a standard (two ways of confirming metabolite identity – more on that later). And sometimes, metabolites are overly fragmented, so their spectra are not recognizable or misindentified. Even for those features identified, they often cannot be quantified exactly; all that can be determined is the abundance of a feature in one group of samples compared to another.
You as an analyst should think about how the experimental design, extraction method, components and settings of the instrument, and peak-picking procedures affect the metabolites detected. You should be especially aware of this when comparing results from different studies. The two labs were likely trying to answer slightly different questions, and they designed their experiments to answer these questions. People conducting primary research generally don’t think about secondary analysis when they design their experiments because they need to design them in the most efficient and cost-effective way for their lab.
Experimental Design Effects
There are many factors when it comes to experimental design, and one of the most important precautions to take is to ensure that external factors are not contributing to apparent metabolomic differences between samples. Here are a few ways that researchers minimize these external factors; however, they aren’t all feasible for every study. You should check whether these aspects of their design are documented in their study when interpreting results.
- Storage time and procedure: All samples should be stored in the same way, in the same type of container, for the same amount of time. Freezing and thawing a sample can affect its metabolite profile, and so can its contact with the container. Keeping these factors consistent is an ideal scenario and doesn’t always happen in reality.
- Processing: All samples should be extracted in the same way (and ideally by the same lab tech) and run through the same experiment all in one batch. But again, this doesn’t always happen, especially if there is a large number of samples. One practice is to flush the instrument with solvent after each batch to prevent contamination from previous samples and to randomize run order.
- Replicates: Are the replicates technical (coming from the same organism) or biological (coming from different organisms in the same group)? This can have a significant effect on the results.
- Quality Control: Quality control samples (QC) are often used for finding outliers after peaks are picked as well as for correcting for differences between runs. The QC sample is usually a pooled combination of all samples and is run along with other samples. Feature abundances can then be corrected when normalizing by QC.
- Blanks: Blanks can be run to correct for effects due to sample processing and storage. Creating blanks may involve putting water or another neutral substance through the same process as the sample, storing water in the same container for the same amount of time as the sample, or adding pH-balancing buffers to both the sample and the blank. Hypothetically, it is also possible to have a blank created by each tech, have a separate blank for each batch, etc, to correct for other effects. In reality, this is often limited by resources and time.
- Internal standards: These are pure compounds that are run through the instrument along with the samples. Because the quantity of the compound is known, they can be used to normalize and even give an absolute quantification of metabolite abundance. However, standards aren’t always available for a given experiment.
The following image shows an example of samples from an experiment using QC, blanks, and internal standards.
Before they can be run through the experiment, samples must be extracted. Extraction protocols differ widely depending on the metabolites of focus, and they also differ between instrument types. Depending on the goal of the experiment, samples may be frozen, dried, centrifuged, extracted using a solvent, and/or diluted. Some frequently-analyzed substances, such as human plasma, have established protocols. Others do not, in which case the extraction protocol must be defined ad hoc by the lab.
Aside from the type of sample, the extraction protocol is also going to depend on the instrument being used. In liquid chromatography and NMR, extraction procedures can vary and are usually dependent on the metabolites of focus. For GC-MS, the sample must be in the gaseous state, with the compounds distributed throughout the gas, so extraction options are limited. For substances that are not volatile, a chemical process called derivitization is used to make them volatile, and then the extraction procedure is performed.
One way to extract metabolites for GC-MS is by collecting the gas above the sample (called the headspace), which depends on the solubility of the compounds and the vapor pressure. It is also possible to flush nitrogen through the sample and then collect it. The efficacy of this method depends on the rate of transfer of the compounds through the nitrogen. Finally, a method called Dynamic Headspace Extraction (illustrated below) was devised to minimize these dependencies.
Another approach is solid phase microextraction (SPME), which uses a fiber to collect a heated sample. The fiber becomes saturated quickly, so the amount of sample that can be extracted is very small. This small sample becomes problematic when trying to separate true signal from noise. In addition, fibers are not always made consistently, making reproducibility difficult. In spite of these drawbacks, it is a simple approach that is commonly used.
Here, we will discuss two ways in which a sample can be analyzed after the preparation and extraction is complete.
- It can be passed through a column (known as chromatography), ionized, and then detected using mass spectrometry.
- It can be analyzed using an NMR instrument.
In general, MS methods have higher sensitivity and can therefore detect more compounds. But NMR methods have much higher reproducibility and selectivity (i.e the mass resolution is very accurate). For this reason, one might use MS if compound coverage is most important for a study and NMR if precise identification is most important. We will discuss each procedure further here.
After sample preparation and extraction, the sample is passed through a column, which is used to separate out the sample’s components. This may be done using an LC column or a GC column.
Liquid Chromatography (LC)
LC is generally used for samples that are not easy to derivitize (i.e. change to a gaseous state). It uses two key principles to separate out compounds for detection: a solvent (mobile phase) and a column (stationary phase).
Depending on the polarity of the compounds (i.e. how attracted they are to water), different columns and solvents will be used. This is very important for analyzing spectra. The goal is to have all molecules pass through the column and reach the detector so that their spectra can be quantified, but not all at once. Therefore, the ideal column is one to which all compounds of focus will be attracted to varying degrees, but to which no compound will be so attracted as to stick to it without passing through the column.
Different columns are optimized for different types of compounds, and compounds also react while in the column, so it is impossible to capture all types of compounds in a given column. A few key types of columns are used in LC: Normal Phase, Reverse Phase, and HILIC. Each of these columns is also used with a specific type of solvent and hydrophobicity. Normal phase columns are usually used for very hydrophobic compounds, whereas HILIC is used for very hydrophilic compounds. Reverse phase is used for compounds that fall into an intermediate category. Bearing this in mind, think about what will happen if a researcher has a solution with hydrophobic compounds but runs HILIC. Those compounds will immediately exit the column and reach the detector. They will show up as a jumbled mess at the beginning of the chromatogram, rather than being nicely separated for easy identification. So, the types of compounds identified in an experiment have a lot to do with the column used to run the experiment.
The reactivity and variable attraction of compounds while in the column is a key reason why MS methods are said to be comparative, not quantitative. Because the chemical environment is complex, it is difficult to determine how much of a compound is getting lost because of its attraction to the column or reaction with other compounds.
Liquid Chromatography (LC) Ionization
After compounds pass through the column, they are ionized and launched into a mass spectrometer for detection (called mass spectrometry). When this is done after liquid chromatography, it is called LC-MS. When it is done after gas chromatography, it is called GC-MS. Ionization is important because all MS are designed for detecting ions only, and ionization procedures differ according to chromatography.
For LC-MS, a common type of ionization is electrospray ionization (ESI). This is considered a type of soft ionization, which means that little fragmentation of the compound occurs during the process. ESI sprays droplets of the ionized sample (combined with solvent) into the MS, creating a vapor. The solvent evaporates, leaving behind the compounds to travel through the MS. Ionization for gas chromatography, as we will discuss, is different.
Gas Chromatography (GC-MS)
Gas chromatography is different from liquid chromatography in that it requires the sample to be derivitized (i.e. converted to a gaseous state) before it can pass through the column. GC columns are much larger than LC columns and are coiled. However, like LC columns, they are coated with a stationary phase and used for separating the compounds. The selectivity and effectiveness of a GC column is determined by its diameter, length, and film thickness.
Unlike LC, GC does not use soft ionization to launch compounds into the MS. It uses a type of hard ionization called Electron Impact (EI). This means that the spectra from a GC will be much more fragmented than the spectra for the same sample using LC because of the high amount of fragmentation caused by the ionization process. While this can be a factor in metabolite identification, it is noteworthy that the identification libraries for GC are more robust than those for LC, helping to mitigate this effect.
Mass Spectrometry (MS)
Once ions in the gaseous state are produced using either of the ionization techniques discussed above, the ionized compounds travel through a mass specrometer to ultimately meet the detector. The differences in trajectory through the mass spectrometer are what makes the compounds identifiable. Mass spectrometers can be of a few types, but they often contain a quadrupole component. Quadrupole components use voltage changes to filter the compounds by mass, yielding m/z ratios. They are very sensitive, but not very selective. For this reason, they are not typically used in conjunction with other components; three common types are Time Of Flight (TOF), Orbitrap, and Fourier Transform Ion Cyclotron Resonance (FT-ICR).
TOF instruments allow the molecule to drift freely through a tube, and they measure m/z ratios using the time taken for a molecule to travel through the tube. Using a TOF in conjunction with a quadrupole increases selectivity, yielding better separation of molecules.
Orbitrap instruments work by using electrodes to trap ions so that the ions must move around a spindle. The m/z ratio is computed using the trajectory of each ion.
FT-ICR uses a magnet to induce fluctuations in the compound. The frequency of the fluctuations are measured using a Fourier transform. Compared to TOF and Orbitrap instruments, these have the highest selectivity. However, they are also less affordable and slower than TOF or Orbitrap.
One important point about MS methods is that they are not quantitative. A compound may be measured across two samples to determine its relative abundance within the samples, but this does not tell us the compounds’s total abundance. MS/MS can be used to focus on a small portion of the MS spectra and quantify it. A second sample is needed for this process, and it requires a QTOF (a quadrupole attached to a TOF) with mass selection. The process is depicted below. Note that this can be done only when the researcher knows which mass range to consider. MS/MS is usually done as a follow-up to MS.
Nuclear Magnetic Resonance (NMR)
The mechanism used in NMR is quite different from that used in MS techniques. NMR instruments use large magnets to induce the oscillation of compounds. They do not separate compounds using a column, and they do not induce hard or soft ionization, and they do not destroy the sample. The compounds are identified using spectra generated from their net magnetization. This is dependent on the molecule’s spin, a property of magnetism.
The oscillation behavior of compounds in NMR is determined primarily by key atoms in the compound, called the nuclei. In any given NMR experiment, a specific type of nucleus will be targeted by the instrument. Common nuclei are hydrogen, the carbon-13 isotope, the phosphorus-31 isotope, nitrogen-15, silicon-29, and fluorine-19. The researcher’s choice of nucleus will depend on which of these they expect to be most abundant in the sample. The sample is enclosed in a holder (probe) optimized for different nuclei of interest, and each atom of this type in a compound will have a separate signal.
The chemical shift of a nucleus (i.e. the shift in signal) is determined by the atoms surrounding it, but it may also be affected by factors like the pH of the sample, the temperature of the sample, and the solvent used. Impurities and solvents may also affect the spectrum, but there are methods for handling these. To lock the optimal magnetic field for a given sample, a process called shimming needs to be conducted with the supervision of the researcher.
J-coupling, the interaction between a nucleus and other nuclei surrounding it in the same compound, also affects the final spectra. NMR peaks often do not have a single summit, but multiple summits (called the multiplicity of the peak). This multiplicity is affected by J-coupling. An example of J-coupling is shown below.
Not everything that needs to be known about each compound can be inferred directly from the NMR spectra described above. For instance, a compound’s fragments may include two or more that are identical; these will overlap completely and won’t be distinguishable. In addition, inference using J-coupling can reveal how many surrounding atoms of a certain type exist, but not exactly where they are in the molecule. 2-D approaches can help researchers to answer these questions.
- Homonuclear COrrelation SpectroscopY (COSY) is a technique that is used to determine which interactions lead to J-coupling. It displays not only the spectra for each molecule, but a cross-peak showing where the coupling occurs.
- TOtal Correlated SpectroscopY (TOCSY) is another technique that, in addition to showing interactions from J-coupling, also shows indirect interactions between nuclei of the same type. It is sometimes compared to running two steps of COSY. This can be used to resolve scenarios in which multiple fragments from the same compound are producting the same peak.
- Nuclear Overhauser Effect Spectroscopy (NOESY) is used to determine the distances between nuclei in bonds. Like COSY and TOCSY, both dimensions use the same type of nucleus.
- Heteronuclear Single Quantum Coherence Spectroscopy (HSQC) is used to find correlations between nuclei of different types. For instance, if used with hydrogen and carbon-13, it can find the fragments in which carbon and hydrogen nuclei are attached.
Metabolite Identification and Modeling
Metabolite identification consists of a few steps:
- Detecting peaks in the signal intensity across m/z and retention time.
- Grouping peaks to create features hypothesized to correspond to metabolites. This is often done using correlation of peaks across samples.
- Matching the spectra of features to known metabolites. This can be done using a pure standard or by querying the spectrum or maximum intensity peak of the spectrum against a database.
A peak is a region of signal with intensity high enough that it is assumed not to be an anomaly. Peak picking often uses statistical methods, and they may be in the form of proprietary software or open source software such as XCMS.
The primary peak picking algorithm in XCMS is called centWave. The user must specify several parameters:
- Peak width
- Signal to noise threshold
- Maximum distance between peaks
- Maximum change in m/z across scans for the peak
- Minimum and maximum counts of sub-peaks per peak
- Intensity noise threshold
- Function to use in approximation of peak shape.
These parameters may not be intuitive to the average researcher. In addition, many researchers perform visual inspection on the list of final peaks to remove false peaks, and this is highly subjective. Both parameters and a list of additional removed peaks should be included for the experiment to be truly replicable.
For NMR, adjustments need to be made to the spectrum before peaks are picked. A Fourier transform is used to convert the signal into the frequency domain, which sometimes includes zero-filling to increase the resolution of the signal. The signal’s phase and baseline are adjusted, usually manually. Finally, an internal standard is typically used to normalize the signal. All of this, along with the peak picking, is often done using proprietary software called TopSpin on Bruker NMR instruments. These manual steps, in addition to the proprietary nature of the software, can present challenges in replication.
Grouping Peaks and Features
Once peaks are detected for each sample, they must be grouped across samples. In XCMS, this is done using the retcor and group functions. Again, there are multiple parameters that the user must specify:
- Method for calculating signal drift
- Step size
- Method for grouping signals
- Deviation tolerance, and
- Minimum number of samples containing the feature.
A manual data cleanup is usually performed after this step. Researchers may remove peaks found in a specified number of blanks, peaks not found in a specified number of QC, or peaks with more than a specified coefficient of variance across QC.
An algorithm called CAMERA can be used as an add-on to the output from XCMS. CAMERA uses known chemical rules to group features found across samples, focusing especially isotopes and adducts in the spectra.
To determine which metabolites to report as significant, researchers have many methods they may use. In these models, metabolites are used as features or independent variables, and sample class labels are used as the outcome or dependent variable.
For class discovery, a PCA or hierarchical clustering method is often used, with the samples labels overlaid. This indicates whether the sample labels are in any way related to the metabolomic driving factors differentiating the samples. For class comparison, a t-test, Wilcoxon rank-sum test, or ANOVA may be used for each metabolite across sample groups, perhaps with p-value or FDR correction. For class prediction, researchers may use a random forest or support vector machine, but it is common to see PLS-DA being used. As with previous steps, it is important to report the parameters given to these algorithms.
The results of a class discovery, class comparison, or class prediction algorithm will include a list of important metabolites, and these are generally the metabolites reported as signficant. But they are still unknown! The next step is to identify them.
One option for identifying significant metabolites is to infer mass measurements from the m/z ratios and query them against metabolite databases. However, the most abundant mass within a feature doesn’t always represent the true mass of the compound. The mass detected by the instrument is called the accurate mass; it is subject to the sensitivity of the instrument and may not include the mass of adducts or isotopes. The discrepancy between this and the exact mass needs to be accounted for when querying the mass in a database.
In addition, there are different types of exact masses, and the researcher needs to be aware which one is contained in a given database. The nominal mass is computed using the integer masses of each element in a compound, the monoisotopic mass is computed using the masses of the most abundant isotope of each compound, and the average mass is computed using the masses of each compound for each isotope, averaged over the abundance of each isotope.
In spectral matching, the spectra found for metabolites of interest are matched to spectra of known compounds in a database. METLIN is a common database for this type of analysis. However, be aware that these spectra are curated from different types of studies, and some have not been verified using a standard; they may have merely been predicted.
A researcher who wants to be absolutely certain about a metabolite’s identity will compare the metabolite’s spectra to the spectrum of a known standard for that compound on the same instrument and subject to the same experimental conditions. This not only requires a lot of extra effort, but it is not possible for many compounds.
Critical Thinking in Computational Metabolomics
In conclusion, there are many factors involved in a metabolomics study. The following are, in my opinion, the most important questions to consider when reading a metabolomics paper, and especially when comparing the results of multiple studies:
- Was this a targeted or untargeted study? In other words, was the focus on specific metabolites?
- How were the samples processed and extracted?
- Did the authors correct for variation adequately by using blanks and QC?
- If MS was used, which column was used, and which type of metabolite is most likely to be detected using this column?
- If MS was used, what is the mass accuracy of the MS instrument?
- If NMR was used, which nuclei were used? Was any 2-D analysis performed?
- How was peak picking performed? Which peak picking parameters were used?
- Which steps were used in data cleanup?
- Were isotopes and adducts resolved using CAMERA?
- How were significant metabolites found? If a parametric algorithm or statistical test was used, what were the parameters?
- At which level were the metabolites identified?
If you do not know the answers to these questions, it is best to ask the authors. Or, if you are not able to do this, at least err on the side of caution. Do not assume that the answers to these questions are likely to be the same across studies.
Open Computational Problems
Several open computational problems exist in metabolomics. You can read more about the current state of computational metabolomics here. In particular, researchers are seeking to address the following problems:
- Identification of metabolites. This is an important one, since the vast majority of features detected using peak picking software are never matched to known spectra. Good solutions will address differences between experiments. Improved integration of known mass spectra is also important.
- Biological inference for metabolite flux between phenotypes. This is often done using known biological and chemical pathways, which are assumed to be discrete (this is not always the case). Sometimes, chemical similarity or reaction networks are used instead of pathways.
- Integration of metabolomics with other -omics data. Many researchers are interested in multi-omics approaches, but combining data types is not straightforward.
- Integration of multi-instrument data. As we have seen, results returned by different types of instruments (NMR, GC-MS, and LC-MS) are expected to differ. The question of how to integrate these data types, like the question of how to integrate multi-omics data, has yet to be answered.
- Integration of lipids and polar metabolites. The typical protocols for identifying lipids and for identifying polar metabolites differ at multiple steps of their protocols. However, it would be useful to perform an integrative computational analysis of these two data types. In a recent paper, this was addressed using a novel instrumentation workflow that lends itself well to integrative analysis.
Ferreira do Nascimento (2017) Advances in Chromatographic Analysis Avid Science.
Frainay,C. et al. (2018) Mind the Gap: Mapping Mass Spectral Databases in Genome-Scale Metabolic Networks Reveals Poorly Covered Areas. Metabolites, 8, 51.
Kuhl,C. et al. (2012) CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Anal. Chem., 84, 283–289.
Peisl,B.Y.L. et al. (2018) Dark matter in host-microbiome metabolomics: Tackling the unknowns–A review. Anal. Chim. Acta, 1037, 13–27.
Riekeberg,E. and Powers,R. (2017) New frontiers in metabolomics: from measurement to insight. F1000Research, 6, 1148.
Samuelsson,L.M. and Larsson,D.G.J. (2008) Contributions from metabolomics to fish research. Mol. Biosyst., 4, 974.
Schmidt,K. and Podmore,I. (2015) Current Challenges in Volatile Organic Compounds Analysis as Potential Biomarkers of Cancer. J. Biomarkers, 2015, 1–16.
Schwaiger,M. et al. (2019) Merging metabolomics and lipidomics into one analytical run. Analyst, 144, 220–229.
K. Sellers. (2010). Why Derivatize? Improve GC Separations with Derivatization. Available from: https://www.restek.com/pdfs/CFTS1269.pdf
St John-Williams,L. et al. (2017) Targeted metabolomics and medication classification data from participants in the ADNI1 cohort. Sci. Data, 4, 170140.
Sumner,L.W. et al. (2007) Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics, 3, 211–221.
Tautenhahn,R. et al. (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics, 9, 504.
Tautenhahn,R. et al. (2012) XCMS Online: a web-based platform to process untargeted metabolomic data. Anal. Chem., 84, 5035–9.
Thermo-Fisher Scientific. (2007). Plasma and Serum Preparation. Available from: https://www.thermofisher.com/us/en/home/references/protocols/cell-and-tissue-analysis/elisa-protocol/elisa-sample-preparation-protocols/plasma-and-serum-preparation.html
Wishart,D.S. et al. (2018) HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res., 46, D608–D617.
Xu,Y.-F. et al. (2015) Avoiding Misannotation of In-Source Fragmentation Products as Cellular Metabolites in Liquid Chromatography–Mass Spectrometry-Based Metabolomics. Anal. Chem., 87, 2273–2281.