Unknown compound identification is a major challenge in metabolomics. Even establishing broad definitions of an “unknown compound” is not straightforward. An unlikely source of clarification (some might say confusion) to our field came from former US Defense Sectretary, Donald Rumsfeld, who once said
Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones.
Following Rumsfeld, a “known known” is a metabolite in a reference database that can be matched with an experimental dataset. A “known unknown” is a metabolite that has been found before but is not in an accessible database. An “unknown unknown” is a metabolite that has not been discovered. The NIH Metabolomics Common Fund has funded 5 centers in the US to improve unknown metabolite identification. Our lab leads one of these centers, and our project is titled Genetics and Quantum Chemistry as Tools for Unknown Metabolite Identification.
We are using the model organism Caenorhabditis elegans and comparing both known mutants and natural isolates with the reference strain PD1074. We collaborate with Erik Anderson on C. elegans and Lauren McIntyre on study design and biostatistics. The conceptual steps that we use in this project are:
- Use high-resolution LC-MS to compare a mutant with PD1074. We use both Thermo Q-Exactive HF Orbitrap and 12 T Bruker Solerix FT-ICR. We collaborate on this with Facundo Fernandez, Jon Amster, Franklin Leach, and Frank Schroeder.
- Peaks that change intensity between mutant and PD1074 and do not match databases will be further analyzed.
- From the accurate m/z values, we can determine the molecular formula
- From the molecular formula, we will attempt to enumerate as many possible molecules as possible using resources such as chemspider.
- Each of the possible structures will be computed by high-level quantum mechanical techniques developed by our collaborator, Kennie Merz.
- We will then compare computed and experimental NMR chemical shifts by adapting an approach called SUMMIT developed by the Bruschweiler lab.
- The experimental NMR data will be obtained by making a fraction library of C. elegans PD1074 reference material. Each HPLC fraction will be analyzed by both NMR and LC-MS, and we will associate unknown LC-MS peaks with NMR data and their retention time.