Title: Development of comprehensive non-target bioinformatics and chemometric tools for data filtering and management of mass spectrometry datasets and the chemical compound identification in environmental omics (proteomics, metabolomics, and exposomics)

The PhD student Carlos Pérez López, from the Chemometrics for Environmental Omics group, will defend his thesis on 18th November at 11:30h in Facultat de Ciències at Universitat de Girona.

Title: Development of comprehensive non-target bioinformatics and chemometric tools for data filtering and management of mass spectrometry datasets and the chemical compound identification in environmental omics (proteomics, metabolomics, and exposomics)

Directors: Romà TaulerAntoni Ginebreda and Damià Barceló

Thesis Committee: Victoria de los Ángeles Salvadó Martín, Guillermo Quintas Soriano and Antonio Checa Gómez

Abstract:
Environmental omics is a valuable source of knowledge concerning the chemical compounds and biomolecules occurring in the environment and their molecular effects on the exposed biota. Mass spectrometry (MS), usually coupled to a chromatographic instrument to separate the analytes of a sample, is the most widely used analytical technique in environmental omics studies. However, MS instruments generate a large amount of data, some unrelated to analytes and associated with instrumental interferences or noise background. Filtering MS signals, selecting those belonging to analytes, and identifying these chemical compounds remain challenging tasks in analytical chemistry. These procedures are even more demanding when there is a total lack of knowledge of the sample´s composition, a typical scenario in the untargeted analytical approach. In the last decades, chemometrics and bioinformatics methods have been proposed, some based on feature detection and a few others on the direct resolution of chemical compounds.

This PhD Thesis has been focused on studying the so-called Regions of Interest-Multivariate Curve Resolution (ROIMCR) chemometrics method to resolve the chemical compounds present in environmental omics samples, following a direct non-target analytical approach. Data from different sources and MS acquisition modes have been analyzed using the ROIMCR method to validate its performance, (including strengths and limitations). Datasets related to mixtures of small (exposomics and metabolomics) and large (proteomics) molecules have been successfully analyzed using full-scan, Data Dependent acquisition (DDA), and Data independent acquisition (DIA) modes. The possibilities of ROIMCR using DIA mode have confirmed its application to the analysis of MS data structures including MS2 information (even considering simultaneously positive and negative ionization modes). Some tools were developed to overcome certain limitations (generally related to the lack of automatization of the process) of the ROIMCR methodology. Such developments were implemented in two related MATLAB software packages, the SigSel package used for the visualization, filtering, and extraction of ROIMCR-results information, and the MSident program for annotating the resolved chemical compounds. Additionally, a rapid and cost-effective methodology, so-called Aquasearch, has been proposed to perform proteomic biomarkers profiling in wastewater samples using Matrix Assisted Laser Desorption /Ionization-Time of Flight (MALDI-TOF). The Aquasearch program developed in this Thesis is proposed as a convenient tool for routine biomarker monitoring, instead of other possible methods that are more resource-demanding and time-consuming.