Integrative analysis of multiple study data for the identification of common metabolic syndrome phen

 CDD · Thèse  · 36 mois    Bac+5 / Master   Plateforme d’Exploration du Métabolisme (PFEM) de l’INRAE de Clermont-Fd/Theix · Saint Genès Champanelle (France)  1875

 Date de prise de poste : 1 septembre 2022


Data integration metabolomics phenotype prediction metabolic syndrome



Deciphering disease-related phenotypes and defining their relationships is crucial to gain a better understanding of non-communicable complex diseases. Metabolomics, the latest in ‘omic’ approaches, is now recognized as a powerful phenotyping tool for nutrition and health research. However, untargeted approaches generate large and intricate datasets which lead to challenges in mode lling, integrating and operating complex metabolic profiles in the transition to precision medicine. In practice, technical challenges of LC-MS untargeted metabolomics, generating semi -quantitative data, prevented large intercomparison between studies. In a context of innovative research focused on Healthy Aging, the present project aims to address the major scientific challenge of identifying common and early phenotypes of metabolic syndrome (MetS) for a deeper understanding of the pathophysiological processes, and modulators involved. Thus, the research question proposed in this PhD thesis concerns the study of the MetS installation trajectories
through the implementation of an integrative data analysis strategy

The objectives will be to 1) develop multivariate models and visualize relationships between large groups of variables, 2) infer phenotypic sets, and 3) find underlying relationships between various datasets to reveal common trends and/or specificities. The work will be based on the use of severaldata sources (metabolomic, epidemiological, phenotypic) collected from different observational cohorts. The work programme will be divided in four main steps:

- standardisation, curation and harmonisation of metadata for the improved annotation of metabolites from LC-MS data, as well as for the description of the subjects in accordance with the requirements of the metabolomic domains and nutritional epidemiology following the FAIR principles (for Findable, Accessible, Interoperable, Reusable).
- stratification of the subjects according to phenotypic data and epidemiological factors (e.g. sex and age) and structuring of the data blocks.
- optimization, within a single study, of the multi-group multi-block data analysis workflow, by selecting the most suitable methods for investigating the complex structures of metabolomic data, taking into account the longitudinal follow-up of the subjects.
- implementation of high-level data integration using metabolic network modelling to link the results of multiple studies in a biologically meaningful context. 


Working place will be located in Clermont -Ferrand, within the Human Nutrition Unit (INRAE UMR1019), which is developing research projects focused on understanding nutrition- health interactions. The PhD student will work within the MAPPING research group (for "MetAbolic phenotype, nutrItioN and modelinG"), that focus on the study of early metabolic events in the development of chronic disorders such as metabolic syndrome, and on the identification of the risk of progression towards these diseases. This research group is part of the Metabolism Exploration Platform (PFEM), entirely dedicated to mass spectrometry metabolomic approaches, and member of MetaboHUB (the national French infrastructure for metabolomics and fluxomics). The PhD student will work in close interaction with the statisticians and bioinformaticians of the platform, which generated the raw data.

The thesis will be co-supervised by the Biomedical and Metabolomic Analysis (BMA) group of the University
of Geneva. Its research activities include the implementation of efficient analytical strategies for the
monitoring of metabolites in biological matrices, and the development of innovative data mining approaches combining chemometrics, machine learning and bioinformatics to analyse large metabolomic datasets including data collected from longitudinal clinical studies.


* Recommended training: Master II in statistics for biomedical research
* Knowledge required:
- Solid grounding in applied mathematics and statistics
- Knowledge and practice of methods for data analysis in clinical research and epidemiology
- Programming and bioinformatics skills
- Good knowledge and practice of the R language would be appreciated
- English mandatory
* Skills:
- Ability to perform statistical methods in epidemiology;
- Be motivated to work in a multidisciplinary environment (biology, chemistry, informatics) and interact with different actors during the internship.
- Rigour, autonomy, and organisation
- Good communication and writing skills


Procédure : If you are interested in the position and meet the requirements of the profile, we will carefully evaluate your application submitted before 15 June 2022, by e-mail to and The selection will include one or more interviews. The application must contain the following documents: • A letter of application stating your motivation, qualifications, main scientific experiments, and a brief description of your current and future scientific interests (max. 1 page). • A complete CV (max. 2 pages), including a clear presentation of your past and current work experiences, including start and end dates, employer and location of employment. • Contact information for at least two relevant references to be verified.

Date limite : 15 juin 2022

Contacts and

Offre publiée le 7 avril 2022, affichage jusqu'au 15 juin 2022