A part of the multi-omics approaches consists in analyzing biological samples with different kinds of ‘omics’ measures such as genomic, proteomic, transcriptomic, metabolomics etc. One of the main objective of such approaches is to identify biological signatures that are more efficient and robust by considering simultaneously several sources of data obtained from the same samples. The combination of these heterogeneous data would allow improving the patient stratification and patient care. Indeed, research results from international consortia have recently shown that, for instance, genomic and proteomic data integration enabled the identification of new subtypes of breast cancer that were not accessible by using genomic analysis alone. Methodological developments to analyze multiple omics datasets are numerous and often use the multiblock analysis approach , . The article  provides a list of the main methods and tools in this context of multi-omics datasets.
The objective of the project is to develop statistical methods in the field of multi-omics data integration based on multivariate analysis and multi-block analysis. The developments would also take into account additional biological information described in systems biology in order to define the most relevant multi-block definition or in integrating the structure between variables into the parsimony constraints for the feature selection task [4, 5, 6]. We will use datasets from the TCGA and CPTAC consortia to evaluate the developments. A preliminary task of the project will be dedicated to carry out multi-omics clustering (unsupervised approaches) to identify subsets of data that would be relevant for the signature identification task .
Interested applicants should have a PhD in data science or applied statistics (data analysis, machine learning, feature selection…) and be interested by multidisciplinary project (data science and biology). Knowledge in biology would be highly appreciated.
 Bouveresse D. et al., Identification of significant factors by an extension of ANOVA-PCA based on multi-block analysis, Chemometrics and Intelligent Laboratory Systems, 2011
 Chen Meng et al., moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets, Journal of proteome research, 2016
 Bersanelli M. et al., Methods for the integration of multi-omics data: mathematical aspects, BMC Bioinformatics, 2016
 Jenatton R. et al., Structured Variable Selection with Sparsity-Inducing Norms, Journal of Machine Learning Research, 2011
 Safo S. et al., Integrative analysis of transcriptomic and metabolomics data via sparse canonical correlation analysis with incorporation of biological information, Biometrics, 2018
 Löfstedt T. et al., A general multiblock method for structured variable selection, arxiv, https://arxiv.org/pdf/1610.09490.pdf , 2016
 Rappoport N. et al., Multi-omic and multi-view clustering algorithms: review and cancer benchmark, Nucleic Acids Research, 2018