12-month master 2 apprenticeship (alternance) contract (CEA Grenoble)
Apprentissage · Stage autre · 12 mois Bac+5 / Master IRIG - CEA Grenoble · Grenoble (France)
Date de prise de poste : 15 septembre 2023
Big Data, Data Lake House, Bioinformatics, AI, Machine/Deep Learning
Offer: 12-month master 2 apprenticeship (alternance) contract (CEA Grenoble)
Title: Data Lake House for human genomic data from tissues and pre-clinical models
Background: Over the past 10 years, technological advances in DNA sequencers have enabled the generation of large
amounts of genomic data stored in public data repositories. Research projects seeking to take advantage of this data to
investigate issues in human health must systematically go through a stage of collection and integration of structured and
unstructured large scale data sets. The traditional approach used by data scientists involves manipulating many flat files and transforming them into dataframes using command line tools. This data transformation and organization strategy has a number of drawbacks for: (i) reproducing transformations and ensuring data versioning, (ii) finely and quickly configuring secure access to the different data transformation stages according to their levels of sensitivity, (iii) facilitating collaborative work on data and (iv) massive training of AI predictive models.
Objective: This apprenticeship will continue the development of a Data Lake House dedicated to the management and
exploration, by data analytics and machine/deep learning approaches, of human genomic data from tissues and 3D cell
cultures (organoids and tumor spheroids).
Workplan: The intern will work on the global data cycle from data ingestion, cleanup, and preparation stages to the
generation of different data marts suitable for downstream analysis of use cases involving data analytics and AI approaches.
Host laboratory: The intern will be hosted in the “Genetics and Chemogenomics” team of the Interdisciplinary Research
Institute of Grenoble (IRIG) of the CEA Grenoble. He/she will be supervised by Christophe Battail, expert in computational
analysis and modeling of genomic data, and will evolve in a multidisciplinary research environment composed of
bioinformaticians, AI experts and biologists.
Knowledge and skills: Big Data technologies (Deltalake, Apache Spark/Atlas/Ranger/Grafana, machine/deep learning), unix command line and Python programming, interest and some knowledge in biology and health.
Professional aptitude: curiosity and desire to improve their scientific and technological skills, rigor and organization, and
ability to work in a team and interact with other students, engineers and researchers.
Procédure : Please send a CV, a cover letter and the name of a referee to firstname.lastname@example.org.
Date limite : 12 juillet 2023
Offre publiée le 24 mai 2023, affichage jusqu'au 12 juillet 2023