BIG DATA MANAGER AND BACK-END DEVELOPER (M/W)
CDI · Ingénieur autre Bac+5 / Master Imagine Institute · PARIS (France) Salary negotiable depending on experience
Date de prise de poste : 1 juin 2021
data big data management genomic sequencing
Imagine is an interdisciplinary research center focused on human rare genetic diseases located at the campus of the Necker Hospital for Sick Children, in the heart of Paris. Imagine is an INSERM Unité mixte de recherche (UMR-1163), and it is affiliated with the Université de Paris and the Paris Public Hospitals Group (Assistance Publique-Hôpitaux de Paris). Imagine intends to address unmet basic and clinical research questions related to rare diseases, in order to increase knowledge in a major medical field that is currently insufficiently covered. Ultimate goals are: improving clinical and genetic diagnosis, anticipating life-long outcomes, proposing a personalized medical follow-up, and accelerating the development of new therapeutic strategies.
The Institute is composed of more than 550 staff members from 28 research laboratories in the fields of genetics, immunology, infectious diseases, hematology, nephrology, developmental disorders, metabolic diseases, dermatology and gastroenterology. The Institute counts with excellent core facilities and technology platforms in genomics, single-cell sequencing, proteomics and cell imaging, as well as in bioinformatics and data science. As a cross-cutting strategy, Imagine has recently set an integrated care & research program (iCARP) on Computational Decision Support Systems (CDSS). The CDSS program aims at the development of translational applications in the fields of bioinformatics, medical informatics, artificial intelligence and big data analysis. Fundamental and applied research focuses on the development of innovative computational methods and software, data analysis pipelines and computational interfaces to assist medical decision in a clinical setting. This includes deep phenotyping and deep learning on heterogenous data sets for improved diagnosis, prognosis and therapeutic strategies.
The Imagine Institute is committed to implement comprehensive data integration protocols and big data management systems for enhanced development of artificial intelligence applications. Major big data sources developed at the Institute cover genomic sequencing (whole exomes and whole genomes) and multi-omics profiling of patient’s samples (both at bulk and at single-cell level) as well as clinical data (including electronic health records, structured phenotypes, imaging and text). In this context, the Imagine Institute is seeking a senior big data manager to develop the associated back-end architecture integrating diverse databases while offering inter-operability among data portals and with front-end applications of the Institute. Under the scientific supervision of Antonio Rausell, coordinator of the iCARP on CDSS, she/he will thus be responsible of the coordination to that aim with Imagine’s research laboratories and technological platforms, notably those involved in data generation, processing and analyses such the Genomics Platform, the Labtech Single-cell, the Proteomics platform, the Bioinformatics Platform, the Data Science Platform and Imagine’s biobanks.
Responsibilities will include:
- Definition of data models, metadata, and data exchange protocols in coordination with technological data platforms.
- Implementation of FAIR standards for data management (Findability, Accesibility, Interoperability and Reusability) compliant with national and international guidelines: Institut Français de Bio-infomatique, ELIXIR-Excelerate (https://bioschemas.org), and the Global Alliance for Genomics & Health (GAGH ; https://beacon-project.io).
- Implementation and management of distributed NoSQL databases (MongoDB, HBase) including the configuration of the strategy for replication, transactions management and recovery.
- Implementation of distributed storage formats and serialization systems for big data based on Hadoop HDFS file system and on json, parquet and avro files.
- Design and implementation of distributed search indexing protocols such ElasticSearch and Solr.
- Implementation of client-server communications protocols based on RESTful API’s as well as gRPCs API’s and coordination with front-end applications of the Institute.
- Definition and implementation of data governance policies including permissions systems, data encryption protocols, multi-client applications and load-balanced querying.
- Deployment, scaling, and management of containerized applications with Docker and Kubernetes in a Hadoop ecosystem.
QUALIFICATION AND PERSONNAL SKILLS
- Senior Java 8+ developer with experience in Maven and Junit
- Advanced level of at least one SQL language (MySQL, PostgreSQL, Spark SQL)
- Experience in at least one NoSQL database (MongoDB, Hbase)
- I/CD deployement: e.g. GitHub, Jenkins, Docker - Experience in distributed computing technologies: Hadoop and Spark.
- Kubernetes, Zookeper and cloud computing deployment
- Excellent communication skills both oral and written in English.
- Permanent contract
- Salary negotiable depending on experience
Procédure : The application letter and the CV are to be submitted under the reference Imagine_195 at firstname.lastname@example.org.
Date limite : None
Antonio RAUSELL, coordinator of the iCARP Computational Decision Support Systems
Offre publiée le 29 avril 2021, affichage jusqu'au 26 juin 2021