Mots-Clés
BCL format
genomic compression
high performance computing
Description
Basic Function and Scope of the Position:
Illumina is the world leader in sequencing technology providing best in class high throughput short-read sequencing instruments. While high throughput sequencing is increasingly used in whole exome and whole genome sequencing applications, the volume of data it generates also increases at the same pace to the point where data storage has become a bottleneck.
Last year, Illumina has acquired Enancio, a start-up focused on genomic data compression based in Rennes. The Enancio compression technology, now called DRAGEN ORA compression enables a lossless compression up to a 5x ratio vs. FASTQ.GZ files. While DRAGEN ORA compresses FASTQ and FASTQ.GZ files into FASTQ.ORA files, the output of Illumina sequencers is in Binary Base Call format (.BCL files).
The storage strategy of a lab varies based on the purpose of the data. Some may store FASTQ files, others .BCL files or .BAM files.
The purpose of this internship is to work closely with the bioinformatic team in Rennes to develop a new tool to compress losslessly .BCL files. The development of this tool should meet certain requirements:
- Lossless compression
- High performance ability to scale up to large volume of data with reasonable memory usage and runtime
- Low-cost integration to the workflow, especially conversion to FASTQ format
- Easy to use
Note: the intern will work on the development of a part of this tool.
Offices are in Rennes with possibilities of hybrid remote/on-site work. The intern will join a growing bioinformatic team in France. Moreover, he/she will be under an Illumina employee French contract with advantageous benefits.
Tasks and Responsibilities:
- Develop a good understanding on how BCLs are generated
- Develop a good understanding on BCL demultiplexing and conversion to FASTQ
- Develop a good knowledge of .BCL format specifications
- Be able to propose different software design strategies to meet requirements
- Design, develop and debug C++ software on Linux
- Be able to follow good practices in term of software development
- Design and run benchmarks to assess runtime/compression ratio performance
All listed tasks and responsibilities are deemed as essential functions to this position; however, business conditions may require reasonable accommodations for additional tasks and responsibilities.
Preferred Experience/Educational Background:
- Master degree second year (MS2) in Bioinformatics or equivalent Engineer degree
- Good level of expertise in C++
- Debugging/troubleshooting skills
- Be curious and analytical
- Experience in High Performance Computing
- Ideally with a first degree or experience as a developer
- Very good level in English
Candidature
Procédure : Please send cover letter and resume to Jennifer Del Giudice cc. Hector Wheatly hwheatley@illumina.com.
All applications must be submitted in English
Date limite : 15 novembre 2021
Contacts
Jennifer Del Giudice and Hector Wheatley
jdNOSPAMelgiudice@illumina.com