25, 28 rue du Dr Roux
Development of bioconvert software and benchmarking of bioinformatics format converters
Life science uses many different formats. They may be old, or with complex syntax and converting those formats may be a challenge. bioconvert (https://github.com/biokit/bioconvert) aims at providing a common tool / interface to convert life science data formats from one to another.
Many conversion tools already exist but they may be dispersed, focused on few specific formats, difficult to install, or not optimised. With bioconvert, we plan to cover a wide spectrum of format conversion; we will re-use existing tools when possible and provide facilities to compare different conversion tools or methods via benchmarking. New implementations are provided when considered better than existing ones.
The Bioconvert is an ongoing project which had received a good welcome from the french bioinformatics community at the last JOBIM conference. A lot of code is already available, but it’s still under development. The candidate main mission will be to work on the current version of the software so as to obtain a first stable release ready for production by:
- Validating existing formats and conversions
- Improve the documentation
- Make sure current distribution (as BioConda package and singularity container) are still up-to-date
- Possibly lead a publication
This tool will be used by lot of bioinformatics units in and outside Pasteur institute.
Following the progress of the project, a web version can be considered.
The candidate will be part of the Bioinformatics and Biostatistics Hub (https://research.pasteur.fr/en/team/bioinformatics-and-biostatistics-hu…).
Skills required (all levels accepted):
- Successful experience in Python
- Be comfortable with the bash/Unix programming language
- Knowledge of biological formats (alignment, mapping, phylogeny, etc)
- git/github knowledge.
- Curiosity, rigour and autonomy, strong interest in the reproducible research
There are 15 contributors to bioconvert; most of them are from the Hub of Bioinformatics and Biostatistics of the Institut Pasteur. The candidate will have the ability to interact daily with those bioinformaticians and to get help and expertise to make this project a successful one.
Ideally, the internship will be 6-months long and will start in February-March 2019. However, we are flexible and are open to discussion.