Development of an emBASE-Galaxy bridge

Type de poste
Durée du poste
Contrat renouvelable
Contrat non renouvelable
Date de prise de fonction
Date de fin de validité de l'annonce

<pre wrap="">
Genome Biology, EMBL, Heidelberg</pre>


Charles Girardot
Email du/des contacts
Next generation sequencing (NGS) is the key technology to analyse the transcriptome, determine DNA-binding protein maps, interrogate sequence variants and sequence the genome of new organisms or individuals. The Genome Biology Computational Support aims at providing the whole Genome Biology unit with support in term of NGS data storage and analysis. We use emBASE, a local branch of the open source BASE platform, to store and annotate both microarray and NGS data. Data analysis is performed on a high performance cluster and managed using a local Galaxy instance. In this project, we propose to improve the communication between emBASE and Galaxy. In particular, the project aims at enabling emBASE to add data libraries in Galaxy, launch predefined workflows (e.g. QC analysis, demultiplexing, read mapping) and transfer results back to emBASE for long term archiving (QC reports, demultiplexed datasets). These functionalities will be developed using the Galaxy API and web services. The project also includes analytical activities like creation of new analysis workflow, addition/development of the necessary tools and ensuring their smooth integration with the compute cluster. The successful candidate will have a practical knowledge of Python, SQL, must be able to work on linux servers and not be reluctant to server administration. Knowledge of R and Bioconductor is a strong plus. Knowledge of other languages like PHP, Java, Perl and javascript are more than welcome. The candidate will work together with software developers and bioinformaticians of the Genome Biology unit. Skills like organisation, commitment and rigour are essential.