Development of a pipeline for analysing DAP-Seq data

 Stage · Stage M2  · 6 mois    Bac+5 / Master   CIRAD · Montpellier (France)

 Date de prise de poste : 22 janvier 2023

Mots-Clés

DAP-Seq Workflow Transcription factors

Description

Transcription factors are central regulators of biological processes that function by binding to DNA regulatory sequences to control the expression of their target genes. The entire set of regulatory interactions between a transcription factor and its target genes is called gene regulatory network (GRN). Thus, the identification of transcription factor binding sites throughout the genome is a common approach to decipher GRNs. DNA Affinity Purification Sequencing (DAP-Seq) has emerged as a simple, high-throughput method for genome-wide identification of transcription factors’ DNA targets (Bartlett et al, 2017). One of the advantages of the method compared to those described previously (eg ChIP-seq) is that DAP-Seq can be potentially applied to any plant species. In the AFEF team at the AGAP Institute, we have used DAP-seq to identify hundreds of transcription factors gene targets that are organized in GRNs involved in dormancy and budbreak of apple (da Silveira Falavigna et al, 2021).

The AGAP Institute offers a 6-months M2 stage to develop a bioinformatic pipeline to analyse DAP-Seq data that will be used to identify genome-wide binding sites of transcription factors from diverse plant species. In particular, using previously published data sets from Arabidopsis and apple the M2 student will develop a pipeline for:

  • Map sequenced reads from Illumina to the reference genome
  • Pre-treat the data and filter out unmapped and duplicated reads
  • Peak calling
  • Find motifs at peak regions using MEME
  • Search transcription factors binding sites and identify target genes

The student will develop the pipeline under the supervision of the Bioinformatics platform and the AFEF team of the AGAP Institute.

We search a candidate with the following profile:

  • Bioinformatics background with experience working in a linux environment
  • Good level in a scripting language (e.g. bash, python) and R as well as data analysis techniques and statistics
  • Previous experience working with NGS datasets is a plus
  • Ability to work in a team with both computational and experimental biologists
  • Bonus: First experience with workflow management system (Snakemake or Nextflow)

 References 

  • • Bartlett, A., O'Malley, R., Huang, Ss. et al. Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat Protoc 12, 1659–1672 (2017). https://doi.org/10.1038/nprot.2017.055 
  • • da Silveira Falavigna V, Severing E, Lai X, Estevan J, Farrera I, Hugouvieux V, Revers LF, Zubieta C, Coupland G, Costes E, Andrés F. Unraveling the role of MADS transcription factor complexes in apple tree dormancy. New Phytol. 2021 Dec;232(5):2071-2088. https://doi.org/10.1111/nph.17710 

Candidature

Procédure : Applicants must send a motivation letter, a CV and the name and e-mail address of two references to fernando.andres-lalaguna@inrae.fr and gaetan.droc@cirad.fr before 31/10/2022. The position is available from January 2023.

Date limite : 31 octobre 2022

Contacts

Fernando ANDRES-LALAGUNA

 feNOSPAMrnando.andres-lalaguna@cirad.fr

Offre publiée le 22 septembre 2022, affichage jusqu'au 31 octobre 2022