Bioinformatics objectives
The available data include:
(i) public GB transcriptome datasets based on different technologies: microarray, RNA-Seq or single cell RNA-Seq (scRNA-Seq). These datasets include samples from patients (bulk tumors).
(ii) home-made GB transcriptome datasets based on different technologies: microarrays (Agilent, Illumina) or RNA-Seq. These datasets include samples from cell lines cultured at the laboratory and submitted to drugs developed at the laboratory (IRE1 inhibitors) + control conditions.
(iii) paired proteome datasets for the GB cell lines cultured at the laboratory (treated conditions + control conditions)
(iv) molecular signatures defining GB cell states [OPC-like, NPC-like, AC-like & MES-like] or IRE1 activity [high/low]. The 4 GB cell states were obtained through the analysis of single cell RNA-Seq of bulk tumors and the transcriptome signatures associated with IRE1 activity were previously described by our group (36 marker genes).
As these molecular signatures are based on gene expression, they can be used to characterize any GB sample with transcriptome data available (microarray or RNA-Seq). The first objective of this proposal is therefore to use deconvolution methods (e.g., the sigScores function of the scalop R package [https://rdrr.io/github/jlaffy/scalop/] and the EcoTyper machine learning framework [https://github.com/digitalcytometry/ecotyper]) to determine the cell state composition of the different GB samples. Likewise, these samples will also be classified according to IRE1 activity using the transcriptome signature previously described by our group (36 marker genes).
The second aim will be to correlate cell state composition with IRE1 activity. One major question here is to address whether to analyze the different datasets separately or integrate them in a meta-analysis (at least for the datasets based on a same technology) using a home-made pipeline developed by a former M2 bioinfo student.
Finally, the third objective is to investigate the possibility to identify specific protein markers associated to each cellular state using Machine Learning approaches (caret R package) or discriminant analysis applied to combined transcriptome and proteome datasets.