Knowledge Engineering for a Real-World Evidence hypergraph: metamodel, tooling, and large-scale cons

Revenir à la liste des offres d'emplois

Stage · Stage autre · 6 mois Bac+5 / Master Inserm CESP · Paris (France)

Date de prise de poste : 1 janvier 2026

Mots-Clés

Knowledge engineering hypergraph data model real-world evidence knowledge base ontologies epidemiology LLM-based agents automated curation

Description

Real-world evidence studies leverage routinely collected healthcare data and modern epidemiological designs to estimate associations and causal effects between exposures (e.g., drugs), outcomes (e.g., disease onset, progression, adverse events), and patient-level factors across a broad spectrum of conditions. These results are increasingly produced with causal inference workflows and are critical to accelerate evidence synthesis for applications such as drug repurposing and clinical decision support.
A large Real-World Evidence Knowledge Base (RWE-KB) has recently been developed to structure and connect this evidence at scale. It compiles findings from epidemiological studies (results, populations, exposures, comparators, outcomes, covariates, bias/limitations, metadata), links treatments to targets, and is enriched with ontological and mechanistic knowledge. The resource is naturally represented as a hypergraph, enabling n-ary relations, hierarchical node/edge types, contextualized assertions, explicit evidence levels, and end-to-end provenance. However, the current hypergraph is still sparse and heterogeneous, and scaling it to a level that supports downstream tasks, such as AI development and clinician-facing products, requires stronger validation, data quality, provenance, and robust ingestion/curation workflows.
The core objective of this internship is to grow the existing RWE-KB into a large-scale, high-trust evidence hypergraph, with explicit provenance, quality signals, and conflict-aware aggregation. The intern will drive the expansion of the hypergraph by integrating new epidemiological evidence end-to-end, from normalization to representation, while strengthening the metamodel and validation rules that keep the KB consistent. Building on the current tooling, they will harden ingestion and curation workflows to improve key performance indicators and optimize LLM-based curation agents that reconcile inconsistent sources, handle deduplication, and reduce manual burden while keeping an auditable review loop. The outcome is a substantially larger, cleaner, and more reliable knowledge base designed to power downstream AI pipelines and clinician-facing applications.

Full offer description is at https://clreda.github.io/assets/offers/RWE_hypergraph_internship_proposal.pdf

How to apply
Interested candidates should apply either in English or French to reda@bio.ens.psl.eu and lamiae.grimaldi@aphp.fr with a detailed CV and a motivation letter.

Candidature

Procédure : Interested candidates should apply either in English or French to reda@bio.ens.psl.eu and lamiae.grimaldi@aphp.fr with a detailed CV and a motivation letter.

Date limite : 23 avril 2026

Contacts

Lamiae Grimaldi
laNOSPAMmiae.grimaldi@aphp.fr

Clemence Reda
reNOSPAMda@ens.fr

https://clreda.github.io/assets/offers/RWE_hypergraph_internship_proposal.pdf

Offre publiée le 19 décembre 2025, affichage jusqu'au 22 octobre 2026