Foundational Models

Johnson & Johnson

West Lafayette, IN  

Project Description: Early phase clinical trials often operate in a “large p, small n” regime, with many covariates, multiple endpoints, but small sample sizes. Recent advances in foundation models, e.g., GPT-style or domain-specific LLMs, offer potentially new opportunities to extract insights through joint modeling of large dimension data and through embedding of biomedical context into classic predictive models such as XGBoost. This project will investigate innovative strategies that bridge Generative AI and Predictive Modeling, focusing on but not limited to the following directions:

• LLM-guided hypothesis generation (undergraduate level): Prompt domain-specific large language models with trial and literature context, to recommend clinically relevant subgroups and plausible effect modifiers for focused subgroup discovery and enrichment • Empirical priors for endpoint modeling (graduate level): Identify and adapt joint endpoint distributions from pre-trained medical generative models, to be used as empirical priors • Synthetic data augmentation (graduate level): Train and validate tabular diffusion models on internal data that preserve observed endpoint relationships, to augment study data with synthesized patient-level records

Keywords: Generative AI and Diffusion Models; Predictive Modeling of Multiple Endpoints; Clinical Trials; Subgroup Identification

Tools and Skills Students will Use and Learn: Implementation of GANs, VAEs, or Diffusion Models and training models to synthesize biologically plausible clinical patient data

Preference for Student Profile • Machine learning basics • Conceptual understanding of GenAI frameworks like GANs, VAEs, Diffusion • Proficiency in python or R

— See Details —