Background:
Clinical research studies rely on several important documents that define how a study is conducted and how data is collected.
Two key document types are:
Study Protocols (Clinical Investigation Plans) These describe the goals of the study, the procedures that will be performed, and the outcomes that will be measured.
Case Report Forms (CRFs) These are structured forms used to collect study data from investigators during the clinical trial. Ideally, every data element collected in a CRF should be supported by information defined in the study protocol. However, clinical studies are complex and evolve over time. Protocols may be updated, study procedures may change, and data collection requirements may expand. These changes can create inconsistencies between what the protocol describes and what the CRF collects.
Templates and Standards:
To improve consistency, organizations often use templates and standards when creating study documentation.
For example:
- Protocol Templates define standard sections such as study objectives, endpoints, and assessments.
- CRF Standards define commonly used data variables and structures used across multiple studies.
These templates are intended to ensure that study design and data collection remain aligned.
This project will explore how well these templates and standards support consistent documentation across real clinical studies.
Project Goal:
The goal of this project is to evaluate how closely study protocols, CRF standards, and actual study data collection forms align with one another, and to assess how well these documents are structured for potential use with artificial intelligence tools. Students will investigate whether information described in protocols can be clearly mapped to the data fields collected in case report forms.
Example Question: If a protocol states: “Assess device success at hospital discharge.”
- Does the CRF contain a clearly defined field for that outcome?
- Or are multiple related fields used, making the relationship less clear?
Understanding these relationships helps determine whether computers—and AI systems—can easily interpret study designs.
Data Provided:
Students will work with de-identified study documentation from past clinical studies. Data may include:
- Protocol templates used to design studies
- Case report form (CRF) standards and variable libraries
- Study-specific protocols
- Case report form specifications for individual studies
- Version histories showing how documents evolved during a study
NOTE: No patient-level data will be included.
Key Questions to Explore:
Students may explore questions such as: 1. Protocol to CRF Mapping
- Can study objectives, endpoints, and procedures described in protocols be clearly linked to specific CRF variables?
2. Use of Standards
- Do study-specific CRFs consistently follow the organization’s CRF standards?
- Where do deviations occur?
3. Documentation Gaps
- Are there CRF variables that do not have clear supporting language in the protocol?
- Are there protocol-defined outcomes that are difficult to locate in the CRF?
4. Document Evolution
- How do protocols and CRFs change over time during a study?
- When protocols are amended, how consistently do those changes propagate into the CRF?
5. AI Readiness
- How well structured are these documents for interpretation by automated systems?
- Students may explore ways to evaluate or score the “AI readiness” of clinical documentation.
Possible Analytical Approaches:
Students may apply techniques such as:
- natural language processing (NLP)
- document similarity analysis
- entity extraction
- knowledge graph construction
- large language model analysis
- data visualization and mapping techniques
Expected Outcomes:
Students may produce:
- tools or scripts for mapping protocol text to CRF variables
- visualizations of relationships between study design and data collection
- analysis of common documentation misalignment patterns
- a framework for evaluating how well clinical documentation supports automated analysis
Why This Matters:
Clinical research organizations are increasingly exploring how artificial intelligence can assist with generating and reviewing study documentation. Better alignment between protocols and data collection forms can help enable future workflows where AI tools assist with tasks such as:
- designing study data collection forms
- validating study documentation
- generating clinical reports while maintaining appropriate human review and compliance with regulatory expectations such as 21 CFR Part 11.
Weekly Meeting Times:
- Mentor Meeting (50 mins): Thursdays @ 09:30-10:20am
- Student Lab / Working Session (1 hr, 50 mins): Tuesdays @ 09:30-11:20am
- Fall mentor time: Thursday: 9:30 AM Eastern
- Fall lab time: Tuesday: 9:30 AM Eastern
- Spring mentor time: Thursday: 9:30 AM Eastern
- Spring lab time: Tuesday: 9:30 AM Eastern
- Industry: Healthcare
- Requirements: Open to all students