Introduction:
Clinical trials are no longer the sole domain of data collected on bespoke case report forms. Real-World Data (RWD) – such as electronic health records (EHRs), insurance claims databases, patient registries, and data from digital health devices – is increasingly being tapped to enhance trials, whether by identifying patients, serving as external control arms, or capturing long-term outcomes. ICH E6(R3) acknowledges this trend by ensuring GCP principles can be applied to such data sources to maintain rigor and reliability. However, integrating RWD into a GCP-compliant trial comes with challenges in data quality, consistency, and privacy. In this blog, we discuss how sponsors and investigators can incorporate real-world data and digital health tools in trials while satisfying E6(R3) requirements, and what regulators will expect to see in terms of data governance and validation.
Opportunities for Real-World Data in Trials
Real-world data refers to health-related information collected outside the traditional trial context. Some common uses in the trial setting include:
- Augmenting Control Groups: A clinical trial might use RWD (e.g., from a disease registry or claims database) to compare outcomes of patients on standard of care with those on an investigational therapy, reducing or even replacing the need for a randomized control arm.
- Hybrid Trial Designs: A study might collect primary endpoint data via usual in-clinic visits, but gather secondary outcomes or longer follow-up through EHR extraction or patient wearables at home. This combination can provide a richer picture of patient health and treatment effects.
- Eligibility and Recruitment: Investigators can use EHR screening tools to find patients meeting trial criteria, speeding up enrollment. Also, capturing some baseline data directly from EHRs can save time and avoid transcription errors.
- Pragmatic Trials: These are trials done in the context of routine care. Often, the interventions are applied in practice and much of the data (like routine lab results, visit records) come from EHRs rather than special research forms.
The benefit of using RWD is increased generalizability and efficiency – you’re getting insights from broader patient experiences and possibly reducing what participants have to do solely for research purposes. However, RWD isn’t collected under controlled conditions and may be messy or incomplete. That’s where E6(R3) guidance comes in: to ensure even when using RWD, the trial data remain trustworthy and the rights of individuals are respected.
GCP Considerations for Using EHRs and Registries
When incorporating Electronic Health Record data:
- Ensure Access and Permission: Investigators must have permission to use patients’ medical records for research. Typically, patients consent to this as part of trial consent (and any additional hospital privacy authorizations, per regulations like HIPAA in the US). E6(R3) requires patient confidentiality to be protected, so make sure extracting EHR data doesn’t expose identifiers to unauthorized parties. Often, data can be coded before analysis.
- Data Relevance and Fit-for-Purpose: E6(R3) Principle 9.2 states data should be fit for purpose. EHRs might have different definitions or timing than a protocol’s needs. For example, blood pressure in an EHR might be taken at various times of day, whereas the trial protocol might want it at a consistent timeframe. You must plan for this discrepancy. Maybe you define how you’ll choose which BP readings from EHR to use (the one closest to the visit date, etc.). Or, if an outcome is hospitalizations, ensure your data source reliably captures hospital admissions for your patient cohort. Document these decisions in the protocol or analysis plan, showing you’ve considered how to use RWD appropriately.
- Data Quality and Completeness Checks: Unlike trial-specific CRFs that are designed to capture exactly the data needed, EHR data can be missing or inaccurate (e.g., a lab value may be missing if done at an outside lab, or a diagnosis may be coded incorrectly). As part of data management, implement quality control checks on imported EHR data. For instance, verify against source if critical data are missing or extreme. You might do a targeted source verification on a sample of EHR entries to ensure the extraction process didn’t mis-map fields. If linking data from multiple sources (say EHR plus trial CRF plus central lab), ensure they align (patient IDs correctly linked, dates consistent). A Data Transfer Agreement and Plan should outline how data flows from healthcare systems to the trial database and what validations occur on receipt.
- Audit Trails from EHRs: GCP expects traceability of data changes. EHR systems have their own audit trails, but when you extract data for a trial, maintain documentation of that extraction (e.g., a query algorithm or date of data pull). If any data is updated (like a diagnosis code revised later), note how updates will be handled (will you re-pull data periodically or lock the dataset at a certain point?). A pragmatic approach: generate a dated dataset from the EHR and keep that file archived; if you ever need to show what EHR info you used, you have the exact file. This becomes essentially “source” for the trial analysis – an inspector might not wade into the hospital’s EHR system if you have an official exported dataset that was protocol-specified.
- Use of Data Standards: When feasible, convert RWD into standard formats or terminologies. For example, use WHO or MedDRA coding for diagnoses and adverse events, LOINC for lab tests, etc. This assists in analysis and also in showing that disparate data was harmonized. If a registry uses different units, convert them to the trial’s units, and document the conversion. Inspectors will want to see that someone took responsibility for ensuring apples-to-apples comparisons. E6(R3) doesn’t list specific standards, but the general admonition is that data should be of quality sufficient to draw conclusions as reliably as traditionally collected data.
- Ethics and Approvals: Using patient data from outside the trial context can raise ethical questions. Make sure the IRB/ethics committee is fully informed of what data you’ll use and how. Often, including a statement in the consent like “We will also collect relevant information from your medical records (such as prior test results and medical history) to avoid repeating tests unnecessarily.” covers it. If obtaining records from outside providers, patients might need to consent to that separately. From a harmonization perspective, regulators worldwide stress transparency with patients about data usage.
Using Digital Health Tools and Wearables
Wearables, apps, and home monitoring devices can continuously or intermittently collect data (steps, heart rate, glucose readings, etc.). To integrate these into GCP trials:
- Device Validation: Treat a digital tool as a piece of equipment needing qualification. Before relying on its data, you should either reference validation studies or conduct your own validation sub-study to ensure the device measures what it claims with acceptable accuracy and precision. For example, if using a blood pressure cuff at home, confirm it’s FDA or CE approved for medical use and perhaps calibrate it for each patient at a baseline visit against a standard device. E6(R3) demands data reliability; proving device accuracy is part of that when the device supplies key endpoint data.
- Training Participants: Provide clear instructions (both written and ideally one-on-one demonstration) to participants on how to use devices properly – how to wear a fitness tracker, when to take measurements, how to synchronize data, what to do if they forget to wear it. GCP doesn’t directly mention this, but it’s analogous to training site staff – here, patients are quasi “data collectors,” so they need training too. Document that you provided this (maybe through acknowledgment in the consent or a training log if you have participants sign off on device training).
- Data Transmission and Handling: Ensure secure and successful data transmission from devices to the trial database. Use encrypted connections if data is transmitted via internet. There should be procedures if data fails to transmit (e.g., if a device hasn’t synced in a week, site staff get an alert to follow up with the patient). This kind of monitoring of data flow is crucial so you don’t end up with huge gaps. Keep logs of any technical issues/outages – inspectors could ask, “there’s a week with no glucose readings, what happened?” – you should have an answer (e.g., device malfunctioned, was replaced on X date, patient continued logging manually in interim – and indeed have those manual logs).
- Data Storage and Privacy: Data from wearables may be stored on third-party servers (e.g., the device manufacturer’s cloud). GCP and privacy laws require that patients consent to this transfer and that contracts assure data protection. In practice, execute a data processing agreement with the device vendor if they will host raw data. Also, plan how you will retrieve the raw data for analysis and archiving. Often you might get CSV files from the vendor periodically. Archive these as source. De-identify them appropriately. Regulators may particularly question if any audio/video data is collected (some trials use video for assessments or compliance confirmation). There must be consent for recording and secure storage with controlled access. E6(R3) basics of confidentiality apply: treat any personally revealing data with highest care.
- Compliance and Missing Data Management: When patients are tasked with using devices, compliance can vary. It’s akin to medication adherence. Outline in the protocol how you’ll handle missing device data – e.g., impute or exclude from certain analyses, and what steps you’ll take to improve compliance (reminders via app or calls). Document each participant’s compliance (device logs often show usage statistics). If an inspector sees lots of missing entries, they may ask what was done about it – showing a robust reminder system or mid-study re-training sessions can demonstrate you tried to ensure data completeness.
Combining Traditional and RWD Streams: Maintaining Integrity
Often, trials will have a mix of data sources – e.g., clinic assessments plus a post-study registry follow-up. To keep overall GCP compliance:
- Consider having a data integration plan clearly mapping how data from various sources (CRF, EHR, device) come together. Who is responsible for each integration step? For instance, the data manager might be responsible for merging the trial dataset with a later registry dataset to create a combined file for analysis. That process should be documented and quality-checked (like verifying key patient identifiers match between datasets).
- Use unique subject identifiers across all data sources to avoid mismatch. Many trials assign a study ID that is used for CRF and maybe also given to any external data provider so that, say, the registry can tag which patients are part of the trial. If direct linking by patient name is needed, ensure that happens securely and then data is coded.
- Analyzing RWD critically: Real-world data often has biases (e.g., sicker patients might have more frequent records in a registry). ICH E6(R3) doesn’t cover analysis per se, but regulators will expect that you handled these issues properly if using RWD to support efficacy or safety conclusions. It’s more a statistical concern, but mention it because as part of planning a GCP trial with RWD, you should involve statisticians to plan how to adjust for confounders or differences in data collection frequency, etc. Document these plans in the protocol or analysis plan so that it’s clear you prospectively managed the complexities.
- Regulatory Engagement: If you intend to use RWD as part of pivotal evidence, early dialogue with regulators (FDA, EMA, etc.) is wise. Many have guidance on RWE (Real-World Evidence) use for regulatory decisions, and they will want to ensure your approach with RWD satisfies their reliability criteria. While optional, such consultation can prevent GCP compliance issues – e.g., a regulator might tell you that you need to get consent from patients for certain data usage even if ethically it seemed okay, or they might advise on how to validate a novel endpoint device. Taking that advice means by the time of inspection or review, you’ve met their expectations.
Case Example
Imagine a heart failure trial where patients get a new drug or standard care. Besides regular clinic visits for primary endpoints, the study utilizes:
- Patients’ electronic health records to collect their hospitalization events and lab tests done as part of routine care.
- A wearable sensor that monitors their daily activity (as a secondary endpoint).
- A long-term registry after the 12-month trial period to follow survival for 3 more years.
To GCP-proof this:
- The sponsor obtained consent for EHR and registry follow-up in the main ICF, explaining what data will be collected and for how long.
- The sponsor partnered with hospitals to securely obtain EHR data. They wrote a procedure: every 3 months, a data extract of pre-specified fields (hospital admissions, NT-proBNP lab results, etc.) will be pulled for each participant. They tested this process on a few records to ensure correct mapping. They keep those extracts on file.
- They validated the wearable’s step count against a research-grade accelerometer in a subset of 30 patients for a week; results were acceptable (within 5% variance). They documented this in a report.
- They gave each patient a detailed wearable guide and had coordinators check device use at each clinic visit (and documented compliance in the source notes). If a patient wasn’t wearing it, they re-educated them. They also had the app send an alert to the study coordinator if data hadn’t synced in >7 days.
- The analysis plan prespecified how they’d handle missing wearable data (e.g., if <10% daily data missing, impute via last observation carried forward; if >10%, exclude from that endpoint analysis).
- They set up a small Real-World Data team: a data manager and epidemiologist overseeing the EHR and registry data integration. They performed a quality review: e.g., cross-checked that every hospitalization noted in EHR was also captured via CRF or vice versa, to ensure no double-counting or omissions.
- At study end, they had a comprehensive dataset combining CRF, EHR, and wearable data. They archived the raw data from each source separately and the combined analysis dataset, with programming code used for merging and analysis.
- During an audit, they were able to show the auditor the audit trail of the EHR query, the transfer logs, and how one patient’s unexpected hospital visit (found in EHR) triggered a protocol deviation reporting (because it should have been reported as SAE but initially wasn’t by the site – they caught it via EHR reconciliation, then filed an SAE report late, which they documented and explained).
This example demonstrates how careful planning and integration of RWD sources under a GCP framework can withstand scrutiny. It’s extra work, but as regulators encourage more RWE use, those who do this well can gain an edge by augmenting trials with valuable real-world insights without sacrificing quality.
Conclusion: Real-World Data with Real GCP Discipline
ICH E6(R3) makes it clear that regardless of data source, data quality and participant protections must remain top priorities. Real-world data can enrich trials, but it must be handled with the same (or greater) rigor as traditional trial data. That means validation of data sources, clear documentation, respecting privacy, and ensuring that use of these data does not introduce bias or uncertainty that could have been avoided with better planning.
For sponsors considering blending RWD into trials:
- Engage cross-functional expertise (clinical, data science, biostat, IT, ethics) early to map out how to do it right.
- Document everything – from how data will be obtained and cleaned to how devices were tested and how patients are guided in their use.
- Pilot test new methods when possible to avoid surprises mid-trial.
- Communicate with regulators for critical uses of RWD to ensure they accept the approach (e.g., using historical controls or real-world endpoints).
By following GCP principles in these innovative areas, you not only comply with E6(R3) but also strengthen the credibility of your findings. Trials augmented with real-world data, done correctly, can have the robustness of randomized trials and the relevance of real-world evidence – the best of both worlds. E6(R3) provides the guardrails to achieve that, making sure that even as we venture into new data frontiers, we carry the compass of GCP to navigate them responsibly.
References
- ICH Guideline for Good Clinical Practice E6(R3), Final Step-4 Guideline, Jan 6, 2025. [1]
- “The revamped Good Clinical Practice E6(R3) guideline: Profound changes in principles and practice,” Arun Bhatt, Perspectives in Clinical Research, 2023. [3]
- TransCelerate/ACRO’s E6(R3) Asset Library: tools on trial design, risk management, data governance. [5]
For those interested in gaining our Transcelerate Biopharma-certified courses, please enroll in our ICH GCP E6 R3 courses at https://www.whitehalltraining.com/
#GCPE6R3 #ClinicalTrials #ICHGuidelines #ClinicalResearch #ICH #E6R3 #GCP #WhitehallTraining #CRO #GoodCllinicalPractice #ClinicalTrials
Guidance To Explore
For those wanting to dive deeper into the details:
- ICH E6 (R3) Final Guideline (Step 4, January 6, 2025) – The official reference text.
- FDA Overview of ICH E6 (R3) – A clear outline of the changes and their implications.
- EMA Step 5 Guideline – European regulatory perspective on implementation.
- TransCelerate ICH E6 Asset Library – Practical tools and frameworks to support adoption (TransCelerate).

Leave a Reply