Lessons Learned & Resources
Return to Projects Page.
Since 2007 there have been three phases of the eMERGE Network. Over this time many lessons learned have been garnered from the investigators, workgroups, meetings, and projects. This page is dedicated to showcasing recent lessons learned, summaries from past steering committee meetings, and packets presented to the External Scientific Panel. The Network hopes that this information will inform the best research practices and guide the science of transnational medicine in the years to come. Click on a heading below to learn more.
Overall Lessons Learned(click here)
Goal: to summarize the lessons learned across all eMERGE workgroups with regards to enrolling, sequencing, and returning clinically relevant variants. To learn more about the eMERGEseq sequencing platform click here.
• Studies should outline recruitment population requirements based on study goals, ascertainment bias, pre-existing conditions, and sufficient sample size based on disease prevalence, and variant frequency should be taking into consideration.
• Harmonization across sites of study design, survey instruments, outcomes to be measured and methods to measure, and data collection should be prioritized and finalized prior to study initiation to ensure streamlined analyses as the project progresses. In recognition that not all issues can be anticipated prior to implementation a process to identify, discuss, and iteratively improve the study is a necessity.
• Clinical-grade cross-institutional electronic health record interface projects require significant effort, and clinical and laboratory sites are extremely heterogenous, gathering requirements, abilities, and needs early on allows for proper planning and timelines for the successful execution of data integration.
• Standardizing return of result information to participants can reduce variability in participant and provider responses allowing for more meaningful outcomes analyses. Context is important to understanding changes (or lack thereof) in health services delivered and long-term outcomes. If contextual factors necessitate heterogeneity in procedures, a method to study the differences using a rigorous implementation science framework can result in generalizable information to inform implementation in diverse sites.
• Clearly defined data freezes considering diversity, phenotypic data, and discovery & implementation goals should be outlined at the beginning of the network project to maximize data delivery and analysis time at the sites.
• Prioritize PGx phenotypes early, recognizing this can require complex genotyping, interpretation, and EHR extraction of drugs, outcomes, timing, especially considering the limitations of current PGx guidelines, particularly with regard to non-white populations (and foster more discovery research specifically in these populations).
• Standard protocols for phenotype and NLP phenotype validation and implementation are critical, documentation, and communication between sites should be prioritized. NLP adds complexity and has extra requirements for privacy protection, technical infrastructure setup, and high-fidelity notes provision compared to standard phenotyping. ‘Off the shelf’ NLP products are not robust enough for deep phenotyping and require extensive customization. Institutional variation in note structure and content further complications use of NLP for phenotyping.
Return of Results (click here)
Goal: To highlight each site’s return of results process from receipt of clinical reports to delivery to the participant. To discuss variability among sites in the process of returning CLIA validated results to participants & providers
Summary: eMERGE sites represent a spectrum of return of results, this allows for a well rounded ability to understand how differences affect overall return. Allows for overarching analysis of how methods affect patient comprehension, engagement, and outcomes.
Experiments of nature
- Cohorts un-selected vs selected for a particular trait -One site selecting participants based on genotype.
- Choice vs no choice for return of secondary findings.
- Negative results returned vs not returned -Variation in timing of placement of results in EHR.
- Randomization vs observational – Pediatric vs adult.
- Differences in the ROR process across sites, although allowing for experiments of nature, provides challenges to studying the impact of ROR.
- Coordinating the impact of ROR on Health Care Providers across sites as sites have different processes for who returns the results and to whom results are sent.
- Coordinating the participant survey across sites is challenging, given the different populations at each site and the different site priorities regarding their focus of research.
- The ROR process is very dependent on institutional cultures and priorities and thus it is difficult to create standard guideline for practice.
- IRBs vary significantly in their requirements, processes, and views towards ROR from genomic sequencing.
Going forward: The ROR group will continue to collect and analyze data in collaboration with Outcomes and EHRI and publish on how differences in participant, provider, and institutional involvement effects the overall return process.
Phenotyping (click here)
Goal: To demonstrate the challenges faced and how eMERGE solved the issues or plan to solve . To discuss how has the nature of phenotyping evolved during the course of eMERGE. To describe what has worked well (machine-learning, NLP, etc).
Summary: The Phenotyping group catalogued issues that cause delays and difficulty during algorithm development and implementation as well as potential solutions or ‘lessons learned’ to these obstacles.
- Logic, complexity of logic, number of data elements, and modalities of data all alter complexity of the phenotype.
- Complexity of the algorithm and scientific question calls upon a select set of individuals to develop and validate, some of which are hard to schedule due to clinical commitments.
- Data Dictionaries can also add complexity, time, and effort.
- Strong project management is needed to keep queues organized, projects assigned, and issues resolved at both the Network and Site level.
- Algorithms as flowcharts are most effective, direct codes do not port well currently.
- Better understanding and cataloguing the complexity of an algorithm allows for better planning.
- Local experts are needed to implement and review depending on complexity of the science.
- Adopting a common data model and common vocabularies can facilitate implementation and data transfer.
- Centralizing commonly used data elements saves programmer time.
Going forward: The Phenotyping group will continue to catalogue complexities of algorithm development and implementation as well as publish lessons learned. Moving forward, incorporation and streamlining of natural language processing and transition to the OMOP common data model will act as ‘experiments of nature’ to compare to previous implementation methods.
EHRI, CDS, At-risk family members (click here)
Goal: Discuss implementing clinical decision support. Discuss communication of risk information to family members and cascade screening
Summary: The EHRI group discussed obstacles and lessons learned during the integration of clinical data into a variety of EHRs.
- The EHR teams at local sites have competing projects and time allocations, EHR integration of eMERGE data needed to be woven into the queue.
- Transitions to new EHRs occurred at several site which, caused delays and even more intense competition for resources.
- Large teams with asynchronous communication and changing personnel caused setbacks.
- Compliance regulations from some states caused issues with data usage and return.
- Data standardization and harmonization is key when returning genomic test results to a variety of clinical sites. Mechanisms to track, analysis tools, and manage data (including for genetic variant reclassifications) are needed for effective integration of results in the EHR.
- Genetic aware clinical decision support should drive off of a variant knowledge base and would require access to structured data and knowledge. This process should not be hand coded. Designing and maintaining such a knowledgebase also requires tight collaboration between clinicians, laboratories, and IT professionals.
- EHR integration of genomic test results at each site requires an oversight process for what and how content is presented to clinicians, including understanding where in the healthcare setting to make data interpretations, and clinical and patient guidance accessible.
- Creating a standard data flow pipeline is key to integrating genomic test results into the EHR. This pipeline will differ depending on site regulations, study design, and requirements.
It is feasible to build a unified clinical network linking heterogeneous laboratories and provider systems in the context of an NIH consortium, although it is not simple. The group will work to inform the genetic HIT standards by developing a FHIR profile that codifies all of the combined experience.
Sequencing Center Harmonization (click here)
Goal: To examine lessons learned during the harmonization of sequencing centers across a network for variant return to participants as well as data usage for research. The paper ‘‘Harmonizing Clinical Sequencing and Interpretation for the eMERGE III Network” has been posted on bioRxiv and submitted to the American Journal of Human Genetics (AJHG) as of October 2018.
Summary: The CSGs discussed issues and lessons from the harmonization process the Network conducted during the creation and development of the eMERGEseq panel. Obstacles:
- Items to be harmonized included data from the collection sites, assay development, test validation, primary analysis, variant classification, report content, data delivery to sites, and progress reporting to the Network including detection rates for different content and tested populations.
- Harmonization during the first year or so caused the rate of sequencing and reporting to be slow. As details were agreed upon, the rate of sequencing and reporting increased rapidly.
- Creating the panel itself slowed progress as well. Sites submitted their top six requested genes and those were combined with the ACMG 56 list. The site genes and SNVs required clinical reporting criteria to be assigned. A consensus list of returnable content was agreed upon but most sites added or subtracted content based on site-specific consent and protocols for return of results.
- A process for notification of sites when a reported variant was reclassified was developed. A more systematic approach to reanalysis is now underway.
- The CSGs worked together and with the Clinical Annotation group to come up with a consensus set of clinically actionable genes and SNVs that would be reported on. Sites were allowed to request additional genes or SNVs that were included on the panel to also be reported on in site-specific reports if clinical validity was met.
- The CSGs worked together to ensure the development and validation of the eMERGEseq panel was concordant, this required two rounds of probe design and validation resulting in 99.8% (Partners/Broad) and 99.9% (Baylor) coverage of bases.
- CSGs communicate share variant interpretations monthly to resolve any discordant variant interpretations Classified variants are submitted to ClinVar.
- A class of non-reportable variants (VUS-leaning pathogenic) were identified as targets for follow-up in case eMERGE clinical data may be able to move the variant into the Likely Pathogenic reportable range.
The CSGs will publish the harmonization paper and include data from the full eMERGEseq panel. They plan to develop harmonized structured genetic test report standards compliant with FHIR/HL7. The CSGs will also examine the triggers and frequency of reanalysis and reinterpretation issues. Establish pipelines for return of updated results going forward is an important next step.
Outcomes (click here)
Goal: To examine lessons learned during the creation and implementation of Outcomes forms across sites after return of actionable variants in the eMERGEseq panel. The PDFs of the final forms can be found here .
Summary: The Outcomes panel discussed general lessons learned from creation of the Outcomes forms, initial findings and considerations from a subset of adult participants (Mayo) and pediatric participants (CCHMC & CHOP). In general, to capture the breath of outcomes across the eMERGEseq panel, deciding what data elements for a given disease were critical was a difficult task, especially across diverse populations. For the pediatric participants, the child’s preference to receive actionable results sometimes differed from their parents or guardians, this in addition to having to re-consent the participant if they turned 18 years old during the course was a main difference when comparing to other cohorts in which only adults were enrolled.
- Harmonizing the Outcomes forms across the sites was difficult, as some issues did not present until data entry commenced. Early development and use of abstraction guides are an important element to determine how sites should interpret Outcomes forms questions.
- Adding in ‘penetrance’ related questions to the Outcomes forms was inefficient as the Outcomes forms were not originally designed to ascertain penetrance related data elements. This also caused a delay in launching the forms and data entry across the Network.
- Penetrance data elements were required for all actionable results, however initially Outcomes forms were only to be completed on participants where return of results took place. Future studies may consider creating penetrance only forms to be filled out in parallel.
- Site-hosted Outcomes forms caused too much variation in data elements and would have made data compilation very difficult during the initial Network-wide analysis. It was necessary to move the Outcomes forms to a central location, hosting by the Coordinating Center.
- Sites had differing IRB requirements when it came to limited versus de-identified data entry, date shifting was required when filling out the forms for some sites.
- The Network achieved broad coverage of applicable phenotypes with some sacrifice in the depth of outcome phenotyping.
- Context is important to understand the changes (or lack thereof) in health services delivered.
- Context is difficult to uniformly capture across a large cohort; complementary qualitative assessments may be critical.
- The eMERGE Network has the opportunity to inform other national and international efforts to (at scale) collect outcomes across a sequencing panel.
- Pediatric cohorts offer the potential for longitudinal studies in the future.
Going forward: The Outcomes group will use the Coordinating Center (CC) hosted REDCap instances of the approved and deployed Outcomes forms to complete the six-months outcomes data. Sites will enter additional penetrance data as needed for appropriate forms. An interim six-month outcomes data analysis is scheduled for October, 2019, which should be more streamlined now the forms are hosting in one REDCap instance. A general Outcomes lessons learned paper on process and intermediate health related outcomes framework for the BMC Journal collection is currently being developed.
Genomics (click here)
Goal: To examine lessons learned in the Genomics group specifically surrounding creation and compilation of large data sets, analyses, and timing of data release in a large consortium.
Summary: The eMERGE Network produced several large, rich datasets including array sets focused on genomic discovery and sequencing datasets focused on implementation science. However, this production requires significant time and money. Networks should clearly outline the analysis and product goals prior to compiling a large dataset, including diversity and phenotypic status of samples if that is an important component in analysis. Analysis and computation costs are significant in large datasets and this should be considered when hosting data on cloud computing services. Cloud computing can be beneficial for analyses pipelines as it can be used to process optimization and management that gives a standard model for operations.
- Adding additional samples to datasets after data freezes have been released, caused delays in analysis and came at a significant cost to resources
- Though the size of eMERGE’s datasets are a strength, working with these multi-terabyte files requires time and resources, and this will be come increasingly important as data analyses are moved onto the cloud computing environment.
- Early analyses were postponed due to lack of phenotypic and case/control data early on in the Network cycle, and the promise of a ‘new dataset’ to be released in the future.
- With the focus on the eMERGEseq dataset, the Network did not prioritize analyses on past datasets, like PGRNseq, that are still a rich source of discovery.
- As data were added over the course of many phases of eMERGE, naming convention of individual site GWAS and even PGRNseq files were not consistent, which required additional time and effort in order to combine and clean the files.
- Demographic files should be collected prior to genetic data compilation and datasets frozen at that time.
- Adding and removing participants once a large dataset is compiled costs significant amounts of time and resources.
- Clearly defined data freezes which take into account diversity, phenotypic data, and discovery & implementation goals should be outlined at the beginning of the network to maximize data delivery and analysis time at the sites.
- Standard naming conventions are necessary when trying to combine files from multiple sequencing centers will maximize efficiency and turn around time.
- Cloud computing can be used to set up standard pipelines for analysis, saving time, resources, and improving consistency.
Going forward: Future networks should be clear about their goals, and when decisions have to be made, they should limit the amount of time for comments in order make a final ruling on a given decision. Likewise, the workgroups should maximize participation in decision making by having clear question and surveys. The group will also work through how to move genomic & phenotypic data to the AnVIL platform, including data on DNA Commons.
Clinical Annotation (click here)
Summary: The Clinical Annotation workgroup has analyzed data and performed penetrance analysis for several clinical disease phenotypes. Several publications reporting the results of penetrance analyses are in progress. The workgroup found that small sample sizes, ascertainment bias, difficulties in EHR-based data collection and rare variant prevalence in the population made it difficult to perform penetrance analysis on certain conditions. .
- Phenotyping was conducted using ICD codes, algorithms, and outcomes forms. Issues arose with penetrance analysis due to differences in local site use of ICD/CPT for clinical diagnosis and procedures.
- Penetrance analysis encountered barriers due to gaps in outcomes form data. This led to a request for additional data elements to be collected across the network, extending the timeline for both outcomes and penetrance analysis.
- Small sample sizes hindered penetrance analysis for some condition.
- Missing data in the medical records and data collections forms complicated penetrance analysis as it was difficult to determine with certainty the disease status of the participant. .
- If a participant received a positive genetic test, they may be coded for that disease in the EHR even if the participant did not have clinical symptoms.
- Studies should be designed to minimize ascertainment bias during enrollment if penetrance should be determined.
- Penetrance should be examined across the lifespan as clinical manifestations of the disease occur at a variety of ages and not at a set point in time.
- For rare disorders and risk alleles, larger sample sizes will be required to obtain meaningful data.
- Information regarding how individuals were ascertained for enrollment to the eMERGEseq cohort at a given site was not always straightforward in relation to their preexisting phenotypes.
- A clinical diagnosis was not made in some cases in the EHR record even though there was the presence of disease elements; further phenotyping for both variant classification and penetrance assessment may be needed in some cases.
Going forward: Future networks should be clear about their goals in terms of penetrance analysis during the startup phase of the project. Analysis of the data should begin early on, so any issues with data interpretation, collection, and missingness can be addressed. Care should be taken when enrolling a prospective cohort as unclear ascertainment can unnecessarily reduce sample size in penetrance analyses.