This review discusses the current state of multicancer early detection tests, the role of machine learning in their development, and their implications for oncology practice and patient care.
HYUNGKEUN - stock.adobe.com
THE EMERGENCE OF multicancer early detection (MCED) testing ushers in a groundbreaking innovation in oncology, with its goal to diagnose cancers at an early, curable stage. Unlike traditional screening tests, which focus on specific organs such as the breast, colon, and lung, MCED tests utilize liquid biopsies/blood samples to analyze circulating tumor DNA (ctDNA), proteins, and other molecular biomarkers in the blood. By combining advanced molecular techniques with machine learning (ML) algorithms, these tests have the potential to detect dozens of cancer types with a single blood test. However, these tests must first address accuracy, sensitivity, specificity, cost, and insurance coverage to be integrated into clinical practice. This review discusses the current state of MCED tests, the role of ML in their development, and their implications for oncology practice and patient care.
MCED tests detect molecular signals that tumors release into the bloodstream. These signals can be ctDNA fragments, circulating proteins, and methylation patterns, which may provide a unique molecular fingerprint of malignancies. The PATHFINDER clinical trial (NCT04241796), which included 6662 participants, demonstrated the feasibility of such technology by analyzing the timeframe and diagnostic methods necessary to establish a cancer diagnosis. Results indicate a 1.4% positive rate, with 38% confirmed as true diagnoses.1
These tests utilize high-throughput sequencing technologies to identify these biomarkers, offering a less invasive alternative to tissue biopsies. Once these biomarkers are identified, machine learning algorithms analyze the data to distinguish cancerous signals from normal biological noise. ML models are trained on vast datasets comprising both healthy and cancer-positive samples, allowing them to identify patterns indicative of malignancy with increasing precision. For example, methylation-based sequencing—employed in GRAIL’s Galleri test—detects changes in DNA methylation patterns associated with tumor growth and tissue of origin.2
Traditional cancer screening methods— such as mammograms for breast cancer and colonoscopies for colorectal cancer— are highly effective but limited in scope. According to the US Preventive Services Task Force, nearly 75% of cancer-related deaths occur in cancers for which no routine screening exists.3 MCED tests complement existing methods and aid early diagnosis, especially in pancreatic, ovarian, and esophageal cancers. If validated, MCED tests may replace organ-specific screenings, reducing costs.
The accessibility of these tests is particularly relevant in community oncology settings, where resources may be constrained. By enabling early detection across a wide range of cancers, MCED tests have the potential to reduce the burden of late-stage diagnoses and improve outcomes, especially in underserved populations. However, their integration into routine practice requires overcoming significant logistical and financial barriers.
Machine learning transforms MCED development by uncovering patterns in genomic and proteomic data that traditional methods miss. Supervised learning models, such as random forest, gradient boosting, and support vector machines, are foundational in the development of MCED tests. These models analyze labeled datasets to identify biomarkers most strongly associated with cancerous signals. Gradient Boosting assigns weights to ctDNA mutations and methylation sites to improve prediction accuracy.4 These models are particularly effective for identifying cancers with well-characterized biomarkers, such as colorectal or lung cancer.
Deep learning, a subset of machine learning, has revolutionized the analysis of high-dimensional data. Convolutional neural networks and recurrent neural networks are widely used in MCED test development due to their ability to process large volumes of genomic and proteomic data. For instance, the Galleri test employs deep-learning models trained on millions of cell-free DNA (cfDNA) methylation profiles, enabling it to detect over 50 cancer types with high specificity.5 Deep learning identifies faint cancer signals in early-stage malignancies, despite minimal biomarker concentrations.
Reinforcement learning is another promising approach in MCED test refinement. By integrating real-world feedback—such as confirmed diagnoses and false positives—algorithms can dynamically adjust and improve their predictions over time.6 This iterative learning process ensures that models remain accurate and relevant as new data emerge.
Machine learning improves diagnostics but often lacks transparency for clinicians. Explainable AI (XAI) frameworks, such as Shapley Additive Explanations and local interpretable model-agnostic explanations, address this issue by providing clear insights into how specific features contribute to predictions.7 By highlighting influential biomarkers—such as particular methylation patterns or protein levels—XAI tools build trust among clinicians and support regulatory approval processes.
The Galleri test by GRAIL is one of the MCED products that detects cancer signals based on cfDNA methylation patterns. Validated through clinical trials, including the SUMMIT study started in 2018,8 the Circulating Cell-free Genome Atlas study,9 and the PATHFINDER study in 2020,1 the Galleri test boasts a specificity of over 99% and a sensitivity of 75% across multiple cancer types. A UK trial for population screening has enrolled 140,000 participants.10 The test has shown promise in identifying hard-to-detect cancers, such as pancreatic and ovarian cancers. High costs and lack of insurance coverage limit adoption.
CancerSEEK was developed at Johns Hopkins University, and it combines ctDNA and protein biomarkers to detect cancers. Early studies showed 99% specificity, with sensitivity varying by cancer type and stage.11 Although the test represents an important step forward, its sensitivity for early-stage cancers remains an area of active research.
The Shield test by Guardant Health focuses on colorectal cancer detection and recently gained FDA approval for use in average-risk adults. Priced at approximately $900, the test represents a significant advancement in blood-based diagnostics. This test boasts an 83% sensitivity and 90% specificity in detection. However, its single-cancer focus limits its applicability compared with broader MCED tests.12
Exact Sciences, renowned for its Cologuard test for colorectal cancer, has been expanding its capabilities into MCED through collaborations with leading research institutions. By integrating ctDNA analysis with advanced protein biomarker detection, the company is developing tests to precisely identify high-mortality cancers.13
Natera’s approach, through its Signatera test, focuses on personalized residual disease monitoring and detection. Although primarily used for posttreatment monitoring, its technological infrastructure positions it to contribute significantly to broader MCED applications by refining ctDNA detection capabilities and improving sensitivity for low-burden cancers.14
Balancing sensitivity and specificity remains a challenge. Although a high sensitivity would allow diagnosis at earlier stages, it will also increase the risk of false positive tests with a consequent increase in downstream testing, cost and anxiety. High specificity cannot be compromised given the subsequent testing that each positive test would result in.
The high cost of MCED tests—ranging from $800 to $1200—poses a significant barrier to widespread adoption as well. Insurance does not cover most of the MCED tests, and at the current time, these must be paid out of pocket. Follow-up procedures often lack insurance coverage, adding financial strain.15
Addressing these issues requires coordinated efforts to demonstrate cost-effectiveness and secure reimbursement pathways for both the tests and downstream diagnostic procedures. Multiomics integration, which combines genomic, proteomic, and metabolomic data, offers a pathway to enhancing both sensitivity and specificity.16 Continued optimization of ML algorithms to handle these datasets is key to achieving this balance.
Establishing clear guidelines for the clinical utility and approval of MCED tests is essential before mainstream utilization begins.17
Future advancements in MCED testing will hinge on several critical areas of research and development. One promising avenue is the integration of multimodal data, combining liquid biopsy results with imaging and electronic health records to create a more holistic view of patient health. NLP and federated learning can refine accuracy while protecting privacy.18
Collaboration is essential to expand clinical trial diversity and inclusivity. Incorporating real-world data from underrepresented populations will ensure that MCED tests are effective across all demographics. Additionally, reinforcing partnerships between MCED developers and community oncology practices can address logistical challenges and ensure seamless integration into existing workflows.
MCED tests can transform oncologic care across the world. Combining advanced molecular biology and machine learning will address long-standing gaps in cancer diagnostics. By enabling earlier and more comprehensive cancer detection, these tests have the potential to significantly improve patient outcomes and reduce the global cancer burden. However, achieving this promise requires overcoming challenges related to cost, insurance coverage, and regulatory approval. No head-to-head comparisons exist among different products. Through sustained research, collaboration, and a commitment to equitable access, MCED tests can fulfill their potential to revolutionize cancer care for patients worldwide.
Binu Malhotra, MD, is a medical oncologist at The Cancer & Hematology Centers in Flint, Michigan. Shivansh Sahni is a research intern at Oakland University in Rochester, Michigan, conducting research in computational biology.