Using AI and EHR for Early Pancreatic Cancer Detection

Published Date : 09/01/2025 

Dr. Shounak Majumder, a gastroenterologist at Mayo Clinic, discusses the significance of recent studies using machine learning and natural language processing to improve the identification of pancreatic cancer risk factors through electronic health records. 

Pancreatic cancer (PC) is the third-leading cause of cancer deaths in the United States, with a grim five-year survival rate of just 13%.

According to Dr.

Shounak Majumder, a gastroenterologist and pancreatologist at Mayo Clinic, this low survival rate is largely due to the fact that most patients are diagnosed at an advanced stage, where the cancer has either spread locally or to distant organs.

Dr.

Majumder directs the High-Risk Pancreas Clinic at Mayo Clinic in Rochester, Minnesota, which screens individuals with familial and genetic risk factors for PC.



Detecting PC at an early, asymptomatic stage can significantly improve survival rates.

However, there is currently no population-based screening strategy for this disease.

Dr.

Majumder and his colleagues are actively engaged in research to address these challenges.

One of their key areas of focus is the use of natural language processing (NLP) for the automated extraction of PC risk factors from unstructured clinical notes in electronic health records (EHRs).

The results of their study were published in Pancreatology in 2024.



In another study, published in the American Journal of Gastroenterology in 2024, Dr.

Majumder and his team conducted a systematic review of 30 studies to explore machine learning (ML) methods for predicting PC risk and identifying novel risk factors from EHR data.



The Importance of This Research


Currently, PC screening is recommended for individuals with a strong family history of the disease or germline variants in PC susceptibility genes.

However, identifying these high-risk individuals using EHR data and connecting them to appropriate screening programs requires significant time and expertise, which are not widely available.

Additionally, 80% to 85% of PC cases are sporadic, occurring in individuals without known familial or genetic risk factors.

This poses a significant barrier to the effectiveness of risk-based PC screening.

Therefore, there is a critical need to automate the identification of individuals with familial and genetic risk of PC and to uncover novel risk factors for sporadic cases.



AI and ML applications are transforming health data summarization and visualization capabilities.

This shift presents an opportunity to leverage advances in AI and ML to develop EHR-based applications that can accurately identify both known and novel risk factors for PC.



Significance of Recent Findings


In their recent publications, Dr.

Majumder and his team developed NLP algorithms that can identify familial and genetic risk factors for PC from unstructured clinical notes within the EHR.

They found that rule-based NLP algorithms are highly sensitive for automated identification of PC risk factors.

This is a significant first step toward the automated detection of high-risk patient populations that could benefit from risk-based PC screening.



In their systematic review, the team found that several groups have aimed to develop ML models using EHR data to predict PC risk, with varying degrees of success.

Most studies have relied on a curated set of known predictors to develop their models, rather than using unbiased approaches that combine structured and unstructured EHR data.

Additionally, issues such as underreported missing data and underutilized explainable-AI techniques need to be addressed.

Based on their findings, the team has summarized best practices and recommendations for future studies focusing on EHR-based AI-ML model development for PC.



Future Research Directions


The performance of rule-based NLP algorithms for identifying familial and genetic risk of PC can be further improved by incorporating emerging tools like large language models and validating them in real-world primary care cohorts.

Dr.

Majumder and his team are currently exploring pathways to clinical implementation of this digital risk phenotyping tool, aiming to understand its impact on both patient and healthcare professional outcomes.



Additional research will need to focus on developing EHR-based AI-ML models for identifying novel risk factors for sporadic PC within diverse real-world population cohorts, leveraging longitudinal data.

While the focus is on creating the most accurate AI-ML models for estimating PC risk, it will be equally important to minimize the risk of inaccurate or biased estimates that lack explainability.



Conclusion


The potential of AI and ML in transforming pancreatic cancer risk assessment and early detection is promising.

By leveraging EHR data and advanced computational techniques, researchers can develop more accurate and efficient tools to identify high-risk individuals and novel risk factors, ultimately improving patient outcomes and survival rates. 

Frequently Asked Questions (FAQS):

Q: Why is early detection of pancreatic cancer important?

A: Early detection of pancreatic cancer is crucial because it significantly improves the chances of survival. Most cases are diagnosed at an advanced stage, leading to a low five-year survival rate of just 13%.


Q: What are the current challenges in pancreatic cancer screening?

A: The main challenges include the lack of a population-based screening strategy, the need for time and expertise to identify high-risk individuals using EHR data, and the fact that 80% to 85% of cases are sporadic and lack known familial or genetic risk factors.


Q: How is AI and ML being used to address these challenges?

A: AI and ML are being used to develop NLP algorithms that can extract risk factors from unstructured clinical notes in EHRs, and to build ML models that can predict pancreatic cancer risk using a combination of structured and unstructured data.


Q: What are the key findings from recent studies in this field?

A: Recent studies have shown that rule-based NLP algorithms are highly sensitive for identifying PC risk factors from EHRs. However, there is a need to address issues such as underreported missing data and underutilized explainable-AI techniques.


Q: What are the future research directions in this area?

A: Future research will focus on improving NLP algorithms, developing EHR-based AI-ML models for identifying novel risk factors in diverse populations, and ensuring the accuracy and explainability of these models. 

More Related Topics :