|Articles|September 1, 2016

Can Online Searches Help Identify Pancreatic Cancer at an Earlier Stage?

Researchers analyzed Bing.com search logs to identify those recently diagnosed with pancreatic cancer. Through symptom patterns expressed as searches, they were able to identify 5% to 15% of pancreatic cancer cases.

What can online searches reveal about a potential cancer diagnosis? Scientists at Microsoft are interested in finding out, and as they analyzed the data of millions of people, lead researcher Eric Horvitz had recollections of two friends at the top of his mind.

Horvitz was catching up with his friend Ronald (Ronnie) Nadel on Dec. 21, 2004, when Nadel mentioned some odd symptoms he was experiencing. Horvitz, a doctor himself, told his friend to see a physician about the symptoms. Less than 1 year later, Nadel passed away from pancreatic cancer. The following year, Horvitz lost another friend, Richard Newton, to pancreatic cancer just a few months after he was diagnosed.

“I had first experienced the challenges of diagnosis and treatment of pancreatic cancer as a medical student at Stanford University,” explained Horvitz, a technical fellow and the managing director of the Microsoft Research lab, in a recent interview. “However, the challenges of catching this devastating illness early hit home with two friends.”

Less than a decade later, Horvitz started a study—along with colleagues Ryen White, an information retrieval expert at Microsoft Research, and John Paparrizos, a graduate student at Columbia University and an intern at Microsoft at the time—aimed at identifying pancreatic cancer earlier.

Their study analyzed Bing.com search logs of 9.2 million individuals initially to identify those recently diagnosed with pancreatic cancer. The research team then looked at data from the previous 18 months to find symptom patterns—expressed as searches—and were able to identify 5% to 15% of pancreatic cancer cases.

“[My friend’s death] was a big motivation, I think, to take this disease head-on and see if we could make a dent in it,” Horvitz said. In this interview, Horvitz discusses highlights of his study, what motivates him, and what’s ahead.

Why did you want to conduct this study in pancreatic cancer?

I've done quite a bit of work with my colleague here at Microsoft Research, Ryen White. White has been a powerhouse in multiple topics in information retrieval. I’m interested in artificial intelligence and leveraging large-scale data. My interest and intuitions about challenges in healthcare go back to my experiences doing an MD and PhD.

We're curious about things like web use among people who become ill, how the web works for diagnoses—does it worry people? Does it help them? Once someone is diagnosed with a challenging illness like breast cancer, how well does web search support them in episodes of treatment, recovery, and recurrence, if that happens?

;

As part of all this work, we have to discriminate between experiential queries and exploratory queries. With experiential queries, there is strong evidence that someone has just been diagnosed with an illness. We looked at these types of queries and saw evidence of users pursuing help with recent diagnoses. One direction I’ve been interested in is detecting and helping people with devastating diseases. The ones that come to mind here are lung cancer and pancreatic cancer. In medical school, I learned that a lung cancer diagnosis when it came to the hospital was almost always too late for surgery—and for pancreatic cancer—that remains the same today.

I was talking with one of my closest friends on the phone—he saw me discussing artificial intelligence with Charlie Rose on his television show—and we were chatting about that. He mentioned as an aside that he had odd symptoms that were bothering him a bit. I asked him a few questions and told him that I didn't want to alarm him, but that he should get himself checked out. Within a few weeks, he was diagnosed with pancreatic cancer and he wasn't even 45 yet at the time.

With an illness like pancreatic cancer, especially if it shows up before it metastasizes, it’s through some nonspecific symptoms—strange back pain, abdominal pain, light-colored stools, and general itchiness, for example. Each symptom taken separately may not alarm someone enough to run to a doctor, and physicians may not react with deep concern. We thought, though, if we had evidence from thousands of patients who were diagnosed, and we could go back in time 18 months from when they were diagnosed, we might be able to see information in the order and accrual of symptoms as reported by people on the web. We thought that we may be able to use subtle clues over time to make inferences.

The answer was yes, the web can show us clues and patterns. I was surprised by how well we could discriminate searches on the web between those who were diagnosed or not, though there is still a false positive rate to deal with. It was a feasibility study and we labeled it as such, but it shows us something about the power, possibilities, and methods in this area.

Your personal connection to pancreatic cancer—how big of a motivation was it to do this study?

It was at the forefront of my mind. What I found stunning was how quickly the disease progressed from diagnosis to death. I remember talking to my friend's surgeon and he said, ‘We opened him up and looked, and he's a very unfortunate young man.’

It was a big motivation, I think, to take this disease head-on and see if we could make a dent in it. I sent a copy of the research article to his wife and his sister, who I've both been in touch with, and I said it was dedicated to him—to my friend who I met in second grade.

In the study, the identities of individuals were kept anonymous. Why was this important to you and how did you do it?

We do a lot of work with anonymized logs of user data. Companies like Google, Microsoft and Apple have access to user data and abide by strict policies to keep it safe.

Our research labs have access to this anonymized data. There's no naming information—just a random identifier assigned to logs.

Our findings led to questions about how this technology might be one day fielded. We have many ideas about that. We could enable an opt-in system, so people could ask to have access to a health and wellness suite of applications. If someone did that, they'd give explicit permission to be monitored and alerted. Another compelling use would be to build classifiers or automatic systems that would then be deployed in the privacy of one's own laptop or smartphone—not sharing anything with anybody—but having the intelligence accrued from studies of large-scale populations of searchers.

What are some of the next steps? What do you hope to achieve going forward?

There are a couple of steps. First, we have to think about ways to deploy the technology. We also have to think about other challenging screening problems and opportunities to do screening when it would really help.

We're interested in lung cancer, in particular. It's not just about finding new ways to identify people earlier and get them to treatment earlier; it's also about finding new kinds of symptoms and demographics and other kinds of observations to extend clinical medicine. Wouldn't it be nice if the results from our studies gave clinical care new avenues for screening that wouldn't even require search engines or search logs? In a way, we would be directing screening policies.

Another direction I'm very excited about is taking this whole paradigm to the next step. Ryen White and I have been talking about working with oncologists to do a patient-centric study. With patient approval in an actual research/review setting, we can get volunteers who have just been diagnosed to fill out a form that would tell us who they are and reveal to us their web search logs. We could then look 18 months back, along with their electronic health record, and link it to how someone has been using the internet to search for information. This is in the works; we've had conversations with clinical colleagues.

What are the cost implications of this work?

It costs money to get people to come to a screening, and it costs money to actually screen individuals. The idea of web searches working as a background observer would lower the cost of screening and raise compliance. The concern, of course, is false positives. You're casting a wider net, and even with a low false positive rate, you'd potentially end up with many people who are being told to get something checked out when it's really nothing.

What were the limitations of the study?

A key limitation is ground truth about patients’ diagnosis. We're excited about the possibility of collaborating with clinical colleagues and working with electronic health records. We have evidence that experiential diagnoses we see in logs are actual diagnoses, but we'd love to have ground truth and to consider details of the timing, background, and other details in the clinical record.

Your own motivation behind wanting to do this research is understandable. As a corporation, what is Microsoft's reason for supporting this research?

Microsoft Research is a leading computer science research and development laboratory. We're charged with looking to the future and pushing the frontiers of computer science. That's our mission and this research is part of that work.

As we develop ideas for research, we always work with the company and product teams to ask where we can go with Microsoft products and services.

For example, we'd love to have Bing search be the place for top-notch, reliable healthcare information. This work can be viewed as part of making our services better for people in the future.

How would you address the worries about hacking and data loss—both with regard to electronic health records, but also search data?

For us, we don't deal with any identity information. In general, though, Microsoft is very serious about information security. Microsoft Azure Cloud Services are HIPAA-compliant.

There's other work going on in another group right now about how to do data analysis and machine learning with actual medical records and encrypted data.

I think we'll see many solutions coming out of our research labs that reduce the chances a patient will lose healthcare data. Data security and healthcare cybersecurity are top-notch challenges for the whole industry.

Knowledge is power. Don’t miss the most recent breakthroughs in cancer care.

Latest CME

Virtual Event

Rhode Island Oncology Society

3 Sessions are Available!

Can Online Searches Help Identify Pancreatic Cancer at an Earlier Stage?

Related Content

Oncology Nurses Navigate Care Challenges Amid Chemotherapy Shortage

Sarcoma Awareness Month: The Critical Role of Multidisciplinary Nursing

Study Reports 10-Year Outcomes of Focal Therapy for Prostate Cancer

GU Oncology Trial Logistics: Nursing Insights from Dana-Farber

FDA Grants Tentative Approval to Generic Olaparib Tablets

Latest CME

Rhode Island Oncology Society

Breast Cancer Tumor Board: Targeting TROP2 – Innovations in Triple-Negative Breast Cancer Treatment

Minnesota Society of Clinical Oncology

South Carolina Oncology Society

Community Oncology Connections™: Monotherapies vs Combinations – Navigating Oral SERDs and Targeted Combination Strategies in HR+/HER2– Metastatic Breast Cancer | Kansas Society of Clinical Oncology

Expert Guidance on Frequently Asked Questions Regarding the Use of ADCs in TNBC

Establishing the Rationale for ADC and ICI Combinations in TNBC

Dissecting Clinical Trial and Real-World Data for ADCs in TNBC

Breaking Down the Rationale for Targeting TROP2 in TNBC

Evaluating the Latest Data and Ongoing Trials for Novel ADC Approaches in TNBC

Community Oncology Connections™: Monotherapies vs Combinations – Navigating Oral SERDs and Targeted Combination Strategies in HR+/HER2– Metastatic Breast Cancer | Louisiana Oncology Society

Community Oncology Connections™: Monotherapies vs Combinations – Navigating Oral SERDs and Targeted Combination Strategies in HR+/HER2– Metastatic Breast Cancer | Missouri Oncology Society

Washington State Medical Oncology Society

Breaking Down the Latest Clinical Data for First-line Maintenance and R/R SCLC

Show Me Your Care Plan!™ Navigating ADC Therapies: Oncology Nursing Strategies for Optimal Patient Management

Broadening the Frontline—Studies Informing the Use of Immunotherapy in Hepatocellular Carcinoma

Cross-Disease Integration of Immunotherapy Innovations

Optimizing Treatment for Biliary Tract Cancers

PER Resource Center: Integrating Novel Approaches in TNBC – New Avenues for TROP2-Targeting ADCs and Beyond – Nursing

Expert Roundtable and Panel Discussions: Current and Future Landscape of TNBC

Practical Considerations and Future Directions for New Treatment Strategies in SCLC

Show Me the Data®: New and Emerging Roles for Oral SERD Therapy in the Treatment of ER+/HER2– Breast Cancer

Nevada Oncology Society

Missouri Oncology Society

Medical Crossfire® in Adjunctive Testing: Charting a New Course in Prostate Cancer Risk Assessment

Ready for Radioligand Therapy? Patient Selection and Sequencing Simplified

Working Together: Overcoming Barriers to Optimize Outcomes in Patients Treated With Radioligand Therapy Through Multidisciplinary Care

Radioligand Therapy 101: The Science Behind the Strategy

BURST CME™ Resource Center: Integrating Novel PSMA-Directed Radioligand Approaches for Diagnosis and Management of Prostate Cancer

North Carolina Oncology Association

Community Practice Connections™: Enhancing Melanoma Outcomes With Intratumoral Oncolytic Immunotherapy–Strategies for the Multidisciplinary Team

A New Era of Targeted Therapy for Advanced NSCLC: Exploring Future Directions for Bispecific Antibodies and ADCs

Navigating Advances in Neovascular Retinal Disease: Translating Evidence to Practice in AMD, DME, and RVO

Enhancing Prostate Cancer Outcomes – The Role of PSMA and Targeted Treatment Strategies

(CME Track) Antibody–Drug Conjugates in Oncology: The Essentials of AE Management for Better Patient Outcomes

Personalized Approaches in NSCLC: Early Detection, Molecular Testing, and Targeted Therapies

9th Annual School of Nursing Oncology™

Community Practice Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages

Community Practice Connections™: DLL3-Targeting Bispecific Antibodies for Small Cell Lung Cancer—From Innovation to Practice

Cases and Conversations™: Transforming Small Cell Lung Cancer Treatment Through Emerging Evidence and Expert Insights

Biomarker Testing in HER2+ GEA: Diagnosis and Treatment Implications

Hot Seat: Converging Lines in the Management of RAS-Altered Cancers

(CME Track) Tackling Oncologic Emergencies in Patients Treated With High-Dose Methotrexate

Beyond Primary End Points: Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer

The Rise of Novel HER2-Targeting Therapies in GEA: Mechanisms and Clinical Data

Show Me the Data™: Personalizing First-Line and Maintenance Therapy in HER2+ Metastatic Breast Cancer to Extend Survival and Elevate Quality of Life

Revolutionizing TGCT Care with Multidisciplinary Perspectives and Cutting-Edge Targeted Therapies

Cases & Conversations™: Unmasking Epithelioid Sarcoma – Enhancing Early Diagnosis and Multidisciplinary Care

Medical Crossfire®: The Who, When, and How of TROP2-Targeting ADCs, ICIs, and PARP inhibition in Triple-Negative Breast Cancer

Diagnosis and Management of TGCT

Real-World Insights on Advances in the Management of Myeloproliferative Neoplasms: How the Experts Apply the Latest Developments to Clinical Practice

SimulatEd™: A Roadmap to Personalized Care Plans and Shared Decision-Making in Low-Grade Serous Ovarian Cancer

Trending on Oncology Nursing News

Study Reports 10-Year Outcomes of Focal Therapy for Prostate Cancer

FDA Grants Tentative Approval to Generic Olaparib Tablets

Sarcoma Awareness Month: The Critical Role of Multidisciplinary Nursing

GU Oncology Trial Logistics: Nursing Insights from Dana-Farber

Oncology Nurses Navigate Care Challenges Amid Chemotherapy Shortage