Language on Social Media Reveals Concerns of Patients with Ovarian Cancer
Creating a way to analyze the postings of patients and caregivers seeking peer support can give doctors and researchers insights about how to help them.
The language used on social media can reveal important clues about the perspectives, values, and needs of patients and caregivers affected by ovarian cancer, and a recent study of this data should be the first of many, according to a research team whose results were presented at the ONS 44th Annual Congress.
In the first study of its kind, researchers from the University of Pittsburgh and the University of British Columbia used a machine learning approach to analyze the language on social media as a means of understanding the concerns of this population so that better interventions can be developed for them and research can be focused on their greatest needs. The approach aims to supplement survey questionnaires and interviews as ways to gather this information, said lead author Young Ji Lee, PhD, MS, RN, assistant professor in the School of Nursing at the University of Pittsburgh School of Medicine. She called the method especially relevant at a time when patient-generated health information is increasingly informing care.
The researchers analyzed the initial postings of nearly 855 patients and caregivers who commented in the Cancer Survivors Network online peer-support forum between 2006 and 2016. They applied machine learning, using simple natural language-processing techniques, to build a computational model that decided whether each posting fell into 1 or more of 12 categories. The categories, identified through a review of existing studies in the literature, included physical, psychological/emotional, family-related, social, interpersonal/intimacy, practical, daily living, spiritual/existential, health information, patient-clinician communication, cognitive needs, and miscellaneous.
The model used bag-of-words (BOW) features, considering each word in a posting for its potential in classifying needs. The researchers identified important features for each need category using mathematical analysis and performance metrics.
They found that the most frequently occurring needs across postings were health information (n = 456), social (n = 307), psychological/emotional (n = 141), and physical (n = 109). Of all the postings, 39% described both information and social needs. Physical, psychological, health, and social needs were identified most accurately by the model.
Less frequently occurring categories were miscellaneous (n = 74), family-related (n = 53), practical (n = 35), patient-clinician communication (n = 19), interpersonal/intimacy (n = 14), spiritual/existential (n = 10), daily living (n = 5), and cognitive (n = 4). In particular, “we need to develop strategies that accurately predict spiritual needs,” Lee said.
Of all the postings, 38% described multiple needs, and of those, 40% described social and informational needs together.
Words describing psychological states, such as “anger” and “anxiety,” were important features for the classification of psychological/emotional and social needs, and medical terms, such as “endoscopy” and “colonoscopy,” were predictive that a post would focus on physical and informational needs.
The authors concluded that even simple programs for word analysis can detect patient and caregiver needs with a high degree of accuracy, and that the same exercises can predict multiple needs at once. That makes this kind of query a valuable way for clinicians to understand patients, they found.
“Our results suggest the potential of using multiple language features and classification methods to develop a more sophisticated model,” they stated. “Our future work involves exploring other language features (e.g., groups of words clustered by using topic modeling techniques, taxonomies, etc.).”
Lee YJ, Jang H, Campbell G, et al. Identifying language features associated with needs of ovarian cancer patients and caregivers using social media. Presented at: Oncology Nursing Society 44th Annual Conference; Anaheim, California; April 11-14, 2019. Abstract 5674.