This blog post made available by the American Journal of Bioethics. You can read this editorial and accompanying articles here.
by John Torous, MD, MBI; Lyle Ungar, Ph.D. & Ian Barnett, Ph.D.
Social media have allowed the scope and scale of biomedical research studies to greatly expand in the last decade. Now able to nearly reach entire populations via platforms like Facebook, researchers can explore new questions and recruit thousands in a matter of minutes. Social media can also help address challenges of longitudinal retention in research and enable researchers to identify participants years later for follow-up, as outlined in the article by Bhatia-Lin and colleagues in this issue, Bhatia-Lin and colleagues also discuss ethical and regulatory concerns with using social media platforms to locate and track research participants and offer an innovative rubric to guide ethical use of social media in biomedical research. We agree with their approach and here suggest expanding it to include other digital data, augmenting it to include data science and operationalizing it using computer science.
While social media comprise a key tool researchers can use to longitudinally identify participants, this is no longer the only tool. Using digital phenotyping methods where researchers install an app onto participant smartphones, it is now possible to track individuals with even finer geolocation and temporal resolution than any social media platform. For example, an ongoing research study has used an app to track a cohort with bipolar disorder for nearly 2 years at the time of this publication. This raises novel issues not only about the ability to track people today and in the future but also in the past, as it is feasible to recreate exactly where those partaking in that bipolar disorder research study have been for years. With Facebook using location-based ads, such data are already being collected on users even if they are not aware of the extent of this tracking and historical map they have created. With law enforcement already using data from wearable fitness sensors to charge individuals with murder, the use and protection of collected data are no longer a theoretical debate but a topic necessitating standards and safeguards. As social media becomes more mobile (with some estimates that 80% of social media time is now accessed via mobile devices), protections for social media research will have to consider the medium it is delivered on.
As social media and smartphone sensors generate vast amounts of data, proper use of these data is becoming as important as the improper uses frequently in the press today. By proper use of the data we mean pairing the right data with appropriate statistical methods to answer the right clinical question. For example, recent research by Facebook to prevent suicide uses linear regression combined with its DeepText algorithm, which uses recurrent and convolutional neural networks to glean textual meaning from Facebook posts, in order to, through a random forest classifier, assign a similarity score to past posts on Facebook that had been identified as expressing suicidal thoughts. This approach makes the assumption that there is some consensus across the population according to how we express suicidal thoughts, and does not specifically leverage a person’s own post and comment history when making a determination of suicidal intent. A better approach might use both population-level suicidal associations, as Facebook currently does, and prior personal history. Considering the core ethical principle of beneficence in maximizing benefits of the research and justice that these benefits may help potential research participants, it becomes clear that the wrong analysis of the right data may cause harm or at least minimize benefits of the project. As social media and the sensor data, concomitant with its delivery, become easier to access, researchers and institutional review boards (IRBs) must consider whether the data science plan supports or hinders the outcomes and thus ethics of the proposed research.
Beyond this example with Facebook and suicide prevention, there are numerous other considerations around big data and appropriate methods. Important considerations for studies may include whether multiple comparisons are controlled for, the study is powered to find significant effects, those significant effects may be clinically meaningful, and the methods are reproducible. Stating that the research team employed “machine learning” or “artificial intelligence” without specific and auditable details is currently leading to what was labeled a “replication crisis” at the 2019 Annual Meeting of the American Association for the Advancement of Science. Thankfully, this can in part be avoided by greater transparency in methods and sharing of code. One approach that aligns with this goal is automated machine learning (autoML), which is an approach that takes multiple aspects of machine learning that are usually subjective, such as data preprocessing, feature extraction, method selection, and tuning parameter selection, and automates them so as to improve both reproducibility and performance. Tools like Jupyter notebooks permit the online sharing of blocks of computer code that researchers can easily access and link with publications so that both the paper and the method are available together.
Using social media toward retention in research, even if implemented using the rubric of Bhatia-Lin and colleagues offer, raises implementation challenges in ensuring that observation does not become intervention or bias. Here, technology can offer solutions by automating follow-up procedures. Researcher objectivity may be impaired when exposed to intimate or superfluous details related to a participant. For example, if a researcher views a participant’s Facebook profile to confirm that person’s identity based on his or her profile picture, the researcher will also view posts on the participant’s wall. This exposure can contribute to assessor bias and compromise the objectivity of the researcher. Subjective evaluations of participant information also allow for assessor bias. For example, even if a researcher is able to confirm a participant’s identity based on that person’s publicly available pictures on his or her profile without viewing other unrelated posts or information, the additional content and context within those pictures may impair a researcher’s objectivity.
Automated approaches for data retrieval and analysis can alleviate these concerns. Information contained on a participant’s Facebook profile can be obtained through an API maintained by Facebook, which extracts only the data necessary for the researcher rather than viewing entire profile. This approach provides an alternative to the researcher viewing the entirety of a participant’s profile, and by requiring a researcher to specify the necessary information fields of interest (e.g., mobile phone number and home address) a priori, information access becomes easier to regulate. In addition to limiting assessor bias, automated data retrieval and review has the added benefit of saving researcher time and resources that would otherwise be required for subjective review.
The potential benefits that social media now offer biomedical research can no longer be denied. But those benefits must be weighed against the risks, which are becoming more apparent with increasing privacy breaches, secondary and unintended uses of nonresearch data, and unreproducible research. Bhatia-Lin and colleagues offer a useful and pragmatic rubric for ensuring these and other risks of social media research as minimized. Expanding their concepts to mobile devices, augmenting them with data science, and operationalizing them computer science will only increase their impact and dissemination in guiding the ethical use of social media in biomedical research.