Data in healthcare: can we trust something we don’t know?

Data is critical for Artificial Intelligence (AI) and Machine Learning (ML) applications in healthcare. Their quantity and quality directly impact the accuracy and inclusiveness of AI-based solutions, determining if predictions can be generalized to different populations, especially for underrepresented subjects. Thus, developers need to be aware of their limitations and find ways to overcome the hurdles they generate. Trust and trustworthiness are the foundational aspects concerning the sociotechnical factors to working with health data at scale. Trust comprises a multifactorial myriad of elements, including purpose, governance, and ethical and legal considerations. Many recent policy documents, such as the AI Act or the Data Act, have approached it.

Anonymized and aggregated data

However, many stakeholders, such as social care organizations that contribute to health provision and integrated care services across Europe, are still mostly unaware of how the European Health Data Space can work and how their contribution can be mutually beneficial. And how can they trust something they don’t know? Most data-sharing activities, foreseen to maximize the use of health data for innovation, do not require access to individual data – at least not at the patient level. Instead, anonymized and aggregated data are needed. Therefore, privacy and security challenges may not be as difficult to overcome as they seem at first sight. It means that more clarifications are required to define the requirements for different specific scenarios, as well as the professional shared responsibilities on anonymizing data at the organisations’ level to ensure all will work well. Also, within this same scope, common interoperability standards play an essential role in addressing some challenges around data quality, trustworthiness, and the potential for generalization.

Not using data can harm patients

It is still acknowledged that most citizens need further information regarding data storage, access and sharing, as well as better digital skills. This requires professionals and policymakers in this area to simplify language and preferably use materials with visual information that makes it more accessible to people in different contexts, literacy, and education levels. During the COVID-19 pandemic, the use of some digital solutions, such as the COVID certificate or tracking tools, may have opened the way for people to better understand the potential benefits of such tools and thus be an enabler for the future. Also, many risks, for example, cybersecurity, were better acknowledged as extremely important to address.

Patients ready to share data

However, the main issue is that the pandemic clearly showed that not using data contributes to not having people duly treated and even causing harm. This is perhaps the most controversial ethical debate. On one side, there is a need to ensure that data sharing is done according to the highest ethical and safety standards. On the other side, not using data or delaying data-driven projects may lead to missing opportunities to improve care standards and outcomes. The perspective in this discussion changes accordingly to the point of view: patients are ready to share their data if there is a chance that it could contribute to better treatment or speed up research on new medicines. But healthy citizens can have different priorities, placing privacy above benefits from the secondary use of data. A balance between these aspects is needed from a policy level and a bottom-up approach that is, more literate, engaged citizens and organizations discussing these issues with credible information to effectively work towards a healthier and happier society. Nobody doubts that healthcare needs data to be more effective, personalized, predictive and preventive. But these goals can only be achieved if citizens are willing to share data. This article has been based on the discussion “How do you maximize the use of health data for innovation and recovery?” organized within the Digital Health Society Summit (4-5 October 2022). Watch the recording from the session here.

Data Management Digital Health