How Big Data Accelerate Research On Blood Cancers

Monday, June 7, 2021
Co-funded by the European Commission project has been already joined by 100 organizations from all over Europe—it’s one of the leading projects of its kind on a global scale. Since January 2021, over 70,000 data sets have been provided, which then are analysed by AI algorithms on so-called the HARMONY Big Data Platform. Researchers focus on finding answers to four questions: How can we diagnose patients faster and more precisely? What are the best practices in clinical management that could help doctors make decisions related to treatment? How can we fulfil the neglected needs of patients suffering from blood cancers? How can we accelerate progress in the development of new drugs? The answers could be hidden in the data. However, the task is challenging—data uploaded into the Platform comes from different sources and is stored in various formats. Firstly, they have to be harmonized to make them comparable and analysable.

A new approach towards medical research

Researchers took a closer look at seven types of blood cancers—one of the most complex types of cancer. To get to know the characteristics of the disease better, they analyse data from electronic medical records and clinical trials, the results from the laboratory and imaging tests, and demographic data. Of course, these are anonymous data. This heterogeneous information, which is incomparable despite what it may seem at first sight, needs to be harmonized first.  As a result, all inputs are turned into a coherent database. In data laboratories, data engineers first have to analyse thousands of component pieces of data, assess their quality and usefulness, and then standardize them.  Now, AI algorithms can do their job. [caption id="attachment_27839" align="aligncenter" width="640"] The structure and processes on the HARMONY Big Data Platform.[/caption] This is a time-consuming and technically advanced process. Thus researchers from HARMONY Alliance used the most advanced techniques developed in recent years. One of them is the Observational Medical Outcomes Partnership (OMOP), which uses international standards of medical terminologies, such as SNOMED and LOINC, which makes it possible to unify information with regard to semantics. Another tool is the Spark technology—an analytical engine used to process data on a large scale. One of the Alliance’s priorities was also to develop data security standards and make sure that procedures comply with the GDPR. Moreover, the HARMONY Alliance has created an internal code of ethics that governs responsibility and transparency in research work.

Cooperation and trust necessary for the technology to do its job

For technical facilities, research methods, and technical infrastructure to be involved in looking for new knowledge on blood cancers, it was necessary to fulfil one more condition—to build a trusted platform for collaboration between public and private companies from all over Europe. The HARMONY Big Data Platform is based on data shared by partners from 22 European countries representing the whole spectrum of stakeholders in healthcare: pharmaceutical companies, biobanks, hospitals, organizations that run clinical trials, etc. Researchers were brought together by one goal: to better understand the characteristics of blood cancers and explore their molecular landscape better. Such unprecedented projects require trust and courage and openness to innovations—appropriate technology is not enough. A much more important thing is to overcome data silos and convince organizations to share data even though, for understandable reasons, they have a rigorous approach to the processing of medical data. In Big Data analyses, even the smallest piece of the puzzle, i.e., every data set, can be the missing link, the key to new treatments, or predictions regarding the development of the diseases. We still do not have answers to many questions about blood cancers. To live up to new research challenges, we need to opt for new tools which broaden classic methods.

How can we avoid wasting data?

According to Statista, 365,829 different clinical studies were registered around the world by January 2021. When they are completed, the collected data is, in most cases, archived and locked in local databases. If some of the data sets were available for reuse, science could make many groundbreaking discoveries. The approach towards secondary use of data in medicine needs to change if we want to speed up developing new drugs and improve the standards of treatment. To do this, we need more favourable regulations. On the other hand, it is necessary to raise social awareness of data sharing, ensuring that citizens benefit from data processing. Opening data silos for research purposes has become a new form of blood donation—it has the power to save human lives. The HARMONY Alliance has already published the first research results. Such projects are the future of progress in life sciences. The HARMONY Big Data Platform has become a blueprint for using data in medical research and can serve as a foundation for other similar projects.