Interview: How to improve data analytics
EuroBiotech_This May, the Pistoia Alliance launched the FAIR toolkit (fairtoolkit.pistoiaalliance.org). What is this about?
Harrow_In the life sciences we have the particular problem that the current fast streams of Big data come in all sorts of different varieties because of the many different types of instruments we have there – for instance from molecular biology, biochemistry and chemistry. Basically, this heterogeneity hampers the digitisation of life sciences data acquisition, access, management and mining. So, the particular challenge is to manage all that data so that you can get value from it at scale. As data are often siloed, stored in varying formats, and difficult to retrieve or share, the key is better data management, i.e. to make data Findable, Accessible, Interoperable, and Reusable – in brief: FAIR. In mid-May, the Pistoia Alliance – a nonprofit organisation of life sciences companies which aim to improve R&D productivity through pre-competitive collaboration – launched guiding principles for making data FAIRable. The FAIR toolkit will help to smooth the path to an improved data sharing within and between industries, which is critical to future research efforts and realizing the value of technologies like deep learning, advanced analytics, and artificial intelligence.
EuroBiotech_How is FAIRability of data linked to the automated data analytics?
Harrow_The thing that changed in 2014 was a group of academics meeting in Holland realised they really had to make better data management – something that is more graspable. So, they came up with the FAIR guiding principles for data management and stewardship published in 2016. FAIR data and metadata can be ready at scale by machines ready for analysis. And that’s what feeds artificial intelligence: well manageable and high quality data. That is really the heart of it.
EuroBiotech_Are there differences between research data from academic partnerships and clinical data?
Harrow_Fair data management started in research labs and now is coming to the healthcare environment. Management of research data is far more simple compared to healthcare, in a hospital environment. For clinical trials data, there are some very mature clinical standards for managing data. We are talking with those clinical standards organisations to understand how the FAIR toolkit can add value.
EuroBiotech_What was the feedback from Pharma and biotech companies?
Harrow_ Since we launched the project, we have had tremendous interest from global pharma as well as SMEs, demonstrating just how important a resource like this is for the entire life science industry. “The FAIR toolkit will enable organisations to realise the value of their data, accomplish effective data management, and build a more collaborative research environment. Data is the connecting thread between all of our projects at The Pistoia Alliance, underpinning initiatives like the Unified Data Model, as well as the effective application of AI. We’ll continue to work with our members on developing projects that deliver such tangible benefits to their organisations.
EuroBiotech_Are you expecting a big boost in knowledge by using the FAIR toolkit?
Harrow_For certain purposes, better – not even necessarily perfect – data management can provide very fast answers, particularly when machines can reuse the R&D and clinical data of approved drug candidates. A current use case is finding a drug repurposed for the treatment of COVID-19. FAIR data management allows the rapid re-use of FDA-
reviewed data packages that could otherwise be overlooked. That’s big progress because these data have been obtained by very costly R&D development and Phase II/III clinical trials. So, making data FAIR and thereby machine-readable allows companies to make better use of their own data across different therapeutic areas and get extra value.
EuroBiotech_What is the biggest challenge in making data FAIRable?
Harrow_We are already very good at managing access to personal patient or IP-protected data. But the big challenge is to make the data interoperable through different ontologies, i.e. to translate it into a vocabulary and metadata that can be read by different analysis platforms in a way that makes sense.
EuroBiotech_For which applications of the FAIR toolkit is there already initial data on the potential benefits?
Harrow_ We already have five use cases, three from Big Pharma and two from technology companies, to demonstrate the benefits of FAIR data management to life sciences companies. One is focussed on COVID-19, coming from The Hyve, a Dutch company, which is developing semantic models to build knowledge graphs which means blending novel data coming in from clinical trials on COVID-19 drug candidates with previously published knowledge from the SARS or MERS corona viruses.
There is a use case of Roche that has pioneered a platform that handles external and internal data in making data FAIR by design. Another use case comes from Bayer that has built a platform to FAIRify data coming in from partner organisations in order to create more value. A further use case from SciBite, a UK technology company, is enriching the annotation of the metadata in order to define the context in which a measurement was made. That’s important to improve reproducibility and machine-readability of data.
It’s just early days in the journey of FAIR data management but we can see from the use cases that improved data management will become best practice in the industry.
Ian Harrow, is an independent consultant providing services in project management, bioinformatics and data analytics. He has been an active member of the Pistoia Alliance since its inception ten years ago and he is a project partner for the BioExcel Center for Computational Biomolecular Research. He worked as a senior principal scientist at Pfizer for 27 years. Prior to this, he undertook postdoctoral research in neurobiology at Columbia University following a PhD in neuropharmacology and electrophysiology at the University of Cambridge.