cancer genome, machine learning

Cancer data: a dive into the depths

As Big Data analytics begin to inspire medical decisionmaking, companies and scientists have gone to war over access to data derived from patient biopsies. Cancer profiling is a seminal situation. While some companies are seeking to profit from approaches to advanced mutation analysis, researchers are vociferous in demanding open access to proprietary databases behind company firewalls.

According to William – call me Bill – Kassler, human decisions are anything but perfect. But machines can debug that, stresses the Deputy Chief Health Officer & Lead Population Health Officer at IBM Watson Health in Boston. Speaking last summer at the Forum Science & Health conference in Munich, the ex-CDC policy advisor and former Medicare expert for value-based drug pricing predicted that cognitive computers would be the Next Big Thing in personalised medicine. Intelligent machines such as IBM Watson, he insisted, could learn rapidly. Within just four years, Watson scaled down its error rate from over 40% to 5% in recognising language. And that’s just the beginning, says Kassler. What if that power was turned to unravelling hidden patterns in tumour genomes?

Why data analytics could transform biopharma

The first applications for these kinds of learning machines and algorithms are currently surfacing in personalised medicine, particularly in the highly lucrative field of cancer. Kassler knows that identifying drug responders to a growing number of cancer combination therapies and immunotherapies is still quite limited. He also knows that cash-strapped health systems want to switch to valuebased pricing by limiting reimbursement of costly targeted cancer therapies or immuno-oncological treatments to patient groups that have a proven benefit. It’s no secret that current Companion Diagnostics (CDx) are not designed for cancer mutation pattern detection. But because cancer-promoting effects are dependent on surrounding genes and an individual genomic context, AI-directed genome mining is currently en vogue in the sector. In 2013, IBM began promoting Watson as a best treatment’ detector. The claim was based on mining cancer genomes and corresponding treatment data stored in Electronic Medical Records (EMRs). According to Kassler, Watson’s learning algorithms – which he brands artificial intelligence – could assist physicians on two fronts:

collecting and interpreting the flood of medical discoveries from papers, slides, clinical trials, and other data

finding unrecognised patterns in realworld data, including relevant mutations in cancer genomes, prognostic patterns in EMRs, images, lifestyle data from fitbits, etc.

Experts like Roger Schank say that Watson simply correlates data, and that this ability is subsequently sold as AI’ for marketing purposes. He stresses that an ability to sift through a large amount of data doesn’t define intelligence. The data-mining approach, however, has attracted Big Data analytics majors like IBM, Google, its spin-out Verily, Microsoft and a growing number of high-tech SMEs. That’s because it can generate exactly the data needed by payors to prove the value of precision medicines. In a knock-on effect, major pharma developers have begun preparing for the dawn of Big Data-aided decision support by signing partnerships with the next-gen sequencing (NGS) analytics or software engineering companies that digitalise patient data. Big Pharma now expects them to have the tools to navigate through the cancer genome jungle – and link genomic patterns to outcomes stored in EMRs. The pharma and data analytics nuptials have been highlighted by a series of M&As and partnerships – most of them with Google-financed US companies like Verily, 23&me, Flatiron Health and Foundation Medicine Inc. Not to be left behind, Novartis also recruited its very first Chief Digital Officer in September.

Providers hit European market

Ever since Germanys statutory health insurance decided last year to begin paying for diagnostic NGS analyses, Europe has become a target market for clinicogenomic data miners. Other EU countries – including Italy, Finland, France and Lithuania – are soon to follow suit. In September, Thermo Fisher Scientific, Merck KGaA and cancer biobank expert Indivumed launched partnerships aimed at mining the cancer genome and establishing advanced companion diagnostics (see table) or better drugs. And personalised healthcare major Roche will soon sound the bell for cancer mutation profiling-based decision support in Europe. One strategic goal of the Swiss firm’s partnership with US-based NGS analytics specialist Foundation Medicine Inc. (FMI) is to monitor the evolution of tumour mutation profiles and relapse through liquid biopsy tests. Another is to develop cancer gene panels for therapy decision support and patient selection.
Later this year, the Boston-based analytics specialist will open a laboratory facility at Roche’s Penzberg site near Munich. Roche has held a majority stake in the firm since 2015. FMI offers a unique tumour sequencing service that is already well established in the US. It has developed sophisticated algorithms that allow analysis of a patient’s DNA to be used to identify and match individual tumour mutation patterns to suitable treatments.

Digitalising the tumour

The principle is to obtain a tumour profile from Next Generation Sequencing of RNA and DNA in biopsy material from a cancer patient, says Hagen Pfundner, Managing Director of Roche Pharma AG. That profile will then be compared with 140,000 anonymised tumour profiles in FMI’s database, which the company has compiled in the US. According to Pfundner, the process could prove a huge help to oncologists, pathologists and patients: For example, if you have a patient with a tumour of unknown origin, FMI would generate a report that includes all known tumour DNA mutations, and also provide an overview of all potential therapies and ongoing clinical studies for which the patient may be eligible.
The use of databases, which provide all diagnostically relevant data, is a great opportunity for personalised medicine, says Ursula Redeker, Spokesperson for the Board of Roche Diagnostics. Our long-term objective is to build up databases that also provide imaging and clinical data, she adds. I think the future of tumour profiling is moving towards molecular tumour boards – a sort of a digital expert system that combines data from various medical disciplines.
In the US, FMI works with Flatiron Health, an IT company specialised in electronic medical record solutions. This collaboration may close the information gap by linking FMI’s tumour profiles with real-life data from a patient’s diagnostic and therapeutic history, as well as with its outcomes. This could prove beneficial in tumour diagnosis and prognosis, but also in drug research and development – particularly when it comes to finding new targets and suitable molecule formats, says Redeker.
The analysis and integration of real-life data that complements clinical data offers huge benefits, and is the next logical step in personalised medicine, says Pfundner. It allows us to identify and recruit patients faster for clinical trials. Furthermore, it can help to identify and find the most suitable combination therapy for a patient. On the regulatory side, it has the potential to accelerate patient access to novel personalised combination therapies that have proved beneficial under real-world conditions. Compared to randomised clinical trials, that would speed up the approval of medicines.

It’s not a good idea to lock away NGS results of tumour biopsies in proprietary databases.

In mid-September, cancer tissue biobanking specialist Indivumed announced similar intentions. A collaboration with the CRO Helomics Corp. is aimed at linking analyses of human cancer biospecimens with annotated clinical data from consenting patients around the world. According to Indivumed CEO Hartmut Juhl, the company wants to establish a unique global cancer database using molecular information from tissue collected under stringent protocols: Helomics is bringing in its research analytics expertise, CLIA service capabilities, and proprietary research platforms to develop the next stage of cancer biomarkers.
Swiss start-up Sophia Genetics S.A. (St. Sulpice, Switzerland), which raised US$30m in September to foster the global expansion of its cancer diagnostics annotation platform SOPHiA AI, is a direct competitor for FMI. Its machine-learning technology is currently being used in 334 hospitals across Europe, and has so far interpreted genome data from over 125,000 patients. According to the company’s CEO Jurgi Camblong, the technology enables the hundreds of institutions in the network to safely and anonymously share their findings and knowledge while ensuring patient data privacy. Camblong’s next goal is to ramp up commercial activity from Europe to Latin America, the Asia Pacific region, Canada and the US.

In Germany, the largest genomic testing and medicines market in Europe, pathologists and oncologists have voiced concerns about R&D limitations through the proprietary database model, which has seemingly become an imminent part of the cancer genome dataminer business model. It’s good to have different providers for high quality panel sequencing. It’s even good if providers apply their learning algorithms to genomic data, says Christoph von Kalle, Director of the National Centre of Tumor Diseases in Heidelberg. However, if the assays or the sequencing were reimbursed by payors, the results must not be locked up in proprietary databases. Follow-up analyses of available genomic data by researchers could bring huge benefit to the patient. Vice versa, treatment outcomes recorded at hospitals could be highly relevant to drug developers. At the end of the day, we must find a way to share data that is beneficial for the patient.
Albrecht Stenzinger, Head of the Center of Molecular Pathology in Heidelberg, which is now partnering with Thermo Fisher Scientific, agrees: It’s not a good idea to lock away NGS results of tumour biopsies in proprietary databases, as Myriad Genetics does for BRCA diagnosis. Companies that do so often sell their technology as overly complex. They put it in a black envelope so neither oncologists nor pathologists can see what has been done technically and bioinformatically – and then they say: buy it or don’t buy it – but it’s a must-have….But neither black boxes, nor inadequate simplistic assays will help us to better understand and treat tumours. Relapses to targeted or immuno-oncologic therapies after just one year or a couple of months remind us that we are only beginning to understand tumour biology. For long-term success, it will be crucial to share other biological and clinical data and know-how in a fully transparent way and to collaborate on an equal footing to have better products and to gain knowledge, says Stenzinger.
Even the anonymisation of patient data before mutational profiling appears to be not such a great idea. Researchers looking to find novel cancer targets prefer pseudonymisation for several reasons: If the tumour progresses and a patient’s data were anonymised, every new disease episode would be stored as a new identity, turning one dataset into two distinct datasets, without attributing it to the person it affects, explains von Kalle. Furthermore, if researchers subsequently find a biomarker that can help prevent cancer, the patients whose data were anonymised wouldn’t be able to benefit from this progress because it would be impossible to identify them. He thinks pseudonymisation makes sense when controlled by an untouchable, independent organisation.

An alternative – data-sharing with mutual benefits

Stenzinger’s latest collaboration is part of NGS major Thermo Fisher Scientific’s European roll-out of partnerships with leading institutes – so called Centers of Excellence – to establish clinical use assays and CDx that allow patient selection for targeted and cancer immune therapies. Announced at the ESMO conference in Madrid (8-12 September), the partnership offers mutual benefits without the need to build proprietary databases. In brief, Stenzinger’s center will translate its sophisticated gene panels to Thermo’s FDA-approved CDx platform. The time of home-brew assays is over, says Stenzinger, whose center switched almost completely to NGS in 2012 and now analyses 4,000 cases per year via DNAseq and RNAseq. Today, you need industry partners to develop molecular assays that have clinical utility, and you must have a scalable infrastructure in place. Our industry partner benefits when we establish new gene panels. They may even be added to its commercial assay portfolio. In the future, there will be more clinically exploitable biological themes across tumour entities. One current example is defects in DNA repair genes as a predictive marker for PARP inhibition and selected chemotherapy regimen. Another is tumour mutational burden/load as a predictor of responsiveness to cancer immunotherapy. These two may even converge in the near future, and we’ll see the routine use of broad gene panels of 1 Mb and beyond in molecular pathology soon.

In fact, the FDA has already approved microsatellite instability as the first biomarker to predict responsiveness to MSD’s checkpoint inhibitor pembrolizumab across cancer modalities. TRK-A is another, and at the ESMO conference, Roche presented its first promising data from a liquid biopsy test that measures tumour mutational burden to predict reponsiveness to cancer immunotherapy. According to Madhushree Ghosh, Senior Director of Strategic Accounts at Thermo Scientific Fisher, the expert/partner network enables us to not only reach researchers working in the field, but also pathologists and oncologists who will directly refer to our tests in their patient care. This enables us to develop CDx products that will enable drug submission correlation to our test, with a focus on biomarker-targeted therapies enabling better patient care.

The spirit of an open-access platform breaks down barriers.

Last year, the largest public database of cancer geneomes – the Cancer Genome Atlas (TCGA) – was moved into Amazon Web Services cloud. Is this a model for future clinico-genomic data-sharing? The database stores genomic data from 11,000 patients in 33 cancer modalities.
Even outside the cancer diagnostics space, data-sharing appears to work (see interview p. 22). At the ESMO conference, Merck KGaA announced it was joining forces with the non-profit initiative Project Data Sphere to jointly lead the Global Oncology Big Data Alliance (GOBDA). The ultimate goal of our alliance is to unleash the power of Big Data to bring value to cancer patients, said Merck board member Belén Garijo. Project Data Sphere’s Big Data analytics platform will be used to accelerate discovery, development and delivery of new approaches in cancer care.

The GOBDA initiative has been formed to expand open-access to de-identified patient data sets, further enhancing analytical capabilities by building on Project Data Sphere’s digital platform. The current platform contains historical clinical trial data from almost 100,000 patients. It was provided by multiple organisations, and access to the information has already led to new and potentially practice-changing findings. GOBDA says it is planning to expand this platform to include rare tumour trials, experimental approaches and real-world patient data. Leveraging these data with Big Data analytics will help optimise clinical trials, build up a data register and advance understanding of what cancer treatments are available.

Overselling technology hopes

IBM has promoted its cognitive system Watson as the coming AI standard in cancer analytics, while Google and Microsoft make similar claims for their AI solutions. But black-box cancer analytics tools still aren’t living up to developer promises. Hopes of Watson’s playing a future role in oncology were dealt a heavy blow when the M.D. Anderson Cancer Center shelved a US$62m project it started in 2013 in order to establish clinical decision support technology based on the cognitive system. The reason? An ongoing lack of documented results. According to Stenzinger, in tumour biology, we’re still just scratching the surface.

(First published in European Biotechnology, 2017)

Thomas Gabrielczyk

9 January 2018

machine learning