Picture: foxanon1987/shutterstock.com

Can machines learn to discover drugs?

By boasting their technology is a “bioscience machine brain to discover new medicines and cures for diseases”, British drug discovery start-up Benevolentai has clearly overstated its case – even in the eyes of AI advocates. But that didn’t slow investors down. The confident self-promotion was backed by a recent funding round that now values Benevolentai at an estimated US$2bn. According to Bruce Booth, a partner at venture capital firm Atlas Ventures, it is unquestionable that computation is and will play an important role in drug R&D. But he remains sceptical it’ll be as transformational as all the hype claims. “Biology,” he says,”is the untamed beast.”
It’s easy to understand why claims being made by AI developers are falling on fertile ground. The pace of drug discovery and the time it takes to bring a drug to market hasn’t improved significantly in the last 40 years. Even worse for the industry – productivity has shown a downward trajectory in terms of return on investment. It’s a counterintuitive statistic in an age of new and exciting technologies like genome editing, high-throughput screening, and structure-based drug design. “Each of these technologies provides us valuable insights,” says Andrew Hopkins, the CEO of Exscientia Ltd. His start-up is one of many British companies seeking to leverage artificial intelligence for drug discovery and development. When asked why all those shiny new molecular biology technologies still haven’t improved productivity in biotech and pharma industries, Hopkins says they can actually make existing processes more complicated. “Greater insight plus greater complexity does not necessarily result in greater productivity.” Of course, he’s also convinced that this is where AI can make a difference.

Boom? Hype? Neither?

“What we are trying to do is improve decisionmaking itself, so that researchers are put into a position to tackle more complicated projects and deal with larger datasets,” says Hopkins. “It’s a true enabling technology that allows us to integrate all the other technological advances. So far, the bottleneck has been the human brain. AI helps us overcome cognitive limits.”
Hopkins isn’t the only one convinced that AI is a panacea for industry woes. The technology is already driving sizeable business activities in healthcare. Besides drug discovery, AI companies are developing business models in areas like finding better biomarkers, optimising clinical trial design, matching patients with treatments, health record management, medical imaging and surgery assistance. Consultants at Accenture Strategy estimate a current annual growth rate of 40% for the market. AI in healthcare was just a US$0.6bn market back in 2014. But it’ll be worth US$6.6bn by 2021. A good indication for accelerating growth is the number of healthcare-focused AI deals which – according to market analyst company CB Insights – rose from less than 20 in 2012 to about 170 in 2017. The number of newcomers per year, as indicated by a disclosed first equity round, rose from 7 to 77 over the same time period. In total, healthcare AI start-ups have raised US$4.3bn across 576 deals since 2013, topping all other industries in AI deal activity. And that includes cybersecurity and commerce. But enough numbers for the moment. Let’s go over some basics.

Getting the terms right

What exactly is AI? The elastic term has been applied to technologies invented as far back as the 1950s. “The basic tools of AI everyone is using have been around for several years now but their application has improved massively in the last couple of years,” says David Williams.The CEO of British drug discovery start-up Nanna Therapeutics adds that generally speaking, AI is a set of computer science techniques that allows – with increasing levels of technical advancement – software to learn from experience, to adapt to new inputs, and to complete tasks that resemble human intelligence. Nowadays existing methods are grouped under what is called ‘narrow AI’, with the most advanced forms able to find creative solutions that no human could ever think of to specific tasks. AI limits are now being pushed further to try to achieve what’s called ‘general AI’. That’s an artificial intelligence able to reflect like humans do.
In healthcare, the most important AI technology is machine learning, especially its newest iterations, which have been dubbed ‘deep machine learning’. A machine learning model is created by feeding data into a learning algorithm. So someone has to first write that algorithm, then train it with data that’s accurate and reliable. Over time, models can be re-trained with newer data, increasing their effectiveness. While the data aspect is extremely important for AI in drug discovery, lets focus first on the algorithms used. Basically, there are two types of them – supervised and unsupervised. Supervised learning algorithms make predictions based on a set of examples. For this machine learning method, it’s important to know the value of interest. Because the model is being fed with correct answers, predictions from supervised learning algorithms are more precise than from unsupervised learning algorithms.
In unsupervised learning, data points aren’t associated with labels. Instead, the goal of an unsupervised learning algorithm is to organise data in some way, or to describe its structure. This can mean grouping it into clusters, or finding different ways of looking at complex data so that it appears simpler or more organised. This form of training is less specific, not least because the people analysing the output might not even know the right answers themselves. That said, unsupervised learning can provide great benefits when an algorithm is tuned properly to fill in the blanks.
So for supervised machine learning, it’s crucial to have a massive set of high quality data. For unsupervised machine learning, human experts capable of putting the results into perspective are critical. “Many of our staff worked in Big Pharma before joining us,” says Exscientia’s Andrew Hopkins. People with an in-depth understanding of the challenges of drug discovery are crucial to interpretation, as there are many domain-specific hurdles of AI implementation. An algorithmic set-up might work well with protein structure data but not at all on high-content screening data.
The insight that combining AI and biology expertise might be the way to go is reflected by a whole new range of cooperation models. Drug developers are buying equity in AI start-ups, joint ventures are happening, and licensing in silico-generated drug candidate programmes has grown common. “We so far nominated four molecules as preclinical candidates, and hopefully next year we will see the first ones move into the clinic,” says Hopkins. According to him, Exscientia’s pharma partnerships with GSK and Sanofi bring more than just financial benefits: “We are learning by being exposed to new problems. Creating new algorithms to solve them enhances the system in its entirety. This way the solutions to more and more design problems are provided by algorithms.”

Small team, big value

Hopkins earned recognition for forming the indication and discovery group of US pharma major Pfizer in 2000. The UK site was one of the industry’s first drug repurposing units to use computational methodologies to mine the literature. Pfizer was trying to institutionalise the success of sildenafil – Viagra. “The results were a significant increase in field-of-use patents and the initiation of Phase II trials for new indications. We realised that a small team of people could generate a lot of value by mining information correctly,” Hopkins remembers. Back around 2005, many companies emerged with an IT-based approach to repurpose drugs. Since then, more expansive databases have been developed in lock-step with more elaborate algorithms. Exscientia published the first version of its automated drug discovery technology in 2012, and continued to improve it since. “It was a long journey, but the technology has now proven itself capable of making huge contributions to the discovery of new molecules,” the AI expert asserts.
That’s a positive take on the situation. First-movers like E-Therapeutics (UK) had to realign themselves as clinical candidates didn’t fulfill expectations. But a new wave of companies – including Exscientia, Benevolentai and Insilico Medicine – seem to have a sounder foundation, both technically and financially. They all share a common goal: to build traditional fully-integrated drug development companies. “Just as molecular biology was a tool adopted in many areas across the value chain 40 years ago, so is AI now,” Hopkins adds. “Companies like Amgen, Genentech and Biogen took molecular biology to the core of their business model and became the first generation of biotech. We see something analogous here. While AI is being adopted by existing companies at different speed and thoroughness, there will be a few big winners, with AI being at the heart of what they do.” Needless to say, Exscientia wants to be one of them.

Generating data in-house

David Williams is also bullish about his firm’s approach. While the business model (internal and partnered programmes) and the focus on a specific disease domain (mitochondria biology) are nothing out of the ordinary, Nanna Therapeutics does a few things differently on the tech side. “We developed a high-throughput method of synthesising medicinally-relevant tag-less small molecules that are fed into a high-content screening platform able to perform and screen billions of functional and phenotypic assays in a single day. That gives us a big advantage – we generate a lot of high-quality and highly consistent data in-house,” explains Williams.
That brings a second important aspect into the limelight: data. It dictates what the AI spits out at the other end. If the answer to a question you’ve asked just doesn’t come, its time to start tinkering with the algorithms. This is especially likely to occur when working with small datasets in supervised machine learning algorithms, where you have to carve out a big chunk of the data to train the AI. “If you have a big enough dataset, the answer is much easier to find,” Nanna’s CEO explains.
Without revealing the nature of the firm’s in-house generated small molecules (atom number and variety), Williams says that Nanna can synthesise billions of different molecules. For comparison: combining 17 atoms of carbon, nitrogen, sulphur, oxygen, hydrogen and the halogens results in a chemical space made up of 166 billion possible molecules. “Across the whole industry, pharmaceutical companies have less than 20 million plate-based small molecules from which to launch a programme,” Nanna’s CEO says. “Starting with 1 or 2 million compounds and using an iterative and linear process of primary, secondary and tertiary screening, each company will attempt to identify the best compounds in successive steps,” he explains. “Our goal is to completely transform this process. Accessing billions of whichever molecules we want, we do all the necessary screening in parallel on everything right at the very start – and avoid the iterative process – to identify interesting hits within a few weeks.”
Surprisingly, Williams admits that Nanna is not terribly sophisticated in the way it uses AI. The quality and quantity of the data allows the company to rely on relatively straightforward, mostly open-source software. Less surprisingly, Williams is concerned about the way some of the ‘cutting-edge’ AI technologies are being hyped at the moment: “These companies just try to squeeze out the few grains of gold left inside the data everyone has looked at dozens of times. I’ll leave it up to them, because we don’t have to look very hard to find a nugget here and there in our data.”

Small molecules ahead

Most activity in AI drug discovery centers around small molecules. A one-stop-shop for AI-based biologics discovery is not an obvious business model, due to the size of the molecules and the complexity of their assembly, interactions and decomposition. That said, a few companies are using AI to focus on certain aspects of biologics drug discovery. They include Antiverse and Peptone in the UK (antibodies), Pepticom in Israel (peptides), Deep Genomics in Canada (oligonucleotides) and Envisagenics in the US (RNA). Two very ambitious start-ups are Canada-based ProteinQure, which aims to design peptide-based therapeutics de novo and US-based Resonant Therapeutics, which says it can find relevant new drug targets and corresponding therapeutic antibodies in cancer tissue samples.
Danish Evaxion Biotech is among the few start-ups using AI to discover vaccines for infectious diseases and cancer. Another low-profile area is the AI-driven discovery of proteins and peptides for the materials science, cosmetics or food industries. Examples for the former are London-based Labgenius and Nuritas from Dublin, with the latter being pursued by multinationals BASF and Nestlé.
It certainly takes an effort to see your way through all the hype. But AI proponents say it’ll just get better and better.