Google and ZebAI launch Chemome Initiative to identify ‘chemical probes’ with AI models

In a study published this week in the Journal of Medicinal Chemistry, researchers at Google in collaboration with X-Chem Pharmaceuticals demonstrated an AI approach for identifying biologically active molecules using a combination of physical and virtual screening processes. It led to the creation of the Chemome Initiative, which launches today — a collaboration between Google’s Accelerated Science team and startup ZebAI that aims to enable the discovery of many more small molecule chemical probes for biological research.

As part of the Chemome Initiative, Google says that ZebiAI will work with researchers to identify proteins of interest and source screening data, which the Accelerated Science team will use to train AI models. These models will make predictions on commercially available libraries of small molecules — chemical probes that aren’t useful as drugs, but that selectively inhibit or promote the function of specific proteins — that’ll be provided to researchers for activity testing to advance some programs through discovery.

Making sense of the biological networks that support life and produce disease is a complex task. One approach is using small molecules — in a biological system (e.g., cancer cells growing in a dish), they can be added at a specific time to observe how the system responds when a protein has increased or decreased activity.

Despite how useful chemical probes are for this kind of biomedical research, only 4% of human proteins have a known chemical probe available. In an effort to isolate new ones, Google and X-Chem Pharmaceuticals turned to the field of AI and machine learning.

VB Transform 2020 Online – July 15-17. Join leading AI executives:
Register for the free livestream.

As the coauthors of the study explain, chemical probes are identified by scanning the space of small molecules in a target protein to distinguish “hit” molecules that can be further tested. The physical part of the process uses DNA-encoded small molecule libraries (DELs) that contain many distinct small molecules in one pool, each of which is attached to a fragment of DNA serving as a “barcode” for that molecule. One generates many chemical fragments along with a common chemical handle. The results are pooled and split into separate reactions where a set of distinct fragments with another chemical handle are added.

The chemical fragments from the two steps react and fuse together at the common chemical handles, and they’re connected to build one continuous barcode for each molecule. Once a library has been generated, it can be used to find the small molecules that bind to the protein of interest by mixing the DEL with the protein and washing away the small molecules that don’t attach. Sequencing the remaining DNA barcodes produces millions of individual reads of DNA fragments that can then be processed to estimate which of the billions of molecules in the original DEL interact with the protein.

Google Chemome InitiativeGoogle Chemome Initiative

Above: The fraction of molecules from those tested showing various levels of activity, comparing predictions from the classifier and random forests on three protein targets.

Image Credit: Google

To predict whether an arbitrarily chosen small molecule will bind to a target protein, the researchers built a machine learning model — specifically a graph convolutional neural network, a type of model designed for graph-like inputs like small molecules. The physical screening with the DEL provide positive and negative examples for a classifier, such that the small molecules remaining at the end of the screening process are positive examples and everything else is negative examples.

The team physically screened three diverse proteins using DEL libraries: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). Using the DEL-trained models, they then virtually screened large make-on-demand libraries from drug discovery platform Mcule and an internal molecule library at X-Chem to identify a set of molecules predicted to show affinity with each protein target. Lastly, they compared the results of their classifier to a random forest model, a common method for virtual screening that uses standard chemical fingerprints. They report that the classifier significantly outperformed the RF model in discovering potent candidates.

The team tested almost 2000 molecules across the three targets, which it claims is the largest published prospective study of virtual screening to date.

“We’re excited to be a part of the Chemome Initiative enabled by the effective ML techniques described here and look forward to its discovery of many new chemical probes. We expect the Chemome will spur significant new biological discoveries and ultimately accelerate new therapeutic discovery for the world,” wrote Google in a blog post. “While more validation must be done to make the hit molecules useful as chemical probes, especially for specifically targeting the protein of interest and the ability to function correctly in common assays, having potent hits is a big step forward in the process.”