Biotechnology Computer Vision Research Accepted to 3DV

Data Scientist at InstaDeep, Marcin J. Skwark, is one of the lead authors of the research paper accepted for the International Conference on 3D Vision 2020 on predicting the biological function of molecules.



The paper, titled “3D Deep Learning for Biological Function Prediction from Physical Fields”, was conducted in cooperation with Machine Learning and Computer Vision researchers from Department of Informatics at Technical University Munich and the Department of Chemistry at Vanderbilt University, with professors Daniel Cremers and Jens Meiler being the senior authors of the work.

About the work, Marcin said; “The function of most biologically important molecules is dictated by their structure. Moreover, molecules with similar structure tend to share functional properties as well. With the advent of modern structure determination methods, such as cryo-EM, science is able to obtain more structural data about proteins and small-drug like molecules. However, inferring a function from structure remains an elusive task”.

Interdisciplinary collaboration

The work combines domain expertise across multiple fields, hence it would not be possible without an interdisciplinary collaboration between the lead authors of the paper, Vladimir Golkov (TUM) who contributed cutting edge computer vision methodology, and Marcin J. Skwark (Vanderbilt/InstaDeep) on computational biology and data science insights.  

“The paper demonstrates that 3D convolutional neural networks can be successfully applied to both function prediction of enzymes, and for the discovery of small drug-like molecules with a desired activity.  But contrary to prior work, there is no expert involvement necessary, as inter-molecular similarity is learned entirely from first principles, based on van der Waals surfaces of atoms. When applied to larger data sets, methods like these would allow for automatic annotation of function for unknown proteins, as well as form a way to repurpose known drug-like molecules for novel therapeutic applications. With the rapid development of both experimental and theoretical techniques to ascertain structures of biomolecules, as well as rapidly growing computational capabilities, it is likely that such methods will become a mainstay of modern life science research”, Marcin explains. 

The paper is accepted at 3DV (International Conference on 3D Vision 2020) taking place virtually from November 25th to 28th, 2020. The paper is published in full via InstaDeep’s Research page or on Arxiv. For now, please enjoy the abstract below!

Author list:

Vladimir Golkov (Technical University of Munich)
Marcin Skwark (InstaDeep)
Atanas Mirchev (Technical University of Munich )
Georgi Dikov (Technical University of Munich )
Alexander R Geanes (Vanderbilt University)
Jeffrey Mendenhall (Vanderbilt University  )
Jens Meiler (Vanderbilt University) 
Daniel Cremers (TU Munich)


3D Deep Learning for Biological Function Prediction from Physical Fields


Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem in biology and medicine. The electron density field and electrostatic potential field of a molecule contain the “raw fingerprint” of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on Enzyme Commission numbers is predicted from the approximated electron density field. In another set of experiments, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods, meaning that neural networks are able to extract the relevant information from raw physical fields, without using handcrafted descriptors. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations.