Open Source

Creating collaborative, world-first tools for biologists with DeepChain.

Github Repository

DeepChain Apps for Biology Research

InstaDeep’s mission is to accelerate the transition to an AI-first world that benefits everyone. We have created DeepChain Apps to help tackle some of the toughest biology challenges with advanced AI. We’re making three repositories publicly available: Apps, Bio-Datasets and Bio-Transformers. This means anyone with a laptop and an internet connection could work with models that were previously only available to elite institutions with massive computing resources.

Our open source repositories

Apps

Trained Apps or machine learning models are readily available to be used to support with protein research and design tasks

Bio-Datasets

Open-source collection of biology datasets and pre-trained embeddings, to be used for training personalised models

Bio-Transformers

A simple way to interact with protein sequence models to use them in your protein research and design tasks.

The DeepChain Open Source Ecosystem

DeepChain Apps are scoring metrics that leverage innovative AI, including natural language processing models. On the DeepChain Apps platform, users can create and share their scoring metrics to evaluate protein sequences. For example, an App could be a model that understands what makes a protein pathogenic (disease-causing). Such a scoring system could be used as a Classifier or Predictor.

  • As a Classifier, it could help to classify large datasets into pathogenic and non-pathogenic sequences. 
  • As a Predictor, the scorer could help predict how a mutation affects a sequence’s pathogenicity, to better understand diseases and leverage findings to design low-pathogenic proteins for medicines. 

Bio-Datasets are a collection of publicly-available, sequence-based protein datasets ready to be used to train machine learning models. In addition, we provide pre-trained contextual embeddings to speed up the fine-tuning of your personalised models. 

Bio-Transformers are AI models that have been trained on hundreds of millions of protein sequences from the UniRef100 and Big Fat Databases. Upon training, these models understand the language of proteins and how protein sequences vary.

DeepChain Hub

DeepChain Hub is a place where biological apps, AI and data converge.

Through the DeepChain App hub, users can use open source AI-powered scorers in their protein design and research effortlessly. The hub allows users to search for Apps, along with information on each App’s goal, author, data sources and methodology. 

The DeepChain Dataset hub allows users to search for curated protein sequence data sets, and leverage the predictive power of available pre-computed embeddings.

The hub displays all available Apps

Want to get involved?

Read more about DeepChain Apps in our post Build powerful AI protein Apps in less than 24 hours with DeepChain open-source tools and visit DeepChain’s GitHub to learn more about the technical aspects of building ML models and engage with our community of experts. 

The DeepChain Playground module leverages transformer algorithms and is available for free to help you analyse your protein sequences and discover variants and key regions. Create your account here and start using AI to accelerate and improve your design and discovery process.


To learn more about DeepChain, send us an email at hello@deepchain.bio

If you are a computational biologist passionate about AI, join our team! You can find our job vacancies here.