OpenFold, a non-profit artificial intelligence (AI) research consortium, has recently introduced two new tools that promise significant advancements in protein structure prediction and protein/protein interactions. The first tool, SoloSeq, integrates a new protein Large Language Model (LLM) with OpenFold’s structure prediction software. The second tool, OpenFold-Multimer, creates higher-quality models of protein/protein complexes. These tools are crucial for designing proteins that do not naturally occur in nature, which opens up possibilities for developing novel therapeutics.
SoloSeq, built on Amazon Web Services, is the first fully open-source integrated protein LLM/structure prediction AI tool. It offers a critical training code that other organizations can use to fine-tune or train new models on their own proprietary data. By making the training code and data sets accessible to the scientific community, SoloSeq enables new scientific breakthroughs that were previously not possible with closed-source models. This commitment to open science facilitates accelerated advancements and improvements in these powerful tools.
One major advantage of SoloSeq is that it eliminates the need for a separate pre-computational step, making the calculation on average more than 10 times faster without compromising accuracy. Traditional protein structure prediction methods require a Multiple Sequence Alignment (MSA) step, which looks for similar protein sequences in nature. SoloSeq’s LLM has already analyzed most known protein sequences, allowing it to rapidly summarize evolutionary information. This makes SoloSeq ideal for large-scale screens where speed is critical. Additionally, SoloSeq handles non-natural proteins, such as those designed de novo, which are not well addressed by MSA-based systems.
OpenFold-Multimer, the second tool, is the first fully open-source protein/protein complex modeling toolkit with included training code. This tool enables users to create new structures, retrain existing models, or fine-tune them with proprietary data. The release of OpenFold-Multimer follows the important work of DeepMind’s AF2-multimer code and model, which demonstrated that retraining a multimer-specific model improves structure accuracy.
According to Brian Weitzner, Ph.D., Director of Computational and Structural Biology at Outpace and co-founder of OpenFold, these tools are essential for curing diseases and represent a significant step forward in protein design. The open-source nature of SoloSeq and OpenFold-Multimer ensures that both industry and academia can leverage these architectures for life science innovation in fields such as pharmaceuticals and agriculture.
OpenFold’s commitment to open science is in line with their mission of developing free and open-source software tools for biology and drug discovery. OpenFold is hosted as a project of the Open Molecular Software Foundation (OMSF). The release of SoloSeq and OpenFold-Multimer marks another significant milestone in bringing deep learning capabilities to the entire life science community.
As these tools continue to evolve, they have the potential to revolutionize protein structure prediction and enhance the discovery of innovative therapeutics. By providing faster and more robust predictions, higher-quality protein/protein interaction models, and the ability to design de novo proteins, OpenFold is empowering researchers and accelerating scientific progress in the field of protein engineering.
For more information about OpenFold and their groundbreaking tools, visit their website at openfold.io.
Sources:
– OpenFold Announcement: [https://www.businesswire.com/news/home/20240219658831/en/](https://www.businesswire.com/news/home/20240219658831/en/)
– OpenFold Website: [https://openfold.io/](https://openfold.io/)