What is the project about?
ADD-ON creates an open, reliable dataset of bacterial enzymes and the amino-acid building blocks they recruit to assemble nonribosomal peptides, an important natural product class that includes drugs like penicillin. It combines new standardized experimental data with curated public data and AI to accelerate the discovery of new bioactive molecules.
What gap in the scientific community led to the creation or expansion of this benchmarking dataset?
Predicting chemical structures directly from genome sequences of their microbial producers remains a major bottleneck in natural product discovery. For nonribosomal peptides — an important natural product class for the discovery of urgently needed novel chemical entities — this challenge centers on adenylation domains, key enzyme parts that determine which amino acids are incorporated into the final compound. Existing datasets describing substrate specificity of adenylation domains are small, biased, and inconsistent, hindering chemical novelty prediction. ADD-ON will fill this gap by expanding and standardizing experimental data to create a reliable benchmark for AI-driven structure prediction.
What is the project’s structure — from data curation to expected outputs such as publications or competitions?
ADD-ON follows a clear five-step structure. We first curate and standardize existing public data on adenylation domains and select diverse candidates for new experimental measurements. Next, we measure substrate specificities using our in-house high-throughput platform and integrate these results into a unified, well-annotated dataset. This dataset will then support the development of an open benchmarking platform, including defined prediction tasks and evaluation metrics. The final phase focuses on community engagement through a publication and an open competition inviting AI and bioinformatics groups to test and improve their models on the dataset.
What impact does the project aim to achieve — within Helmholtz and across the broader research and industry community?
ADD-ON will provide the first open, standardized reference dataset for predicting enzyme–substrate relationships in nonribosomal peptide biosynthesis. Within Helmholtz, it will strengthen collaboration between AI, bioinformatics, and experimental biology groups, creating a model for cross-domain data-driven research. In the wider community, it will establish a reproducible benchmark that encourages fair comparison of machine learning methods and supports the development of more accurate genome-to-structure prediction tools. In the long term, this will accelerate the discovery of new natural compounds and contribute to global efforts against antimicrobial resistance and other pressing health challenges.