Pero – Unlocking ML Potential: Benchmark Datasets on Perovskite Thin Film Processing

Visual for Pero; Addressing the lack of standardized, FAIR benchmark datasets in perovskite photovoltaics. Pero enables reproducible AI models for efficiency prediction, material classification, and defect detection, which are critical for industrial scaling of sustainable energy technologies.

Image: Photo: Markus Breig, KIT; illustration: Felix Laufer, KIT | info

What is the project about?

Scaling perovskite photovoltaics is key for the clean energy transition, but process complexity hinders industrialization. Unlock-Pero delivers a benchmark dataset linking process data with device performance. By defining AI tasks and sharing baseline models, the project accelerates industrial adoption of perovskite solar cells.

What main scientific or societal challenge does the benchmark address?

Addressing the lack of standardized, FAIR benchmark datasets in perovskite photovoltaics. Pero enables reproducible AI models for efficiency prediction, material classification, and defect detection, which are critical for industrial scaling of sustainable energy technologies.

What gap in the scientific community led to the creation or expansion of this benchmarking dataset?

Despite perovskite photovoltaics showing remarkable progress with efficiencies exceeding 26%, the field critically lacks open datasets linking processing data to final device performance. This absence prevents meaningful comparison and validation of AI methods for critical industrial challenges like defect detection and efficiency prediction. Unlock-Pero addresses this fundamental gap by providing a comprehensive, multi-dimensional benchmark dataset (spatial, temporal, and spectral imaging data) with annotated labels for three key tasks: performance prediction, material classification, and defect segmentation. It strengthens open, collaborative research and contributes to broadening the field’s trust in AI-based methods.

How does the benchmark dataset support reproducibility, robustness, and fairness in AI research?

Unlock-Pero ensures reproducibility through standardized annotation protocols across KIT and HZB labs, cross-validation by multiple experts, and standardized data splits for training, validation, and testing. To ensure robustness, real-world variability across different laboratories, fabrication conditions, and equipment setups are captured, preventing overfitting to single-lab conditions. Fairness is achieved by openly releasing the dataset, annotations, and baseline machine learning models. Public access via domain-specific repositories such as NOMAD Oasis ensures broad accessibility and community-driven benchmarking. By providing clear benchmarking tasks and reference implementations, the benchmark dataset project enables fair, transparent comparison of AI methods across the global experimental perovskite research community.

In what ways does the project foster cross-domain, cross-center, or interdisciplinary collaboration?

Unlock-Pero exemplifies Helmholtz’s collaborative strength by uniting experimental photovoltaics researchers from KIT and HZB with AI experts from Helmholtz AI. This cross-center partnership creates synergies impossible within single institutions. The project bridges materials science, energy research, and computer vision by combining KIT’s advanced in-situ imaging capabilities with HZB’s diverse fabrication methods, guided by AI best practices from Helmholtz AI consultants. The interdisciplinary team ensures that the real-world challenges in perovskite fabrication are tackled with state-of-the-art AI methods, fostering a collaborative culture across Helmholtz centers and research fields.

Project partners

Karlsruhe Institute of Technology (KIT), Institute of Microstructure Technology, Next Generation Photovoltaics

Helmholtz-Zentrum Berlin für Materialien und Energy GmbH, Department Solution Processing of Hybrid Materials & Devices

Karlsruhe Institute of Technology (KIT), Scientific Computing Center

Helmholtz AI Consultants Energy

Primary Contact

Prof. Dr. Ulrich W. Paetzold, KIT, Institute of Microstructure Technology

Other projects

Visual for ADD-ON; ADD-ON addresses the lack of reliable data for predicting how microbial enzymes assemble peptide-based natural products. By enabling accurate AI-driven structure prediction, it accelerates the discovery of new bioactive compounds and ultimately supports efforts to combat antimicrobial resistance.

Image: ADD-ON | info

ADD-ON: Adenylation Domain Database and Online Benchmarking Platform

ADD-ON addresses the lack of reliable data for predicting how microbial enzymes assemble peptide-based natural products. By enabling accurate AI-driven structure prediction, it accelerates the discovery of new bioactive compounds and ultimately supports efforts to combat antimicrobial resistance.

Visual for GRIDMARK; Transforming energy systems toward climate neutrality: Distribution grids have the potential to be catalysts for the energy transition. Unfortunately, most Distribution System Operators lack the resources to fully monitor their systems. Therefore, there is an urgent need for more high-quality data, particularly to develop and test machine learning models.

info

GRIDMARK – Generating Reproducible Insights through Data Benchmarking for AI in Energy Systems

Transforming energy systems toward climate neutrality: Distribution grids have the potential to be catalysts for the energy transition. Unfortunately, most Distribution System Operators lack the resources to fully monitor their systems. Therefore, there is an urgent need for more high-quality data, particularly to develop and test machine learning models.

Visual for NeuroHarmonize; The benchmark addresses the lack of harmonized, reproducible, and privacy-preserving multimodal datasets for Alzheimer’s disease (AD). Current AI models struggle with fragmented and non-standardized data, which limits their generalizability and clinical deployment. NeuroHarmonize creates a FAIR-compliant, decentralized benchmarking framework to accelerate reliable, transparent, and collaborative AI for AD diagnosis, prognosis, and long-term monitoring.

Image: Georg Kislinger, Martina Schifferer, Christian Haass & Maryam Khojasteh-Farat, DZNE (BSIC 2021 contribution) | info

NeuroHarmonize – A Benchmark Decentralized Data Harmonization Workflow for AI-Driven Alzheimer’s Disease Management

The benchmark addresses the lack of harmonized, reproducible, and privacy-preserving multimodal datasets for Alzheimer’s disease (AD). NeuroHarmonize creates a FAIR-compliant, decentralized benchmarking framework to accelerate reliable, transparent, and collaborative AI for AD diagnosis, prognosis, and long-term monitoring.