S3-Bench: A Comprehensive Benchmark for Scientific Reasoning with MLLMs

A Comprehensive Overview of Multimodal Large Language Models in Scientific Research

Abstract

Recent advances in artificial intelligence (AI), especially large language models, have accelerated the integration of multimodal data in scientific research. Given that scientific fields involve diverse data types, ranging from text and images to complex biological sequences and structures, multimodal large language models~(MLLMs) have emerged as powerful tools to bridge these modalities, enabling more comprehensive data analysis and intelligent decision-making. This work, S³-Bench, provides a comprehensive overview of recent advances in MLLMs, focusing on their diverse applications across science. We systematically review the progress of MLLMs in key scientific domains, including drug discovery, molecular & protein design, materials science, and genomics. The work highlights model architectures, domain-specific adaptations, benchmark datasets, and promising future directions. More importantly, we also conducted benchmarking evaluations of open-source models on several highly significant tasks, such as molecular property prediction and protein function prediction. Our work aims to serve as a valuable resource for both researchers and practitioners interested in the rapidly evolving landscape of multimodal AI for science.

Overview figure from the paper
Figure 1:Overview of our S³-Bench, highlighting four major components discussed in the paper and presenting the key modalities and their corresponding applications in this field.

Research Fields

Figure 2: Distribution of MLLMs for drug and molecule tasks
Figure 2: Distribution of MLLMs for drug and molecule tasks, presenting each model's release date, scale, architecture and application.
Figure 3: Distribution of MLLMs for protein tasks
Figure 3: Distribution of MLLMs for protein tasks, presenting each model's release date, scale, architecture and application.
Figure 4: Distribution of MLLMs for gene and materials
Figure 4: Distribution of MLLMs for gene and materials, presenting each model's release date, scale, and architecture.