
Towards Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation (ICLR, 2025)
April 27, 2025 | Singapore Expo
About the Workshop
Our mission is to foster interdisciplinary collaboration to develop fully autonomous AI systems, addressing challenges like benchmark datasets, human-AI collaboration, robust tools and methods for validating AI outputs, and trustworthiness. By tackling these issues, we can unlock AI's transformative potential in research. In this workshop, themed Agentic AI for Science, we will explore these critical topics and welcome diverse perspectives. We will focus on integrating agentic AI systems to enhance scientific discovery while upholding rigorous standards. For AI to contribute effectively, it must generate novel hypotheses, comprehend their applications, quantify testing resources, and validate feasibility through well-designed experiments. This workshop serves as a vital forum for collaboration and knowledge-sharing aimed at redefining the landscape of scientific discovery. This workshop aims to address four main research thrusts to propel future research, including (non-exclusively):
Thrust 1. Design and development of agentic AI systems for scientific discovery. The emergence of agentic AI, powered by foundation models—particularly generative models—opens up unprecedented opportunities for scientific discovery. These systems can potentially revolutionize various aspects of the scientific process, including hypothesis generation, comprehension of complex scientific phenomena, quantification, and validation. Designing and developing effective agentic AI systems for scientific discovery is both exciting and non-trivial. Pioneering work in this field has already demonstrated the promise of leveraging scientific tools, agents, and knowledge graphs. Notable examples include ChemCrow, which showcases the potential of AI in chemistry; Crispr-GPT, which applies AI to genetic engineering; and SciAgents , which illustrates the power of multi-agent systems in scientific discovery. These groundbreaking studies highlight the transformative potential of agentic AI in accelerating scientific progress and opening new avenues for research. Key research topics in this thrust include (but not limited to):
- Developing scientific foundation models: Tailoring general foundation models specifically for various scientific fields to enhance relevance and accuracy.
- Effective scientific tool augmentation: Enhancing existing scientific tools and methodologies with agentic AI capabilities.
- Multi-agent decomposition design: Developing frameworks for scientific hypothesis generation using multiple specialized AI agents.
- Human-in-the-loop agentic systems: Improving reliability and interpretability of AI-driven scientific discoveries through strategic human intervention.
Thrust 2. Theoretical foundation for scientific agentic AI. Developing agentic scientific AI requires methods to quantify the predictions and performance of these systems, as well as to validate the scientific hypotheses they generate. A thorough investigation of agentic scientific AI systems also demands solid theoretical foundations and tools to ensure guarantees on their behavior. To analyze and evaluate such systems, we will incorporate theoretical tools in modeling, logical reasoning, model validation and diagnosis, interpretable AI, and other general methods that can provide guarantees on agentic systems. Key topics in this area include, but are not limited to, the following:
- Theoretical foundation: Statistical models and theories of agentic scientific AI, such as theoretical studies on in-context learning, multi-agent communications, game theory, physics-informed hard and soft optimization constraints, and neural operators.
- Logic reasoning: Inductive, deductive, and abductive reasoning; Bayesian reasoning and probabilistic programming; neural-symbolic approaches.
- Model quantification, validation, diagnosis: Theory-driven metrics for quantifying AI system performance; self-evaluation of LLMs; data valuation and data-centric AI; diagnostics for data, architecture, and training processes; creation of standardized benchmarks for evaluating the validity of scientific hypothesis generation; scientific facts and hallucination.
- Interpretable AI: Approaches for explaining agentic AI system behaviors; quantifying trust, safety, and transparency; mechanistic interpretability.
Thrust 3. Practical application of scientific agentic AI. Deploying agentic AI systems in practical scientific research across diverse domains presents numerous challenges, particularly due to the need for domain-specific adaptation such as the unique data formats and model constraints of each scientific field. Bias in training data poses a significant risk, especially in sensitive domains like medicine. Trustworthiness and explainability are essential for scientists to confidently integrate AI-generated hypotheses and solutions into their research. Furthermore, ethical considerations arise when AI systems potentially automate research decisions that may impact public health, policy, or environmental outcomes, underscoring the importance of responsible AI deployment in science.
- Domain-specific model adaptation: Adapting agentic AI models to handle domain-specific data formats, workflows, and tools across various scientific fields; transfer learning and data-efficient fine-tuning.
- Bias detection and mitigation: Identifying and mitigating bias in training data, model design and outputs; fairness-aware AI systems for sensitive domains like healthcare and social science.
- Robustness, trustworthiness and explainability: Methods for improving the transparency and explainability of agentic AI systems in scientific research; uncertainty interpretation and quantification.
- Ethical considerations and responsible use of agentic AI in sensitive research areas; development of AI governance models to ensure accountability and human oversight in automated scientific workflows.
Thrust 4. Open problems and challenges on scientific agentic AI. Despite the promising potential of agentic AI in scientific discovery, many open problems and challenges remain to be addressed. These may include:
- Automatic curation of domain-specific scientific domains and integration of the knowledge into agentic AI systems.
- Advanced mechanisms of multi-agent collaboration in scientific discovery, with considerations of their scalability and computational efficiency.
- Continual evolution and learning of agentic AI systems; Mechanisms for updating models and improving performance based on experimental results, new data and discoveries.
- Validation and reproducibility of results generated by agentic AI systems.
Call for Papers
We are pleased to announce the Workshop on Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation at The International Conference on Learning Representations (ICLR) 2025, to be held at Singapore Expo, Singapore, on April 27, 2025. This workshop aims to explore the transformative potential of agentic AI in scientific discovery, focusing on hypothesis generation, validation, and other critical stages of the scientific process. By fostering interdisciplinary collaboration, the workshop seeks to address challenges and unlock new opportunities in the design and application of agentic AI systems.
Workshop Themes
We invite contributions addressing the following research thrusts:
- Design and Development of Agentic AI Systems: Exploring frameworks, tools, and human-in-the-loop systems for scientific discovery.
- Theoretical Foundations: Developing statistical models and reasoning approaches for hypothesis validation and performance assessment.
- Practical Applications: Examining domain-specific adaptations, ethical considerations, and governance frameworks for responsible deployment.
- Open Problems and Challenges: Addressing issues in knowledge integration, validation, and continual improvement of agentic AI systems.
Key Focus Areas
Submissions are encouraged in the following areas (not exhaustive):
- AI-driven hypothesis generation and validation.
- Statistical and logical reasoning approaches.
- Applications of AI in scientific experimentation.
- Ethical, reproducibility, and governance challenges in AI-driven science.
Types of Contributions
Our proposed workshop invites a diverse array of paper types, including original research, position papers, short/tiny papers and survey articles, all aimed at advancing Agentic AI research for scientific discovery.
- Original research papers: Present groundbreaking findings, innovative methodologies, or theoretical insights.
- Position papers: Provide thought-provoking perspectives on emerging trends and challenges in the field.
- Short/Tiny Papers Track: This track encourages submissions that are innovative and exploratory, even if not yet fully developed. We invite concise, impactful contributions with a page length of 5 pages (excluding references), focusing on novel ideas, preliminary results, or insightful perspectives that can spark discussions and inspire further research.
- Survey articles: Offer comprehensive overviews of specific topics, illuminating current research landscapes and proposing future directions.
Future Dates and Deadlines
- Call for Papers released: January 10, 2025
- Submission period: January 10 - February 10, 2025
- Submission Deadline: February 10, 2025 (Anywhere on Earth, AoE)
- Review period: February 10 - March 02, 2025
- Notification of Acceptance: March 04, 2025
- Camera-Ready Submission: March 20, 2025
- Workshop Dates: April 27, 2025
Submission Site
We will use OpenReview to manage the submissions and reviewing process. All listed authors must have an up-to-date OpenReview profile, properly attributed with current and past institutional affiliation, homepage, Google Scholar, DBLP, ORCID, LinkedIn, Semantic Scholar (wherever applicable). The OpenReview profile will be used to handle conflict of interest and paper matching. Submissions will not be made public on OpenReview during the reviewing period.
Abstracts and papers can be submitted through the OpenReview platform: OpenReview Submission Site .
Camera-Ready Submission Guidelines
Authors of accepted papers are required to submit the Camera-Ready version through the OpenReview Platform . The final submission must incorporate all necessary elements, including implementation code, author affiliations, and address the reviewer comments. Additionally, authors must provide a 3-5 minute video recording summarizing their paper along with presentation slides. These materials should be uploaded to the "[ Event Metadata for Authors (e.g., Poster and Slide Uploads...) ]" section corresponding to their paper on the ICLR Agentic AI for Science Workshop Page.
Poster Guidelines
- Workshop Poster size: Maximum size allowed for printing: 61cm x 91cm (24in x 36in) width/height.
- Poster Printing: You can print your poster using any service you want or ICLR offers an optional poster printing service with delivery to the convention center. Please use the Workshop poster order form to make the order.
Submission Guidelines
Submissions must adhere to the ICLR 2025 submission format and be submitted as PDFs through the OpenReview platform. All submissions will undergo a rigorous double-blind peer review process. Accepted papers will be included in the ICLR 2025 workshop proceedings.
- Deadlines: Submission deadlines are strict, and no extensions will be granted. Placeholder/dummy abstracts are not allowed.
- Tiny Papers Track: This year, ICLR is discontinuing the separate "Tiny Papers" track, and is instead requiring each workshop to accept short (3–5 pages in ICLR format, exact page length to be determined by each workshop) paper submissions, with an eye towards inclusion; see Call For Tiny Papers for more details. Authors of these papers will be earmarked for potential funding from ICLR, but need to submit a separate application for Financial Assistance that evaluates their eligibility. This application for Financial Assistance to attend ICLR 2025 will become available on ICLR Website at the beginning of February and close on March 2nd.
- Authorship: All listed authors must have contributed substantially to the work and agree to its submission. Large Language Models (LLMs) cannot be listed as authors but may be cited as tools if used appropriately.
- Anonymity: Submissions must follow the double-blind review process. Authors should ensure that no identifying information appears in the paper, supplementary material, or any external links.
-
Formatting: Submissions must use the ICLR 2025 LaTeX template. Papers are limited to 9 pages of content, with unlimited space for references and an optional appendix. The main content of the paper (first 9 pages) must be self-contained, as reviewers are not required to read beyond these.
The LaTeX template is available for download: ICLR 2025 LaTeX Template. - Originality: Submissions must present original work. Papers under review at, or published in, other peer-reviewed venues cannot be submitted. Previously presented work in workshops or non-archival formats is allowed but must be appropriately anonymized. For details, visit the ICLR 2025 Author Guide.
- Ethics: Authors are encouraged to include a section on the ethical implications of their research and data use. Submissions must comply with ICLR’s ethics policies, available at ICLR Code of Ethics.
- Code of Conduct: Authors are required to adhere to ICLR’s code of conduct, available at ICLR Code of Conduct.
Reviewing Process
Each submission will undergo a rigorous double-blind peer review process. Submissions will be evaluated on criteria such as technical merit, originality, potential impact, and ethics.
Reviewers, including organizers, will not evaluate submissions from individuals who:
- Have been colleagues within the same organization in the past three years.
- Have co-authored publications within the last three years.
- Are currently affiliated with the same institution as the submitting authors.
Publication and Presentation Policies
Accepted papers will be published as part of ICLR 2025 workshop proceedings. Authors of accepted papers are required to present their work at the workshop, either as oral presentations or posters, as determined by the workshop organizers. Papers must be covered by a distinct conference registration and presenters are encouraged to include a clear explanation of the contributions and implications of their work during their presentations. Pre-recorded video presentations may be requested for accessibility and archiving purposes. Virtual presentation options may be provided for authors unable to attend in person, in line with ICLR's hybrid participation policy.
Other AI for Science Workshops at ICLR 2025
If you feel that your work is not a good thematic fit for the Agentic AI for Science workshop, consider submitting to one of the other AI for science workshops, such as:
Program Committee Co-Chairs
-
Dr. Lifu Huang - UC Davis
lfuhuang@ucdavis.edu -
Dr. Dawei Zhou - Virginia Tech
zhoud@vt.edu -
Dr. Adithya Kulkarni - Virginia Tech
aditkulk@vt.edu
Workshop Schedule
The workshop is scheduled to take place during ICLR 2025 on April 27, 2025 in Singapore.
Singapore Time (GMT+8) | Event |
---|---|
08:30 - 08:40 | Opening Remarks (Dr. Lifu Huang) |
08:40 - 09:25 | Keynote (Dr. Marinka Zitnik): Empowering Biomedical Discovery with "AI Scientists" |
09:25 - 10:10 | Keynote (Dr. Chandan K Reddy): Toward Agentic AI Systems for Interpretable Scientific Equation Discovery |
10:10 - 10:55 | Keynote (Dr. Jingrui He): Exploitation vs. Exploration in Sequential Decision Making |
10:55 - 11:05 | Coffee Break |
11:05 - 11:50 | Keynote (Dr. Sanmi Koyejo): Preference-Guided Multi-Objective Optimization for Scientific Discovery |
11:50 - 12:35 | Keynote (Dr. Markus J Buehler): Physics-Aware AI: Bridging Science Through Multi-Agent Reasoning Systems |
12:35 - 13:30 | Lunch Break and Interaction Session of Speakers and Presenters with K-12 and Undergraduate Students from Underrepresented Communities from Singapore |
13:30 - 13:45 | Neural Nonmyopic Bayesian Optimization in Dynamic Cost Settings |
13:45 - 14:00 | Large Language Models Are Innate Crystal Structure Generators |
14:00 - 14:15 | LLM-Augmented Chemical Synthesis and Design Decision Programs |
14:15 - 14:30 | MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses |
14:30 - 14:45 | Large Language Models powered Neural Solvers for Generalized Vehicle Routing Problems |
14:45 - 15:00 | Agent S: An Open Agentic Framework that Uses Computers Like a Human |
15:00 - 15:15 | CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models |
15:15 - 15:25 | Coffee Break |
15:25 - 16:15 | Panel Discussion (Dr. Sanmi Koyejo, Dr. Marinka Zitnik, Dr. Yujun Yan, Dr. Beatrice Soh, Dr. Yarin Gal) |
16:15 - 16:30 | AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data |
16:30 - 16:45 | ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code |
16:45 - 17:45 | Poster Session (For remaining papers) |
17:45 - 18:00 | Closing Remarks with Awards (Dr. Yaoqing Yang, Dr. Yujun Yan, Dr. Beatrice Soh) |
Keynote Speaker | Talk Title | Abstract |
---|---|---|
Dr. Marinka Zitnik | Empowering Biomedical Discovery with "AI Scientists" | We envision "AI scientists" as collaborative systems that learn, reason, and interact across diverse biomedical tools and data modalities to accelerate scientific discovery. These agents integrate foundation models, real-time knowledge, and structured experimentation to solve complex problems in therapeutics and biology. I will present two such systems: ProCyon and TxAgent. ProCyon is a foundation model for protein phenotypes, trained across five interrelated knowledge domains: molecular function, therapeutic mechanism, disease association, protein domain, and molecular interaction. It unifies sequence, structure, and phenotypic inputs through multimodal co-training and instruction tuning, enabling zero-shot transfer and free-form phenotype generation. Evaluated on dozens of tasks, ProCyon outperforms specialized and generalist models in contextual retrieval, binding prediction, and variant effect inference. It generates candidate phenotypes for under-characterized proteins and supports discovery in complex diseases like Parkinson's. TxAgent is an AI agent for therapeutic reasoning that integrates real-time biomedical knowledge with a curated universe of 211 computational tools, including all FDA-approved drugs since 1939 and validated clinical evidence. TxAgent performs multi-step inference to assess drug interactions, contraindications, and patient-specific treatment strategies. It dynamically selects and executes tools, synthesizes information from multiple sources, and aligns recommendations with clinical guidelines. Across 3,168 drug reasoning tasks and 456 patient scenarios, TxAgent achieves 92.1% accuracy, outperforming specialized LLMs and reasoning agents (DeepSeek-R1-671B). ProCyon and TxAgent pave the way toward "AI scientists" systems that contribute to scientific understanding and therapeutic design and can eventually learn and innovate on their own. |
Dr. Chandan K Reddy | Toward Agentic AI Systems for Interpretable Scientific Equation Discovery | Equation discovery is a crucial aspect of computational scientific discovery, traditionally approached through symbolic regression (SR) methods that focus mainly on data-driven equation search. Current approaches often struggle to fully leverage the rich domain-specific knowledge that scientists typically rely on. We present LLM-SR, an agentic AI-based iterative approach that combines the power of large language models (LLMs) with evolutionary program search and data-driven optimization to discover scientific equations more effectively and efficiently while incorporating scientific prior knowledge. LLM-SR integrates several key aspects of the agentic scientific discovery pipeline, namely, scientific knowledge representation and reasoning (enabled through autonomous LLM agents using prompting and prior knowledge), hypothesis generation (via agent-driven equation skeleton proposals), data-driven evaluation and optimization, and evolutionary search for iterative refinement. Through this integration, our approach discovers interpretable and physically meaningful equations while ensuring efficient exploration of the equation search space and generalization to out-of-domain data. We will demonstrate LLM-SR’s effectiveness across various scientific domains—nonlinear oscillators, bacterial growth, and material stress behavior. This work not only improves the accuracy and interpretability of discovered equations but also enhances the autonomy and efficiency of the equation discovery process, aligning with the goals of agentic AI systems for accelerating scientific innovation. |
Dr. Jingrui He | Exploitation vs. Exploration in Sequential Decision Making | Autonomous sequential decision making is of key importance in agriculture. For example, in anticipation of an extreme weather event (e.g., flash droughts, derechos), it is critical to make proper decisions (e.g., plant date, irrigation frequency and amount) in order to avoid potentially severe impacts on the crop yield. In this talk, I will introduce some of our recently developed techniques studying the tradeoff between exploitation and exploration in sequential decision making. I will start by introducing EE-net for contextual bandits, which leverages two neural networks for learning the reward function and for adaptively learning the potential gains compared to the currently estimated reward respectively. Then I will introduce PageRank Bandits, which adapts the neural exploration strategy from EE-net to address link prediction problems on graphs. Furthermore, in the presence of adversarial attacks or corruptions on the received rewards, I will present our recent work utilizing a novel context-aware Gradient Descent training strategy to improve the robustness of contextual bandits. Towards the end, I will also share my thoughts regarding future directions. |
Dr. Sanmi Koyejo | Preference-Guided Multi-Objective Optimization for Scientific Discovery | Multi-objective optimization problems pervade science and engineering, requiring decision-makers to select Pareto-optimal solutions aligned with their preferences among competing objectives. This challenge is particularly acute in applications like drug discovery and clinical planning, where experts must manually evaluate numerous candidates based on chemical intuition across multiple properties. We propose two applications that combine preferential multi-objective optimization with intuitive constraint specification. Our approach enables domain experts to guide the selection process through both pairwise comparisons and by defining bounds on objectives, effectively capturing domain expertise while efficiently navigating vast solution spaces. For brachytherapy planning, our method yields solutions with over 3% greater utility than competing approaches. In drug discovery, our approach significantly outperforms state-of-the-art screening methods, recovering up to 43% of known drugs while examining only 6% of a 100K candidate library for EGFR and DRD2 targets. These results highlight the potential of preference-guided multi-objective optimization to dramatically improve efficiency in resource-intensive tasks by effectively leveraging domain expertise through intuitive preference specification. |
Dr. Markus J Buehler | Physics-Aware AI: Bridging Science Through Multi-Agent Reasoning Systems | To be added |
Accepted Papers
Oral Presentations
- MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
- Large Language Models powered Neural Solvers for Generalized Vehicle Routing Problems
- Agent S: An Open Agentic Framework that Uses Computers Like a Human
- CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models
- AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data
- ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
- Neural Nonmyopic Bayesian Optimization in Dynamic Cost Settings
- LLM-Augmented Chemical Synthesis and Design Decision Programs
- Large Language Models Are Innate Crystal Structure Generators
Poster Presentations
- Orchestrating Tool Ecosystem of Drug Discovery with Intention-Aware LLM Agents
- A Simplified a priori Theory of Meaning; Nature Based AI 'First Principles'
- ENHANCING DIVERSITY AND NOVELTY IN TEXT GENERATION VIA MULTI-VIEW EMBEDDINGS
- Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots
- Emerging Multi-AI Agent Framework for Autonomous Agentic AI Solution Optimization
- HEP-JEPA: A Foundation Model for Collider Physics
- AgenticHypothesis: A Survey on Hypothesis Generation Using LLM Systems
- Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
- Automated Machine Learning Research via Agentic Exploration with Human Oversight
- LLM Agents for Literature to Code Conversion: Case Study of Heat Exchanger Design
- MDCROW: Automating Molecular Dynamics Workflows with Large Language Models
- Evolving RL: Discovering New Activation Functions using LLMs
- ProteinHypothesis: A Physics-Aware Chain of Multi-Agent RAG LLM for Hypothesis Generation in Protein Science
- Dynamic Knowledge Integration in Multi-Agent Systems for Content Inference
- APPA : Agentic Preformulation Pathway Assistant
- Automatic Scientific Claims Verification with Pruned Evidence Graph
For more details on the accepted papers, visit the OpenReview Workshop page or the ICLR Workshop Page.
ORGANIZERS
Publicity Chair
Workshop Sponsors
Our workshop is sponsored by AIJ and Amazon.


CONTACT US
For inquiries regarding the workshop, please reach out to us at aditkulk@vt.edu.