CINXE.COM
Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving
<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving</title> <!--Generated on Mon Nov 11 18:46:12 2024 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <base href="/html/2411.07228v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S1" title="In Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">1 </span>Introduction</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S2" title="In Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2 </span>ChemAgent</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3" title="In Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3 </span>Experiments</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.SS1" title="In 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.1 </span>Experimental Setup</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.SS2" title="In 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2 </span>Overall Performance</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.SS3" title="In 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.3 </span>Error Analysis</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A3.SS2" title="In 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">C.2 </span>Reasoning: Information Oversight</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A3.SS3" title="In 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">C.3 </span>Reasoning: Algebra Error</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A3.SS4" title="In 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">C.4 </span>Grounding: Wrong Input Format</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A3.SS5" title="In 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">C.5 </span>Tool: Wrong Tool Output</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A3.SS6" title="In C.5 Tool: Wrong Tool Output ‣ 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">C.6 </span>Tool: Inconsistent Tool Outputs</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_appendix"><a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A4" title="In C.6 Tool: Inconsistent Tool Outputs ‣ C.5 Tool: Wrong Tool Output ‣ 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">D </span>Tool Usage Analysis</span></a></li> </ol> </li> </ol> </li> </ol> </li> </ol> </li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"><span class="ltx_note ltx_role_footnotetext" id="footnotex1"><sup class="ltx_note_mark">*</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">*</sup><span class="ltx_note_type">footnotetext: </span>Equal contribution. <sup class="ltx_sup" id="footnotex1.1">†</sup>Correspondence to: <span class="ltx_text ltx_font_typewriter" id="footnotex1.2">{yu.3737, sun.397}@osu.edu</span></span></span></span> <h1 class="ltx_title ltx_title_document">Tooling or Not Tooling? <br class="ltx_break"/>The Impact of Tools on Language Agents for Chemistry Problem Solving</h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Botao Yu<sup class="ltx_sup" id="id1.1.1">†<math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id1.1.1.m1.1"><semantics id="id1.1.1.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id1.1.1.m1.1.1" xref="id1.1.1.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id1.1.1.m1.1b"><ci id="id1.1.1.m1.1.1.cmml" xref="id1.1.1.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id1.1.1.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id1.1.1.m1.1d">caligraphic_C</annotation></semantics></math></sup> Frazier N. Baker<sup class="ltx_sup" id="id3.3.3">*<math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id2.2.2.m1.1"><semantics id="id2.2.2.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id2.2.2.m1.1.1" xref="id2.2.2.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id2.2.2.m1.1b"><ci id="id2.2.2.m1.1.1.cmml" xref="id2.2.2.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id2.2.2.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id2.2.2.m1.1d">caligraphic_C</annotation></semantics></math><math alttext="\mathcal{B}" class="ltx_Math" display="inline" id="id3.3.3.m2.1"><semantics id="id3.3.3.m2.1a"><mi class="ltx_font_mathcaligraphic" id="id3.3.3.m2.1.1" xref="id3.3.3.m2.1.1.cmml">ℬ</mi><annotation-xml encoding="MathML-Content" id="id3.3.3.m2.1b"><ci id="id3.3.3.m2.1.1.cmml" xref="id3.3.3.m2.1.1">ℬ</ci></annotation-xml><annotation encoding="application/x-tex" id="id3.3.3.m2.1c">\mathcal{B}</annotation><annotation encoding="application/x-llamapun" id="id3.3.3.m2.1d">caligraphic_B</annotation></semantics></math></sup> Ziru Chen<sup class="ltx_sup" id="id4.4.4">*<math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id4.4.4.m1.1"><semantics id="id4.4.4.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id4.4.4.m1.1.1" xref="id4.4.4.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id4.4.4.m1.1b"><ci id="id4.4.4.m1.1.1.cmml" xref="id4.4.4.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id4.4.4.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id4.4.4.m1.1d">caligraphic_C</annotation></semantics></math></sup> Garrett Herb<sup class="ltx_sup" id="id5.5.5"><math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id5.5.5.m1.1"><semantics id="id5.5.5.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id5.5.5.m1.1.1" xref="id5.5.5.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id5.5.5.m1.1b"><ci id="id5.5.5.m1.1.1.cmml" xref="id5.5.5.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id5.5.5.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id5.5.5.m1.1d">caligraphic_C</annotation></semantics></math></sup> Boyu Gou<sup class="ltx_sup" id="id6.6.6"><math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id6.6.6.m1.1"><semantics id="id6.6.6.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id6.6.6.m1.1.1" xref="id6.6.6.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id6.6.6.m1.1b"><ci id="id6.6.6.m1.1.1.cmml" xref="id6.6.6.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id6.6.6.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id6.6.6.m1.1d">caligraphic_C</annotation></semantics></math></sup> <br class="ltx_break"/><span class="ltx_text ltx_font_bold" id="id9.9.9">Daniel Adu-Ampratwum<sup class="ltx_sup" id="id7.7.7.1"><math alttext="\mathcal{P}" class="ltx_Math" display="inline" id="id7.7.7.1.m1.1"><semantics id="id7.7.7.1.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id7.7.7.1.m1.1.1" xref="id7.7.7.1.m1.1.1.cmml">𝒫</mi><annotation-xml encoding="MathML-Content" id="id7.7.7.1.m1.1b"><ci id="id7.7.7.1.m1.1.1.cmml" xref="id7.7.7.1.m1.1.1">𝒫</ci></annotation-xml><annotation encoding="application/x-tex" id="id7.7.7.1.m1.1c">\mathcal{P}</annotation><annotation encoding="application/x-llamapun" id="id7.7.7.1.m1.1d">caligraphic_P</annotation></semantics></math></sup> Xia Ning<sup class="ltx_sup" id="id8.8.8.2"><math alttext="\mathcal{BCP}" class="ltx_Math" display="inline" id="id8.8.8.2.m1.1"><semantics id="id8.8.8.2.m1.1a"><mrow id="id8.8.8.2.m1.1.1" xref="id8.8.8.2.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="id8.8.8.2.m1.1.1.2" xref="id8.8.8.2.m1.1.1.2.cmml">ℬ</mi><mo id="id8.8.8.2.m1.1.1.1" xref="id8.8.8.2.m1.1.1.1.cmml"></mo><mi class="ltx_font_mathcaligraphic" id="id8.8.8.2.m1.1.1.3" xref="id8.8.8.2.m1.1.1.3.cmml">𝒞</mi><mo id="id8.8.8.2.m1.1.1.1a" xref="id8.8.8.2.m1.1.1.1.cmml"></mo><mi class="ltx_font_mathcaligraphic" id="id8.8.8.2.m1.1.1.4" xref="id8.8.8.2.m1.1.1.4.cmml">𝒫</mi></mrow><annotation-xml encoding="MathML-Content" id="id8.8.8.2.m1.1b"><apply id="id8.8.8.2.m1.1.1.cmml" xref="id8.8.8.2.m1.1.1"><times id="id8.8.8.2.m1.1.1.1.cmml" xref="id8.8.8.2.m1.1.1.1"></times><ci id="id8.8.8.2.m1.1.1.2.cmml" xref="id8.8.8.2.m1.1.1.2">ℬ</ci><ci id="id8.8.8.2.m1.1.1.3.cmml" xref="id8.8.8.2.m1.1.1.3">𝒞</ci><ci id="id8.8.8.2.m1.1.1.4.cmml" xref="id8.8.8.2.m1.1.1.4">𝒫</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="id8.8.8.2.m1.1c">\mathcal{BCP}</annotation><annotation encoding="application/x-llamapun" id="id8.8.8.2.m1.1d">caligraphic_B caligraphic_C caligraphic_P</annotation></semantics></math></sup> Huan Sun<sup class="ltx_sup" id="id9.9.9.3">†<math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id9.9.9.3.m1.1"><semantics id="id9.9.9.3.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id9.9.9.3.m1.1.1" xref="id9.9.9.3.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id9.9.9.3.m1.1b"><ci id="id9.9.9.3.m1.1.1.cmml" xref="id9.9.9.3.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id9.9.9.3.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id9.9.9.3.m1.1d">caligraphic_C</annotation></semantics></math></sup></span> <br class="ltx_break"/><sup class="ltx_sup" id="id10.10.10"><math alttext="\mathcal{C}" class="ltx_Math" display="inline" id="id10.10.10.m1.1"><semantics id="id10.10.10.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id10.10.10.m1.1.1" xref="id10.10.10.m1.1.1.cmml">𝒞</mi><annotation-xml encoding="MathML-Content" id="id10.10.10.m1.1b"><ci id="id10.10.10.m1.1.1.cmml" xref="id10.10.10.m1.1.1">𝒞</ci></annotation-xml><annotation encoding="application/x-tex" id="id10.10.10.m1.1c">\mathcal{C}</annotation><annotation encoding="application/x-llamapun" id="id10.10.10.m1.1d">caligraphic_C</annotation></semantics></math></sup>Department of Computer Science and Engineering, OSU <br class="ltx_break"/><sup class="ltx_sup" id="id11.11.11"><math alttext="\mathcal{B}" class="ltx_Math" display="inline" id="id11.11.11.m1.1"><semantics id="id11.11.11.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id11.11.11.m1.1.1" xref="id11.11.11.m1.1.1.cmml">ℬ</mi><annotation-xml encoding="MathML-Content" id="id11.11.11.m1.1b"><ci id="id11.11.11.m1.1.1.cmml" xref="id11.11.11.m1.1.1">ℬ</ci></annotation-xml><annotation encoding="application/x-tex" id="id11.11.11.m1.1c">\mathcal{B}</annotation><annotation encoding="application/x-llamapun" id="id11.11.11.m1.1d">caligraphic_B</annotation></semantics></math></sup>Department of Biomedical Informatics, OSU <sup class="ltx_sup" id="id12.12.12"><math alttext="\mathcal{P}" class="ltx_Math" display="inline" id="id12.12.12.m1.1"><semantics id="id12.12.12.m1.1a"><mi class="ltx_font_mathcaligraphic" id="id12.12.12.m1.1.1" xref="id12.12.12.m1.1.1.cmml">𝒫</mi><annotation-xml encoding="MathML-Content" id="id12.12.12.m1.1b"><ci id="id12.12.12.m1.1.1.cmml" xref="id12.12.12.m1.1.1">𝒫</ci></annotation-xml><annotation encoding="application/x-tex" id="id12.12.12.m1.1c">\mathcal{P}</annotation><annotation encoding="application/x-llamapun" id="id12.12.12.m1.1d">caligraphic_P</annotation></semantics></math></sup>College of Pharmacy, OSU <br class="ltx_break"/>Website: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://osu-nlp-group.github.io/ChemAgent/" title="">https://osu-nlp-group.github.io/ChemAgent/</a> </span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id13.id1">To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents’ ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.</p> </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2> <div class="ltx_para ltx_noindent" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">Large language models (LLMs) have demonstrated impressive problem-solving capabilities in many disciplines <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">wang2024mmlupro</span>; <span class="ltx_ref ltx_missing_citation ltx_ref_self">yue2024mmmu</span>; <span class="ltx_ref ltx_missing_citation ltx_ref_self">grossmann2023ai</span>)</cite>. When it comes to chemistry, LLMs still face significant challenges, such as incorrect calculation, lack of domain knowledge, or inability to perform certain tasks like reaction prediction <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">guo2023eighttask</span>; <span class="ltx_ref ltx_missing_citation ltx_ref_self">mirza2024chembench</span>)</cite>. To address these limitations, LLM-based agents integrated with tools have been proposed to tackle chemistry-specific problems <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">wang2024agentsurvey</span>; <span class="ltx_ref ltx_missing_citation ltx_ref_self">ramos2024chemistryagentsurvey</span>)</cite>. For example, ChemCrow <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">bran2023chemcrow</span>)</cite> expands LLMs’ capabilities by incorporating 18 tools, ranging from web search to chemical reaction prediction. Similarly, Coscientist <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">boiko2023coscientist</span>)</cite> integrates the control of cloud labs to enable LLMs to automate wet lab experiments.</p> </div> <div class="ltx_para ltx_noindent" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">Despite the promise of these tool-augmented agents, existing evaluations have been largely qualitative and limited in scope. For example, ChemCrow is assessed with only 14 individual tasks mainly focusing on compound synthesis, and Coscientist’s evaluation involves merely six specific tasks. These narrow assessments leave a large gap in our understanding of how tool-augmented agents perform across diverse chemistry tasks in real-world applications.</p> </div> <div class="ltx_para ltx_noindent" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">In this work, we conduct a comprehensive evaluation of LLM-based agents on different chemistry tasks to grasp a deep understanding of their potential and limitations. To explore and enhance the capabilities of agents in diverse and complex chemistry scenarios, we introduce ChemAgent, a new chemistry agent capable of handling a wide spectrum of tasks. It leverages the ReAct framework <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">yao2023react</span>)</cite> and integrates 29 tools, such as a search tool for PubChem <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">kim2019pubchem</span>)</cite>, several molecular property predictors, as well as many practical tools present in ChemCrow. Then, we adapt two categories of real-world chemistry problems for systematic evaluation: specialized tasks and general questions. For specialized tasks, we use SMolInstruct <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">yu2024llasmol</span>)</cite>, which contains 14 types of specialized molecule- and reaction-centric tasks. For general questions, we use MMLU-Chemistry and GPQA-Chemistry, which are chemistry-related subsets of the MMLU <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">hendryckstest2021mmlu</span>)</cite> and GPQA <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">rein2023gpqa</span>)</cite> benchmarks, containing exam-like questions ranging from high school, college, to graduate level.</p> </div> <div class="ltx_para ltx_noindent" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">Through comprehensive experiments, we show that: While ChemAgent substantially outperforms ChemCrow on all chemistry tasks, it does not consistently outperform the base LLMs without tools. In addition, the impact of tool augmentation is highly dependent on task characteristics. For specialized chemistry tasks involving professional molecular representations (e.g., SMILES <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">weininger1988smiles</span>)</cite>) and specialized chemical operations (e.g., compound synthesis), augmenting LLMs with task-specific tools can yield substantial performance gains. Nonetheless, for general chemistry questions that require fundamental knowledge and extensive reasoning, ChemAgent cannot address these challenges adequately and underperforms the base LLMs. Further analysis along with a chemistry expert shows that ChemAgent’s underperformance on general chemistry questions is primarily due to delicate mistakes at intermediate stages of its problem-solving process, such as wrong reasoning steps and information oversight. Overall, our findings indicate that tool augmentation may introduce additional complexity that hinders LLM reasoning and thus does not always help in chemistry problem-solving. Future research may improve LLM-based agents for chemistry by optimizing cognitive load and enhancing reasoning and information verification abilities.</p> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">2 </span>ChemAgent</h2> <figure class="ltx_figure ltx_align_floatright" id="S2.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="210" id="S2.F1.g1" src="x1.png" width="297"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 1: </span>Our ChemAgent framework. Upon receiving a user <span class="ltx_text ltx_font_italic" id="S2.F1.6.1">task</span>, the agent iterates through a three-step ReAct process <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">yao2023react</span>)</cite>: (1) <span class="ltx_text ltx_font_italic" id="S2.F1.7.2">Thought</span> generation, analyzing the current situation and planning subsequent steps; (2) <span class="ltx_text ltx_font_italic" id="S2.F1.8.3">Action</span> determination, selecting the appropriate tool and its input based on the generated thought; and (3) <span class="ltx_text ltx_font_italic" id="S2.F1.9.4">Observation</span> obtaining, executing a tool in the environment and obtaining the results or feedback. This iterative cycle continues until task completion or conclusion, and the final <span class="ltx_text ltx_font_italic" id="S2.F1.10.5">answer</span> is returned to the user.</figcaption> </figure> <div class="ltx_para ltx_noindent" id="S2.p1"> <p class="ltx_p" id="S2.p1.1">We introduce ChemAgent (<a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S2.F1" title="In 2 ChemAgent ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Figure</span> <span class="ltx_text ltx_ref_tag">1</span></a>), a chemistry agent improved over ChemCrow <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">bran2023chemcrow</span>)</cite> and equipped with enhanced tools for a wider range of chemistry tasks. It implements two essential cognitive abilities <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">sumers2024cognitive</span>)</cite> required for chemistry problem-solving: (1) <span class="ltx_text ltx_font_bold" id="S2.p1.1.1">Reasoning</span>: This ability is required in the <span class="ltx_text ltx_font_italic" id="S2.p1.1.2">Thought</span> step for comprehending user queries and tool outputs, assessing current status, and formulating subsequent steps. (2) <span class="ltx_text ltx_font_bold" id="S2.p1.1.3">Grounding</span>: Based on the reasoning result (i.e., the “thought”), this ability determines the appropriate tool to execute and its corresponding input.</p> </div> <div class="ltx_para ltx_noindent" id="S2.p2"> <p class="ltx_p" id="S2.p2.1">To enhance ChemAgent’s capabilities, we develop an extensive set of 29 tools (<span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:app:tool_set</span>), categorized into general, molecule, and reaction tools. General tools provide the agent with common problem-solving abilities, such as the execution of Python code for computations and various operations via <span class="ltx_text ltx_font_typewriter" id="S2.p2.1.1">PythonREPL</span>. Molecule tools specialize in the analysis, prediction, and conversion of molecules and their properties. For example, <span class="ltx_text ltx_font_typewriter" id="S2.p2.1.2">FunctionalGroups</span> can identifies functional groups within a molecule, which is crucial for analyzing molecular characteristics. Lastly, reaction tools are instrumental in predicting chemical reaction outcomes (<span class="ltx_text ltx_font_typewriter" id="S2.p2.1.3">ForwardSynthesis</span>) and suggesting synthesis paths for desired products (<span class="ltx_text ltx_font_typewriter" id="S2.p2.1.4">Retrosynthesis</span>), both of which are essential in applications like drug discovery <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">berdigaliyev2020overview</span>)</cite>.</p> </div> <div class="ltx_para ltx_noindent" id="S2.p3"> <p class="ltx_p" id="S2.p3.1">To develop the tool set, we create 16 new tools and enhance 6 existing ones in ChemCrow, which provides ChemAgent more comprehensive and robust abilities in solving chemistry problems. For example, we create <span class="ltx_text ltx_font_typewriter" id="S2.p3.1.1">PubchemSearchQA</span>, which leverages an LLM to retrieve and extract authorized and comprehensive compound information from PubChem <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">kim2019pubchem</span>)</cite>, and several molecular property predictors (<span class="ltx_text ltx_font_typewriter" id="S2.p3.1.2">BBBPPredictor</span>, <span class="ltx_text ltx_font_typewriter" id="S2.p3.1.3">SideEffectPredictor</span>, etc.), which employ neural networks <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">zhou2023unimol</span>)</cite> for molecular property predictions. We also enhance <span class="ltx_text ltx_font_typewriter" id="S2.p3.1.4">WebSearch</span> with an LLM-enhanced searching service to yield more comprehensive and flexible search results.</p> </div> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">3 </span>Experiments</h2> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.1 </span>Experimental Setup</h3> <figure class="ltx_table" id="S3.T1"> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S3.T1.1" style="width:318.0pt;height:53.9pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(-53.3pt,9.0pt) scale(0.748815615113171,0.748815615113171) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S3.T1.1.1"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S3.T1.1.1.1.1"> <th class="ltx_td ltx_align_justify ltx_align_top ltx_th ltx_th_column ltx_border_tt" id="S3.T1.1.1.1.1.1"> <span class="ltx_inline-block ltx_align_top" id="S3.T1.1.1.1.1.1.1"> <span class="ltx_p" id="S3.T1.1.1.1.1.1.1.1" style="width:76.8pt;"><span class="ltx_text ltx_font_bold" id="S3.T1.1.1.1.1.1.1.1.1">Category</span></span> </span> </th> <th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt" id="S3.T1.1.1.1.1.2"><span class="ltx_text ltx_font_bold" id="S3.T1.1.1.1.1.2.1">Dataset</span></th> <th class="ltx_td ltx_align_right ltx_th ltx_th_column ltx_border_tt" id="S3.T1.1.1.1.1.3"><span class="ltx_text ltx_font_bold" id="S3.T1.1.1.1.1.3.1"># Sample</span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt" id="S3.T1.1.1.1.1.4"><span class="ltx_text ltx_font_bold" id="S3.T1.1.1.1.1.4.1">Specific task type</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T1.1.1.2.1"> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_t" id="S3.T1.1.1.2.1.1"> <span class="ltx_inline-block ltx_align_top" id="S3.T1.1.1.2.1.1.1"> <span class="ltx_p" id="S3.T1.1.1.2.1.1.1.1" style="width:76.8pt;"><span class="ltx_text" id="S3.T1.1.1.2.1.1.1.1.1">Specialized tasks</span></span> </span> </td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T1.1.1.2.1.2">SMolInstruct</td> <td class="ltx_td ltx_align_right ltx_border_t" id="S3.T1.1.1.2.1.3">700</td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T1.1.1.2.1.4">Molecule- and reaction-centric tasks</td> </tr> <tr class="ltx_tr" id="S3.T1.1.1.3.2"> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_bb ltx_border_t" id="S3.T1.1.1.3.2.1" rowspan="2"> <span class="ltx_inline-block ltx_align_top" id="S3.T1.1.1.3.2.1.1"> <span class="ltx_p" id="S3.T1.1.1.3.2.1.1.1" style="width:76.8pt;"><span class="ltx_text" id="S3.T1.1.1.3.2.1.1.1.1">General questions</span></span> </span> </td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T1.1.1.3.2.2">MMLU-Chemistry</td> <td class="ltx_td ltx_align_right ltx_border_t" id="S3.T1.1.1.3.2.3">70</td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T1.1.1.3.2.4">High school- and college-level questions</td> </tr> <tr class="ltx_tr" id="S3.T1.1.1.4.3"> <td class="ltx_td ltx_align_left ltx_border_bb ltx_border_t" id="S3.T1.1.1.4.3.1">GPQA-Chemistry</td> <td class="ltx_td ltx_align_right ltx_border_bb ltx_border_t" id="S3.T1.1.1.4.3.2">93</td> <td class="ltx_td ltx_align_left ltx_border_bb ltx_border_t" id="S3.T1.1.1.4.3.3">Graduate-level questions</td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 1: </span>Datasets used in our experiments.</figcaption> </figure> <div class="ltx_para ltx_noindent" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS1.p1.1.1">Datasets.</span> We use three well-established datasets (listed in <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.T1" title="In 3.1 Experimental Setup ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Table</span> <span class="ltx_text ltx_ref_tag">1</span></a>) to thoroughly assess tool-augmented agents on two categories of chemistry problems: (1) <span class="ltx_text ltx_font_bold" id="S3.SS1.p1.1.2">Specialized chemistry tasks</span> focus on experiment-like problems involving molecular manipulations, predictions, and representations. This category includes <span class="ltx_text ltx_font_bold" id="S3.SS1.p1.1.3">SMolInstruct</span> <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">yu2024llasmol</span>)</cite>, which contains 14 molecule- and reaction-centric tasks and requires models to understand molecular representations like SMILES <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">weininger1988smiles</span>)</cite> and perform specific chemical operations, such as predicting synthesis paths and converting chemical names (<span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:fig:smolinstruct_tasks</span>). (2) <span class="ltx_text ltx_font_bold" id="S3.SS1.p1.1.4">General chemistry questions</span> resemble questions appearing in exams at different levels and test a wide range of fundamental knowledge and general reasoning in chemistry. This category includes <span class="ltx_text ltx_font_bold" id="S3.SS1.p1.1.5">MMLU-Chemistry</span>, a manually verified chemistry subset of the MMLU benchmark <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">hendryckstest2021mmlu</span>)</cite> that consists of questions at the high school and college level (<span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:app:mmluc</span>), and <span class="ltx_text ltx_font_bold" id="S3.SS1.p1.1.6">GPQA-Chemistry</span>, the chemistry section of the GPQA-Diamond benchmark <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">rein2023gpqa</span>)</cite> that consists of difficult graduate-level questions.</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS1.p2"> <p class="ltx_p" id="S3.SS1.p2.1"><span class="ltx_text ltx_font_bold" id="S3.SS1.p2.1.1">LLMs and Agents.</span> We compare our ChemAgent with two baselines: (1) State-of-the-art (SoTA) base LLMs, including GPT-4o <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">gpt4o</span>)</cite> and Claude-3.5-Sonnet <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">claude3.5sonnet</span>)</cite>, which have shown superior capabilities in chemistry problem-solving among existing LLMs <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">wang2024mmlupro</span>)</cite>. (2) ChemCrow <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">bran2023chemcrow</span>)</cite>, a pioneering chemistry-focused agent equipped with 18 expert-designed tools. For ChemCrow and ChemAgent, we utilize GPT-4o or Claude-3.5-Sonnet as the backbone language models, and refer to them as GPT and Claude, respectively.</p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.2 </span>Overall Performance</h3> <figure class="ltx_table" id="S3.T3"> <div class="ltx_flex_figure ltx_flex_table"> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_center ltx_align_middle" id="S3.T3.24" style="width:178.9pt;"> <div class="ltx_inline-block ltx_transformed_outer" id="S3.T3.24.24" style="width:397.5pt;height:195.8pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(52.6pt,-25.9pt) scale(1.35956716104917,1.35956716104917) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S3.T3.24.24.24"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S3.T3.24.24.24.25.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_tt" id="S3.T3.24.24.24.25.1.1" rowspan="2"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.25.1.1.1">Model</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.24.24.24.25.1.2"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.25.1.2.1">NC-S2I</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.24.24.24.25.1.3"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.25.1.3.1">PP-SIDER</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.24.24.24.25.1.4"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.25.1.4.1">FS</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.24.24.24.25.1.5"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.25.1.5.1">RS</span></th> </tr> <tr class="ltx_tr" id="S3.T3.24.24.24.26.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.24.24.24.26.2.1"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.26.2.1.1">EM</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.24.24.24.26.2.2"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.26.2.2.1">Acc</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.24.24.24.26.2.3"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.26.2.3.1">EM</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.24.24.24.26.2.4"><span class="ltx_text ltx_font_bold" id="S3.T3.24.24.24.26.2.4.1">EM</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T3.4.4.4.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S3.T3.4.4.4.4.5">GPT-4o</th> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.1.1.1.1.1"><math alttext="0.0" class="ltx_Math" display="inline" id="S3.T3.1.1.1.1.1.m1.1"><semantics id="S3.T3.1.1.1.1.1.m1.1a"><mn id="S3.T3.1.1.1.1.1.m1.1.1" xref="S3.T3.1.1.1.1.1.m1.1.1.cmml">0.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.1.1.1.1.1.m1.1b"><cn id="S3.T3.1.1.1.1.1.m1.1.1.cmml" type="float" xref="S3.T3.1.1.1.1.1.m1.1.1">0.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.1.1.1.1.1.m1.1c">0.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.1.1.1.1.1.m1.1d">0.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.2.2.2.2.2"><math alttext="44.0" class="ltx_Math" display="inline" id="S3.T3.2.2.2.2.2.m1.1"><semantics id="S3.T3.2.2.2.2.2.m1.1a"><mn id="S3.T3.2.2.2.2.2.m1.1.1" xref="S3.T3.2.2.2.2.2.m1.1.1.cmml">44.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.2.2.2.2.2.m1.1b"><cn id="S3.T3.2.2.2.2.2.m1.1.1.cmml" type="float" xref="S3.T3.2.2.2.2.2.m1.1.1">44.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.2.2.2.2.2.m1.1c">44.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.2.2.2.2.2.m1.1d">44.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.3.3.3.3.3"><math alttext="12.0" class="ltx_Math" display="inline" id="S3.T3.3.3.3.3.3.m1.1"><semantics id="S3.T3.3.3.3.3.3.m1.1a"><mn id="S3.T3.3.3.3.3.3.m1.1.1" xref="S3.T3.3.3.3.3.3.m1.1.1.cmml">12.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.3.3.3.3.3.m1.1b"><cn id="S3.T3.3.3.3.3.3.m1.1.1.cmml" type="float" xref="S3.T3.3.3.3.3.3.m1.1.1">12.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.3.3.3.3.3.m1.1c">12.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.3.3.3.3.3.m1.1d">12.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.4.4.4.4.4"><math alttext="0.0" class="ltx_Math" display="inline" id="S3.T3.4.4.4.4.4.m1.1"><semantics id="S3.T3.4.4.4.4.4.m1.1a"><mn id="S3.T3.4.4.4.4.4.m1.1.1" xref="S3.T3.4.4.4.4.4.m1.1.1.cmml">0.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.4.4.4.4.4.m1.1b"><cn id="S3.T3.4.4.4.4.4.m1.1.1.cmml" type="float" xref="S3.T3.4.4.4.4.4.m1.1.1">0.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.4.4.4.4.4.m1.1c">0.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.4.4.4.4.4.m1.1d">0.0</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.8.8.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S3.T3.8.8.8.8.5">Claude-3.5-Sonnet</th> <td class="ltx_td ltx_align_left" id="S3.T3.5.5.5.5.1"><math alttext="2.0" class="ltx_Math" display="inline" id="S3.T3.5.5.5.5.1.m1.1"><semantics id="S3.T3.5.5.5.5.1.m1.1a"><mn id="S3.T3.5.5.5.5.1.m1.1.1" xref="S3.T3.5.5.5.5.1.m1.1.1.cmml">2.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.5.5.5.5.1.m1.1b"><cn id="S3.T3.5.5.5.5.1.m1.1.1.cmml" type="float" xref="S3.T3.5.5.5.5.1.m1.1.1">2.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.5.5.5.5.1.m1.1c">2.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.5.5.5.5.1.m1.1d">2.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.6.6.6.6.2"><math alttext="62.0" class="ltx_Math" display="inline" id="S3.T3.6.6.6.6.2.m1.1"><semantics id="S3.T3.6.6.6.6.2.m1.1a"><mn id="S3.T3.6.6.6.6.2.m1.1.1" xref="S3.T3.6.6.6.6.2.m1.1.1.cmml">62.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.6.6.6.6.2.m1.1b"><cn id="S3.T3.6.6.6.6.2.m1.1.1.cmml" type="float" xref="S3.T3.6.6.6.6.2.m1.1.1">62.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.6.6.6.6.2.m1.1c">62.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.6.6.6.6.2.m1.1d">62.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.7.7.7.7.3"><math alttext="22.0" class="ltx_Math" display="inline" id="S3.T3.7.7.7.7.3.m1.1"><semantics id="S3.T3.7.7.7.7.3.m1.1a"><mn id="S3.T3.7.7.7.7.3.m1.1.1" xref="S3.T3.7.7.7.7.3.m1.1.1.cmml">22.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.7.7.7.7.3.m1.1b"><cn id="S3.T3.7.7.7.7.3.m1.1.1.cmml" type="float" xref="S3.T3.7.7.7.7.3.m1.1.1">22.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.7.7.7.7.3.m1.1c">22.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.7.7.7.7.3.m1.1d">22.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.8.8.8.8.4"><math alttext="0.0" class="ltx_Math" display="inline" id="S3.T3.8.8.8.8.4.m1.1"><semantics id="S3.T3.8.8.8.8.4.m1.1a"><mn id="S3.T3.8.8.8.8.4.m1.1.1" xref="S3.T3.8.8.8.8.4.m1.1.1.cmml">0.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.8.8.8.8.4.m1.1b"><cn id="S3.T3.8.8.8.8.4.m1.1.1.cmml" type="float" xref="S3.T3.8.8.8.8.4.m1.1.1">0.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.8.8.8.8.4.m1.1c">0.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.8.8.8.8.4.m1.1d">0.0</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.12.12.12.12"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S3.T3.12.12.12.12.5">ChemCrow (GPT)</th> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.9.9.9.9.1"><math alttext="2.0" class="ltx_Math" display="inline" id="S3.T3.9.9.9.9.1.m1.1"><semantics id="S3.T3.9.9.9.9.1.m1.1a"><mn id="S3.T3.9.9.9.9.1.m1.1.1" xref="S3.T3.9.9.9.9.1.m1.1.1.cmml">2.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.9.9.9.9.1.m1.1b"><cn id="S3.T3.9.9.9.9.1.m1.1.1.cmml" type="float" xref="S3.T3.9.9.9.9.1.m1.1.1">2.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.9.9.9.9.1.m1.1c">2.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.9.9.9.9.1.m1.1d">2.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.10.10.10.10.2"><math alttext="36.0" class="ltx_Math" display="inline" id="S3.T3.10.10.10.10.2.m1.1"><semantics id="S3.T3.10.10.10.10.2.m1.1a"><mn id="S3.T3.10.10.10.10.2.m1.1.1" xref="S3.T3.10.10.10.10.2.m1.1.1.cmml">36.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.10.10.10.10.2.m1.1b"><cn id="S3.T3.10.10.10.10.2.m1.1.1.cmml" type="float" xref="S3.T3.10.10.10.10.2.m1.1.1">36.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.10.10.10.10.2.m1.1c">36.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.10.10.10.10.2.m1.1d">36.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.11.11.11.11.3"><math alttext="72.0" class="ltx_Math" display="inline" id="S3.T3.11.11.11.11.3.m1.1"><semantics id="S3.T3.11.11.11.11.3.m1.1a"><mn id="S3.T3.11.11.11.11.3.m1.1.1" xref="S3.T3.11.11.11.11.3.m1.1.1.cmml">72.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.11.11.11.11.3.m1.1b"><cn id="S3.T3.11.11.11.11.3.m1.1.1.cmml" type="float" xref="S3.T3.11.11.11.11.3.m1.1.1">72.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.11.11.11.11.3.m1.1c">72.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.11.11.11.11.3.m1.1d">72.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.12.12.12.12.4"><math alttext="8.0" class="ltx_Math" display="inline" id="S3.T3.12.12.12.12.4.m1.1"><semantics id="S3.T3.12.12.12.12.4.m1.1a"><mn id="S3.T3.12.12.12.12.4.m1.1.1" xref="S3.T3.12.12.12.12.4.m1.1.1.cmml">8.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.12.12.12.12.4.m1.1b"><cn id="S3.T3.12.12.12.12.4.m1.1.1.cmml" type="float" xref="S3.T3.12.12.12.12.4.m1.1.1">8.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.12.12.12.12.4.m1.1c">8.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.12.12.12.12.4.m1.1d">8.0</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.16.16.16.16"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S3.T3.16.16.16.16.5">ChemCrow (Claude)</th> <td class="ltx_td ltx_align_left" id="S3.T3.13.13.13.13.1"><math alttext="2.0" class="ltx_Math" display="inline" id="S3.T3.13.13.13.13.1.m1.1"><semantics id="S3.T3.13.13.13.13.1.m1.1a"><mn id="S3.T3.13.13.13.13.1.m1.1.1" xref="S3.T3.13.13.13.13.1.m1.1.1.cmml">2.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.13.13.13.13.1.m1.1b"><cn id="S3.T3.13.13.13.13.1.m1.1.1.cmml" type="float" xref="S3.T3.13.13.13.13.1.m1.1.1">2.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.13.13.13.13.1.m1.1c">2.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.13.13.13.13.1.m1.1d">2.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.14.14.14.14.2"><math alttext="32.0" class="ltx_Math" display="inline" id="S3.T3.14.14.14.14.2.m1.1"><semantics id="S3.T3.14.14.14.14.2.m1.1a"><mn id="S3.T3.14.14.14.14.2.m1.1.1" xref="S3.T3.14.14.14.14.2.m1.1.1.cmml">32.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.14.14.14.14.2.m1.1b"><cn id="S3.T3.14.14.14.14.2.m1.1.1.cmml" type="float" xref="S3.T3.14.14.14.14.2.m1.1.1">32.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.14.14.14.14.2.m1.1c">32.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.14.14.14.14.2.m1.1d">32.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.15.15.15.15.3"><math alttext="70.0" class="ltx_Math" display="inline" id="S3.T3.15.15.15.15.3.m1.1"><semantics id="S3.T3.15.15.15.15.3.m1.1a"><mn id="S3.T3.15.15.15.15.3.m1.1.1" xref="S3.T3.15.15.15.15.3.m1.1.1.cmml">70.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.15.15.15.15.3.m1.1b"><cn id="S3.T3.15.15.15.15.3.m1.1.1.cmml" type="float" xref="S3.T3.15.15.15.15.3.m1.1.1">70.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.15.15.15.15.3.m1.1c">70.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.15.15.15.15.3.m1.1d">70.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.16.16.16.16.4"><math alttext="22.0" class="ltx_Math" display="inline" id="S3.T3.16.16.16.16.4.m1.1"><semantics id="S3.T3.16.16.16.16.4.m1.1a"><mn id="S3.T3.16.16.16.16.4.m1.1.1" xref="S3.T3.16.16.16.16.4.m1.1.1.cmml">22.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.16.16.16.16.4.m1.1b"><cn id="S3.T3.16.16.16.16.4.m1.1.1.cmml" type="float" xref="S3.T3.16.16.16.16.4.m1.1.1">22.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.16.16.16.16.4.m1.1c">22.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.16.16.16.16.4.m1.1d">22.0</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.20.20.20.20"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S3.T3.20.20.20.20.5">ChemAgent (GPT)</th> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.17.17.17.17.1"><math alttext="70.0" class="ltx_Math" display="inline" id="S3.T3.17.17.17.17.1.m1.1"><semantics id="S3.T3.17.17.17.17.1.m1.1a"><mn id="S3.T3.17.17.17.17.1.m1.1.1" xref="S3.T3.17.17.17.17.1.m1.1.1.cmml">70.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.17.17.17.17.1.m1.1b"><cn id="S3.T3.17.17.17.17.1.m1.1.1.cmml" type="float" xref="S3.T3.17.17.17.17.1.m1.1.1">70.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.17.17.17.17.1.m1.1c">70.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.17.17.17.17.1.m1.1d">70.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.18.18.18.18.2"><math alttext="70.0" class="ltx_Math" display="inline" id="S3.T3.18.18.18.18.2.m1.1"><semantics id="S3.T3.18.18.18.18.2.m1.1a"><mn id="S3.T3.18.18.18.18.2.m1.1.1" xref="S3.T3.18.18.18.18.2.m1.1.1.cmml">70.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.18.18.18.18.2.m1.1b"><cn id="S3.T3.18.18.18.18.2.m1.1.1.cmml" type="float" xref="S3.T3.18.18.18.18.2.m1.1.1">70.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.18.18.18.18.2.m1.1c">70.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.18.18.18.18.2.m1.1d">70.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.19.19.19.19.3"><math alttext="78.0" class="ltx_Math" display="inline" id="S3.T3.19.19.19.19.3.m1.1"><semantics id="S3.T3.19.19.19.19.3.m1.1a"><mn id="S3.T3.19.19.19.19.3.m1.1.1" xref="S3.T3.19.19.19.19.3.m1.1.1.cmml">78.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.19.19.19.19.3.m1.1b"><cn id="S3.T3.19.19.19.19.3.m1.1.1.cmml" type="float" xref="S3.T3.19.19.19.19.3.m1.1.1">78.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.19.19.19.19.3.m1.1c">78.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.19.19.19.19.3.m1.1d">78.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.20.20.20.20.4"><math alttext="42.0" class="ltx_Math" display="inline" id="S3.T3.20.20.20.20.4.m1.1"><semantics id="S3.T3.20.20.20.20.4.m1.1a"><mn id="S3.T3.20.20.20.20.4.m1.1.1" xref="S3.T3.20.20.20.20.4.m1.1.1.cmml">42.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.20.20.20.20.4.m1.1b"><cn id="S3.T3.20.20.20.20.4.m1.1.1.cmml" type="float" xref="S3.T3.20.20.20.20.4.m1.1.1">42.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.20.20.20.20.4.m1.1c">42.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.20.20.20.20.4.m1.1d">42.0</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.24.24.24.24"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb" id="S3.T3.24.24.24.24.5">ChemAgent (Claude)</th> <td class="ltx_td ltx_align_left ltx_border_bb" id="S3.T3.21.21.21.21.1"><math alttext="70.0" class="ltx_Math" display="inline" id="S3.T3.21.21.21.21.1.m1.1"><semantics id="S3.T3.21.21.21.21.1.m1.1a"><mn id="S3.T3.21.21.21.21.1.m1.1.1" xref="S3.T3.21.21.21.21.1.m1.1.1.cmml">70.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.21.21.21.21.1.m1.1b"><cn id="S3.T3.21.21.21.21.1.m1.1.1.cmml" type="float" xref="S3.T3.21.21.21.21.1.m1.1.1">70.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.21.21.21.21.1.m1.1c">70.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.21.21.21.21.1.m1.1d">70.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_bb" id="S3.T3.22.22.22.22.2"><math alttext="68.0" class="ltx_Math" display="inline" id="S3.T3.22.22.22.22.2.m1.1"><semantics id="S3.T3.22.22.22.22.2.m1.1a"><mn id="S3.T3.22.22.22.22.2.m1.1.1" xref="S3.T3.22.22.22.22.2.m1.1.1.cmml">68.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.22.22.22.22.2.m1.1b"><cn id="S3.T3.22.22.22.22.2.m1.1.1.cmml" type="float" xref="S3.T3.22.22.22.22.2.m1.1.1">68.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.22.22.22.22.2.m1.1c">68.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.22.22.22.22.2.m1.1d">68.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_bb" id="S3.T3.23.23.23.23.3"><math alttext="80.0" class="ltx_Math" display="inline" id="S3.T3.23.23.23.23.3.m1.1"><semantics id="S3.T3.23.23.23.23.3.m1.1a"><mn id="S3.T3.23.23.23.23.3.m1.1.1" xref="S3.T3.23.23.23.23.3.m1.1.1.cmml">80.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.23.23.23.23.3.m1.1b"><cn id="S3.T3.23.23.23.23.3.m1.1.1.cmml" type="float" xref="S3.T3.23.23.23.23.3.m1.1.1">80.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.23.23.23.23.3.m1.1c">80.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.23.23.23.23.3.m1.1d">80.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_bb" id="S3.T3.24.24.24.24.4"><math alttext="42.0" class="ltx_Math" display="inline" id="S3.T3.24.24.24.24.4.m1.1"><semantics id="S3.T3.24.24.24.24.4.m1.1a"><mn id="S3.T3.24.24.24.24.4.m1.1.1" xref="S3.T3.24.24.24.24.4.m1.1.1.cmml">42.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.24.24.24.24.4.m1.1b"><cn id="S3.T3.24.24.24.24.4.m1.1.1.cmml" type="float" xref="S3.T3.24.24.24.24.4.m1.1.1">42.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.24.24.24.24.4.m1.1c">42.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.24.24.24.24.4.m1.1d">42.0</annotation></semantics></math></td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Table 2: </span>The results (%) on the SMolInstruct dataset. EM (exact match) and Acc (accuracy) are the used metrics.</figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_center ltx_align_middle" id="S3.T3.36" style="width:210.7pt;"> <div class="ltx_inline-block ltx_transformed_outer" id="S3.T3.36.12" style="width:397.5pt;height:171.5pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(52.7pt,-22.7pt) scale(1.36092402237176,1.36092402237176) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S3.T3.36.12.12"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S3.T3.36.12.12.13.1"> <th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt" id="S3.T3.36.12.12.13.1.1"><span class="ltx_text ltx_font_bold" id="S3.T3.36.12.12.13.1.1.1">Model</span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt" id="S3.T3.36.12.12.13.1.2"><span class="ltx_text ltx_font_bold" id="S3.T3.36.12.12.13.1.2.1">MMLU-Chemistry</span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_border_tt" id="S3.T3.36.12.12.13.1.3"><span class="ltx_text ltx_font_bold" id="S3.T3.36.12.12.13.1.3.1">GPQA-Chemistry</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T3.26.2.2.2"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S3.T3.26.2.2.2.3">GPT-4o</th> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.25.1.1.1.1"><math alttext="80.5" class="ltx_Math" display="inline" id="S3.T3.25.1.1.1.1.m1.1"><semantics id="S3.T3.25.1.1.1.1.m1.1a"><mn id="S3.T3.25.1.1.1.1.m1.1.1" xref="S3.T3.25.1.1.1.1.m1.1.1.cmml">80.5</mn><annotation-xml encoding="MathML-Content" id="S3.T3.25.1.1.1.1.m1.1b"><cn id="S3.T3.25.1.1.1.1.m1.1.1.cmml" type="float" xref="S3.T3.25.1.1.1.1.m1.1.1">80.5</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.25.1.1.1.1.m1.1c">80.5</annotation><annotation encoding="application/x-llamapun" id="S3.T3.25.1.1.1.1.m1.1d">80.5</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.26.2.2.2.2"><math alttext="40.5" class="ltx_Math" display="inline" id="S3.T3.26.2.2.2.2.m1.1"><semantics id="S3.T3.26.2.2.2.2.m1.1a"><mn id="S3.T3.26.2.2.2.2.m1.1.1" xref="S3.T3.26.2.2.2.2.m1.1.1.cmml">40.5</mn><annotation-xml encoding="MathML-Content" id="S3.T3.26.2.2.2.2.m1.1b"><cn id="S3.T3.26.2.2.2.2.m1.1.1.cmml" type="float" xref="S3.T3.26.2.2.2.2.m1.1.1">40.5</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.26.2.2.2.2.m1.1c">40.5</annotation><annotation encoding="application/x-llamapun" id="S3.T3.26.2.2.2.2.m1.1d">40.5</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.28.4.4.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S3.T3.28.4.4.4.3">Claude-3.5-Sonnet</th> <td class="ltx_td ltx_align_left" id="S3.T3.27.3.3.3.1"><math alttext="76.7" class="ltx_Math" display="inline" id="S3.T3.27.3.3.3.1.m1.1"><semantics id="S3.T3.27.3.3.3.1.m1.1a"><mn id="S3.T3.27.3.3.3.1.m1.1.1" xref="S3.T3.27.3.3.3.1.m1.1.1.cmml">76.7</mn><annotation-xml encoding="MathML-Content" id="S3.T3.27.3.3.3.1.m1.1b"><cn id="S3.T3.27.3.3.3.1.m1.1.1.cmml" type="float" xref="S3.T3.27.3.3.3.1.m1.1.1">76.7</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.27.3.3.3.1.m1.1c">76.7</annotation><annotation encoding="application/x-llamapun" id="S3.T3.27.3.3.3.1.m1.1d">76.7</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.28.4.4.4.2"><math alttext="52.3" class="ltx_Math" display="inline" id="S3.T3.28.4.4.4.2.m1.1"><semantics id="S3.T3.28.4.4.4.2.m1.1a"><mn id="S3.T3.28.4.4.4.2.m1.1.1" xref="S3.T3.28.4.4.4.2.m1.1.1.cmml">52.3</mn><annotation-xml encoding="MathML-Content" id="S3.T3.28.4.4.4.2.m1.1b"><cn id="S3.T3.28.4.4.4.2.m1.1.1.cmml" type="float" xref="S3.T3.28.4.4.4.2.m1.1.1">52.3</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.28.4.4.4.2.m1.1c">52.3</annotation><annotation encoding="application/x-llamapun" id="S3.T3.28.4.4.4.2.m1.1d">52.3</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.30.6.6.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S3.T3.30.6.6.6.3">ChemCrow (GPT)</th> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.29.5.5.5.1"><math alttext="43.3" class="ltx_Math" display="inline" id="S3.T3.29.5.5.5.1.m1.1"><semantics id="S3.T3.29.5.5.5.1.m1.1a"><mn id="S3.T3.29.5.5.5.1.m1.1.1" xref="S3.T3.29.5.5.5.1.m1.1.1.cmml">43.3</mn><annotation-xml encoding="MathML-Content" id="S3.T3.29.5.5.5.1.m1.1b"><cn id="S3.T3.29.5.5.5.1.m1.1.1.cmml" type="float" xref="S3.T3.29.5.5.5.1.m1.1.1">43.3</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.29.5.5.5.1.m1.1c">43.3</annotation><annotation encoding="application/x-llamapun" id="S3.T3.29.5.5.5.1.m1.1d">43.3</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.30.6.6.6.2"><math alttext="27.5" class="ltx_Math" display="inline" id="S3.T3.30.6.6.6.2.m1.1"><semantics id="S3.T3.30.6.6.6.2.m1.1a"><mn id="S3.T3.30.6.6.6.2.m1.1.1" xref="S3.T3.30.6.6.6.2.m1.1.1.cmml">27.5</mn><annotation-xml encoding="MathML-Content" id="S3.T3.30.6.6.6.2.m1.1b"><cn id="S3.T3.30.6.6.6.2.m1.1.1.cmml" type="float" xref="S3.T3.30.6.6.6.2.m1.1.1">27.5</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.30.6.6.6.2.m1.1c">27.5</annotation><annotation encoding="application/x-llamapun" id="S3.T3.30.6.6.6.2.m1.1d">27.5</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.32.8.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S3.T3.32.8.8.8.3">ChemCrow (Claude)</th> <td class="ltx_td ltx_align_left" id="S3.T3.31.7.7.7.1"><math alttext="68.6" class="ltx_Math" display="inline" id="S3.T3.31.7.7.7.1.m1.1"><semantics id="S3.T3.31.7.7.7.1.m1.1a"><mn id="S3.T3.31.7.7.7.1.m1.1.1" xref="S3.T3.31.7.7.7.1.m1.1.1.cmml">68.6</mn><annotation-xml encoding="MathML-Content" id="S3.T3.31.7.7.7.1.m1.1b"><cn id="S3.T3.31.7.7.7.1.m1.1.1.cmml" type="float" xref="S3.T3.31.7.7.7.1.m1.1.1">68.6</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.31.7.7.7.1.m1.1c">68.6</annotation><annotation encoding="application/x-llamapun" id="S3.T3.31.7.7.7.1.m1.1d">68.6</annotation></semantics></math></td> <td class="ltx_td ltx_align_left" id="S3.T3.32.8.8.8.2"><math alttext="35.2" class="ltx_Math" display="inline" id="S3.T3.32.8.8.8.2.m1.1"><semantics id="S3.T3.32.8.8.8.2.m1.1a"><mn id="S3.T3.32.8.8.8.2.m1.1.1" xref="S3.T3.32.8.8.8.2.m1.1.1.cmml">35.2</mn><annotation-xml encoding="MathML-Content" id="S3.T3.32.8.8.8.2.m1.1b"><cn id="S3.T3.32.8.8.8.2.m1.1.1.cmml" type="float" xref="S3.T3.32.8.8.8.2.m1.1.1">35.2</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.32.8.8.8.2.m1.1c">35.2</annotation><annotation encoding="application/x-llamapun" id="S3.T3.32.8.8.8.2.m1.1d">35.2</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.34.10.10.10"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S3.T3.34.10.10.10.3">ChemAgent (GPT)</th> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.33.9.9.9.1"><math alttext="71.0" class="ltx_Math" display="inline" id="S3.T3.33.9.9.9.1.m1.1"><semantics id="S3.T3.33.9.9.9.1.m1.1a"><mn id="S3.T3.33.9.9.9.1.m1.1.1" xref="S3.T3.33.9.9.9.1.m1.1.1.cmml">71.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.33.9.9.9.1.m1.1b"><cn id="S3.T3.33.9.9.9.1.m1.1.1.cmml" type="float" xref="S3.T3.33.9.9.9.1.m1.1.1">71.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.33.9.9.9.1.m1.1c">71.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.33.9.9.9.1.m1.1d">71.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_t" id="S3.T3.34.10.10.10.2"><math alttext="33.8" class="ltx_Math" display="inline" id="S3.T3.34.10.10.10.2.m1.1"><semantics id="S3.T3.34.10.10.10.2.m1.1a"><mn id="S3.T3.34.10.10.10.2.m1.1.1" xref="S3.T3.34.10.10.10.2.m1.1.1.cmml">33.8</mn><annotation-xml encoding="MathML-Content" id="S3.T3.34.10.10.10.2.m1.1b"><cn id="S3.T3.34.10.10.10.2.m1.1.1.cmml" type="float" xref="S3.T3.34.10.10.10.2.m1.1.1">33.8</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.34.10.10.10.2.m1.1c">33.8</annotation><annotation encoding="application/x-llamapun" id="S3.T3.34.10.10.10.2.m1.1d">33.8</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S3.T3.36.12.12.12"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb" id="S3.T3.36.12.12.12.3">ChemAgent (Claude)</th> <td class="ltx_td ltx_align_left ltx_border_bb" id="S3.T3.35.11.11.11.1"><math alttext="70.0" class="ltx_Math" display="inline" id="S3.T3.35.11.11.11.1.m1.1"><semantics id="S3.T3.35.11.11.11.1.m1.1a"><mn id="S3.T3.35.11.11.11.1.m1.1.1" xref="S3.T3.35.11.11.11.1.m1.1.1.cmml">70.0</mn><annotation-xml encoding="MathML-Content" id="S3.T3.35.11.11.11.1.m1.1b"><cn id="S3.T3.35.11.11.11.1.m1.1.1.cmml" type="float" xref="S3.T3.35.11.11.11.1.m1.1.1">70.0</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.35.11.11.11.1.m1.1c">70.0</annotation><annotation encoding="application/x-llamapun" id="S3.T3.35.11.11.11.1.m1.1d">70.0</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_border_bb" id="S3.T3.36.12.12.12.2"><math alttext="45.9" class="ltx_Math" display="inline" id="S3.T3.36.12.12.12.2.m1.1"><semantics id="S3.T3.36.12.12.12.2.m1.1a"><mn id="S3.T3.36.12.12.12.2.m1.1.1" xref="S3.T3.36.12.12.12.2.m1.1.1.cmml">45.9</mn><annotation-xml encoding="MathML-Content" id="S3.T3.36.12.12.12.2.m1.1b"><cn id="S3.T3.36.12.12.12.2.m1.1.1.cmml" type="float" xref="S3.T3.36.12.12.12.2.m1.1.1">45.9</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.36.12.12.12.2.m1.1c">45.9</annotation><annotation encoding="application/x-llamapun" id="S3.T3.36.12.12.12.2.m1.1d">45.9</annotation></semantics></math></td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Table 3: </span>The accuracy scores (%) on the MMLU-Chemistry and GPQA-Chemistry datasets, averaged over three runs.</figcaption> </figure> </div> </div> </figure> <div class="ltx_para ltx_noindent" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS2.p1.1.1">Specialized Chemistry Tasks.</span> Models are evaluated on 50 randomly selected samples from the test set of SMolInstruct for each task, and the results on four selected tasks are presented in <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.T3" title="In 3.2 Overall Performance ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Table</span> <span class="ltx_text ltx_ref_tag">3</span></a> (see <span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:app:detail_sm_results</span> for the full results). We can observe that: (1) ChemAgent exhibits substantial improvements over its base LLM counterparts, highlighting the critical role of domain-specific tools in augmenting LLMs’ capabilities on the specialized tasks in SMolInstruct. (2) Compared to ChemCrow, ChemAgent demonstrates superior performance. Our analysis suggests that the disparity is attributed to ChemCrow’s limited tool set and the potential lack of robustness in its tool implementations. For instance, ChemCrow’s apparent deficiency in molecular property prediction tools and its limited web search capabilities seem to hinder its performance in property prediction tasks. In contrast, ChemAgent’s tool set (<span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:app:tool_set</span>) is more comprehensive and robust for LLMs to leverage effectively.</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS2.p2"> <p class="ltx_p" id="S3.SS2.p2.1"><span class="ltx_text ltx_font_bold" id="S3.SS2.p2.1.1">General Chemistry Questions.</span> As presented in <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.T3" title="In 3.2 Overall Performance ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Table</span> <span class="ltx_text ltx_ref_tag">3</span></a>, contrary to our expectations, the ChemAgent variants underperform their base LLM counterparts. This trend persists across both datasets and is also observed with ChemCrow, suggesting a common issue in tool-augmented agents for chemistry. This observation challenges the intuition that tool augmentation would invariably enhance the performance of LLMs by providing additional information <cite class="ltx_cite ltx_citemacro_citep">(<span class="ltx_ref ltx_missing_citation ltx_ref_self">qu2024tool</span>)</cite>, and shows that both agents could not fall back to base LLMs’ capabilities when tools offer no advantage.</p> </div> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.3 </span>Error Analysis</h3> <figure class="ltx_figure" id="S3.F2"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S3.F2.sf1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="201" id="S3.F2.sf1.g1" src="x2.png" width="229"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">(a) </span>SMolInstruct.</figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S3.F2.sf2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="201" id="S3.F2.sf2.g1" src="x3.png" width="441"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">(b) </span>MMLU-Chemistry.</figcaption> </figure> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>The statistics of errors of ChemAgent (GPT) on SMolInstruct (102 errors) and MMLU-Chemistry (64 errors).</figcaption> </figure> <div class="ltx_para ltx_noindent" id="S3.SS3.p1"> <p class="ltx_p" id="S3.SS3.p1.1">To examine the errors made by ChemAgent, we use SMolInstruct and MMLU-Chemistry as representatives from their respective categories and conduct a manual error analysis. For all the samples where ChemAgent (GPT) fails in our experiments, we engage a chemistry expert to analyze the errors, which are then classified into three types, namely <span class="ltx_text ltx_font_bold" id="S3.SS3.p1.1.1">reasoning error</span>, <span class="ltx_text ltx_font_bold" id="S3.SS3.p1.1.2">grounding error</span>, and <span class="ltx_text ltx_font_bold" id="S3.SS3.p1.1.3">tool error</span>, based on the components (the cognitive abilities and the environment) responsible for the errors. The definitions of the errors identified during our experiment are as follows:</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS3.p2"> <p class="ltx_p" id="S3.SS3.p2.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.p2.1.1">Reasoning errors.</span> Errors made by the “reasoning” ability, where the agent inaccurately assesses the situation or devises an incorrect plan for subsequent steps, such as misinterpreting tool outputs or suggesting incorrect methodologies. Specifically, they include the following errors:</p> <ul class="ltx_itemize" id="S3.I1"> <li class="ltx_item" id="S3.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i1.p1"> <p class="ltx_p" id="S3.I1.i1.p1.1">Wrong knowledge/reasoning: an error where agent makes a mistake in applying chemistry knowledge or makes a conclusion that does not logically follow from the previous information.</p> </div> </li> <li class="ltx_item" id="S3.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i2.p1"> <p class="ltx_p" id="S3.I1.i2.p1.1">Wrong final answer: an error where the analysis process is correct but the final answer is wrong.</p> </div> </li> <li class="ltx_item" id="S3.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i3.p1"> <p class="ltx_p" id="S3.I1.i3.p1.1">Information oversight: an error where the agent neglects to consider relevant information given in the question or the previous steps.</p> </div> </li> <li class="ltx_item" id="S3.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i4.p1"> <p class="ltx_p" id="S3.I1.i4.p1.1">Algebra error: an error in algebraic manipulation or simplification, such as the incorrect solving of equations or misapplication of algebraic axioms.</p> </div> </li> <li class="ltx_item" id="S3.I1.i5" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para ltx_noindent" id="S3.I1.i5.p1"> <p class="ltx_p" id="S3.I1.i5.p1.1">Incomplete reasoning: An error where the reasoning process is not fully developed, such as when solving a problem but omitting necessary steps or details.</p> </div> </li> </ul> </div> <div class="ltx_para ltx_noindent" id="S3.SS3.p3"> <p class="ltx_p" id="S3.SS3.p3.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.p3.1.1">Grounding errors.</span> These occur during tool invocation, such as selecting an inappropriate tool, using an incorrect input format, or providing erroneous inputs to a tool. Specifically:</p> <ul class="ltx_itemize" id="S3.I2"> <li class="ltx_item" id="S3.I2.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para ltx_noindent" id="S3.I2.i1.p1"> <p class="ltx_p" id="S3.I2.i1.p1.1">Wrong input format: an error arising from data being provided in a format that the tool cannot process, resulting in failures or incorrect results.</p> </div> </li> </ul> </div> <div class="ltx_para ltx_noindent" id="S3.SS3.p4"> <p class="ltx_p" id="S3.SS3.p4.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.p4.1.1">Tool errors.</span> These errors originate from the environment (i.e., the tools used in this study), where the tools either fail to execute properly or return inaccurate information. Specifically:</p> <ul class="ltx_itemize" id="S3.I3"> <li class="ltx_item" id="S3.I3.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I3.i1.p1"> <p class="ltx_p" id="S3.I3.i1.p1.1">Wrong tool output: an error occurring when a tool produces incorrect or unexpected results, leading to faulty conclusions or actions.</p> </div> </li> <li class="ltx_item" id="S3.I3.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para ltx_noindent" id="S3.I3.i2.p1"> <p class="ltx_p" id="S3.I3.i2.p1.1">Inconsistent tool outputs: an error where multiple tools return inconsistent information, leading to faulty conclusions or actions.</p> </div> </li> </ul> </div> <div class="ltx_para ltx_noindent" id="S3.SS3.p5"> <p class="ltx_p" id="S3.SS3.p5.1">As illustrated in <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.F2" title="In 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Figure</span> <span class="ltx_text ltx_ref_tag">2</span></a>, the error distributions are very different on the two datasets. On SMolInstruct (<a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.F2.sf1" title="In Figure 2 ‣ 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Figure</span> <span class="ltx_text ltx_ref_tag">2(a)</span></a>), tool errors account for over <span class="ltx_text ltx_font_bold" id="S3.SS3.p5.1.1">99.0%</span> of all errors. These errors mainly stem from the neural networks-based tools (e.g., ForwardSynthesis, BBBPPredictor), which inherently possess imperfect accuracy. For these specialized tasks where dedicated tools exist, the agent can easily pinpoint and correctly use the needed tools (<a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A4" title="Appendix D Tool Usage Analysis ‣ C.6 Tool: Inconsistent Tool Outputs ‣ C.5 Tool: Wrong Tool Output ‣ 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Appendix</span> <span class="ltx_text ltx_ref_tag">D</span></a>), resulting in limited or no reasoning and grounding errors. In contrast, on MMLU-Chemistry (<a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#S3.F2.sf2" title="In Figure 2 ‣ 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Figure</span> <span class="ltx_text ltx_ref_tag">2(b)</span></a>), reasoning errors constitute over <span class="ltx_text ltx_font_bold" id="S3.SS3.p5.1.2">90.0%</span>. This is because MMLU questions require broader knowledge and more intricate chemical reasoning and rely less on external tools. Our analysis indicates that all the observed reasoning errors manifest as delicate mistakes at intermediate stages of problem-solving, rather than incorrect overall methods. For instance, an inaccurate chemistry knowledge is applied or a mistaken conclusion is made (the wrong knowledge/reasoning error), or a wrong final option is selected despite of the correct analysis process (the wrong final answer error). We present a specific case showcasing wrong knowledge/reasoning and incomplete reasoning in <span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:ec:reasoning_wrong_knowledge_reasoning_incomplete_reasoning</span>, and the cases of other errors can be found in <span class="ltx_ref ltx_missing_label ltx_ref_self">LABEL:app:errors</span>. Compared to LLMs without tools, the tool-augmented agent appears more prone to such delicate mistakes.</p> </div> <figure class="ltx_table" id="S3.SS3.11"> <div class="ltx_flex_figure ltx_flex_table"> <div class="ltx_flex_cell ltx_flex_size_1"> <table class="ltx_tabular ltx_figure_panel ltx_align_middle" id="S3.SS3.11.11"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.SS3.11.11.12.1"> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_tt" id="S3.SS3.11.11.12.1.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.12.1.1.1"> <span class="ltx_p" id="S3.SS3.11.11.12.1.1.1.1" style="width:375.6pt;">If 1.0 g of rubidium and 1.0 g of bromine are reacted, what will be left in measurable amounts (more than 0.10 mg) in the reaction vessel?</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.13.2"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.13.2.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.13.2.1.1"> <span class="ltx_p" id="S3.SS3.11.11.13.2.1.1.1" style="width:375.6pt;">(A) RbBr only (B) RbBr, Rb, Br2, and Rb2Br (C) RbBr and Rb2Br only</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.14.3"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.14.3.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.14.3.1.1"> <span class="ltx_p" id="S3.SS3.11.11.14.3.1.1.1" style="width:375.6pt;">(D) RbBr, Rb, and Br2 (E) Rb and Br2 only (F) Br2 only</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.15.4"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.15.4.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.15.4.1.1"> <span class="ltx_p" id="S3.SS3.11.11.15.4.1.1.1" style="width:375.6pt;">(G) RbBr and Rb only <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.15.4.1.1.1.1">[predicted]</span> (H) Rb only</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.16.5"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.16.5.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.16.5.1.1"> <span class="ltx_p" id="S3.SS3.11.11.16.5.1.1.1" style="width:375.6pt;">(I) Nothing will be left in measurable amounts (J) RbBr and Br2 only <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.16.5.1.1.1.1">[Correct]</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.17.6"> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_t" id="S3.SS3.11.11.17.6.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.17.6.1.1"> <span class="ltx_p" id="S3.SS3.11.11.17.6.1.1.1" style="width:375.6pt;"> <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.17.6.1.1.1.1" style="color:#FF0000;">Step 1</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.18.7"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.11.11.18.7.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.18.7.1.1"> <span class="ltx_p" id="S3.SS3.11.11.18.7.1.1.1" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.11.11.18.7.1.1.1.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_block" id="S3.SS3.11.11.18.7.1.1.2" style="background-color:#F0F0F0;padding:9pt;"> <span class="ltx_p" id="S3.SS3.11.11.18.7.1.1.2.1">To determine what substances will be left in measurable amounts, I need to conduct a stoichiometric calculation based on the reaction between rubidium (Rb) and bromine (Br2) to form rubidium bromide (RbBr). First, I’ll calculate the moles of rubidium and bromine to verify which one is the limiting reagent. Then I will determine whether any excess reactant is left after the reaction. Given:</span> <span class="ltx_itemize" id="S3.I4"> <span class="ltx_item" id="S3.I4.i1" style="list-style-type:none;"><span class="ltx_tag ltx_tag_item">•</span> <span class="ltx_para" id="S3.I4.i1.p1"> <span class="ltx_p" id="S3.I4.i1.p1.1">Atomic weight of Rubidium (Rb) = 85.47 g/mol</span> </span></span> <span class="ltx_item" id="S3.I4.i2" style="list-style-type:none;"><span class="ltx_tag ltx_tag_item">•</span> <span class="ltx_para ltx_noindent" id="S3.I4.i2.p1"> <span class="ltx_p" id="S3.I4.i2.p1.1">Molecular weight of Bromine (Br2) = 159.808 g/mol (since Br = 79.904 g/mol)</span> </span></span> </span> <span class="ltx_p" id="S3.SS3.11.11.18.7.1.1.2.2">Let’s start by calculating the moles of rubidium and Bromine.</span> </span> <span class="ltx_p" id="S3.SS3.11.11.18.7.1.1.3"><span class="ltx_text" id="S3.SS3.11.11.18.7.1.1.3.1" style="color:#0000FF;">Action</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.19.8"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.19.8.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.19.8.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.19.8.1.1.1" style="width:375.6pt;">(A) Square planar complexes can display optical isomerism only if all four ligands are identical.</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.20.9"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.20.9.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.20.9.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.20.9.1.1.1" style="width:375.6pt;">(B) Tetrahedral complexes never display optical isomerism. <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.20.9.1.1.1.1">[predicted]</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.21.10"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.21.10.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.21.10.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.21.10.1.1.1" style="width:375.6pt;">(C) Linear complexes can display optical isomerism when both ligands are different.</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.3.3.3"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.3.3.3.3"><span class="ltx_inline-logical-block ltx_align_top" id="S3.SS3.3.3.3.3.3" style="background-color:#F0F0F0;"> <span class="ltx_para" id="S3.SS3.2.2.2.2.2.p2"> <span class="ltx_p" id="S3.SS3.2.2.2.2.2.p2.1" style="width:375.6pt;">(D) Octahedral complexes of monodentate ligands can display optical isomerism only when they have at least three different ligands. <span class="ltx_text ltx_font_bold" id="S3.SS3.2.2.2.2.2.p2.1.1">[correct]</span></span> </span> <span class="ltx_para ltx_noindent" id="S3.SS3.1.1.1.1.1.p1"> <span class="ltx_p" id="S3.SS3.1.1.1.1.1.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.1.1.1.1.1.p1.1.1">Problem Solving Process of ChemAgent (GPT)</span></span> </span> <span class="ltx_para" id="S3.SS3.3.3.3.3.3.p3"> <span class="ltx_p" id="S3.SS3.3.3.3.3.3.p3.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.3.3.3.3.3.p3.1.1" style="color:#FF0000;">Step 1</span></span> </span></span></td> </tr> <tr class="ltx_tr" id="S3.SS3.4.4.4"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.4.4.4.1"><span class="ltx_inline-sectional-block" id="S3.SS3.4.4.4.1.1" style="background-color:#F0F0F0;"> <span class="ltx_para" id="S3.SS3.4.4.4.1.1.p1"> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.1" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.4.4.4.1.1.p1.1.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_block" id="S3.SS3.4.4.4.1.1.p1.2" style="background-color:#F0F0F0;padding:9pt;"> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.2.1">To answer this question, we need to understand the coordination geometries and when optical isomerism can occur in complexes. Let’s briefly consider each statement:</span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.2.2">(A) Square planar complexes: Optical isomerism occurs when there is no plane of symmetry in the molecule. Square planar complexes typically don’t show optical isomerism unless they’re part of a larger chiral structure or have different ligands.</span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.2.3">(B) Tetrahedral complexes: <span class="ltx_text" id="S3.SS3.4.4.4.1.1.p1.2.3.1" style="background-color:#FFFF00;">Tetrahedral complexes can exhibit optical isomerism</span> if they have four different ligands, forming a chiral center analogous to asymmetric carbon in organic chemistry.</span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.2.4">(C) Linear complexes: Linear complexes generally don’t exhibit optical isomerism, as they require stereocenters which are not possible with only two ligands.</span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.2.5">(D) Octahedral complexes: Octahedral complexes can show optical isomerism when they have a specific arrangement of different ligands that break symmetry and create chirality.</span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.2.6">The only true statement regarding optical isomerism in complexes that could contain achiral ligands is (B).</span> </span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.3"><span class="ltx_text" id="S3.SS3.4.4.4.1.1.p1.3.1" style="color:#0000FF;">Answer</span></span> <span class="ltx_p" id="S3.SS3.4.4.4.1.1.p1.4" style="background-color:#F0F0F0;padding:9pt;">The answer is (B).</span> </span> <section class="ltx_subsection" id="A3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">C.2 </span>Reasoning: Information Oversight</h3> <span class="ltx_para ltx_noindent" id="A3.SS2.p1"> <span class="ltx_p" id="A3.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="A3.SS2.p1.1.1">Explanation</span></span> </span> <span class="ltx_para ltx_noindent" id="A3.SS2.p2"> <span class="ltx_p" id="A3.SS2.p2.1">This is an example of a <span class="ltx_text ltx_ulem_uline" id="A3.SS2.p2.1.1">reasoning error: information oversight</span>. The question is from MMLU-Chemistry, and the agent answers it by using PythonREPL to calculate the result. However, the model ignores “13C” in the question, which indicates that the frequency should be approximately 1/4 of the frequency of the NMR instrument. This oversight leads to the wrong final answer.</span> </span> <span class="ltx_para ltx_noindent" id="A3.SS2.p3"> <span class="ltx_p" id="A3.SS2.p3.1"><span class="ltx_text ltx_font_bold" id="A3.SS2.p3.1.1">Question</span></span> </span> <span class="ltx_para" id="A3.SS2.p4"> <span class="ltx_p" id="A3.SS2.p4.1">The 13C chemical shifts of benzene and chloroform are 128.4 ppm and 77.2 ppm respectively. What is the difference in the 13C NMR frequencies of the two compounds on a 600 MHz spectrometer?</span> </span> </section></span></td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.22.11"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.22.11.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.22.11.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.22.11.1.1.1" style="width:375.6pt;">(A) 7.73 kHz <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.22.11.1.1.1.1">[correct]</span> (B) 60.3 kHz (C) 122 kHz (D) 500 kHz (E) 15.4 kHz</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.7.7.7"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.7.7.7.3"><span class="ltx_inline-logical-block ltx_align_top" id="S3.SS3.7.7.7.3.3" style="background-color:#F0F0F0;"> <span class="ltx_para" id="S3.SS3.6.6.6.2.2.p2"> <span class="ltx_p" id="S3.SS3.6.6.6.2.2.p2.1" style="width:375.6pt;">(F) 30.7 kHz <span class="ltx_text ltx_font_bold" id="S3.SS3.6.6.6.2.2.p2.1.1">[predicted]</span> (G) 183.2 kHz (H) 367.2 kHz (I) 91.6 kHz (J) 244 kHz</span> </span> <span class="ltx_para ltx_noindent" id="S3.SS3.5.5.5.1.1.p1"> <span class="ltx_p" id="S3.SS3.5.5.5.1.1.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.5.5.5.1.1.p1.1.1">Problem Solving Process of ChemAgent (GPT)</span></span> </span> <span class="ltx_para" id="S3.SS3.7.7.7.3.3.p3"> <span class="ltx_p" id="S3.SS3.7.7.7.3.3.p3.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.7.7.7.3.3.p3.1.1" style="color:#FF0000;">Step 1</span></span> </span></span></td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.23.12"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.23.12.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.23.12.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.1" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.11.11.23.12.1.1.1.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_block" id="S3.SS3.11.11.23.12.1.1.2" style="background-color:#F0F0F0;padding:9pt;"> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.2.1">To find the difference in frequencies, we need to calculate the frequency difference corresponding to the chemical shift difference. This can be calculated using the formula: frequency difference (in Hz) = chemical shift difference (in ppm) × spectrometer frequency (in MHz).</span> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.2.2">The chemical shift difference between benzene and chloroform is (128.4 ppm - 77.2 ppm), and the spectrometer frequency is 600 MHz. Let’s calculate the frequency difference in kHz.</span> </span> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.3"><span class="ltx_text" id="S3.SS3.11.11.23.12.1.1.3.1" style="color:#0000FF;">Action</span></span> <span class="ltx_block" id="S3.SS3.11.11.23.12.1.1.4" style="background-color:#F0F0F0;padding:9pt;"> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.4.1">Tool: PythonREPL</span> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.4.2">Tool input:</span> <span class="ltx_listing ltx_lst_language_Python ltx_lst_numbers_left ltx_lstlisting ltx_listing" id="S3.SS3.11.11.23.12.1.1.4.3" style="background-color:#F2F2EB;"><span class="ltx_listing_data"><a download="" href="data:text/plain;base64,Y2hlbWljYWxfc2hpZnRfZGlmZmVyZW5jZSA9IDEyOC40IC0gNzcuMgpzcGVjdHJvbWV0ZXJfZnJlcXVlbmN5X21oeiA9IDYwMAojIENhbGN1bGF0aW5nIHRoZSBmcmVxdWVuY3kgZGlmZmVyZW5jZSBpbiBrSHoKZnJlcXVlbmN5X2RpZmZlcmVuY2Vfa2h6ID0gY2hlbWljYWxfc2hpZnRfZGlmZmVyZW5jZSAqIHNwZWN0cm9tZXRlcl9mcmVxdWVuY3lfbWh6CmZyZXF1ZW5jeV9kaWZmZXJlbmNlX2toeg==">⬇</a></span> <span class="ltx_listingline" id="lstnumberx130"><span class="ltx_tag ltx_tag_listingline">1</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx130.1">chemical_shift_difference</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx130.2"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx130.3">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx130.4"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx130.5">128.4</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx130.6"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx130.7">-</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx130.8"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx130.9">77.2</span> </span> <span class="ltx_listingline" id="lstnumberx131"><span class="ltx_tag ltx_tag_listingline">2</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx131.1">spectrometer_frequency_mhz</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx131.2"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx131.3">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx131.4"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx131.5">600</span> </span> <span class="ltx_listingline" id="lstnumberx132"><span class="ltx_tag ltx_tag_listingline">3</span><span class="ltx_text ltx_lst_comment ltx_font_typewriter" id="lstnumberx132.1" style="color:#009900;">#<span class="ltx_text ltx_lst_space" id="lstnumberx132.1.1"> </span>Calculating<span class="ltx_text ltx_lst_space" id="lstnumberx132.1.2"> </span>the<span class="ltx_text ltx_lst_space" id="lstnumberx132.1.3"> </span>frequency<span class="ltx_text ltx_lst_space" id="lstnumberx132.1.4"> </span>difference<span class="ltx_text ltx_lst_space" id="lstnumberx132.1.5"> </span>in<span class="ltx_text ltx_lst_space" id="lstnumberx132.1.6"> </span>kHz</span> </span> <span class="ltx_listingline" id="lstnumberx133"><span class="ltx_tag ltx_tag_listingline">4</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx133.1">frequency_difference_khz</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx133.2"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx133.3">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx133.4"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx133.5">chemical_shift_difference</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx133.6"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx133.7">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx133.8"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx133.9">spectrometer_frequency_mhz</span> </span> <span class="ltx_listingline" id="lstnumberx134"><span class="ltx_tag ltx_tag_listingline">5</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx134.1">frequency_difference_khz</span> </span> </span> </span> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.5"><span class="ltx_text" id="S3.SS3.11.11.23.12.1.1.5.1" style="color:#0000FF;">Observation</span></span> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.6" style="background-color:#F0F0F0;padding:9pt;">Tool output: 30720.0</span> <span class="ltx_p" id="S3.SS3.11.11.23.12.1.1.7"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.23.12.1.1.7.1" style="color:#FF0000;">Step 2</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.8.8.8"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.8.8.8.1"><span class="ltx_inline-sectional-block" id="S3.SS3.8.8.8.1.1" style="background-color:#F0F0F0;"> <span class="ltx_para" id="S3.SS3.8.8.8.1.1.p1"> <span class="ltx_p" id="S3.SS3.8.8.8.1.1.p1.1" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.8.8.8.1.1.p1.1.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_p" id="S3.SS3.8.8.8.1.1.p1.2" style="background-color:#F0F0F0;padding:9pt;">The frequency difference between benzene and chloroform on a 600 MHz spectrometer is 30.72 kHz. This corresponds to option (F) 30.7 kHz.</span> <span class="ltx_p" id="S3.SS3.8.8.8.1.1.p1.3"><span class="ltx_text" id="S3.SS3.8.8.8.1.1.p1.3.1" style="color:#0000FF;">Answer</span></span> <span class="ltx_p" id="S3.SS3.8.8.8.1.1.p1.4" style="background-color:#F0F0F0;padding:9pt;">The answer is (F).</span> </span> <section class="ltx_subsection" id="A3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">C.3 </span>Reasoning: Algebra Error</h3> <span class="ltx_para ltx_noindent" id="A3.SS3.p1"> <span class="ltx_p" id="A3.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="A3.SS3.p1.1.1">Explanation</span></span> </span> <span class="ltx_para ltx_noindent" id="A3.SS3.p2"> <span class="ltx_p" id="A3.SS3.p2.1">This is an example of a <span class="ltx_text ltx_ulem_uline" id="A3.SS3.p2.1.1">reasoning error: algebra error</span>. The question is from MMLU-Chemistry, and the agent answers it by firstly using Name2SMILES to understand the molecular structures and then calculating the ratio on its own. Everything is correct until the <span class="ltx_text" id="A3.SS3.p2.1.2" style="background-color:#FFFF00;">highlighted</span> algebra calculation step fails to correctly cross-multiply to solve for the ratio, leading to the wrong final answer.</span> </span> <span class="ltx_para ltx_noindent" id="A3.SS3.p3"> <span class="ltx_p" id="A3.SS3.p3.1"><span class="ltx_text ltx_font_bold" id="A3.SS3.p3.1.1">Question</span></span> </span> <span class="ltx_para" id="A3.SS3.p4"> <span class="ltx_p" id="A3.SS3.p4.2">The 1H spectrum of a mixture of dimethylsulphoxide (DMSO) and acetonitrile (AN) contains lines with relative intensities <math alttext="\alpha" class="ltx_Math" display="inline" id="A3.SS3.p4.1.m1.1"><semantics id="A3.SS3.p4.1.m1.1a"><mi id="A3.SS3.p4.1.m1.1.1" mathbackground="#F0F0F0" xref="A3.SS3.p4.1.m1.1.1.cmml">α</mi><annotation-xml encoding="MathML-Content" id="A3.SS3.p4.1.m1.1b"><ci id="A3.SS3.p4.1.m1.1.1.cmml" xref="A3.SS3.p4.1.m1.1.1">𝛼</ci></annotation-xml><annotation encoding="application/x-tex" id="A3.SS3.p4.1.m1.1c">\alpha</annotation><annotation encoding="application/x-llamapun" id="A3.SS3.p4.1.m1.1d">italic_α</annotation></semantics></math> and <math alttext="3\alpha" class="ltx_Math" display="inline" id="A3.SS3.p4.2.m2.1"><semantics id="A3.SS3.p4.2.m2.1a"><mrow id="A3.SS3.p4.2.m2.1.1" xref="A3.SS3.p4.2.m2.1.1.cmml"><mn id="A3.SS3.p4.2.m2.1.1.2" mathbackground="#F0F0F0" xref="A3.SS3.p4.2.m2.1.1.2.cmml">3</mn><mo id="A3.SS3.p4.2.m2.1.1.1" xref="A3.SS3.p4.2.m2.1.1.1.cmml"></mo><mi id="A3.SS3.p4.2.m2.1.1.3" mathbackground="#F0F0F0" xref="A3.SS3.p4.2.m2.1.1.3.cmml">α</mi></mrow><annotation-xml encoding="MathML-Content" id="A3.SS3.p4.2.m2.1b"><apply id="A3.SS3.p4.2.m2.1.1.cmml" xref="A3.SS3.p4.2.m2.1.1"><times id="A3.SS3.p4.2.m2.1.1.1.cmml" xref="A3.SS3.p4.2.m2.1.1.1"></times><cn id="A3.SS3.p4.2.m2.1.1.2.cmml" type="integer" xref="A3.SS3.p4.2.m2.1.1.2">3</cn><ci id="A3.SS3.p4.2.m2.1.1.3.cmml" xref="A3.SS3.p4.2.m2.1.1.3">𝛼</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="A3.SS3.p4.2.m2.1c">3\alpha</annotation><annotation encoding="application/x-llamapun" id="A3.SS3.p4.2.m2.1d">3 italic_α</annotation></semantics></math>, respectively. What is the ratio of the two concentrations, [DMSO]:[AN]?</span> </span> </section></span></td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.24.13"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.24.13.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.24.13.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.24.13.1.1.1" style="width:375.6pt;">(A) 3:2 (B) 1:6 <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.24.13.1.1.1.1">[correct]</span> (C) 1:9 <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.24.13.1.1.1.2">[predicted]</span> (D) 1:1 (E) 2:3</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.25.14"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.11.11.25.14.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.25.14.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.25.14.1.1.1" style="width:375.6pt;">(F) 1:3 (G) 3:1 (H) 2:1 (I) 6:1 (J) 1:2</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.26.15"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.26.15.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.26.15.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.26.15.1.1.1" style="width:375.6pt;"> <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.26.15.1.1.1.1">Problem Solving Process of ChemAgent (GPT)</span></span> <span class="ltx_p" id="S3.SS3.11.11.26.15.1.1.2"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.26.15.1.1.2.1" style="color:#FF0000;">Step 1</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.10.10.10"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.10.10.10.2"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.10.10.10.2.2" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.10.10.10.2.2.3" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.10.10.10.2.2.3.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_p" id="S3.SS3.10.10.10.2.2.2.2" style="background-color:#F0F0F0;padding:9pt;">The relative intensities in the 1H NMR spectrum of the mixture of DMSO and acetonitrile are given as <math alttext="\alpha" class="ltx_Math" display="inline" id="S3.SS3.9.9.9.1.1.1.1.m1.1"><semantics id="S3.SS3.9.9.9.1.1.1.1.m1.1a"><mi id="S3.SS3.9.9.9.1.1.1.1.m1.1.1" mathbackground="#F0F0F0" xref="S3.SS3.9.9.9.1.1.1.1.m1.1.1.cmml">α</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.9.9.9.1.1.1.1.m1.1b"><ci id="S3.SS3.9.9.9.1.1.1.1.m1.1.1.cmml" xref="S3.SS3.9.9.9.1.1.1.1.m1.1.1">𝛼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.9.9.9.1.1.1.1.m1.1c">\alpha</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.9.9.9.1.1.1.1.m1.1d">italic_α</annotation></semantics></math> and <math alttext="3\alpha" class="ltx_Math" display="inline" id="S3.SS3.10.10.10.2.2.2.2.m2.1"><semantics id="S3.SS3.10.10.10.2.2.2.2.m2.1a"><mrow id="S3.SS3.10.10.10.2.2.2.2.m2.1.1" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.cmml"><mn id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.2" mathbackground="#F0F0F0" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.2.cmml">3</mn><mo id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.1" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.1.cmml"></mo><mi id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.3" mathbackground="#F0F0F0" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.3.cmml">α</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.10.10.10.2.2.2.2.m2.1b"><apply id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.cmml" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1"><times id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.1.cmml" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.1"></times><cn id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.2.cmml" type="integer" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.2">3</cn><ci id="S3.SS3.10.10.10.2.2.2.2.m2.1.1.3.cmml" xref="S3.SS3.10.10.10.2.2.2.2.m2.1.1.3">𝛼</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.10.10.10.2.2.2.2.m2.1c">3\alpha</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.10.10.10.2.2.2.2.m2.1d">3 italic_α</annotation></semantics></math>, respectively. To find the concentration ratio [DMSO]:[AN], the relative number of nuclei contributing to these intensities must be considered. Both DMSO and acetonitrile contribute distinct numbers of protons. I will determine the number of protons contributing to the NMR spectrum for each compound.</span> <span class="ltx_p" id="S3.SS3.10.10.10.2.2.4"><span class="ltx_text" id="S3.SS3.10.10.10.2.2.4.1" style="color:#0000FF;">Action</span></span> <span class="ltx_p" id="S3.SS3.10.10.10.2.2.5" style="background-color:#F0F0F0;padding:9pt;">Tool: Name2SMILES</span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.27.16"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.27.16.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.27.16.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.27.16.1.1.1" style="width:375.6pt;">Tool input: Dimethylsulphoxide</span> <span class="ltx_p" id="S3.SS3.11.11.27.16.1.1.2"><span class="ltx_text" id="S3.SS3.11.11.27.16.1.1.2.1" style="color:#0000FF;">Observation</span></span> <span class="ltx_p" id="S3.SS3.11.11.27.16.1.1.3" style="background-color:#F0F0F0;padding:9pt;">Tool output: <span class="ltx_text ltx_font_typewriter" id="S3.SS3.11.11.27.16.1.1.3.1">CS(=O)C</span></span> <span class="ltx_p" id="S3.SS3.11.11.27.16.1.1.4"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.27.16.1.1.4.1" style="color:#FF0000;">Step 2</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.28.17"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.11.11.28.17.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.28.17.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.28.17.1.1.1" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.11.11.28.17.1.1.1.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_p" id="S3.SS3.11.11.28.17.1.1.2" style="background-color:#F0F0F0;padding:9pt;">Dimethylsulphoxide (DMSO) has the SMILES representation <span class="ltx_text ltx_font_typewriter" id="S3.SS3.11.11.28.17.1.1.2.1">CS(=O)C</span>, meaning it has 6 equivalent hydrogen atoms (as CH3 groups contribute their protons). Now, I need to find out the SMILES for acetonitrile (AN) to determine its number of hydrogens.</span> <span class="ltx_p" id="S3.SS3.11.11.28.17.1.1.3"><span class="ltx_text" id="S3.SS3.11.11.28.17.1.1.3.1" style="color:#0000FF;">Action</span></span> <span class="ltx_block" id="S3.SS3.11.11.28.17.1.1.4" style="background-color:#F0F0F0;padding:9pt;"> <span class="ltx_p" id="S3.SS3.11.11.28.17.1.1.4.1">Tool: Name2SMILES</span> </span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.29.18"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.29.18.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.29.18.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.29.18.1.1.1" style="width:375.6pt;">Tool input: Acetonitrile</span> <span class="ltx_p" id="S3.SS3.11.11.29.18.1.1.2"><span class="ltx_text" id="S3.SS3.11.11.29.18.1.1.2.1" style="color:#0000FF;">Observation</span></span> <span class="ltx_p" id="S3.SS3.11.11.29.18.1.1.3" style="background-color:#F0F0F0;padding:9pt;">Tool output: <span class="ltx_text ltx_font_typewriter" id="S3.SS3.11.11.29.18.1.1.3.1">CC#N</span></span> <span class="ltx_p" id="S3.SS3.11.11.29.18.1.1.4"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.29.18.1.1.4.1" style="color:#FF0000;">Step 3</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.11"> <td class="ltx_td ltx_nopad_r ltx_align_justify ltx_align_top" id="S3.SS3.11.11.11.1"><span class="ltx_inline-sectional-block" id="S3.SS3.11.11.11.1.1" style="background-color:#F0F0F0;"> <span class="ltx_para" id="S3.SS3.11.11.11.1.1.p1"> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.1" style="width:375.6pt;"><span class="ltx_text" id="S3.SS3.11.11.11.1.1.p1.1.1" style="color:#0000FF;">Thought</span></span> <span class="ltx_block" id="S3.SS3.11.11.11.1.1.p1.2" style="background-color:#F0F0F0;padding:9pt;"> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.2.1">Acetonitrile (AN) has the SMILES representation <span class="ltx_text ltx_font_typewriter" id="S3.SS3.11.11.11.1.1.p1.2.1.1">CC#N</span>, meaning it has 3 equivalent hydrogen atoms (from the CH3 group).</span> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.2.2">Dimethyl sulfoxide (DMSO) thus contributes 6 protons to the NMR, whereas acetonitrile (AN) contributes 3 protons. The NMR intensity is proportional to the number of protons, thus the ratio of concentrations can be calculated considering this proportionality.</span> <span class="ltx_itemize" id="A3.I1"> <span class="ltx_item" id="A3.I1.i1" style="list-style-type:none;"><span class="ltx_tag ltx_tag_item">•</span> <span class="ltx_para" id="A3.I1.i1.p1"> <span class="ltx_p" id="A3.I1.i1.p1.2">The intensity ratio given in the NMR spectrum is <math alttext="\alpha" class="ltx_Math" display="inline" id="A3.I1.i1.p1.1.m1.1"><semantics id="A3.I1.i1.p1.1.m1.1a"><mi id="A3.I1.i1.p1.1.m1.1.1" mathbackground="#F0F0F0" xref="A3.I1.i1.p1.1.m1.1.1.cmml">α</mi><annotation-xml encoding="MathML-Content" id="A3.I1.i1.p1.1.m1.1b"><ci id="A3.I1.i1.p1.1.m1.1.1.cmml" xref="A3.I1.i1.p1.1.m1.1.1">𝛼</ci></annotation-xml><annotation encoding="application/x-tex" id="A3.I1.i1.p1.1.m1.1c">\alpha</annotation><annotation encoding="application/x-llamapun" id="A3.I1.i1.p1.1.m1.1d">italic_α</annotation></semantics></math> (for DMSO) and <math alttext="3\alpha" class="ltx_Math" display="inline" id="A3.I1.i1.p1.2.m2.1"><semantics id="A3.I1.i1.p1.2.m2.1a"><mrow id="A3.I1.i1.p1.2.m2.1.1" xref="A3.I1.i1.p1.2.m2.1.1.cmml"><mn id="A3.I1.i1.p1.2.m2.1.1.2" mathbackground="#F0F0F0" xref="A3.I1.i1.p1.2.m2.1.1.2.cmml">3</mn><mo id="A3.I1.i1.p1.2.m2.1.1.1" xref="A3.I1.i1.p1.2.m2.1.1.1.cmml"></mo><mi id="A3.I1.i1.p1.2.m2.1.1.3" mathbackground="#F0F0F0" xref="A3.I1.i1.p1.2.m2.1.1.3.cmml">α</mi></mrow><annotation-xml encoding="MathML-Content" id="A3.I1.i1.p1.2.m2.1b"><apply id="A3.I1.i1.p1.2.m2.1.1.cmml" xref="A3.I1.i1.p1.2.m2.1.1"><times id="A3.I1.i1.p1.2.m2.1.1.1.cmml" xref="A3.I1.i1.p1.2.m2.1.1.1"></times><cn id="A3.I1.i1.p1.2.m2.1.1.2.cmml" type="integer" xref="A3.I1.i1.p1.2.m2.1.1.2">3</cn><ci id="A3.I1.i1.p1.2.m2.1.1.3.cmml" xref="A3.I1.i1.p1.2.m2.1.1.3">𝛼</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="A3.I1.i1.p1.2.m2.1c">3\alpha</annotation><annotation encoding="application/x-llamapun" id="A3.I1.i1.p1.2.m2.1d">3 italic_α</annotation></semantics></math> (for AN).</span> </span></span> <span class="ltx_item" id="A3.I1.i2" style="list-style-type:none;"><span class="ltx_tag ltx_tag_item">•</span> <span class="ltx_para ltx_noindent" id="A3.I1.i2.p1"> <span class="ltx_p" id="A3.I1.i2.p1.1">Let the concentration of DMSO be [DMSO] and that of AN be [AN].</span> </span></span> </span> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.2.3">Based on proportionality:</span> <span class="ltx_equation ltx_eqn_table" id="A3.Ex1"> <span><span class="ltx_equation ltx_eqn_row ltx_align_baseline"> <span class="ltx_eqn_cell ltx_eqn_center_padleft"></span> <span class="ltx_eqn_cell ltx_align_center"><math alttext="\frac{\text{[DMSO]}\times 6}{\text{[AN]}\times 3}=\frac{\alpha}{3\alpha}" class="ltx_Math" display="block" id="A3.Ex1.m1.1"><semantics id="A3.Ex1.m1.1a"><mrow id="A3.Ex1.m1.1.1" xref="A3.Ex1.m1.1.1.cmml"><mfrac id="A3.Ex1.m1.1.1.2" xref="A3.Ex1.m1.1.1.2.cmml"><mrow id="A3.Ex1.m1.1.1.2.2" xref="A3.Ex1.m1.1.1.2.2.cmml"><mtext id="A3.Ex1.m1.1.1.2.2.2" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.2.2.2a.cmml">[DMSO]</mtext><mo id="A3.Ex1.m1.1.1.2.2.1" lspace="0.222em" mathbackground="#F0F0F0" rspace="0.222em" xref="A3.Ex1.m1.1.1.2.2.1.cmml">×</mo><mn id="A3.Ex1.m1.1.1.2.2.3" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.2.2.3.cmml">6</mn></mrow><mrow id="A3.Ex1.m1.1.1.2.3" xref="A3.Ex1.m1.1.1.2.3.cmml"><mtext id="A3.Ex1.m1.1.1.2.3.2" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.2.3.2a.cmml">[AN]</mtext><mo id="A3.Ex1.m1.1.1.2.3.1" lspace="0.222em" mathbackground="#F0F0F0" rspace="0.222em" xref="A3.Ex1.m1.1.1.2.3.1.cmml">×</mo><mn id="A3.Ex1.m1.1.1.2.3.3" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.2.3.3.cmml">3</mn></mrow></mfrac><mo id="A3.Ex1.m1.1.1.1" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.1.cmml">=</mo><mfrac id="A3.Ex1.m1.1.1.3" xref="A3.Ex1.m1.1.1.3.cmml"><mi id="A3.Ex1.m1.1.1.3.2" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.3.2.cmml">α</mi><mrow id="A3.Ex1.m1.1.1.3.3" xref="A3.Ex1.m1.1.1.3.3.cmml"><mn id="A3.Ex1.m1.1.1.3.3.2" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.3.3.2.cmml">3</mn><mo id="A3.Ex1.m1.1.1.3.3.1" xref="A3.Ex1.m1.1.1.3.3.1.cmml"></mo><mi id="A3.Ex1.m1.1.1.3.3.3" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.3.3.3.cmml">α</mi></mrow></mfrac></mrow><annotation-xml encoding="MathML-Content" id="A3.Ex1.m1.1b"><apply id="A3.Ex1.m1.1.1.cmml" xref="A3.Ex1.m1.1.1"><eq id="A3.Ex1.m1.1.1.1.cmml" xref="A3.Ex1.m1.1.1.1"></eq><apply id="A3.Ex1.m1.1.1.2.cmml" xref="A3.Ex1.m1.1.1.2"><divide id="A3.Ex1.m1.1.1.2.1.cmml" xref="A3.Ex1.m1.1.1.2"></divide><apply id="A3.Ex1.m1.1.1.2.2.cmml" xref="A3.Ex1.m1.1.1.2.2"><times id="A3.Ex1.m1.1.1.2.2.1.cmml" xref="A3.Ex1.m1.1.1.2.2.1"></times><ci id="A3.Ex1.m1.1.1.2.2.2a.cmml" xref="A3.Ex1.m1.1.1.2.2.2"><mtext id="A3.Ex1.m1.1.1.2.2.2.cmml" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.2.2.2">[DMSO]</mtext></ci><cn id="A3.Ex1.m1.1.1.2.2.3.cmml" type="integer" xref="A3.Ex1.m1.1.1.2.2.3">6</cn></apply><apply id="A3.Ex1.m1.1.1.2.3.cmml" xref="A3.Ex1.m1.1.1.2.3"><times id="A3.Ex1.m1.1.1.2.3.1.cmml" xref="A3.Ex1.m1.1.1.2.3.1"></times><ci id="A3.Ex1.m1.1.1.2.3.2a.cmml" xref="A3.Ex1.m1.1.1.2.3.2"><mtext id="A3.Ex1.m1.1.1.2.3.2.cmml" mathbackground="#F0F0F0" xref="A3.Ex1.m1.1.1.2.3.2">[AN]</mtext></ci><cn id="A3.Ex1.m1.1.1.2.3.3.cmml" type="integer" xref="A3.Ex1.m1.1.1.2.3.3">3</cn></apply></apply><apply id="A3.Ex1.m1.1.1.3.cmml" xref="A3.Ex1.m1.1.1.3"><divide id="A3.Ex1.m1.1.1.3.1.cmml" xref="A3.Ex1.m1.1.1.3"></divide><ci id="A3.Ex1.m1.1.1.3.2.cmml" xref="A3.Ex1.m1.1.1.3.2">𝛼</ci><apply id="A3.Ex1.m1.1.1.3.3.cmml" xref="A3.Ex1.m1.1.1.3.3"><times id="A3.Ex1.m1.1.1.3.3.1.cmml" xref="A3.Ex1.m1.1.1.3.3.1"></times><cn id="A3.Ex1.m1.1.1.3.3.2.cmml" type="integer" xref="A3.Ex1.m1.1.1.3.3.2">3</cn><ci id="A3.Ex1.m1.1.1.3.3.3.cmml" xref="A3.Ex1.m1.1.1.3.3.3">𝛼</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="A3.Ex1.m1.1c">\frac{\text{[DMSO]}\times 6}{\text{[AN]}\times 3}=\frac{\alpha}{3\alpha}</annotation><annotation encoding="application/x-llamapun" id="A3.Ex1.m1.1d">divide start_ARG [DMSO] × 6 end_ARG start_ARG [AN] × 3 end_ARG = divide start_ARG italic_α end_ARG start_ARG 3 italic_α end_ARG</annotation></semantics></math></span> <span class="ltx_eqn_cell ltx_eqn_center_padright"></span></span></span> </span> <span class="ltx_equation ltx_eqn_table" id="A3.Ex2"> <span><span class="ltx_equation ltx_eqn_row ltx_align_baseline"> <span class="ltx_eqn_cell ltx_eqn_center_padleft"></span> <span class="ltx_eqn_cell ltx_align_center"><span class="ltx_text ltx_markedasmath" id="A3.Ex2.m1.1.1.1"><span class="ltx_text" id="A3.Ex2.m1.1.1.1.1" style="background-color:#FFFF00;"> <math alttext="\displaystyle\frac{\text{[DMSO]}}{\text{[AN]}}=\frac{1}{9}" class="ltx_Math" display="inline" id="A3.Ex2.m1.1.1.1.1.m1.1"><semantics id="A3.Ex2.m1.1.1.1.1.m1.1a"><mrow id="A3.Ex2.m1.1.1.1.1.m1.1.1" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.cmml"><mstyle displaystyle="true" id="A3.Ex2.m1.1.1.1.1.m1.1.1.2" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.cmml"><mfrac id="A3.Ex2.m1.1.1.1.1.m1.1.1.2a" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.cmml"><mtext id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.2" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.2a.cmml">[DMSO]</mtext><mtext id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.3" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.3a.cmml">[AN]</mtext></mfrac></mstyle><mo id="A3.Ex2.m1.1.1.1.1.m1.1.1.1" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.1.cmml">=</mo><mstyle displaystyle="true" id="A3.Ex2.m1.1.1.1.1.m1.1.1.3" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3.cmml"><mfrac id="A3.Ex2.m1.1.1.1.1.m1.1.1.3a" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3.cmml"><mn id="A3.Ex2.m1.1.1.1.1.m1.1.1.3.2" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3.2.cmml">1</mn><mn id="A3.Ex2.m1.1.1.1.1.m1.1.1.3.3" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3.3.cmml">9</mn></mfrac></mstyle></mrow><annotation-xml encoding="MathML-Content" id="A3.Ex2.m1.1.1.1.1.m1.1b"><apply id="A3.Ex2.m1.1.1.1.1.m1.1.1.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1"><eq id="A3.Ex2.m1.1.1.1.1.m1.1.1.1.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.1"></eq><apply id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2"><divide id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.1.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2"></divide><ci id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.2a.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.2"><mtext id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.2.cmml" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.2">[DMSO]</mtext></ci><ci id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.3a.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.3"><mtext id="A3.Ex2.m1.1.1.1.1.m1.1.1.2.3.cmml" mathbackground="#FFFF00" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.2.3">[AN]</mtext></ci></apply><apply id="A3.Ex2.m1.1.1.1.1.m1.1.1.3.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3"><divide id="A3.Ex2.m1.1.1.1.1.m1.1.1.3.1.cmml" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3"></divide><cn id="A3.Ex2.m1.1.1.1.1.m1.1.1.3.2.cmml" type="integer" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3.2">1</cn><cn id="A3.Ex2.m1.1.1.1.1.m1.1.1.3.3.cmml" type="integer" xref="A3.Ex2.m1.1.1.1.1.m1.1.1.3.3">9</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="A3.Ex2.m1.1.1.1.1.m1.1c">\displaystyle\frac{\text{[DMSO]}}{\text{[AN]}}=\frac{1}{9}</annotation><annotation encoding="application/x-llamapun" id="A3.Ex2.m1.1.1.1.1.m1.1d">divide start_ARG [DMSO] end_ARG start_ARG [AN] end_ARG = divide start_ARG 1 end_ARG start_ARG 9 end_ARG</annotation></semantics></math> </span></span></span> <span class="ltx_eqn_cell ltx_eqn_center_padright"></span></span></span> </span> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.2.4">The concentration ratio of DMSO to AN is therefore 1:9.</span> </span> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.3"><span class="ltx_text" id="S3.SS3.11.11.11.1.1.p1.3.1" style="color:#0000FF;">Answer</span></span> <span class="ltx_p" id="S3.SS3.11.11.11.1.1.p1.4" style="background-color:#F0F0F0;padding:9pt;">The answer is (C) 1:9.</span> </span> <section class="ltx_subsection" id="A3.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">C.4 </span>Grounding: Wrong Input Format</h3> <span class="ltx_para ltx_noindent" id="A3.SS4.p1"> <span class="ltx_p" id="A3.SS4.p1.1"><span class="ltx_text ltx_font_bold" id="A3.SS4.p1.1.1">Explanation</span></span> </span> <span class="ltx_para ltx_noindent" id="A3.SS4.p2"> <span class="ltx_p" id="A3.SS4.p2.1">This is an example of a <span class="ltx_text ltx_ulem_uline" id="A3.SS4.p2.1.1">grounding error: wrong input format</span>. This question is from SMolInstruct, and the agent initially calls the dedicated tool, SideEffectPredictor, to predict the potential side effect. However, it inputs an extra “SMILES:” (<span class="ltx_text" id="A3.SS4.p2.1.2" style="background-color:#FFFF00;">highlighted</span>), causing the tool to malfunction. The same issue occurs with the CanonicalizeSMILES tool.</span> </span> <span class="ltx_para ltx_noindent" id="A3.SS4.p3"> <span class="ltx_p" id="A3.SS4.p3.1"><span class="ltx_text ltx_font_bold" id="A3.SS4.p3.1.1">Question</span></span> </span> <span class="ltx_para" id="A3.SS4.p4"> <span class="ltx_p" id="A3.SS4.p4.1">Are there any known side effects of <SMILES><span class="ltx_text ltx_font_typewriter" id="A3.SS4.p4.1.1">CC(C)(C)NC[C@H](O)COC1=NSN=C1N1CCOCC1. CCN[C@H]1CN(CCCOC)S(=O)(=O)C2=C1C=C(S(N)(=O)=O)S2</span> <span class="ltx_text" id="A3.SS4.p4.1.2"></SMILES></span> affecting the hepatobiliary system?</span> </span> </section></span></td> </tr> <tr class="ltx_tr" id="S3.SS3.11.11.30.19"> <td class="ltx_td ltx_align_justify ltx_align_top" id="S3.SS3.11.11.30.19.1"> <span class="ltx_inline-block ltx_align_top" id="S3.SS3.11.11.30.19.1.1" style="background-color:#F0F0F0;"> <span class="ltx_p" id="S3.SS3.11.11.30.19.1.1.1" style="width:375.6pt;"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.30.19.1.1.1.1">Correct:</span> No. <span class="ltx_text ltx_font_bold" id="S3.SS3.11.11.30.19.1.1.1.2">Predicted:</span> Yes.</span> </span> </td> </tr> </tbody> </table> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.12"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.12.1" style="background-color:#F0F0F0;">Problem Solving Process of ChemAgent (GPT)<span class="ltx_text ltx_font_medium" id="S3.SS3.11.12.1.1"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.13"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.13.1" style="color:#FF0000;background-color:#F0F0F0;">Step 1</span><span class="ltx_text" id="S3.SS3.11.13.2" style="background-color:#F0F0F0;"> <span class="ltx_text" id="S3.SS3.11.13.2.1" style="color:#0000FF;">Thought</span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.14" style="background-color:#F0F0F0;padding:9pt;">To determine if there are any known side effects of the given compound affecting the hepatobiliary system, I will first use the SideEffectPredictor tool to obtain the probabilities of different side effects caused by the compound. After that, I will specifically look for any effects related to the hepatobiliary system.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.15"><span class="ltx_text" id="S3.SS3.11.15.1" style="color:#0000FF;background-color:#F0F0F0;">Action<span class="ltx_text" id="S3.SS3.11.15.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <div class="ltx_block ltx_figure_panel" id="S3.SS3.11.16" style="background-color:#F0F0F0;padding:9pt;"> <p class="ltx_p" id="S3.SS3.11.16.1">Tool: SideEffectPredictor</p> <p class="ltx_p" id="S3.SS3.11.16.2">Tool input: <span class="ltx_text" id="S3.SS3.11.16.2.1" style="background-color:#FFFF00;">SMILES:</span> <span class="ltx_text ltx_font_typewriter" id="S3.SS3.11.16.2.2">CC(C)(C)NC[C@H](O)COC1=NSN=C1N1CCOCC1.CCN[C@H]1CN(CCCOC)S (=O)(=O)C2=C1C=C(S(N)(=O)=O)S2</span></p> </div> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.17"><span class="ltx_text" id="S3.SS3.11.17.1" style="color:#0000FF;background-color:#F0F0F0;">Observation<span class="ltx_text" id="S3.SS3.11.17.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.18" style="background-color:#F0F0F0;padding:9pt;">Tool output: Error: Invalid SMILES.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.19"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.19.1" style="color:#FF0000;background-color:#F0F0F0;">Step 2</span><span class="ltx_text" id="S3.SS3.11.19.2" style="background-color:#F0F0F0;"> <span class="ltx_text" id="S3.SS3.11.19.2.1" style="color:#0000FF;">Thought</span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.20" style="background-color:#F0F0F0;padding:9pt;">The error indicates that the provided SMILES is invalid. I’ll canonicalize the SMILES and then re-attempt using the SideEffectPredictor tool.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.21"><span class="ltx_text" id="S3.SS3.11.21.1" style="color:#0000FF;background-color:#F0F0F0;">Action<span class="ltx_text" id="S3.SS3.11.21.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <div class="ltx_block ltx_figure_panel" id="S3.SS3.11.22" style="background-color:#F0F0F0;padding:9pt;"> <p class="ltx_p" id="S3.SS3.11.22.1">Tool: CanonicalizeSMILES</p> <p class="ltx_p" id="S3.SS3.11.22.2">Tool input: <span class="ltx_text" id="S3.SS3.11.22.2.1" style="background-color:#FFFF00;">SMILES:</span> <span class="ltx_text ltx_font_typewriter" id="S3.SS3.11.22.2.2">CC(C)(C)NC[C@H](O)COC1=NSN=C1N1CCOCC1.CCN[C@H]1CN(CCCOC) S(=O)(=O)C2=C1C=C(S(N)(=O)=O)S2</span></p> </div> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.23"><span class="ltx_text" id="S3.SS3.11.23.1" style="color:#0000FF;background-color:#F0F0F0;">Observation<span class="ltx_text" id="S3.SS3.11.23.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.24" style="background-color:#F0F0F0;padding:9pt;">Tool output: Error: Invalid SMILES.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.25"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.25.1" style="color:#FF0000;background-color:#F0F0F0;">Step 2</span><span class="ltx_text" id="S3.SS3.11.25.2" style="background-color:#F0F0F0;"> <span class="ltx_text" id="S3.SS3.11.25.2.1" style="color:#0000FF;">Thought</span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.26" style="background-color:#F0F0F0;padding:9pt;">Since there’s an issue with the SMILES provided and attempts to canonicalize it failed, I’ll use the AiExpert tool to get an analysis regarding the potential side effects based on the description of the molecule.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.27"><span class="ltx_text" id="S3.SS3.11.27.1" style="color:#0000FF;background-color:#F0F0F0;">Action<span class="ltx_text" id="S3.SS3.11.27.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <div class="ltx_block ltx_figure_panel" id="S3.SS3.11.28" style="background-color:#F0F0F0;padding:9pt;"> <p class="ltx_p" id="S3.SS3.11.28.1">Tool: AiExpert</p> <p class="ltx_p" id="S3.SS3.11.28.2">Tool input: Are there any known side effects affecting the hepatobiliary system for a compound with both isothiazolone and sulfonamide-like structures?</p> </div> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.29"><span class="ltx_text" id="S3.SS3.11.29.1" style="color:#0000FF;background-color:#F0F0F0;">Observation<span class="ltx_text" id="S3.SS3.11.29.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.30" style="background-color:#F0F0F0;padding:9pt;">Tool output: Compounds that contain isothiazolone and sulfonamide-like structures can be complex in nature … (omit the long analysis)</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.31"><span class="ltx_text" id="S3.SS3.11.31.1" style="background-color:#F0F0F0;">Given the potential for systemic toxicity, especially with ongoing exposure, caution is warranted, and hepatobiliary monitoring should be part of safety assessments for such a compound. Always consult and follow established guidelines and protocols for safety evaluation during drug development or chemical utilization phases.</span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.32"><span class="ltx_text ltx_font_bold" id="S3.SS3.11.32.1" style="color:#FF0000;background-color:#F0F0F0;">Step 3</span><span class="ltx_text" id="S3.SS3.11.32.2" style="background-color:#F0F0F0;"> <span class="ltx_text" id="S3.SS3.11.32.2.1" style="color:#0000FF;">Thought</span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.33" style="background-color:#F0F0F0;padding:9pt;">The AiExpert tool suggests there could be potential hepatobiliary side effects for compounds containing isothiazolone and sulfonamide-like structures due to possible hepatotoxicity, liver enzyme changes, or hypersensitivity reactions involving the liver. Therefore, there’s a likelihood of such effects occurring in the given compound.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.34"><span class="ltx_text" id="S3.SS3.11.34.1" style="color:#0000FF;background-color:#F0F0F0;">Answer<span class="ltx_text" id="S3.SS3.11.34.1.1" style="color:#000000;"></span></span></p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel" id="S3.SS3.11.35" style="background-color:#F0F0F0;padding:9pt;">Yes.</p> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <section class="ltx_subsection ltx_figure_panel" id="A3.SS5" style="background-color:#F0F0F0;"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">C.5 </span>Tool: Wrong Tool Output</h3> <div class="ltx_para ltx_noindent" id="A3.SS5.p1"> <p class="ltx_p" id="A3.SS5.p1.1"><span class="ltx_text ltx_font_bold" id="A3.SS5.p1.1.1">Explanation</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p2"> <p class="ltx_p" id="A3.SS5.p2.1">This example shows the <span class="ltx_text ltx_ulem_uline" id="A3.SS5.p2.1.1">tool: wrong tool output</span> error. The question is from MMLU-Chemistry, and the agent answers it by searching the internet with the WebSearch tool. The tool provides an incorrect result (<span class="ltx_text" id="A3.SS5.p2.1.2" style="background-color:#FFFF00;">highlighted</span>), which leads to the wrong final answer.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p3"> <p class="ltx_p" id="A3.SS5.p3.1"><span class="ltx_text ltx_font_bold" id="A3.SS5.p3.1.1">Question</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p4"> <p class="ltx_p" id="A3.SS5.p4.1">Of the following atoms, which has the lowest electron affinity? (A) F (B) Si <span class="ltx_text ltx_font_bold" id="A3.SS5.p4.1.1">[predicted]</span> (C) O (D) Ca <span class="ltx_text ltx_font_bold" id="A3.SS5.p4.1.2">[gold]</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p5"> <p class="ltx_p" id="A3.SS5.p5.1"><span class="ltx_text ltx_font_bold" id="A3.SS5.p5.1.1">Problem Solving Process of ChemAgent (GPT)</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p6"> <p class="ltx_p" id="A3.SS5.p6.1"><span class="ltx_text ltx_font_bold" id="A3.SS5.p6.1.1" style="color:#FF0000;">Step 1</span> <span class="ltx_text" id="A3.SS5.p6.1.2" style="color:#0000FF;">Thought</span></p> <p class="ltx_p" id="A3.SS5.p6.2" style="background-color:#F0F0F0;padding:9pt;">Electron affinity generally increases across a period and decreases down a group in the periodic table. Considering this trend, I need to examine the electron affinities of the given atoms: F, Si, O, and Ca. I will search for electron affinities to accurately determine which atom has the lowest value.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p7"> <p class="ltx_p" id="A3.SS5.p7.1"><span class="ltx_text" id="A3.SS5.p7.1.1" style="color:#0000FF;">Action</span></p> <p class="ltx_p" id="A3.SS5.p7.2" style="background-color:#F0F0F0;padding:9pt;">Tool: WebSearch</p> <p class="ltx_p" id="A3.SS5.p7.3">Tool input: Which atom has the lowest electron affinity among F, Si, O, and Ca?</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p8"> <p class="ltx_p" id="A3.SS5.p8.1"><span class="ltx_text" id="A3.SS5.p8.1.1" style="color:#0000FF;">Observation</span></p> <p class="ltx_p" id="A3.SS5.p8.2" style="background-color:#F0F0F0;padding:9pt;">Tool output: <span class="ltx_text" id="A3.SS5.p8.2.1" style="background-color:#FFFF00;">The atom with the lowest electron affinity among F, Si, O, and Ca is Silicon (Si).</span> It has an electron affinity value of 134.06 kJ/mol, which is lower compared to the other atoms mentioned.</p> <p class="ltx_p" id="A3.SS5.p8.3"><span class="ltx_text ltx_font_bold" id="A3.SS5.p8.3.1" style="color:#FF0000;">Step 2</span> <span class="ltx_text" id="A3.SS5.p8.3.2" style="color:#0000FF;">Thought</span></p> <p class="ltx_p" id="A3.SS5.p8.4" style="background-color:#F0F0F0;padding:9pt;">Based on the electron affinity values obtained, Silicon (Si) has the lowest electron affinity among the listed options.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS5.p9"> <p class="ltx_p" id="A3.SS5.p9.1"><span class="ltx_text" id="A3.SS5.p9.1.1" style="color:#0000FF;">Answer</span></p> <p class="ltx_p" id="A3.SS5.p9.2" style="background-color:#F0F0F0;padding:9pt;">The answer is (B) Si.</p> </div> <section class="ltx_subsection" id="A3.SS6"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">C.6 </span>Tool: Inconsistent Tool Outputs</h3> <div class="ltx_para ltx_noindent" id="A3.SS6.p1"> <p class="ltx_p" id="A3.SS6.p1.1"><span class="ltx_text ltx_font_bold" id="A3.SS6.p1.1.1">Explanation</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p2"> <p class="ltx_p" id="A3.SS6.p2.1">This is an example of a <span class="ltx_text ltx_ulem_uline" id="A3.SS6.p2.1.1">tool error: inconsistent tool outputs</span>. The question is from SMolInstruct, and the agent answers it by predicting the answer with ToxicityPredictor and verifying the answer with PubchemSearchQA. However, the two tools give different answers and the agent chooses to believe the latter, which leads to the wrong final answer.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p3"> <p class="ltx_p" id="A3.SS6.p3.1"><span class="ltx_text ltx_font_bold" id="A3.SS6.p3.1.1">Question</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p4"> <p class="ltx_p" id="A3.SS6.p4.1">Is <SMILES><span class="ltx_text ltx_font_typewriter" id="A3.SS6.p4.1.1">C[C@H](/C=C/[C@H](C)C(C)(C)O)[C@H]1CC[C@H]2/C(=C/C=C3C[C@@H](O) C[C@H](O)C3)CCC[C@]12C</span> <span class="ltx_text" id="A3.SS6.p4.1.2"></SMILES></span> classified as a hazardous substance with documented toxicity? <span class="ltx_text ltx_font_bold" id="A3.SS6.p4.1.3">Correct:</span> No. <span class="ltx_text ltx_font_bold" id="A3.SS6.p4.1.4">Predicted:</span> Yes.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p5"> <p class="ltx_p" id="A3.SS6.p5.1"><span class="ltx_text ltx_font_bold" id="A3.SS6.p5.1.1">Problem Solving Process of ChemAgent (GPT)</span></p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p6"> <p class="ltx_p" id="A3.SS6.p6.1"><span class="ltx_text ltx_font_bold" id="A3.SS6.p6.1.1" style="color:#FF0000;">Step 1</span> <span class="ltx_text" id="A3.SS6.p6.1.2" style="color:#0000FF;">Thought</span></p> <p class="ltx_p" id="A3.SS6.p6.2" style="background-color:#F0F0F0;padding:9pt;">To determine if the given SMILES represents a hazardous substance with documented toxicity, I should first predict its toxicity profile using the ToxicityPredictor tool.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p7"> <p class="ltx_p" id="A3.SS6.p7.1"><span class="ltx_text" id="A3.SS6.p7.1.1" style="color:#0000FF;">Action</span></p> <div class="ltx_block" id="A3.SS6.p7.2" style="background-color:#F0F0F0;padding:9pt;"> <p class="ltx_p" id="A3.SS6.p7.2.1">Tool: ToxicityPredictor</p> <p class="ltx_p" id="A3.SS6.p7.2.2">Tool input: <span class="ltx_text ltx_font_typewriter" id="A3.SS6.p7.2.2.1">C[C@H](/C=C/[C@H](C)C(C)(C)O)[C@H]1CC[C@H]2/C(=C/C=C3C[C@@H](O) C[C@H](O)C3)CCC[C@]12C</span></p> </div> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p8"> <p class="ltx_p" id="A3.SS6.p8.1"><span class="ltx_text" id="A3.SS6.p8.1.1" style="color:#0000FF;">Observation</span></p> <p class="ltx_p" id="A3.SS6.p8.2" style="background-color:#F0F0F0;padding:9pt;">Tool output: The probability of the compound to be toxic is 7.85%, which means it’s unlikely to happen. Note that the result is predicted by a neural network model and may not be accurate. You may use other tools or resources to obtain more reliable results if needed.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p9"> <p class="ltx_p" id="A3.SS6.p9.1"><span class="ltx_text ltx_font_bold" id="A3.SS6.p9.1.1" style="color:#FF0000;">Step 2</span> <span class="ltx_text" id="A3.SS6.p9.1.2" style="color:#0000FF;">Thought</span></p> <p class="ltx_p" id="A3.SS6.p9.2" style="background-color:#F0F0F0;padding:9pt;">The ToxicityPredictor tool suggests that the probability of this compound being toxic is relatively low (7.85%), indicating it is unlikely to be classified as a hazardous substance based on this prediction alone. However, considering the neural network model’s limitations, it would be prudent to verify this against more authoritative data sources like PubChem for documented toxicity.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p10"> <p class="ltx_p" id="A3.SS6.p10.1"><span class="ltx_text" id="A3.SS6.p10.1.1" style="color:#0000FF;">Action</span></p> <div class="ltx_block" id="A3.SS6.p10.2" style="background-color:#F0F0F0;padding:9pt;"> <p class="ltx_p" id="A3.SS6.p10.2.1">Tool: PubchemSearchQA</p> <p class="ltx_p" id="A3.SS6.p10.2.2">Tool input: SMILES:</p> </div> <p class="ltx_p" id="A3.SS6.p10.3">Question: Is this compound classified as a hazardous substance with documented toxicity?</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p11"> <p class="ltx_p" id="A3.SS6.p11.1"><span class="ltx_text" id="A3.SS6.p11.1.1" style="color:#0000FF;">Observation</span></p> <p class="ltx_p" id="A3.SS6.p11.2" style="background-color:#F0F0F0;padding:9pt;">Tool output: Yes, paricalcitol is classified as a hazardous substance with documented toxicity. It has several GHS hazard classifications, including being fatal if swallowed (H300) and toxic if swallowed or in contact with skin (H301+H311). Additionally, it is suspected of causing cancer (H351) and may cause harm to breast-fed children (H362). Paricalcitol has acute toxicity both orally and dermally, and is recognized for causing damage to organs through prolonged or repeated exposure (H372)</p> <p class="ltx_p" id="A3.SS6.p11.3">Furthermore, toxicological information indicates potential drug-induced liver injury (ambiguous concern), and toxicity symptoms such as hypercalcemia can be severe. It also shows potential for interactions that could exacerbate its toxic effects. These factors collectively contribute to its classification as a hazardous substance with significant documented toxicity.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p12"> <p class="ltx_p" id="A3.SS6.p12.1"><span class="ltx_text ltx_font_bold" id="A3.SS6.p12.1.1" style="color:#FF0000;">Step 3</span> <span class="ltx_text" id="A3.SS6.p12.1.2" style="color:#0000FF;">Thought</span></p> <p class="ltx_p" id="A3.SS6.p12.2" style="background-color:#F0F0F0;padding:9pt;">Based on the information obtained from PubChem, the compound represented by the given SMILES, identified as paricalcitol, is indeed classified as a hazardous substance with documented toxicity. It has several GHS hazard classifications and potential health risks.</p> </div> <div class="ltx_para ltx_noindent" id="A3.SS6.p13"> <p class="ltx_p" id="A3.SS6.p13.1"><span class="ltx_text" id="A3.SS6.p13.1.1" style="color:#0000FF;">Answer</span></p> <p class="ltx_p" id="A3.SS6.p13.2" style="background-color:#F0F0F0;padding:9pt;">Yes.</p> </div> <div class="ltx_pagination ltx_role_newpage"></div> <section class="ltx_appendix" id="A4"> <h2 class="ltx_title ltx_title_appendix"> <span class="ltx_tag ltx_tag_appendix">Appendix D </span>Tool Usage Analysis</h2> <figure class="ltx_figure" id="A4.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="474" id="A4.F2.g1" src="x4.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>The statistics of tool usage by ChemAgent (GPT). The cell values represent the ratios of number of samples where the corresponding tools are used out of the total number of samples, and the number “0” indicates the value is 0 (the tool is not used).</figcaption> </figure> <div class="ltx_para ltx_noindent" id="A4.p1"> <p class="ltx_p" id="A4.p1.1">To examine the tool usage by the agent, we perform a statistical analysis of the tools used on both SMolInstruct and MMLU-Chemistry, as depicted in <a class="ltx_ref" href="https://arxiv.org/html/2411.07228v1#A4.F2" title="In Appendix D Tool Usage Analysis ‣ C.6 Tool: Inconsistent Tool Outputs ‣ C.5 Tool: Wrong Tool Output ‣ 3.3 Error Analysis ‣ 3 Experiments ‣ Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving"><span class="ltx_text ltx_ref_tag">Figure</span> <span class="ltx_text ltx_ref_tag">2</span></a>.</p> </div> <div class="ltx_para ltx_noindent" id="A4.p2"> <p class="ltx_p" id="A4.p2.1">On SMolInstruct, the agent predominantly invokes the tools designed for the tasks. For instance, in the NC-I2F task, the agent commonly utilizes IUPAC2SMILES and SMILES2Formula to facilitate the conversion from IUPAC names to molecular formulas, with these tools often achieving values near 1.0. The molecular captioning (MC) and molecular generation (MG) tasks are notable exceptions. Since they are more open-ended, the agent opts for a variety of tools.</p> </div> <div class="ltx_para ltx_noindent" id="A4.p3"> <p class="ltx_p" id="A4.p3.1">Conversely, on MMLU-Chemistry, the agent typically resorts to general-purpose tools (e.g., PythonREPL for calculations, WebSearch for knowledge gathering), due to the nature of the questions and the absence of task-specific tools.</p> </div> </section> </section> </section> </div> </div> </figure> </section> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Nov 11 18:46:12 2024 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg=="/></a> </div></footer> </div> </body> </html>