CINXE.COM
Do Large Language Models Understand Performance Optimization?
<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>Do Large Language Models Understand Performance Optimization?</title> <!--Generated on Mon Mar 17 22:26:10 2025 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <meta content="Large languages models, High-performance computing, Performance benchmarking, Code optimization" lang="en" name="keywords"/> <base href="/html/2503.13772v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S1" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">1 </span>Introduction</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2 </span>Related Work</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.SS0.SSS0.Px1" title="In 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">LLMs for Code</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.SS0.SSS0.Px2" title="In 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">LLM Benchmarks</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.SS0.SSS0.Px3" title="In 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">LLM Agents</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.SS0.SSS0.Px4" title="In 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Traditional Performance Analysis Tools</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S3" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3 </span>Benchmark Suite</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4 </span>Evaluation</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS1" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1 </span>Experiments Design</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS2" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.2 </span>Experiment 1: Single Serial Optimization</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS3" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.3 </span>Experiment 2: Multiple Serial Optimizations</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS4" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.4 </span>Experiment 3: Parallel Optimization</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS5" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.5 </span>Experiment 4: Time Spent on Applying Optimizations</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS6" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.6 </span>Experiment 5: Correctness</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.SS7" title="In 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.7 </span>Experiment 6: HPC Commonsense</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5 </span>Case Studies</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS1" title="In 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.1 </span>Performance Optimization Agent</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS2" title="In 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.2 </span>Case 1: NPB_CG</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS2.SSS0.Px1" title="In 5.2. Case 1: NPB_CG ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 1</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS2.SSS0.Px2" title="In 5.2. Case 1: NPB_CG ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 2</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS2.SSS0.Px3" title="In 5.2. Case 1: NPB_CG ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 3</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS3" title="In 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.3 </span>Case 2: XSBench</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS3.SSS0.Px1" title="In 5.3. Case 2: XSBench ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 1</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS3.SSS0.Px2" title="In 5.3. Case 2: XSBench ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 2</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS3.SSS0.Px3" title="In 5.3. Case 2: XSBench ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 3</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS4" title="In 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.4 </span>Case 3: LBM D2Q37</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS4.SSS1" title="In 5.4. Case 3: LBM D2Q37 ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.4.1 </span>Single Process</span></a> <ol class="ltx_toclist ltx_toclist_subsubsection"> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS4.SSS1.Px1" title="In 5.4.1. Single Process ‣ 5.4. Case 3: LBM D2Q37 ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 1</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS4.SSS1.Px2" title="In 5.4.1. Single Process ‣ 5.4. Case 3: LBM D2Q37 ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 2</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS4.SSS1.Px3" title="In 5.4.1. Single Process ‣ 5.4. Case 3: LBM D2Q37 ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 3</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS4.SSS2" title="In 5.4. Case 3: LBM D2Q37 ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.4.2 </span>Multiple Processes</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS5" title="In 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5.5 </span>Case 4: Minisweep</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS5.SSS0.Px1" title="In 5.5. Case 4: Minisweep ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 1</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS5.SSS0.Px2" title="In 5.5. Case 4: Minisweep ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 2</span></a></li> <li class="ltx_tocentry ltx_tocentry_paragraph"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.SS5.SSS0.Px3" title="In 5.5. Case 4: Minisweep ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title">Version 3</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S6" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">6 </span>Discussion</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S7" title="In Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">7 </span>Conclusions</span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line ltx_leqno"> <h1 class="ltx_title ltx_title_document">Do Large Language Models Understand Performance Optimization?</h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Bowen Cui<sup class="ltx_sup" id="id4.2.id1">∗</sup> </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_email"><a href="mailto:bcui2@gmu.edu">bcui2@gmu.edu</a> </span> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_affiliation_institution" id="id5.3.id1">George Mason University</span><span class="ltx_text ltx_affiliation_city" id="id6.4.id2">Fairfax</span><span class="ltx_text ltx_affiliation_state" id="id7.5.id3">VA</span><span class="ltx_text ltx_affiliation_country" id="id8.6.id4">USA</span> </span></span></span> <span class="ltx_author_before">, </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Tejas Ramesh<sup class="ltx_sup" id="id9.2.id1">∗</sup> </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_email"><a href="mailto:tramesh2@gmu.edu">tramesh2@gmu.edu</a> </span> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_affiliation_institution" id="id10.3.id1">George Mason University</span><span class="ltx_text ltx_affiliation_city" id="id11.4.id2">Fairfax</span><span class="ltx_text ltx_affiliation_state" id="id12.5.id3">VA</span><span class="ltx_text ltx_affiliation_country" id="id13.6.id4">USA</span> </span></span></span> <span class="ltx_author_before">, </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Oscar Hernandez </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_email"><a href="mailto:oscar@ornl.gov">oscar@ornl.gov</a> </span> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_affiliation_institution" id="id14.1.id1">Oak Ridge National Laboratory</span><span class="ltx_text ltx_affiliation_city" id="id15.2.id2">Oak Ridge</span><span class="ltx_text ltx_affiliation_state" id="id16.3.id3">TN</span><span class="ltx_text ltx_affiliation_country" id="id17.4.id4">USA</span> </span></span></span> <span class="ltx_author_before"> and </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Keren Zhou </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_email"><a href="mailto:kzhou6@gmu.edu">kzhou6@gmu.edu</a> </span> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_affiliation_institution" id="id18.1.id1">George Mason University</span><span class="ltx_text ltx_affiliation_city" id="id19.2.id2">Fairfax</span><span class="ltx_text ltx_affiliation_state" id="id20.3.id3">VA</span><span class="ltx_text ltx_affiliation_country" id="id21.4.id4">USA</span> </span></span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract.</h6> <p class="ltx_p" id="id22.id1">Large Language Models (LLMs) have emerged as powerful tools for software development tasks such as code completion, translation, and optimization. However, their ability to generate efficient and correct code, particularly in complex High-Performance Computing (HPC) contexts, has remained underexplored. To address this gap, this paper presents a comprehensive benchmark suite encompassing multiple critical HPC computational motifs to evaluate the performance of code optimized by state-of-the-art LLMs, including OpenAI o1, Claude-3.5, and Llama-3.2. In addition to analyzing basic computational kernels, we developed an agent system that integrates LLMs to assess their effectiveness in real HPC applications. Our evaluation focused on key criteria such as execution time, correctness, and understanding of HPC-specific concepts. We also compared the results with those achieved using traditional HPC optimization tools. Based on the findings, we recognized the strengths of LLMs in understanding human instructions and performing automated code transformations. However, we also identified significant limitations, including their tendency to generate incorrect code and their challenges in comprehending complex control and data flows in sophisticated HPC code.</p> </div> <div class="ltx_keywords">Large languages models, High-performance computing, Performance benchmarking, Code optimization </div> <div class="ltx_acknowledgements"> <sup class="ltx_sup" id="id23.id1">∗</sup>Equal contribution. </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">1. </span>Introduction</h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">The rise of Large Language Models (LLMs) <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib49" title="">chatgpt-4, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib9" title="">claude, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib14" title="">gpt-3, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib2" title="">Llama3.2, </a>)</cite> has introduced advanced capabilities to a wide range of applications, such as translation tools <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib79" title="">Zhu2023MultilingualMT, </a>)</cite>, search engines <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib61" title="">Spatharioti2023ComparingTA, </a>)</cite>, and recommendation systems <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib4" title="">llm_recommendation, </a>)</cite>. Notably, the coding capabilities of LLMs <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib18" title="">Evauating_LLMs_Trained_on_Code, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib76" title="">codegeex, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib8" title="">codewhisperer, </a>)</cite> have garnered considerable attention from both academia and industry, particularly in tasks such as code completion <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib55" title="">codellama, </a>)</cite>, summarization <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib6" title="">Ahmed2022FewshotTL, </a>)</cite>, and repair <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib69" title="">Xia2022LessTM, </a>)</cite>. LLMs are trained on vast datasets <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib37" title="">stack, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib30" title="">pile, </a>)</cite> that cover a diverse range of programming languages and application domains, enabling them to generate code that effectively aligns with the intention conveyed through natural language inputs. Meanwhile, several studies have proposed benchmarks <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib11" title="">Austin2021ProgramSW, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib33" title="">Hendrycks2021MeasuringCC, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib18" title="">Evauating_LLMs_Trained_on_Code, </a>)</cite> for evaluating the capabilities of LLMs in coding.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">As LLMs gain widespread popularity, the High-Performance Computing (HPC) community has been assessing and exploring their potential to address various HPC challenges <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib25" title="">HPC, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib44" title="">Nichols2024CanLL, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib16" title="">Chen2024OMPGPTAG, </a>)</cite>. Existing benchmarks predominantly examine whether LLMs can solve problems under human guidance with example solutions (i.e., multiple shots in-context learning <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib48" title="">Olsson2022IncontextLA, </a>)</cite>) and tend to evaluate general rather than domain-specific questions. There is a dearth of studies <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib47" title="">niu2024evaluatingefficiencysourcecode, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib53" title="">Qiu2024HowEI, </a>)</cite> investigating the efficiency (i.e., performance) of LLM-generated code, particularly in the context of complex real-world problems. Unlike simple benchmarks, HPC code is inherently more complex, requiring not only a deep understanding of hardware, algorithms, and programming languages but also the ability to navigate large code bases and leverage cross-domain knowledge. Optimizations such as vectorization, parallelization, loop transformation, and data prefetching demand intricate program analysis and an understanding of multi-level parallelism models to effectively address performance bottlenecks. Frequently, there exists no straightforward reference solution available for code optimization.</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">This complexity poses significant challenges for the application of LLMs in HPC tasks. Existing models often lack the specialized knowledge and program analysis capabilities necessary to handle these intricacies, making it difficult to achieve the desired level of optimization for HPC applications. Traditional performance optimization tools <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib5" title="">hpctookit, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib57" title="">Shende2006TheTP, </a>)</cite> have played a pivotal role within the HPC community. Some of these tools <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib26" title="">MAQAO, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib28" title="">KBS-MAQAO, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib20" title="">Codee, </a>)</cite> use static analysis to propose optimizations and occasionally automate code transformations, demonstrating capabilities similar to LLMs. However, traditional tools face multiple constraints as they rely on heuristic methods, are restricted to limited instances for automatic code transformations, and support only some frontend languages. The distinctions between traditional optimization tools and LLM inspire our investigation into their impact on HPC benchmarks, with the goal of <span class="ltx_text ltx_font_bold" id="S1.p3.1.1">understanding the benefits and limitations of these tools and offering insight into advancing performance optimization by integrating LLMs with traditional performance tools</span>.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">In this research, we curated a benchmark suite comprising 26 representative HPC codes across 11 distinct domains commonly found in HPC problems. The benchmarks are categorized into three levels based on the complexity and the size of the program. We evaluated and compared the performance of code optimized by Codee <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib20" title="">Codee, </a>)</cite> with that optimized by three leading LLMs, OpenAI o1 <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib1" title="">GPT-4, </a>)</cite>, Llama-3.2 <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib2" title="">Llama3.2, </a>)</cite>, and Claude-3.5 <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib9" title="">claude, </a>)</cite>. We selected Codee as the representative traditional performance tool due to its capability to automate performance analysis and optimization, provision of in-depth insights into HPC code optimization, and wide adoption <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib42" title="">codee_nersc, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib54" title="">richardson2024optimizing, </a>)</cite> by industry and multiple research institutions. Using our benchmark suite, we evaluated the capabilities of LLMs and Codee from the following perspectives:</p> <ul class="ltx_itemize" id="S1.I1"> <li class="ltx_item" id="S1.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i1.p1"> <p class="ltx_p" id="S1.I1.i1.p1.1"><span class="ltx_text ltx_font_bold" id="S1.I1.i1.p1.1.1">Speedups</span>: Assessing the improvement in execution time achieved through optimization;</p> </div> </li> <li class="ltx_item" id="S1.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i2.p1"> <p class="ltx_p" id="S1.I1.i2.p1.1"><span class="ltx_text ltx_font_bold" id="S1.I1.i2.p1.1.1">Scalability (parallelism)</span>: Measuring the effectiveness of optimizations in scaling across multiple CPU cores and compute nodes;</p> </div> </li> <li class="ltx_item" id="S1.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i3.p1"> <p class="ltx_p" id="S1.I1.i3.p1.1"><span class="ltx_text ltx_font_bold" id="S1.I1.i3.p1.1.1">Correctness</span>: Checking if optimized code outputs the same results as the original code;</p> </div> </li> <li class="ltx_item" id="S1.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i4.p1"> <p class="ltx_p" id="S1.I1.i4.p1.1"><span class="ltx_text ltx_font_bold" id="S1.I1.i4.p1.1.1">HPC Commonsense</span>: Evaluating the models’ understanding of domain-specific best practices and principles in HPC;</p> </div> </li> <li class="ltx_item" id="S1.I1.i5" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i5.p1"> <p class="ltx_p" id="S1.I1.i5.p1.1"><span class="ltx_text ltx_font_bold" id="S1.I1.i5.p1.1.1">Applicability to Applications</span>: Investigating the models’ ability to handle complex HPC applications</p> </div> </li> </ul> <p class="ltx_p" id="S1.p4.2">In contrast to existing work <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib64" title="">ValeroLara2023ComparingLA, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib44" title="">Nichols2024CanLL, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib45" title="">Nichols2024PerformanceAlignedLF, </a>)</cite>, this study is the first to investigate the differences between LLMs and state-of-the-art traditional performance optimization tools.</p> </div> <div class="ltx_para" id="S1.p5"> <p class="ltx_p" id="S1.p5.1">Moreover, we propose a performance optimization agent <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib66" title="">wang2024survey, </a>)</cite> for optimizing real HPC applications, integrating insights from both LLMs and traditional performance tools to replicate human optimization of HPC code. This process typically begins with an existing reference implementation and proceeds with the identification of performance hotspots. Subsequently, various optimizations are applied to achieve speedup and scalability. The process is repeated iteratively until the desired speedups are achieved.</p> </div> <div class="ltx_para" id="S1.p6"> <p class="ltx_p" id="S1.p6.1">The rest of the paper is organized as follows. Section <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2" title="2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">2</span></a> presents related topics. Section <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S3" title="3. Benchmark Suite ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">3</span></a> describes our benchmark suite in detail. Section <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4" title="4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">4</span></a> introduces the evaluation workflow and discusses the evaluation results. Section <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5" title="5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">5</span></a> describes our prototype agent environment and results from applying it to four HPC applications. We summarize the key findings and rank capabilities of LLMs and Codee in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S6" title="6. Discussion ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">6</span></a>. Section <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S7" title="7. Conclusions ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">7</span></a> concludes and presents future directions.</p> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">2. </span>Related Work</h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1">This section reviews recent studies in related fields to provide background knowledge and highlight the significance of our research.</p> </div> <section class="ltx_paragraph" id="S2.SS0.SSS0.Px1"> <h5 class="ltx_title ltx_title_paragraph">LLMs for Code</h5> <div class="ltx_para" id="S2.SS0.SSS0.Px1.p1"> <p class="ltx_p" id="S2.SS0.SSS0.Px1.p1.1">LLMs can generate code based on human prompts and code snippets. Typically, code-specific LLMs are trained on general, large-scale datasets <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib35" title="">jiang2024surveylargelanguagemodels, </a>)</cite> and further fine-tuned using smaller, code-related datasets. Those datasets typically consist of pairs of prompts that represent the purposes of the LLMs, example inputs, and outputs in diff-based formats. Reinforcement Learning with Human Feedback (RLHF) <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib50" title="">ouyang2022traininglanguagemodelsfollow, </a>)</cite> enhances LLMs’ capability to produce syntactically and functionally correct code, with models such as CodeRL <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib39" title="">NEURIPS2022_8636419d, </a>)</cite> and PPOCoder <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib60" title="">shojaee2023executionbasedcodegenerationusing, </a>)</cite> demonstrating significant advancements. Furthermore, prompt engineering techniques, including Chain-of-Thought (CoT) <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib68" title="">wei2023chainofthoughtpromptingelicitsreasoning, </a>)</cite> and Self-Debugging <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib19" title="">chen2023teachinglargelanguagemodels, </a>)</cite>, facilitate iterative refinements for in-depth code analysis. Retrieval-augmented generation (RAG) addresses the challenges of repository-level code generation by incorporating cross-file context <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib74" title="">zhang2023repocoderrepositorylevelcodecompletion, </a>)</cite>. Our study evaluates three state-of-the-art LLMs that achieve exceptional results in existing code-related tasks <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib10" title="">lmarena, </a>)</cite>.</p> </div> </section> <section class="ltx_paragraph" id="S2.SS0.SSS0.Px2"> <h5 class="ltx_title ltx_title_paragraph">LLM Benchmarks</h5> <div class="ltx_para" id="S2.SS0.SSS0.Px2.p1"> <p class="ltx_p" id="S2.SS0.SSS0.Px2.p1.1">A range of benchmarks <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib17" title="">chen2021evaluating, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib35" title="">jiang2024surveylargelanguagemodels, </a>)</cite> has been developed to evaluate the code generation capabilities of LLMs. Most studies primarily focus on assessing the correctness of the generated code. Similarity-based metrics such as BLEU <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib51" title="">Papineni2002BleuAM, </a>)</cite>, ROUGE <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib41" title="">lin-2004-rouge, </a>)</cite>, and METEOR <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib12" title="">banerjee-lavie-2005-meteor, </a>)</cite> often fall short in capturing the syntactic and functional correctness of code. Consequently, execution-based metrics, such as pass@k <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib18" title="">Evauating_LLMs_Trained_on_Code, </a>)</cite>, have gained traction for their ability to directly evaluate functional validity by running unit tests on generated code. In addition, there are also studies that capture qualitative aspects like code style and maintainability <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib35" title="">jiang2024surveylargelanguagemodels, </a>)</cite>. Existing benchmarks <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib43" title="">nichols2024can, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib46" title="">nichols2024performance, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib47" title="">niu2024evaluatingefficiencysourcecode, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib29" title="">fang2024towards, </a>)</cite> for evaluating the efficiency of LLMs often focus on narrow aspects, such as their ability to parallelize code, without thoroughly examining their understanding of broader efficiency-related challenges. Furthermore, prior studies mostly analyze small computational kernels, neglecting larger applications. In contrast, our study provides a more comprehensive evaluation by assessing the efficiency, correctness, and HPC-specific expertise of LLMs for both small kernels and large applications.</p> </div> </section> <section class="ltx_paragraph" id="S2.SS0.SSS0.Px3"> <h5 class="ltx_title ltx_title_paragraph">LLM Agents</h5> <div class="ltx_para" id="S2.SS0.SSS0.Px3.p1"> <p class="ltx_p" id="S2.SS0.SSS0.Px3.p1.1">An LLM agent <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib66" title="">wang2024survey, </a>)</cite> is a system designed to utilize LLMs for solving problems by generating a <span class="ltx_text ltx_font_italic" id="S2.SS0.SSS0.Px3.p1.1.1">plan</span>, executing the plan through external <span class="ltx_text ltx_font_italic" id="S2.SS0.SSS0.Px3.p1.1.2">tools</span>, gathering feedback from the <span class="ltx_text ltx_font_italic" id="S2.SS0.SSS0.Px3.p1.1.3">environment</span>, and storing information in <span class="ltx_text ltx_font_italic" id="S2.SS0.SSS0.Px3.p1.1.4">memory</span> to refine subsequent steps. It extends the core functionality of LLMs by enabling dynamic interaction with external environments. Several frameworks <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib31" title="">significantgravitas2023autogpt, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib67" title="">wang2024openhands, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib24" title="">langchain, </a>)</cite> have been developed to facilitate the creation of such agents <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib7" title="">cognition2024devin, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib34" title="">cursor, </a>)</cite>. To enhance solution accuracy, these agents <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib75" title="">zhang2024codeagent, </a>)</cite> often employ advanced reasoning strategies, such as Chain-of-Thought <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib68" title="">wei2023chainofthoughtpromptingelicitsreasoning, </a>)</cite>, Tree-of-Thought <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib71" title="">yao2024tree, </a>)</cite>, and ReAct <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib72" title="">yao2022react, </a>)</cite>, to determine the most effective invocation strategies. Additionally, recent research <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib36" title="">kim2023llm, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib59" title="">shinn2024reflexion, </a>)</cite> has explored optimizing the efficiency and accuracy of the agent workflow. Unlike previous code agents focused on software development, we developed a prototype agent system aimed at evaluating the effectiveness of using LLMs for performance optimization of large applications. </p> </div> </section> <section class="ltx_paragraph" id="S2.SS0.SSS0.Px4"> <h5 class="ltx_title ltx_title_paragraph">Traditional Performance Analysis Tools</h5> <div class="ltx_para" id="S2.SS0.SSS0.Px4.p1"> <p class="ltx_p" id="S2.SS0.SSS0.Px4.p1.1">In the field of HPC, performance analysis tools play a critical role in identifying and resolving bottlenecks to optimize application efficiency across single and multiple compute nodes. These tools are generally classified into two main categories: dynamic and static tools. <span class="ltx_text ltx_font_italic" id="S2.SS0.SSS0.Px4.p1.1.1">Dynamic tools</span> <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib5" title="">hpctookit, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib58" title="">shende2006tau, </a>)</cite> operate at runtime, collecting performance metrics such as execution time, memory usage, and processor utilization and attributing metrics to source files to identify hotspots. <span class="ltx_text ltx_font_italic" id="S2.SS0.SSS0.Px4.p1.1.2">Static tools</span> <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib27" title="">djoudi2005maqao, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib20" title="">Codee, </a>)</cite> analyze source code and compilation databases to identify inefficiencies without executing the program. Recent advancements <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib77" title="">zhou2021automated, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib78" title="">zhou2021gpa, </a>)</cite> have introduced hybrid tools that combine static and dynamic analysis techniques to offer deeper insights. Traditional performance analysis tools typically rely on heuristics and predefined algorithms to detect performance bottlenecks and inefficiencies. While this approach ensures precision—static tools apply rigorous algorithms, and dynamic tools capture empirical data from real execution—such methods may lack the flexibility needed to address the complex and evolving challenges in modern HPC environments. In contrast, LLMs offer enhanced adaptability by modifying code and incorporating additional information through retrieval-augmented generation (RAG). However, it is crucial to carefully validate the outputs of LLMs to ensure correctness and effectiveness in HPC workflows.</p> </div> <figure class="ltx_table" id="S2.T1"> <figcaption class="ltx_caption ltx_centering" style="font-size:70%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S2.T1.4.1.1" style="font-size:129%;">Table 1</span>. </span><span class="ltx_text" id="S2.T1.5.2" style="font-size:129%;">Classification of Benchmarks by Computational Motifs</span></figcaption> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S2.T1.6"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S2.T1.6.1.1"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.1.1.1"><span class="ltx_text ltx_font_bold" id="S2.T1.6.1.1.1.1" style="font-size:70%;">Computational Motif</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.1.1.2"><span class="ltx_text ltx_font_bold" id="S2.T1.6.1.1.2.1" style="font-size:70%;">Benchmark</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.1.1.3"><span class="ltx_text ltx_font_bold" id="S2.T1.6.1.1.3.1" style="font-size:70%;">Language</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.1.1.4"><span class="ltx_text ltx_font_bold" id="S2.T1.6.1.1.4.1" style="font-size:70%;">Level</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.1.1.5"><span class="ltx_text ltx_font_bold" id="S2.T1.6.1.1.5.1" style="font-size:70%;">Tokens</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.1.1.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.1.1.6.1"> <span class="ltx_p" id="S2.T1.6.1.1.6.1.1" style="width:199.2pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.6.1.1.6.1.1.1" style="font-size:70%;">Description</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.2.2"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.2.2.1" rowspan="5"><span class="ltx_text" id="S2.T1.6.2.2.1.1" style="font-size:70%;">Dense Linear Algebra</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.2.2.2"> <span class="ltx_text" id="S2.T1.6.2.2.2.1" style="font-size:70%;">Durbin </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.2.2.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.2.2.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.2.2.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.2.2.3"><span class="ltx_text" id="S2.T1.6.2.2.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.2.2.4"><span class="ltx_text" id="S2.T1.6.2.2.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.2.2.5"><span class="ltx_text" id="S2.T1.6.2.2.5.1" style="font-size:70%;">538</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.2.2.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.2.2.6.1"> <span class="ltx_p" id="S2.T1.6.2.2.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.2.2.6.1.1.1" style="font-size:70%;">Toeplitz system solver, used in signal processing and numerical analysis.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.3.3"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.3.3.1"> <span class="ltx_text" id="S2.T1.6.3.3.1.1" style="font-size:70%;">Doitgen </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.3.3.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.3.3.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.3.3.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.3.3.2"><span class="ltx_text" id="S2.T1.6.3.3.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.3.3.3"><span class="ltx_text" id="S2.T1.6.3.3.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.3.3.4"><span class="ltx_text" id="S2.T1.6.3.3.4.1" style="font-size:70%;">667</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.3.3.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.3.3.5.1"> <span class="ltx_p" id="S2.T1.6.3.3.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.3.3.5.1.1.1" style="font-size:70%;">Multi-resolution analysis kernel for wavelet transforms.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.4.4"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.4.4.1"> <span class="ltx_text" id="S2.T1.6.4.4.1.1" style="font-size:70%;">Cholesky </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.4.4.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.4.4.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.4.4.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.4.4.2"><span class="ltx_text" id="S2.T1.6.4.4.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.4.4.3"><span class="ltx_text" id="S2.T1.6.4.4.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.4.4.4"><span class="ltx_text" id="S2.T1.6.4.4.4.1" style="font-size:70%;">648</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.4.4.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.4.4.5.1"> <span class="ltx_p" id="S2.T1.6.4.4.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.4.4.5.1.1.1" style="font-size:70%;">Cholesky decomposition, widely used in dense matrix factorizations.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.5.5"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.5.5.1"> <span class="ltx_text" id="S2.T1.6.5.5.1.1" style="font-size:70%;">2mm </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.5.5.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.5.5.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.5.5.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.5.5.2"><span class="ltx_text" id="S2.T1.6.5.5.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.5.5.3"><span class="ltx_text" id="S2.T1.6.5.5.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.5.5.4"><span class="ltx_text" id="S2.T1.6.5.5.4.1" style="font-size:70%;">795</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.5.5.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.5.5.5.1"> <span class="ltx_p" id="S2.T1.6.5.5.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.5.5.5.1.1.1" style="font-size:70%;">Double matrix-matrix multiplications.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.6.6"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.6.6.1"> <span class="ltx_text" id="S2.T1.6.6.6.1.1" style="font-size:70%;">Correlation </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.6.6.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.6.6.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.6.6.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.6.6.2"><span class="ltx_text" id="S2.T1.6.6.6.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.6.6.3"><span class="ltx_text" id="S2.T1.6.6.6.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.6.6.4"><span class="ltx_text" id="S2.T1.6.6.6.4.1" style="font-size:70%;">729</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.6.6.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.6.6.5.1"> <span class="ltx_p" id="S2.T1.6.6.6.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.6.6.5.1.1.1" style="font-size:70%;">Mean and standard deviation computation for matrix columns.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.7.7"> <td class="ltx_td ltx_border_l ltx_border_r" id="S2.T1.6.7.7.1"></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.7.7.2"> <span class="ltx_text" id="S2.T1.6.7.7.2.1" style="font-size:70%;">MATMUL </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.7.7.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib22" title="">codee_performance_demos<span class="ltx_text" id="S2.T1.6.7.7.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.7.7.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.7.7.3"><span class="ltx_text" id="S2.T1.6.7.7.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.7.7.4"><span class="ltx_text" id="S2.T1.6.7.7.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.7.7.5"><span class="ltx_text" id="S2.T1.6.7.7.5.1" style="font-size:70%;">764</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.7.7.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.7.7.6.1"> <span class="ltx_p" id="S2.T1.6.7.7.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.7.7.6.1.1.1" style="font-size:70%;">Basic matrix multiplication.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.8.8"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.8.8.1" rowspan="4"><span class="ltx_text" id="S2.T1.6.8.8.1.1" style="font-size:70%;">Sparse Linear Algebra</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.8.8.2"> <span class="ltx_text" id="S2.T1.6.8.8.2.1" style="font-size:70%;">Trisolv </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.8.8.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.8.8.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.8.8.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.8.8.3"><span class="ltx_text" id="S2.T1.6.8.8.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.8.8.4"><span class="ltx_text" id="S2.T1.6.8.8.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.8.8.5"><span class="ltx_text" id="S2.T1.6.8.8.5.1" style="font-size:70%;">623</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.8.8.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.8.8.6.1"> <span class="ltx_p" id="S2.T1.6.8.8.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.8.8.6.1.1.1" style="font-size:70%;">Triangular solver, solving sparse triangular systems.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.9.9"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.9.9.1"> <span class="ltx_text" id="S2.T1.6.9.9.1.1" style="font-size:70%;">Bicg </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.9.9.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.9.9.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.9.9.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.9.9.2"><span class="ltx_text" id="S2.T1.6.9.9.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.9.9.3"><span class="ltx_text" id="S2.T1.6.9.9.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.9.9.4"><span class="ltx_text" id="S2.T1.6.9.9.4.1" style="font-size:70%;">649</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.9.9.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.9.9.5.1"> <span class="ltx_p" id="S2.T1.6.9.9.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.9.9.5.1.1.1" style="font-size:70%;">BiCGStab linear solver, an iterative method for sparse systems.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.10.10"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.10.10.1"> <span class="ltx_text" id="S2.T1.6.10.10.1.1" style="font-size:70%;">ATMUX </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.10.10.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib22" title="">codee_performance_demos<span class="ltx_text" id="S2.T1.6.10.10.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.10.10.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.10.10.2"><span class="ltx_text" id="S2.T1.6.10.10.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.10.10.3"><span class="ltx_text" id="S2.T1.6.10.10.3.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.10.10.4"><span class="ltx_text" id="S2.T1.6.10.10.4.1" style="font-size:70%;">2185</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.10.10.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.10.10.5.1"> <span class="ltx_p" id="S2.T1.6.10.10.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.10.10.5.1.1.1" style="font-size:70%;">Sparse matrix-vector multiplication, often used in physics simulations.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.11.11"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.11.11.1"> <span class="ltx_text" id="S2.T1.6.11.11.1.1" style="font-size:70%;">NPB_CG </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.11.11.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib22" title="">codee_performance_demos<span class="ltx_text" id="S2.T1.6.11.11.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.11.11.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.11.11.2"><span class="ltx_text" id="S2.T1.6.11.11.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.11.11.3"><span class="ltx_text" id="S2.T1.6.11.11.3.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.11.11.4"><span class="ltx_text" id="S2.T1.6.11.11.4.1" style="font-size:70%;">20060</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.11.11.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.11.11.5.1"> <span class="ltx_p" id="S2.T1.6.11.11.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.11.11.5.1.1.1" style="font-size:70%;">Conjugate gradient solver, an iterative sparse linear algebra algorithm from NAS benchmarks.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.12.12"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.12.12.1"><span class="ltx_text" id="S2.T1.6.12.12.1.1" style="font-size:70%;">Spectral Methods</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.12.12.2"> <span class="ltx_text" id="S2.T1.6.12.12.2.1" style="font-size:70%;">Deriche </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.12.12.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.12.12.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.12.12.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.12.12.3"><span class="ltx_text" id="S2.T1.6.12.12.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.12.12.4"><span class="ltx_text" id="S2.T1.6.12.12.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.12.12.5"><span class="ltx_text" id="S2.T1.6.12.12.5.1" style="font-size:70%;">864</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.12.12.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.12.12.6.1"> <span class="ltx_p" id="S2.T1.6.12.12.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.12.12.6.1.1.1" style="font-size:70%;">Edge detection commonly used in image processing.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.13.13"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.13.13.1" rowspan="2"><span class="ltx_text" id="S2.T1.6.13.13.1.1" style="font-size:70%;">Monte Carlo</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.13.13.2"> <span class="ltx_text" id="S2.T1.6.13.13.2.1" style="font-size:70%;">PI </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.13.13.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib22" title="">codee_performance_demos<span class="ltx_text" id="S2.T1.6.13.13.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.13.13.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.13.13.3"><span class="ltx_text" id="S2.T1.6.13.13.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.13.13.4"><span class="ltx_text" id="S2.T1.6.13.13.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.13.13.5"><span class="ltx_text" id="S2.T1.6.13.13.5.1" style="font-size:70%;">262</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.13.13.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.13.13.6.1"> <span class="ltx_p" id="S2.T1.6.13.13.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.13.13.6.1.1.1" style="font-size:70%;">Monte Carlo integration, involves random sampling for numerical simulations.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.14.14"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.14.14.1"> <span class="ltx_text" id="S2.T1.6.14.14.1.1" style="font-size:70%;">XSBench </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.14.14.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib3" title="">hpc2021<span class="ltx_text" id="S2.T1.6.14.14.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.14.14.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.14.14.2"><span class="ltx_text" id="S2.T1.6.14.14.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.14.14.3"><span class="ltx_text" id="S2.T1.6.14.14.3.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.14.14.4"><span class="ltx_text" id="S2.T1.6.14.14.4.1" style="font-size:70%;">9094</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.14.14.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.14.14.5.1"> <span class="ltx_p" id="S2.T1.6.14.14.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.14.14.5.1.1.1" style="font-size:70%;">Monte Carlo Macroscopic Cross Section Lookup Benchmark.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.15.15"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.15.15.1"><span class="ltx_text" id="S2.T1.6.15.15.1.1" style="font-size:70%;">Dynamic Programming</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.15.15.2"> <span class="ltx_text" id="S2.T1.6.15.15.2.1" style="font-size:70%;">Nussinov </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.15.15.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.15.15.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.15.15.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.15.15.3"><span class="ltx_text" id="S2.T1.6.15.15.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.15.15.4"><span class="ltx_text" id="S2.T1.6.15.15.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.15.15.5"><span class="ltx_text" id="S2.T1.6.15.15.5.1" style="font-size:70%;">2096</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.15.15.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.15.15.6.1"> <span class="ltx_p" id="S2.T1.6.15.15.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.15.15.6.1.1.1" style="font-size:70%;">Dynamic programming for RNA secondary structure prediction.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.16.16"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.16.16.1" rowspan="4"><span class="ltx_text" id="S2.T1.6.16.16.1.1" style="font-size:70%;">Structured Grids</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.16.16.2"> <span class="ltx_text" id="S2.T1.6.16.16.2.1" style="font-size:70%;">Adi </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.16.16.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.16.16.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.16.16.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.16.16.3"><span class="ltx_text" id="S2.T1.6.16.16.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.16.16.4"><span class="ltx_text" id="S2.T1.6.16.16.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.16.16.5"><span class="ltx_text" id="S2.T1.6.16.16.5.1" style="font-size:70%;">760</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.16.16.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.16.16.6.1"> <span class="ltx_p" id="S2.T1.6.16.16.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.16.16.6.1.1.1" style="font-size:70%;">Alternating Direction Implicit method, common in structured grid-based PDEs.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.17.17"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.17.17.1"> <span class="ltx_text" id="S2.T1.6.17.17.1.1" style="font-size:70%;">Srad </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.17.17.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib73" title="">srad<span class="ltx_text" id="S2.T1.6.17.17.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.17.17.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.17.17.2"><span class="ltx_text" id="S2.T1.6.17.17.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.17.17.3"><span class="ltx_text" id="S2.T1.6.17.17.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.17.17.4"><span class="ltx_text" id="S2.T1.6.17.17.4.1" style="font-size:70%;">669</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.17.17.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.17.17.5.1"> <span class="ltx_p" id="S2.T1.6.17.17.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.17.17.5.1.1.1" style="font-size:70%;">Speckle reducing anisotropic diffusion for image processing, utilizing structured grids.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.18.18"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.18.18.1"> <span class="ltx_text" id="S2.T1.6.18.18.1.1" style="font-size:70%;">Hotspot </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.18.18.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib15" title="">che2009rodinia<span class="ltx_text" id="S2.T1.6.18.18.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.18.18.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.18.18.2"><span class="ltx_text" id="S2.T1.6.18.18.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.18.18.3"><span class="ltx_text" id="S2.T1.6.18.18.3.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.18.18.4"><span class="ltx_text" id="S2.T1.6.18.18.4.1" style="font-size:70%;">1384</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.18.18.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.18.18.5.1"> <span class="ltx_p" id="S2.T1.6.18.18.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.18.18.5.1.1.1" style="font-size:70%;">2D structured grid for simulating heat distribution on a chip.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.19.19"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.19.19.1"> <span class="ltx_text" id="S2.T1.6.19.19.1.1" style="font-size:70%;">Hotspot3D </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.19.19.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib15" title="">che2009rodinia<span class="ltx_text" id="S2.T1.6.19.19.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.19.19.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.19.19.2"><span class="ltx_text" id="S2.T1.6.19.19.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.19.19.3"><span class="ltx_text" id="S2.T1.6.19.19.3.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.19.19.4"><span class="ltx_text" id="S2.T1.6.19.19.4.1" style="font-size:70%;">1117</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.19.19.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.19.19.5.1"> <span class="ltx_p" id="S2.T1.6.19.19.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.19.19.5.1.1.1" style="font-size:70%;">3D heat distribution simulation over a structured grid.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.20.20"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.20.20.1" rowspan="3"><span class="ltx_text" id="S2.T1.6.20.20.1.1" style="font-size:70%;">N-body Methods</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.20.20.2"> <span class="ltx_text" id="S2.T1.6.20.20.2.1" style="font-size:70%;">COULOMB </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.20.20.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib22" title="">codee_performance_demos<span class="ltx_text" id="S2.T1.6.20.20.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.20.20.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.20.20.3"><span class="ltx_text" id="S2.T1.6.20.20.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.20.20.4"><span class="ltx_text" id="S2.T1.6.20.20.4.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.20.20.5"><span class="ltx_text" id="S2.T1.6.20.20.5.1" style="font-size:70%;">1709</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.20.20.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.20.20.6.1"> <span class="ltx_p" id="S2.T1.6.20.20.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.20.20.6.1.1.1" style="font-size:70%;">Coulomb force simulation, typical of N-body problems in molecular dynamics.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.21.21"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.21.21.1"> <span class="ltx_text" id="S2.T1.6.21.21.1.1" style="font-size:70%;">Particlefilter </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.21.21.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib15" title="">che2009rodinia<span class="ltx_text" id="S2.T1.6.21.21.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.21.21.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.21.21.2"><span class="ltx_text" id="S2.T1.6.21.21.2.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.21.21.3"><span class="ltx_text" id="S2.T1.6.21.21.3.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.21.21.4"><span class="ltx_text" id="S2.T1.6.21.21.4.1" style="font-size:70%;">2984</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.21.21.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.21.21.5.1"> <span class="ltx_p" id="S2.T1.6.21.21.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.21.21.5.1.1.1" style="font-size:70%;">Particle filter, commonly used in N-body filtering methods in robotics and control.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.22.22"> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.22.22.1"> <span class="ltx_text" id="S2.T1.6.22.22.1.1" style="font-size:70%;">HACCmk </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.22.22.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib22" title="">codee_performance_demos<span class="ltx_text" id="S2.T1.6.22.22.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.22.22.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.22.22.2"><span class="ltx_text" id="S2.T1.6.22.22.2.1" style="font-size:70%;">C++</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.22.22.3"><span class="ltx_text" id="S2.T1.6.22.22.3.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.22.22.4"><span class="ltx_text" id="S2.T1.6.22.22.4.1" style="font-size:70%;">3266</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.22.22.5"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.22.22.5.1"> <span class="ltx_p" id="S2.T1.6.22.22.5.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.22.22.5.1.1.1" style="font-size:70%;">Cosmological simulation using particle-mesh methods for N-body computations.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.23.23"> <td class="ltx_td ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.23.23.1"></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.23.23.2"> <span class="ltx_text" id="S2.T1.6.23.23.2.1" style="font-size:70%;">Jacobi-1d </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.23.23.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib52" title="">PolyBenchC-4.2.1<span class="ltx_text" id="S2.T1.6.23.23.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.23.23.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.23.23.3"><span class="ltx_text" id="S2.T1.6.23.23.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.23.23.4"><span class="ltx_text" id="S2.T1.6.23.23.4.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.6.23.23.5"><span class="ltx_text" id="S2.T1.6.23.23.5.1" style="font-size:70%;">564</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r ltx_border_t" id="S2.T1.6.23.23.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.23.23.6.1"> <span class="ltx_p" id="S2.T1.6.23.23.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.23.23.6.1.1.1" style="font-size:70%;">1-D Jacobi stencil computation.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.24.24"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r" id="S2.T1.6.24.24.1"><span class="ltx_text" id="S2.T1.6.24.24.1.1" style="font-size:70%;">Stencils</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.24.24.2"> <span class="ltx_text" id="S2.T1.6.24.24.2.1" style="font-size:70%;">LBM D2Q37 </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.24.24.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib3" title="">hpc2021<span class="ltx_text" id="S2.T1.6.24.24.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.24.24.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.24.24.3"><span class="ltx_text" id="S2.T1.6.24.24.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.24.24.4"><span class="ltx_text" id="S2.T1.6.24.24.4.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S2.T1.6.24.24.5"><span class="ltx_text" id="S2.T1.6.24.24.5.1" style="font-size:70%;">27364</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_r" id="S2.T1.6.24.24.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.24.24.6.1"> <span class="ltx_p" id="S2.T1.6.24.24.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.24.24.6.1.1.1" style="font-size:70%;">Lattice Boltzmann Method (LBM) for fluid dynamics, uses stencil operations for grid updates.</span></span> </span> </td> </tr> <tr class="ltx_tr" id="S2.T1.6.25.25"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t" id="S2.T1.6.25.25.1"><span class="ltx_text" id="S2.T1.6.25.25.1.1" style="font-size:70%;">Radiation Transport</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S2.T1.6.25.25.2"> <span class="ltx_text" id="S2.T1.6.25.25.2.1" style="font-size:70%;">Minisweep </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S2.T1.6.25.25.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib3" title="">hpc2021<span class="ltx_text" id="S2.T1.6.25.25.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S2.T1.6.25.25.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S2.T1.6.25.25.3"><span class="ltx_text" id="S2.T1.6.25.25.3.1" style="font-size:70%;">C</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S2.T1.6.25.25.4"><span class="ltx_text" id="S2.T1.6.25.25.4.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S2.T1.6.25.25.5"><span class="ltx_text" id="S2.T1.6.25.25.5.1" style="font-size:70%;">10478</span></td> <td class="ltx_td ltx_align_justify ltx_align_top ltx_border_b ltx_border_r ltx_border_t" id="S2.T1.6.25.25.6"> <span class="ltx_inline-block ltx_align_top" id="S2.T1.6.25.25.6.1"> <span class="ltx_p" id="S2.T1.6.25.25.6.1.1" style="width:199.2pt;"><span class="ltx_text" id="S2.T1.6.25.25.6.1.1.1" style="font-size:70%;">Solves transport equations for radiation, typically used in nuclear engineering.</span></span> </span> </td> </tr> </tbody> </table> </figure> </section> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">3. </span>Benchmark Suite</h2> <div class="ltx_para" id="S3.p1"> <p class="ltx_p" id="S3.p1.1">In this section, we outline the design rationale behind our benchmark suite, summarized in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.T1" title="Table 1 ‣ Traditional Performance Analysis Tools ‣ 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">1</span></a>. The suite encompasses a diverse set of motifs, including Dense and Sparse Linear Algebra, Spectral Methods, Monte Carlo, Dynamic Programming, Structured Grids, N-body Methods, Stencils, and Radiation Transport, which represent the core computational patterns frequently encountered in HPC applications.</p> </div> <div class="ltx_para" id="S3.p2"> <p class="ltx_p" id="S3.p2.1">Our benchmarks range in scale from small, representative compute kernels, such as basic matrix multiplication (<span class="ltx_text ltx_font_italic" id="S3.p2.1.1">MATMUL</span>) and chain matrix multiplication (<span class="ltx_text ltx_font_italic" id="S3.p2.1.2">2mm</span>), to large-scale applications like <span class="ltx_text ltx_font_italic" id="S3.p2.1.3">NPB_CG</span> and <span class="ltx_text ltx_font_italic" id="S3.p2.1.4">LBM D2Q37</span>. These benchmarks are categorized into three levels of increasing code size and complexity: level 1 represents the simplest benchmarks, while level 3 encompasses the most complex. This hierarchical structure enables a comprehensive evaluation of how effectively LLMs and traditional performance tools can tackle foundational computational tasks as well as domain-specific challenges requiring advanced algorithmic and hardware knowledge. For smaller benchmarks, both LLMs and traditional code analysis tools are capable of analyzing complete code files. However, for larger applications, LLMs may face output token length limitations, and traditional tools may generate many false-positive suggestions. To address these challenges, we employ profilers, such as HPCToolkit <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib5" title="">hpctookit, </a>)</cite>, to identify performance-critical hotspots, focusing optimization efforts on the most computationally intensive code sections. The benchmarks span multiple programming languages, primarily C and C++, emphasizing the most common coding environments in the HPC community. To prepare the benchmarks for evaluation, we eliminated any existing OpenMP parallel pragmas. Additionally, we fully expanded C/C++ macros to avoid potential misinterpretations by LLMs and limitations of traditional tools.</p> </div> <div class="ltx_para" id="S3.p3"> <p class="ltx_p" id="S3.p3.1">In summary, this multi-level, multi-domain design enables systematic evaluation of the capabilities of LLMs and traditional tools in addressing detailed optimization challenges in small benchmarks and usability issues in large-scale applications.</p> </div> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">4. </span>Evaluation</h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">We describe our evaluation framework as well as a series of experiments conducted using the framework for level 1 and level 2 benchmarks in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F1" title="Figure 1 ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">1</span></a>. It begins by instructing four different tools—Codee, OpenAI o1, Llama-3.2, and Claude-3.5—to optimize existing code written in C/C++. Note that all these tools accept static code without executing it before providing performance optimization suggestions.</p> </div> <figure class="ltx_figure" id="S4.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="361" id="S4.F1.g1" src="x1.png" width="623"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F1.2.1.1" style="font-size:90%;">Figure 1</span>. </span><span class="ltx_text" id="S4.F1.3.2" style="font-size:90%;">An Overview of our evaluation framework.</span></figcaption> </figure> <figure class="ltx_figure" id="S4.F2"> <figure class="ltx_float ltx_lstlisting ltx_align_center" id="S4.F2.tab1"> <div class="ltx_listing ltx_lst_language_Python ltx_lstlisting ltx_framed ltx_framed_rectangle ltx_listing" id="S4.F2.tab1.1" style="background-color:#F2F2EB;"> <div class="ltx_listing_data"><a download="codee_commands.tex" href="data:text/plain;base64,Y29kZWUgY2hlY2tzIC0tdmVyYm9zZSAgLS10YXJnZXQtYXJjaCBjcHUgLS0gZ2NjIC1jIG1haW4uYyAtSSBpbmNsdWRlLyAtTzMKRGF0ZTogMjAyNS0wMS0wOSBDb2RlZSB2ZXJzaW9uOiAyMDI0LjQuMiBMaWNlbnNlIHR5cGU6IEZ1bGwKQ29tcGlsZXIgaW52b2NhdGlvbjogZ2NjIC1jIG1haW4uYyAtSSBpbmNsdWRlLyAtTzMKWzEvMV0gbWFpbi5jIC4uLiBEb25lCkNIRUNLUyBSRVBPUlQKbWFpbi5jOjE2OjkgW1BXUjAzOV0gKGxldmVsOiBMMSk6IENvbnNpZGVyIGxvb3AgaW50ZXJjaGFuZ2UgdG8gaW1wcm92ZSB0aGUgbG9jYWxpdHkgb2YgcmVmZXJlbmNlIGFuZCBlbmFibGUgdmVjdG9yaXphdGlvbgogIExvb3BzIHRvIGludGVyY2hhbmdlOgogICAgMTY6ICAgICAgICAgZm9yIChzaXplX3QgaiA9IDA7IGogPCBuOyBqKyspIHsKICAgIDE3OiAgICAgICAgICAgICBmb3IgKHNpemVfdCBrID0gMDsgayA8IHA7IGsrKykgewogIFN1Z2dlc3Rpb246IEludGVyY2hhbmdlIGlubmVyIGFuZCBvdXRlciBsb29wcyBpbiB0aGUgbG9vcCBuZXN0IHRvIGltcHJvdmUgcGVyZm9ybWFuY2UKICBEb2N1bWVudGF0aW9uOiBodHRwczovL2dpdGh1Yi5jb20vY29kZWUtY29tL29wZW4tY2F0YWxvZy90cmVlL21haW4vQ2hlY2tzL1BXUjAzOQogIEF1dG9GaXg6IGNvZGVlIHJld3JpdGUgLS1tZW1vcnkgbG9vcC1pbnRlcmNoYW5nZSAtLWluLXBsYWNlIG1haW4uYzoxNjo5IC0tIGdjYyAtYyBtYWluLmMgLUkgaW5jbHVkZS8gLU8zCg==">⬇</a></div> <div class="ltx_listingline" id="lstnumberx1"> <span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.1" style="font-size:50%;">codee</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.2" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.3" style="font-size:50%;">checks</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.4" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.5" style="font-size:50%;">--</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.6" style="font-size:50%;">verbose</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.7" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.8" style="font-size:50%;">--</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.9" style="font-size:50%;">target</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.10" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.11" style="font-size:50%;">arch</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.12" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.13" style="font-size:50%;">cpu</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.14" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.15" style="font-size:50%;">--</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.16" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.17" style="font-size:50%;">gcc</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.18" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.19" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.20" style="font-size:50%;">c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.21" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.22" style="font-size:50%;">main</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.23" style="font-size:50%;">.</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.24" style="font-size:50%;">c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.25" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.26" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.27" style="font-size:50%;">I</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.28" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.29" style="font-size:50%;">include</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.30" style="font-size:50%;">/</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx1.31" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx1.32" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx1.33" style="font-size:50%;">O3</span> </div> <div class="ltx_listingline" id="lstnumberx2"> <span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx2.1" style="font-size:50%;">Date</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx2.2" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.3" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx2.4" style="font-size:50%;">2025-01-09</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.5" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx2.6" style="font-size:50%;">Codee</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.7" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx2.8" style="font-size:50%;">version</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx2.9" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.10" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx2.11" style="font-size:50%;">2024.4.2</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.12" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx2.13" style="font-size:50%;">License</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.14" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_keyword ltx_lst_keywords2 ltx_font_typewriter" id="lstnumberx2.15" style="font-size:50%;color:#FF00FF;">type</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx2.16" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx2.17" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx2.18" style="font-size:50%;">Full</span> </div> <div class="ltx_listingline" id="lstnumberx3"> <span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.1" style="font-size:50%;">Compiler</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.2" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.3" style="font-size:50%;">invocation</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx3.4" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.5" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.6" style="font-size:50%;">gcc</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.7" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx3.8" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.9" style="font-size:50%;">c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.10" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.11" style="font-size:50%;">main</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx3.12" style="font-size:50%;">.</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.13" style="font-size:50%;">c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.14" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx3.15" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.16" style="font-size:50%;">I</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.17" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.18" style="font-size:50%;">include</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx3.19" style="font-size:50%;">/</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx3.20" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx3.21" style="font-size:50%;">-</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx3.22" style="font-size:50%;">O3</span> </div> <div class="ltx_listingline" id="lstnumberx4"> <span class="ltx_text ltx_font_typewriter" id="lstnumberx4.1" style="font-size:50%;">[1/1]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx4.2" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx4.3" style="font-size:50%;">main</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx4.4" style="font-size:50%;">.</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx4.5" style="font-size:50%;">c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx4.6" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx4.7" style="font-size:50%;">...</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx4.8" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx4.9" style="font-size:50%;">Done</span> </div> <div class="ltx_listingline" id="lstnumberx5"> <span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx5.1" style="font-size:50%;">CHECKS</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx5.2" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx5.3" style="font-size:50%;">REPORT</span> </div> <div class="ltx_listingline" id="lstnumberx6"> <span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.1" style="font-size:50%;">main</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.2" style="font-size:50%;">.</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.3" style="font-size:50%;">c</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.4" style="font-size:50%;">:16:9</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.5" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.6" style="font-size:50%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.7" style="font-size:50%;">PWR039</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.8" style="font-size:50%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.9" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.10" style="font-size:50%;">(</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.11" style="font-size:50%;">level</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.12" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.13" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.14" style="font-size:50%;">L1</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx6.15" style="font-size:50%;">):</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.16" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.17" style="font-size:50%;">Consider</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.18" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.19" style="font-size:50%;">loop</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.20" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.21" style="font-size:50%;">interchange</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.22" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.23" style="font-size:50%;">to</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.24" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.25" style="font-size:50%;">improve</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.26" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.27" style="font-size:50%;">the</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.28" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.29" style="font-size:50%;">locality</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.30" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.31" style="font-size:50%;">of</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.32" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.33" style="font-size:50%;">reference</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.34" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx6.35" style="font-size:50%;color:#FF00FF;">and</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.36" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.37" style="font-size:50%;">enable</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx6.38" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx6.39" style="font-size:50%;">vectorization</span> </div> <div class="ltx_listingline" id="lstnumberx7"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx7.1" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx7.2" style="font-size:50%;">Loops</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx7.3" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx7.4" style="font-size:50%;">to</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx7.5" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx7.6" style="font-size:50%;">interchange</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx7.7" style="font-size:50%;">:</span> </div> <div class="ltx_listingline" id="lstnumberx8"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.1" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.2" style="font-size:50%;">16:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.3" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx8.4" style="font-size:50%;color:#FF00FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.5" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.6" style="font-size:50%;">(</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx8.7" style="font-size:50%;">size_t</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.8" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx8.9" style="font-size:50%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.10" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.11" style="font-size:50%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.12" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.13" style="font-size:50%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.14" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx8.15" style="font-size:50%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.16" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.17" style="font-size:50%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.18" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx8.19" style="font-size:50%;">n</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.20" style="font-size:50%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.21" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx8.22" style="font-size:50%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.23" style="font-size:50%;">++)</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx8.24" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx8.25" style="font-size:50%;">{</span> </div> <div class="ltx_listingline" id="lstnumberx9"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.1" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.2" style="font-size:50%;">17:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.3" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx9.4" style="font-size:50%;color:#FF00FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.5" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.6" style="font-size:50%;">(</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx9.7" style="font-size:50%;">size_t</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.8" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx9.9" style="font-size:50%;">k</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.10" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.11" style="font-size:50%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.12" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.13" style="font-size:50%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.14" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx9.15" style="font-size:50%;">k</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.16" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.17" style="font-size:50%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.18" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx9.19" style="font-size:50%;">p</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.20" style="font-size:50%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.21" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx9.22" style="font-size:50%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.23" style="font-size:50%;">++)</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx9.24" style="font-size:50%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx9.25" style="font-size:50%;">{</span> </div> <div class="ltx_listingline" id="lstnumberx10"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.1" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.2" style="font-size:50%;">Suggestion</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx10.3" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.4" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.5" style="font-size:50%;">Interchange</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.6" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.7" style="font-size:50%;">inner</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.8" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx10.9" style="font-size:50%;color:#FF00FF;">and</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.10" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.11" style="font-size:50%;">outer</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.12" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.13" style="font-size:50%;">loops</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.14" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx10.15" style="font-size:50%;color:#FF00FF;">in</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.16" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.17" style="font-size:50%;">the</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.18" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.19" style="font-size:50%;">loop</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.20" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.21" style="font-size:50%;">nest</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.22" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.23" style="font-size:50%;">to</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.24" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.25" style="font-size:50%;">improve</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx10.26" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx10.27" style="font-size:50%;">performance</span> </div> <div class="ltx_listingline" id="lstnumberx11"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx11.1" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx11.2" style="font-size:50%;">Documentation</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx11.3" style="font-size:50%;">:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx11.4" style="font-size:50%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx11.5" style="font-size:50%;">https</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx11.6" style="font-size:50%;">:</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx11.7" style="font-size:50%;"><span class="ltx_text" id="lstnumberx11.7.1" style="color:#9400D1;">github.com/codee-com/open-catalog/tree/main/Checks/PWR039</span></span> </div> <div class="ltx_listingline" id="lstnumberx12"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.1" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.2" style="font-size:50%;color:#9400D1;">AutoFix:</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.3" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.4" style="font-size:50%;color:#9400D1;">codee</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.5" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.6" style="font-size:50%;color:#9400D1;">rewrite</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.7" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.8" style="font-size:50%;color:#9400D1;">--memory</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.9" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.10" style="font-size:50%;color:#9400D1;">loop-interchange</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.11" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.12" style="font-size:50%;color:#9400D1;">--in-place</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.13" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.14" style="font-size:50%;color:#9400D1;">main.c:16:9</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.15" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.16" style="font-size:50%;color:#9400D1;">--</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.17" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.18" style="font-size:50%;color:#9400D1;">gcc</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.19" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.20" style="font-size:50%;color:#9400D1;">-c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.21" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.22" style="font-size:50%;color:#9400D1;">main.c</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.23" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.24" style="font-size:50%;color:#9400D1;">-I</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.25" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.26" style="font-size:50%;color:#9400D1;">include/</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx12.27" style="font-size:50%;color:#9400D1;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx12.28" style="font-size:50%;color:#9400D1;">-O3</span> </div> </div> </figure> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F2.3.1.1" style="font-size:90%;">Figure 2</span>. </span><span class="ltx_text" id="S4.F2.4.2" style="font-size:90%;">An example of Codee suggesting optimizations for <span class="ltx_text ltx_font_italic" id="S4.F2.4.2.1">MATMUL</span></span></figcaption> </figure> <div class="ltx_para" id="S4.p2"> <p class="ltx_p" id="S4.p2.1">Codee incorporates LLVM <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib38" title="">lattner2004llvm, </a>)</cite> to apply compiler analysis and identify potential optimizations by correlating analysis results with its performance problem catalog <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib23" title="">codee_open_catalog, </a>)</cite>. For each problem, Codee explains why these optimizations are necessary and suggests how they might be implemented. Some optimizations may be automatically applied by Codee, while others may require manual adjustments. For example, in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F2" title="Figure 2 ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">2</span></a>, we show a sample of how Codee suggests an auto-fix enabled loop interchange optimization for <span class="ltx_text ltx_font_italic" id="S4.p2.1.1">MATMUL</span>.</p> </div> <div class="ltx_para" id="S4.p3"> <p class="ltx_p" id="S4.p3.1">For LLMs, we concatenate initial prompts as shown in Box <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4" title="4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">4</span></a> with the related code to guide the optimization process. Note that we do not provide LLMs with any existing optimization examples for a fair comparison with Codee.</p> </div> <div class="ltx_para" id="S4.p4"> <p class="ltx_p" id="S4.p4.1">The optimized code generated by each tool is compiled into executable binaries, which are then tested with representative inputs to assess <span class="ltx_text ltx_font_italic" id="S4.p4.1.1">Code Correctness</span> and <span class="ltx_text ltx_font_italic" id="S4.p4.1.2">Performance Speedup</span>. Moreover, we request each tool to explain the specific optimizations applied, and verify whether the optimization terminology is sensible and consistent with the modifications (i.e., <span class="ltx_text ltx_font_italic" id="S4.p4.1.3">HPC Commonsense</span>).</p> </div> <div class="ltx_para" id="S4.p5"> <p class="ltx_p" id="S4.p5.1">All experiments were performed on a machine equipped with Rocky Linux 8.5 (Green Obsidian) and a single AMD EPYC 7543 32-Core CPU. The software we used includes GCC/G++ v14.2.0, CLANG/CLANG++ v19.1.5, and Codee v2024.3.5 Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.T2" title="Table 2 ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">2</span></a> lists the details of the LLMs we used in the experiments.</p> </div> <div class="ltx_para ltx_noindent" id="S4.p6"> <svg class="ltx_picture" height="272.94" id="S4.p6.pic1" overflow="visible" version="1.1" width="600"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,272.94) matrix(1 0 0 -1 0 0)"><g fill="#000000" fill-opacity="1.0"><path d="M 0 5.91 L 0 267.03 C 0 270.29 2.64 272.94 5.91 272.94 L 594.09 272.94 C 597.36 272.94 600 270.29 600 267.03 L 600 5.91 C 600 2.64 597.36 0 594.09 0 L 5.91 0 C 2.64 0 0 2.64 0 5.91 Z" style="stroke:none"></path></g><g fill="#F9F9F9" fill-opacity="1.0"><path d="M 1.97 5.91 L 1.97 251.29 L 598.03 251.29 L 598.03 5.91 C 598.03 3.73 596.27 1.97 594.09 1.97 L 5.91 1.97 C 3.73 1.97 1.97 3.73 1.97 5.91 Z" style="stroke:none"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 21.65 257.19)"><foreignobject color="#FFFFFF" height="9.84" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="556.69"> <span class="ltx_inline-block ltx_minipage ltx_align_bottom" id="S4.p6.pic1.1.1.1.1.1" style="width:402.3pt;"> <span class="ltx_p" id="S4.p6.pic1.1.1.1.1.1.1"><span class="ltx_text" id="S4.p6.pic1.1.1.1.1.1.1.1" style="font-size:80%;">Prompt template for EX1 & EX2 & EX3</span>.</span> </span></foreignobject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 21.65 13.78)"><foreignobject color="#000000" height="225.7" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="556.69"> <span class="ltx_inline-block ltx_minipage ltx_align_bottom" id="S4.p6.pic1.2.2.2.1.1" style="width:402.3pt;"> <span class="ltx_p" id="S4.p6.pic1.2.2.2.1.1.1"><span class="ltx_text ltx_font_bold" id="S4.p6.pic1.2.2.2.1.1.1.1" style="font-size:80%;"># System Prompt (Common for all experiments and LLMs)<span class="ltx_text ltx_font_medium" id="S4.p6.pic1.2.2.2.1.1.1.1.1"> <br class="ltx_break"/>You are a code generation/optimization assistant. Given a prompt your output must only be a compilable source code. The computation environment is a Linux system (Rocky Linux 8.5 Green Obsidian) and a single AMD EPYC 7543 32-Core CPU. The C/C++ language compilers available are: GCC/G++ v14.2.0 and CLANG/CLANG++ v19.1.5 <br class="ltx_break"/></span># Prompt for EX1<span class="ltx_text ltx_font_medium" id="S4.p6.pic1.2.2.2.1.1.1.1.2"> <br class="ltx_break"/>Provide the C/C++ code with a single serial optimization without removing any of the existing functions or header files and without adding any new functions or print statements. <br class="ltx_break"/></span># Prompt for EX2<span class="ltx_text ltx_font_medium" id="S4.p6.pic1.2.2.2.1.1.1.1.3"> <br class="ltx_break"/>Propose an additional serial optimization that can be applied without removing any of the existing functions or header files and without adding any new functions or print statements. <br class="ltx_break"/></span># Prompt for EX3<span class="ltx_text ltx_font_medium" id="S4.p6.pic1.2.2.2.1.1.1.1.4"> <br class="ltx_break"/>Based on the original code, provide optimized parallel C/C++ code without removing any of the existing functions or header files and without adding any new functions or print statements.</span></span></span> </span></foreignobject></g></g></svg> </div> <figure class="ltx_table" id="S4.T2"> <figcaption class="ltx_caption ltx_centering" style="font-size:70%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T2.6.1.1" style="font-size:129%;">Table 2</span>. </span><span class="ltx_text" id="S4.T2.7.2" style="font-size:129%;">Comparison of LLMs<span class="ltx_note ltx_role_footnote" id="footnote2"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_tag ltx_tag_note"><span class="ltx_text" id="footnote2.1.1.1" style="font-size:111%;">2</span></span><span class="ltx_text" id="footnote2.5" style="font-size:111%;">“Context Window” here refers to the total input + output token limit of the model.</span></span></span></span></span></figcaption> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S4.T2.2"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T2.2.3.1"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T2.2.3.1.1"><span class="ltx_text ltx_font_bold" id="S4.T2.2.3.1.1.1" style="font-size:70%;">Model</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.3.1.2"><span class="ltx_text ltx_font_bold" id="S4.T2.2.3.1.2.1" style="font-size:70%;">Context Window</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.3.1.3"><span class="ltx_text ltx_font_bold" id="S4.T2.2.3.1.3.1" style="font-size:70%;">Max Output</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.3.1.4"><span class="ltx_text ltx_font_bold" id="S4.T2.2.3.1.4.1" style="font-size:70%;">#Parameters</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.3.1.5"><span class="ltx_text ltx_font_bold" id="S4.T2.2.3.1.5.1" style="font-size:70%;">Publish Date</span></td> </tr> <tr class="ltx_tr" id="S4.T2.1.1"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T2.1.1.2"> <span class="ltx_text" id="S4.T2.1.1.2.1" style="font-size:70%;">OpenAI o1 </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S4.T2.1.1.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib1" title="">GPT-4<span class="ltx_text" id="S4.T2.1.1.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S4.T2.1.1.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.1.1.3"><span class="ltx_text" id="S4.T2.1.1.3.1" style="font-size:70%;">128k tokens</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.1.1.4"><span class="ltx_text" id="S4.T2.1.1.4.1" style="font-size:70%;">4096 tokens</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.1.1.1"><math alttext="\sim" class="ltx_Math" display="inline" id="S4.T2.1.1.1.m1.1"><semantics id="S4.T2.1.1.1.m1.1a"><mo id="S4.T2.1.1.1.m1.1.1" mathsize="70%" xref="S4.T2.1.1.1.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T2.1.1.1.m1.1b"><csymbol cd="latexml" id="S4.T2.1.1.1.m1.1.1.cmml" xref="S4.T2.1.1.1.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T2.1.1.1.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T2.1.1.1.m1.1d">∼</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.1.1.5"><span class="ltx_text" id="S4.T2.1.1.5.1" style="font-size:70%;">Dec 5, 2024</span></td> </tr> <tr class="ltx_tr" id="S4.T2.2.4.2"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T2.2.4.2.1"> <span class="ltx_text" id="S4.T2.2.4.2.1.1" style="font-size:70%;">Llama-3.2 </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S4.T2.2.4.2.1.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib2" title="">Llama3.2<span class="ltx_text" id="S4.T2.2.4.2.1.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S4.T2.2.4.2.1.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.4.2.2"><span class="ltx_text" id="S4.T2.2.4.2.2.1" style="font-size:70%;">128k tokens</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.4.2.3"><span class="ltx_text" id="S4.T2.2.4.2.3.1" style="font-size:70%;">4096 tokens</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.4.2.4"><span class="ltx_text" id="S4.T2.2.4.2.4.1" style="font-size:70%;">90B</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T2.2.4.2.5"><span class="ltx_text" id="S4.T2.2.4.2.5.1" style="font-size:70%;">Sep 25, 2024</span></td> </tr> <tr class="ltx_tr" id="S4.T2.2.2"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t" id="S4.T2.2.2.2"> <span class="ltx_text" id="S4.T2.2.2.2.1" style="font-size:70%;">Claude-3.5 </span><cite class="ltx_cite ltx_citemacro_citep"><span class="ltx_text" id="S4.T2.2.2.2.2.1" style="font-size:70%;">(</span><a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib9" title="">claude<span class="ltx_text" id="S4.T2.2.2.2.3.2.1.1" style="font-size:70%;">, </span></a><span class="ltx_text" id="S4.T2.2.2.2.4.3" style="font-size:70%;">)</span></cite> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T2.2.2.3"><span class="ltx_text" id="S4.T2.2.2.3.1" style="font-size:70%;">200k tokens</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T2.2.2.4"><span class="ltx_text" id="S4.T2.2.2.4.1" style="font-size:70%;">8192 tokens</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T2.2.2.1"><math alttext="\sim" class="ltx_Math" display="inline" id="S4.T2.2.2.1.m1.1"><semantics id="S4.T2.2.2.1.m1.1a"><mo id="S4.T2.2.2.1.m1.1.1" mathsize="70%" xref="S4.T2.2.2.1.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T2.2.2.1.m1.1b"><csymbol cd="latexml" id="S4.T2.2.2.1.m1.1.1.cmml" xref="S4.T2.2.2.1.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T2.2.2.1.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T2.2.2.1.m1.1d">∼</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T2.2.2.5"><span class="ltx_text" id="S4.T2.2.2.5.1" style="font-size:70%;">June 20, 2024</span></td> </tr> </tbody> </table> </figure> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.1. </span>Experiments Design</h3> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1">We conducted six experiments to analyze the performance optimization effects of Codee and LLMs. The details of each experiment are described as follows:</p> </div> <div class="ltx_para" id="S4.SS1.p2"> <ul class="ltx_itemize" id="S4.I1"> <li class="ltx_item" id="S4.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S4.I1.i1.p1"> <p class="ltx_p" id="S4.I1.i1.p1.1"><span class="ltx_text ltx_font_bold" id="S4.I1.i1.p1.1.1">Experiment 1 (EX1): Single Serial Optimization.</span> We apply a single serial optimization that does not include multi-threaded parallelization in this experiment. Using Codee, we prefer optimizations that can be auto-fixed or require a simple code change (e.g., modifying a single line or adding a few simple annotations). For LLMs, we always instruct them to automatically apply a single optimization. The goal is to assess the speedup each tool can achieve through simple optimizations using different compilers.</p> </div> </li> <li class="ltx_item" id="S4.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S4.I1.i2.p1"> <p class="ltx_p" id="S4.I1.i2.p1.1"><span class="ltx_text ltx_font_bold" id="S4.I1.i2.p1.1.1">Experiment 2 (EX2): Multiple Serial Optimizations.</span> In this experiment, we select and apply four additional serial optimizations recommended by Codee based on their corresponding importance <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib23" title="">codee_open_catalog, </a>)</cite>. For LLMs, we request them to implement and describe four additional optimizations incrementally. Four versions of the optimized code are yielded, and we choose the version that delivers the highest speedup and correct results to indicate the speedup with different compilers. The goal is to measure the maximum possible speedups each tool can achieve in the single thread setting with extensive updates.</p> </div> </li> <li class="ltx_item" id="S4.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S4.I1.i3.p1"> <p class="ltx_p" id="S4.I1.i3.p1.1"><span class="ltx_text ltx_font_bold" id="S4.I1.i3.p1.1.1">Experiment 3 (EX3): Parallel Optimizations.</span> We instruct Codee and LLMs to consider only parallel optimizations (typically OpenMP multi-threaded parallelism). The goal is to check each tool’s capability of writing parallel code. Note that we separate parallel and serial optimizations to develop a more thorough understanding of the effects of individual optimizations. Parallel optimizations can overshadow or interact with serial optimizations, making it difficult to isolate and evaluate the effects of serial optimizations independently.</p> </div> </li> <li class="ltx_item" id="S4.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S4.I1.i4.p1"> <p class="ltx_p" id="S4.I1.i4.p1.1"><span class="ltx_text ltx_font_bold" id="S4.I1.i4.p1.1.1">Experiment 4 (EX4): Time Spent on Applying Optimization.</span> We summarize and compare the human time required to invoke tools, prepare prompts, compile and run code, and validate correctness in EX1, EX2, and EX3 for each tool.</p> </div> </li> <li class="ltx_item" id="S4.I1.i5" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S4.I1.i5.p1"> <p class="ltx_p" id="S4.I1.i5.p1.1"><span class="ltx_text ltx_font_bold" id="S4.I1.i5.p1.1.1">Experiment 5 (EX5): Correctness.</span> We summarize the correctness of code generated by different tools in EX1, EX2, and EX3. Correctness encompasses the following: checking whether the code is complete, the code compiles without any errors, and the outputs match that of the original code. Note that we adopt the <span class="ltx_text ltx_font_italic" id="S4.I1.i5.p1.1.2">pass@1</span> metric <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib18" title="">Evauating_LLMs_Trained_on_Code, </a>)</cite> to measure the correctness of the LLMs, only considering the the top-1 result without prompting the LLMs to produce additional answers.</p> </div> </li> <li class="ltx_item" id="S4.I1.i6" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S4.I1.i6.p1"> <p class="ltx_p" id="S4.I1.i6.p1.1"><span class="ltx_text ltx_font_bold" id="S4.I1.i6.p1.1.1">Experiment 6 (EX6): HPC Commonsense.</span> We define a new term-<span class="ltx_text ltx_font_italic" id="S4.I1.i6.p1.1.2">HPC Commonsense</span> to understand how the suggested optimizations are relevant to each benchmark and the domain they belong to. Codee provides reasons for optimizations during code inspection. If an LLM does not provide explanations directly, we prompt it with “Give me explanations for the optimizations you made.”</p> </div> </li> </ul> </div> <div class="ltx_para" id="S4.SS1.p3"> <p class="ltx_p" id="S4.SS1.p3.1">All experiments were conducted by two graduate students with a limited background in HPC code optimization. The optimized code and benchmark results have been verified by several experienced performance engineers. We compare the execution time for each optimized code generated in EX1, EX2, and EX3 against the original code, with the time averaged over 10 iterations. For each benchmark, we utilized the built-in input by default and updated certain inputs to ensure that the execution time was sufficiently long. Note that <span class="ltx_text ltx_font_bold" id="S4.SS1.p3.1.1">NA</span> in our figures indicates that the generated code yields incorrect results, fails to compile, or encounters any runtime errors. For every <span class="ltx_text ltx_font_bold" id="S4.SS1.p3.1.2">NA</span> instance, we revert to the original code for measurement of runtime. We use the following abbreviation for LLMs: O1 (OpenAI o1), L3.2 (Llama-3.2), and C3.5 (Claude-3.5)</p> </div> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.2. </span>Experiment 1: Single Serial Optimization</h3> <figure class="ltx_figure" id="S4.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="316" id="S4.F3.g1" src="x2.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F3.2.1.1" style="font-size:90%;">Figure 3</span>. </span><span class="ltx_text" id="S4.F3.3.2" style="font-size:90%;">Speedups of single and multiple round serial optimizations. The Y axis indicates the speedup and the horizontal lines denote a speedup of 1.0. Benchmarks that achieved speedup ¿=5.9 in multiple round optimization across GCC/G++ and CLANG/CLANG++ compilers have been marked above the respective bars.</span></figcaption> </figure> <div class="ltx_para" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.8">Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F3" title="Figure 3 ‣ 4.2. Experiment 1: Single Serial Optimization ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">3</span></a> (a) and (b) present the speedups for each benchmark achieved by optimizing the original code using three different LLMs and Codee, compiled with GCC/G++ and CLANG/CLANG++, respectively. The mean speedups achieved by Codee are 1.48<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.1.m1.1"><semantics id="S4.SS2.p1.1.m1.1a"><mo id="S4.SS2.p1.1.m1.1.1" xref="S4.SS2.p1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.1.m1.1b"><times id="S4.SS2.p1.1.m1.1.1.cmml" xref="S4.SS2.p1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.1.m1.1d">×</annotation></semantics></math> (GCC/G++) and 1.48<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.2.m2.1"><semantics id="S4.SS2.p1.2.m2.1a"><mo id="S4.SS2.p1.2.m2.1.1" xref="S4.SS2.p1.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.2.m2.1b"><times id="S4.SS2.p1.2.m2.1.1.cmml" xref="S4.SS2.p1.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.2.m2.1d">×</annotation></semantics></math> (CLANG/CLANG++). In comparison, OpenAI o1 achieves mean speedups of 1.50<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.3.m3.1"><semantics id="S4.SS2.p1.3.m3.1a"><mo id="S4.SS2.p1.3.m3.1.1" xref="S4.SS2.p1.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.3.m3.1b"><times id="S4.SS2.p1.3.m3.1.1.cmml" xref="S4.SS2.p1.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.3.m3.1d">×</annotation></semantics></math> and 1.44<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.4.m4.1"><semantics id="S4.SS2.p1.4.m4.1a"><mo id="S4.SS2.p1.4.m4.1.1" xref="S4.SS2.p1.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.4.m4.1b"><times id="S4.SS2.p1.4.m4.1.1.cmml" xref="S4.SS2.p1.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.4.m4.1d">×</annotation></semantics></math>, Llama-3.2 achieves mean speedups of 1.24<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.5.m5.1"><semantics id="S4.SS2.p1.5.m5.1a"><mo id="S4.SS2.p1.5.m5.1.1" xref="S4.SS2.p1.5.m5.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.5.m5.1b"><times id="S4.SS2.p1.5.m5.1.1.cmml" xref="S4.SS2.p1.5.m5.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.5.m5.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.5.m5.1d">×</annotation></semantics></math> and 1.19<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.6.m6.1"><semantics id="S4.SS2.p1.6.m6.1a"><mo id="S4.SS2.p1.6.m6.1.1" xref="S4.SS2.p1.6.m6.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.6.m6.1b"><times id="S4.SS2.p1.6.m6.1.1.cmml" xref="S4.SS2.p1.6.m6.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.6.m6.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.6.m6.1d">×</annotation></semantics></math>, and Claude-3.5 achieves mean speedups of 1.02<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.7.m7.1"><semantics id="S4.SS2.p1.7.m7.1a"><mo id="S4.SS2.p1.7.m7.1.1" xref="S4.SS2.p1.7.m7.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.7.m7.1b"><times id="S4.SS2.p1.7.m7.1.1.cmml" xref="S4.SS2.p1.7.m7.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.7.m7.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.7.m7.1d">×</annotation></semantics></math> and 1.04<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p1.8.m8.1"><semantics id="S4.SS2.p1.8.m8.1a"><mo id="S4.SS2.p1.8.m8.1.1" xref="S4.SS2.p1.8.m8.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.8.m8.1b"><times id="S4.SS2.p1.8.m8.1.1.cmml" xref="S4.SS2.p1.8.m8.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.8.m8.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.8.m8.1d">×</annotation></semantics></math>, using GCC/G++ and CLANG/CLANG++, respectively. OpenAI o1 typically outperforms Llama-3.2 for speedups and the quantity of correctly produced codes, whereas Claude-3.5 does not reach the same level of efficiency and accuracy.</p> </div> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.2">Due to heuristics and compiler-based code analysis by Codee, we observed no errors in the optimized code. In contrast, OpenAI o1, Llama-3.2, and Claude-3.5 yield codes that are partially completed or yield incorrect results. Codee matches or outperforms the LLMs in most of the benchmarks. In the case of <span class="ltx_text ltx_font_italic" id="S4.SS2.p2.2.1">MATMUL</span>, we noticed 2/3 LLMs (OpenAI o1 and Llama-3.2) apply loop interchange as a serial optimization strategy the same as Codee (yielding ~4.5<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p2.1.m1.1"><semantics id="S4.SS2.p2.1.m1.1a"><mo id="S4.SS2.p2.1.m1.1.1" xref="S4.SS2.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p2.1.m1.1b"><times id="S4.SS2.p2.1.m1.1.1.cmml" xref="S4.SS2.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p2.1.m1.1d">×</annotation></semantics></math> speedup). We believe the familiarity of LLMs with a widely used algorithm like <span class="ltx_text ltx_font_italic" id="S4.SS2.p2.2.2">MATMUL</span> during training results in such a speedup. In another instance, for a more HPC specific algorithm in the <span class="ltx_text ltx_font_italic" id="S4.SS2.p2.2.3">HACCmk</span> example, out of the LLMs only OpenAI o1 provides significant speedups comparable to Codee (~5<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p2.2.m2.1"><semantics id="S4.SS2.p2.2.m2.1a"><mo id="S4.SS2.p2.2.m2.1.1" xref="S4.SS2.p2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p2.2.m2.1b"><times id="S4.SS2.p2.2.m2.1.1.cmml" xref="S4.SS2.p2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p2.2.m2.1d">×</annotation></semantics></math> speedup), while other LLMs cannot optimize the code well. Llama-3.2 and Claude-3.5 struggle to achieve significant optimization in level 2 benchmarks, such as <span class="ltx_text ltx_font_italic" id="S4.SS2.p2.2.4">Particlefilter</span>. The results with CLANG/CLANG++ are generally consistent with the results obtained with GCC/G++.</p> </div> </section> <section class="ltx_subsection" id="S4.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.3. </span>Experiment 2: Multiple Serial Optimizations</h3> <div class="ltx_para" id="S4.SS3.p1"> <p class="ltx_p" id="S4.SS3.p1.8">Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F3" title="Figure 3 ‣ 4.2. Experiment 1: Single Serial Optimization ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">3</span></a> (c) and (d) present the optimization results for each benchmark after applying multiple serial optimizations. Codee achieves mean speedups of 1.71<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.1.m1.1"><semantics id="S4.SS3.p1.1.m1.1a"><mo id="S4.SS3.p1.1.m1.1.1" xref="S4.SS3.p1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.1.m1.1b"><times id="S4.SS3.p1.1.m1.1.1.cmml" xref="S4.SS3.p1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.1.m1.1d">×</annotation></semantics></math> (GCC/G++) and 1.78<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.2.m2.1"><semantics id="S4.SS3.p1.2.m2.1a"><mo id="S4.SS3.p1.2.m2.1.1" xref="S4.SS3.p1.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.2.m2.1b"><times id="S4.SS3.p1.2.m2.1.1.cmml" xref="S4.SS3.p1.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.2.m2.1d">×</annotation></semantics></math> (CLANG/CLANG++). OpenAI o1 achieves mean speedups of 2.07<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.3.m3.1"><semantics id="S4.SS3.p1.3.m3.1a"><mo id="S4.SS3.p1.3.m3.1.1" xref="S4.SS3.p1.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.3.m3.1b"><times id="S4.SS3.p1.3.m3.1.1.cmml" xref="S4.SS3.p1.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.3.m3.1d">×</annotation></semantics></math> and 2.13<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.4.m4.1"><semantics id="S4.SS3.p1.4.m4.1a"><mo id="S4.SS3.p1.4.m4.1.1" xref="S4.SS3.p1.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.4.m4.1b"><times id="S4.SS3.p1.4.m4.1.1.cmml" xref="S4.SS3.p1.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.4.m4.1d">×</annotation></semantics></math>, Llama-3.2 achieves mean speedups of 1.30<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.5.m5.1"><semantics id="S4.SS3.p1.5.m5.1a"><mo id="S4.SS3.p1.5.m5.1.1" xref="S4.SS3.p1.5.m5.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.5.m5.1b"><times id="S4.SS3.p1.5.m5.1.1.cmml" xref="S4.SS3.p1.5.m5.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.5.m5.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.5.m5.1d">×</annotation></semantics></math> and 1.32<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.6.m6.1"><semantics id="S4.SS3.p1.6.m6.1a"><mo id="S4.SS3.p1.6.m6.1.1" xref="S4.SS3.p1.6.m6.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.6.m6.1b"><times id="S4.SS3.p1.6.m6.1.1.cmml" xref="S4.SS3.p1.6.m6.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.6.m6.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.6.m6.1d">×</annotation></semantics></math>, and Claude-3.5 achieves mean speedups of 1.31<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.7.m7.1"><semantics id="S4.SS3.p1.7.m7.1a"><mo id="S4.SS3.p1.7.m7.1.1" xref="S4.SS3.p1.7.m7.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.7.m7.1b"><times id="S4.SS3.p1.7.m7.1.1.cmml" xref="S4.SS3.p1.7.m7.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.7.m7.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.7.m7.1d">×</annotation></semantics></math> and 1.31<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p1.8.m8.1"><semantics id="S4.SS3.p1.8.m8.1a"><mo id="S4.SS3.p1.8.m8.1.1" xref="S4.SS3.p1.8.m8.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p1.8.m8.1b"><times id="S4.SS3.p1.8.m8.1.1.cmml" xref="S4.SS3.p1.8.m8.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p1.8.m8.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p1.8.m8.1d">×</annotation></semantics></math>. As EX2 codes are derived from the optimizations implemented in EX1, errors present in EX1 persist and those codes are still identified as <span class="ltx_text ltx_font_bold" id="S4.SS3.p1.8.1">NA</span> (e.g., <span class="ltx_text ltx_font_italic" id="S4.SS3.p1.8.2">Srad</span>).</p> </div> <div class="ltx_para" id="S4.SS3.p2"> <p class="ltx_p" id="S4.SS3.p2.2">In most cases, we observed that higher speedups are achieved after applying multiple optimizations. For example, in <span class="ltx_text ltx_font_italic" id="S4.SS3.p2.2.1">Hotspot3D</span>, Codee shows a significant improvement in multiple-round optimization compared to single-round optimization, with a speedup of 1.80<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p2.1.m1.1"><semantics id="S4.SS3.p2.1.m1.1a"><mo id="S4.SS3.p2.1.m1.1.1" xref="S4.SS3.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p2.1.m1.1b"><times id="S4.SS3.p2.1.m1.1.1.cmml" xref="S4.SS3.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p2.1.m1.1d">×</annotation></semantics></math> using CLANG/CLANG++. In an interesting scenario for <span class="ltx_text ltx_font_italic" id="S4.SS3.p2.2.2">Particlefilter</span>, apart from suggesting regular canonical optimizations, OpenAI o1 goes a step further in suggesting the implementation of binary search as opposed to the existing linear search approach resulting in a speedup of 9.8<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS3.p2.2.m2.1"><semantics id="S4.SS3.p2.2.m2.1a"><mo id="S4.SS3.p2.2.m2.1.1" xref="S4.SS3.p2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS3.p2.2.m2.1b"><times id="S4.SS3.p2.2.m2.1.1.cmml" xref="S4.SS3.p2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS3.p2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS3.p2.2.m2.1d">×</annotation></semantics></math> on both the compilers. This finding highlights LLMs’ capability to offer adaptive optimizations from the perspective of algorithmic time complexity.</p> </div> </section> <section class="ltx_subsection" id="S4.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.4. </span>Experiment 3: Parallel Optimization</h3> <figure class="ltx_figure" id="S4.F4"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="923" id="S4.F4.g1" src="x3.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F4.2.1.1" style="font-size:90%;">Figure 4</span>. </span><span class="ltx_text" id="S4.F4.3.2" style="font-size:90%;">Results from parallel optimizations using GCC/G++ compiler. The Y axis indicates the speedup.</span></figcaption> </figure> <figure class="ltx_figure" id="S4.F5"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="917" id="S4.F5.g1" src="x4.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F5.2.1.1" style="font-size:90%;">Figure 5</span>. </span><span class="ltx_text" id="S4.F5.3.2" style="font-size:90%;">Results from parallel optimizations using CLANG/CLANG++ compiler. The Y axis indicates the speedup.</span></figcaption> </figure> <div class="ltx_para" id="S4.SS4.p1"> <p class="ltx_p" id="S4.SS4.p1.10">Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F4" title="Figure 4 ‣ 4.4. Experiment 3: Parallel Optimization ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">4</span></a> and Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F5" title="Figure 5 ‣ 4.4. Experiment 3: Parallel Optimization ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">5</span></a> demonstrate the speedups achieved by parallel optimizations using each tool with 4 to 32 threads, compiled with GCC/G++ and CLANG/CLANG++ respectively. For GCC/G++, the mean speedups of Codee, OpenAI o1, Llama-3.2, and Claude-3.5 across 4 to 32 threads are 4.75<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.1.m1.1"><semantics id="S4.SS4.p1.1.m1.1a"><mo id="S4.SS4.p1.1.m1.1.1" xref="S4.SS4.p1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.1.m1.1b"><times id="S4.SS4.p1.1.m1.1.1.cmml" xref="S4.SS4.p1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.1.m1.1d">×</annotation></semantics></math>, 5.01<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.2.m2.1"><semantics id="S4.SS4.p1.2.m2.1a"><mo id="S4.SS4.p1.2.m2.1.1" xref="S4.SS4.p1.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.2.m2.1b"><times id="S4.SS4.p1.2.m2.1.1.cmml" xref="S4.SS4.p1.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.2.m2.1d">×</annotation></semantics></math>, 2.89<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.3.m3.1"><semantics id="S4.SS4.p1.3.m3.1a"><mo id="S4.SS4.p1.3.m3.1.1" xref="S4.SS4.p1.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.3.m3.1b"><times id="S4.SS4.p1.3.m3.1.1.cmml" xref="S4.SS4.p1.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.3.m3.1d">×</annotation></semantics></math>, and 3.38<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.4.m4.1"><semantics id="S4.SS4.p1.4.m4.1a"><mo id="S4.SS4.p1.4.m4.1.1" xref="S4.SS4.p1.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.4.m4.1b"><times id="S4.SS4.p1.4.m4.1.1.cmml" xref="S4.SS4.p1.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.4.m4.1d">×</annotation></semantics></math>, respectively. For CLANG/CLANG++, the mean speedups of Codee, OpenAI o1, Llama-3.2, and Claude-3.5 across 4 to 32 threads are 4.12<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.5.m5.1"><semantics id="S4.SS4.p1.5.m5.1a"><mo id="S4.SS4.p1.5.m5.1.1" xref="S4.SS4.p1.5.m5.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.5.m5.1b"><times id="S4.SS4.p1.5.m5.1.1.cmml" xref="S4.SS4.p1.5.m5.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.5.m5.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.5.m5.1d">×</annotation></semantics></math>, 5.04<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.6.m6.1"><semantics id="S4.SS4.p1.6.m6.1a"><mo id="S4.SS4.p1.6.m6.1.1" xref="S4.SS4.p1.6.m6.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.6.m6.1b"><times id="S4.SS4.p1.6.m6.1.1.cmml" xref="S4.SS4.p1.6.m6.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.6.m6.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.6.m6.1d">×</annotation></semantics></math>, 2.46<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.7.m7.1"><semantics id="S4.SS4.p1.7.m7.1a"><mo id="S4.SS4.p1.7.m7.1.1" xref="S4.SS4.p1.7.m7.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.7.m7.1b"><times id="S4.SS4.p1.7.m7.1.1.cmml" xref="S4.SS4.p1.7.m7.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.7.m7.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.7.m7.1d">×</annotation></semantics></math>, and 3.64<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.8.m8.1"><semantics id="S4.SS4.p1.8.m8.1a"><mo id="S4.SS4.p1.8.m8.1.1" xref="S4.SS4.p1.8.m8.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.8.m8.1b"><times id="S4.SS4.p1.8.m8.1.1.cmml" xref="S4.SS4.p1.8.m8.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.8.m8.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.8.m8.1d">×</annotation></semantics></math>, respectively. OpenAI o1 demonstrates the best performance in recommending strategies for parallel optimization across different compilers. Codee does suggest good parallel optimizations as well, offering auto-fixes for most of the commonly used optimization strategies. Notably, in <span class="ltx_text ltx_font_italic" id="S4.SS4.p1.10.1">HACCmk</span> and <span class="ltx_text ltx_font_italic" id="S4.SS4.p1.10.2">Hotspot3D</span>, Codee’s performance is particularly impressive, achieving 40.00<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.9.m9.1"><semantics id="S4.SS4.p1.9.m9.1a"><mo id="S4.SS4.p1.9.m9.1.1" xref="S4.SS4.p1.9.m9.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.9.m9.1b"><times id="S4.SS4.p1.9.m9.1.1.cmml" xref="S4.SS4.p1.9.m9.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.9.m9.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.9.m9.1d">×</annotation></semantics></math> and 21.20<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS4.p1.10.m10.1"><semantics id="S4.SS4.p1.10.m10.1a"><mo id="S4.SS4.p1.10.m10.1.1" xref="S4.SS4.p1.10.m10.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS4.p1.10.m10.1b"><times id="S4.SS4.p1.10.m10.1.1.cmml" xref="S4.SS4.p1.10.m10.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS4.p1.10.m10.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS4.p1.10.m10.1d">×</annotation></semantics></math> at 32 threads respectively. As for the LLMs, it should be noted that Claude-3.5 fails to optimize 8 out of 20 benchmarks.</p> </div> <div class="ltx_para" id="S4.SS4.p2"> <p class="ltx_p" id="S4.SS4.p2.1">While the LLMs are good at yielding efficient optimizations for <span class="ltx_text ltx_font_italic" id="S4.SS4.p2.1.1">MATMUL</span> as discussed in previous sections, in the case of parallelization, all of them fail to suggest meaningful optimizations that yield speedups. Interestingly, while the OpenAI o1 yields code with remarkable speedups for <span class="ltx_text ltx_font_italic" id="S4.SS4.p2.1.2">MATMUL</span>, it applies serial optimizations instead of using OpenMP pragmas, suggesting that it does not follow users’ instructions and breaks the strictly parallel optimization requirement of our experiment. Llama-3.2 and Claude-3.5 end up not only parallelizing the matrix multiplication function, but also parallelizing the outer loop that benchmarks the MATMUL function multiple times, resulting in complex nested parallelism and excessive memory contention. In the case of <span class="ltx_text ltx_font_italic" id="S4.SS4.p2.1.3">Hotspot3D</span>, Codee recommends using <span class="ltx_text ltx_font_typewriter" id="S4.SS4.p2.1.4">#pragma omp for schedule(auto)</span> to achieve a higher speedup than <span class="ltx_text ltx_font_typewriter" id="S4.SS4.p2.1.5">#pragma omp parallel for</span> suggested by the LLMs. A similar scenario also occurs in <span class="ltx_text ltx_font_italic" id="S4.SS4.p2.1.6">HACCmk</span>.</p> </div> <div class="ltx_para" id="S4.SS4.p3"> <p class="ltx_p" id="S4.SS4.p3.1">Interestingly with <span class="ltx_text ltx_font_italic" id="S4.SS4.p3.1.1">Durbin</span> and <span class="ltx_text ltx_font_italic" id="S4.SS4.p3.1.2">Jacobi-1d</span>, the performance declines with an increase in thread counts. This occurs as a result of applying OpenMP pragmas to the inner loop instead of the outer loop of the nested loops present in the program. When parallelizing the inner loop, the workload per thread might be too small relative to the overhead of creating and synchronizing threads. Another interesting scenario occurs with the case of <span class="ltx_text ltx_font_italic" id="S4.SS4.p3.1.3">Srad</span> where Llama-3.2 is the only model that failed to scale using the CLANG/CLANG++ compiler, as shown in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F5" title="Figure 5 ‣ 4.4. Experiment 3: Parallel Optimization ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">5</span></a>. However, its GCC/G++ counterpart does achieve non-trivial speedups. Upon investigation, we found that GCC/G++ automatically fused nested loops for better scheduling. While CLANG/CLANG++, on the other hand, only parallelizes the outmost loop with <span class="ltx_text ltx_font_typewriter" id="S4.SS4.p3.1.4">#pragma omp parallel for</span>.</p> </div> </section> <section class="ltx_subsection" id="S4.SS5"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.5. </span>Experiment 4: Time Spent on Applying Optimizations</h3> <figure class="ltx_table" id="S4.T3"> <figcaption class="ltx_caption ltx_centering" style="font-size:70%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T3.19.1.1" style="font-size:129%;">Table 3</span>. </span><span class="ltx_text" id="S4.T3.20.2" style="font-size:129%;">Time taken to apply code optimizations</span></figcaption> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S4.T3.15"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T3.15.16.1"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T3.15.16.1.1" rowspan="2"><span class="ltx_text" id="S4.T3.15.16.1.1.1" style="font-size:70%;">Tool</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="3" id="S4.T3.15.16.1.2"><span class="ltx_text" id="S4.T3.15.16.1.2.1" style="font-size:70%;">Average Time (Per Benchmark)</span></td> </tr> <tr class="ltx_tr" id="S4.T3.15.17.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.15.17.2.1"><span class="ltx_text" id="S4.T3.15.17.2.1.1" style="font-size:70%;">EX1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.15.17.2.2"><span class="ltx_text" id="S4.T3.15.17.2.2.1" style="font-size:70%;">EX2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.15.17.2.3"><span class="ltx_text" id="S4.T3.15.17.2.3.1" style="font-size:70%;">EX3</span></td> </tr> <tr class="ltx_tr" id="S4.T3.3.3"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T3.3.3.4"><span class="ltx_text" id="S4.T3.3.3.4.1" style="font-size:70%;">Codee</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.1.1.1"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.1.1.1.m1.1"><semantics id="S4.T3.1.1.1.m1.1a"><mo id="S4.T3.1.1.1.m1.1.1" mathsize="70%" xref="S4.T3.1.1.1.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.1.1.1.m1.1b"><csymbol cd="latexml" id="S4.T3.1.1.1.m1.1.1.cmml" xref="S4.T3.1.1.1.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.1.1.1.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.1.1.1.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.1.1.1.1" style="font-size:70%;">3 mins</span> </td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T3.2.2.2"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.2.2.2.m1.1"><semantics id="S4.T3.2.2.2.m1.1a"><mo id="S4.T3.2.2.2.m1.1.1" mathsize="70%" xref="S4.T3.2.2.2.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.2.2.2.m1.1b"><csymbol cd="latexml" id="S4.T3.2.2.2.m1.1.1.cmml" xref="S4.T3.2.2.2.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.2.2.2.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.2.2.2.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.2.2.2.1" style="font-size:70%;">4 mins</span> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.3.3.3"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.3.3.3.m1.1"><semantics id="S4.T3.3.3.3.m1.1a"><mo id="S4.T3.3.3.3.m1.1.1" mathsize="70%" xref="S4.T3.3.3.3.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.3.3.3.m1.1b"><csymbol cd="latexml" id="S4.T3.3.3.3.m1.1.1.cmml" xref="S4.T3.3.3.3.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.3.3.3.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.3.3.3.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.3.3.3.1" style="font-size:70%;">2 mins</span> </td> </tr> <tr class="ltx_tr" id="S4.T3.7.7"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T3.7.7.5"><span class="ltx_text" id="S4.T3.7.7.5.1" style="font-size:70%;">OpenAI o1</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T3.4.4.1"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.4.4.1.m1.1"><semantics id="S4.T3.4.4.1.m1.1a"><mo id="S4.T3.4.4.1.m1.1.1" mathsize="70%" xref="S4.T3.4.4.1.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.4.4.1.m1.1b"><csymbol cd="latexml" id="S4.T3.4.4.1.m1.1.1.cmml" xref="S4.T3.4.4.1.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.4.4.1.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.4.4.1.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.4.4.1.1" style="font-size:70%;">2 mins</span> </td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T3.6.6.3"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.5.5.2.m1.1"><semantics id="S4.T3.5.5.2.m1.1a"><mo id="S4.T3.5.5.2.m1.1.1" mathsize="70%" xref="S4.T3.5.5.2.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.5.5.2.m1.1b"><csymbol cd="latexml" id="S4.T3.5.5.2.m1.1.1.cmml" xref="S4.T3.5.5.2.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.5.5.2.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.5.5.2.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.6.6.3.1" style="font-size:70%;">6 mins (</span><math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.6.6.3.m2.1"><semantics id="S4.T3.6.6.3.m2.1a"><mo id="S4.T3.6.6.3.m2.1.1" mathsize="70%" xref="S4.T3.6.6.3.m2.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.6.6.3.m2.1b"><csymbol cd="latexml" id="S4.T3.6.6.3.m2.1.1.cmml" xref="S4.T3.6.6.3.m2.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.6.6.3.m2.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.6.6.3.m2.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.6.6.3.2" style="font-size:70%;">1.5 mins/trial)</span> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.7.7.4"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.7.7.4.m1.1"><semantics id="S4.T3.7.7.4.m1.1a"><mo id="S4.T3.7.7.4.m1.1.1" mathsize="70%" xref="S4.T3.7.7.4.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.7.7.4.m1.1b"><csymbol cd="latexml" id="S4.T3.7.7.4.m1.1.1.cmml" xref="S4.T3.7.7.4.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.7.7.4.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.7.7.4.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.7.7.4.1" style="font-size:70%;">5 mins</span> </td> </tr> <tr class="ltx_tr" id="S4.T3.11.11"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T3.11.11.5"><span class="ltx_text" id="S4.T3.11.11.5.1" style="font-size:70%;">Llama-3.2</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T3.8.8.1"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.8.8.1.m1.1"><semantics id="S4.T3.8.8.1.m1.1a"><mo id="S4.T3.8.8.1.m1.1.1" mathsize="70%" xref="S4.T3.8.8.1.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.8.8.1.m1.1b"><csymbol cd="latexml" id="S4.T3.8.8.1.m1.1.1.cmml" xref="S4.T3.8.8.1.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.8.8.1.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.8.8.1.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.8.8.1.1" style="font-size:70%;">2 mins</span> </td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T3.10.10.3"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.9.9.2.m1.1"><semantics id="S4.T3.9.9.2.m1.1a"><mo id="S4.T3.9.9.2.m1.1.1" mathsize="70%" xref="S4.T3.9.9.2.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.9.9.2.m1.1b"><csymbol cd="latexml" id="S4.T3.9.9.2.m1.1.1.cmml" xref="S4.T3.9.9.2.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.9.9.2.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.9.9.2.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.10.10.3.1" style="font-size:70%;">9 mins (</span><math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.10.10.3.m2.1"><semantics id="S4.T3.10.10.3.m2.1a"><mo id="S4.T3.10.10.3.m2.1.1" mathsize="70%" xref="S4.T3.10.10.3.m2.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.10.10.3.m2.1b"><csymbol cd="latexml" id="S4.T3.10.10.3.m2.1.1.cmml" xref="S4.T3.10.10.3.m2.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.10.10.3.m2.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.10.10.3.m2.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.10.10.3.2" style="font-size:70%;">2.25 mins/trial)</span> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.11.11.4"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.11.11.4.m1.1"><semantics id="S4.T3.11.11.4.m1.1a"><mo id="S4.T3.11.11.4.m1.1.1" mathsize="70%" xref="S4.T3.11.11.4.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.11.11.4.m1.1b"><csymbol cd="latexml" id="S4.T3.11.11.4.m1.1.1.cmml" xref="S4.T3.11.11.4.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.11.11.4.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.11.11.4.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.11.11.4.1" style="font-size:70%;">5 mins</span> </td> </tr> <tr class="ltx_tr" id="S4.T3.15.15"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t" id="S4.T3.15.15.5"><span class="ltx_text" id="S4.T3.15.15.5.1" style="font-size:70%;">Claude-3.5</span></td> <td class="ltx_td ltx_align_left ltx_border_b ltx_border_r ltx_border_t" id="S4.T3.12.12.1"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.12.12.1.m1.1"><semantics id="S4.T3.12.12.1.m1.1a"><mo id="S4.T3.12.12.1.m1.1.1" mathsize="70%" xref="S4.T3.12.12.1.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.12.12.1.m1.1b"><csymbol cd="latexml" id="S4.T3.12.12.1.m1.1.1.cmml" xref="S4.T3.12.12.1.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.12.12.1.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.12.12.1.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.12.12.1.1" style="font-size:70%;">2 mins</span> </td> <td class="ltx_td ltx_align_left ltx_border_b ltx_border_r ltx_border_t" id="S4.T3.14.14.3"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.13.13.2.m1.1"><semantics id="S4.T3.13.13.2.m1.1a"><mo id="S4.T3.13.13.2.m1.1.1" mathsize="70%" xref="S4.T3.13.13.2.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.13.13.2.m1.1b"><csymbol cd="latexml" id="S4.T3.13.13.2.m1.1.1.cmml" xref="S4.T3.13.13.2.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.13.13.2.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.13.13.2.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.14.14.3.1" style="font-size:70%;">6 mins (</span><math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.14.14.3.m2.1"><semantics id="S4.T3.14.14.3.m2.1a"><mo id="S4.T3.14.14.3.m2.1.1" mathsize="70%" xref="S4.T3.14.14.3.m2.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.14.14.3.m2.1b"><csymbol cd="latexml" id="S4.T3.14.14.3.m2.1.1.cmml" xref="S4.T3.14.14.3.m2.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.14.14.3.m2.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.14.14.3.m2.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.14.14.3.2" style="font-size:70%;">1.5 mins/trial)</span> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T3.15.15.4"> <math alttext="\sim" class="ltx_Math" display="inline" id="S4.T3.15.15.4.m1.1"><semantics id="S4.T3.15.15.4.m1.1a"><mo id="S4.T3.15.15.4.m1.1.1" mathsize="70%" xref="S4.T3.15.15.4.m1.1.1.cmml">∼</mo><annotation-xml encoding="MathML-Content" id="S4.T3.15.15.4.m1.1b"><csymbol cd="latexml" id="S4.T3.15.15.4.m1.1.1.cmml" xref="S4.T3.15.15.4.m1.1.1">similar-to</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S4.T3.15.15.4.m1.1c">\sim</annotation><annotation encoding="application/x-llamapun" id="S4.T3.15.15.4.m1.1d">∼</annotation></semantics></math><span class="ltx_text" id="S4.T3.15.15.4.1" style="font-size:70%;">4 mins</span> </td> </tr> </tbody> </table> </figure> <div class="ltx_para" id="S4.SS5.p1"> <p class="ltx_p" id="S4.SS5.p1.1">Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.T3" title="Table 3 ‣ 4.5. Experiment 4: Time Spent on Applying Optimizations ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">3</span></a> summarizes the time required to apply optimizations using each tool, including activities such as prompting the model, waiting for responses, implementing necessary code changes, and validating the results. Codee includes built-in auto-fixes for specific optimizations, allowing it to automatically transform code regardless of its size. Although many optimizations still require manual adjustments to the code, Codee provides the file, function, and line number where the optimizations should be applied. In comparison, LLMs can often automate code transformations. Nonetheless, Codee is generally faster than LLMs across experiments for the following reasons in EX2 and EX3. First, since LLMs are invoked via remote servers and the models used are large, the process of generating responses can be slow, especially when multiple sequential optimizations are applied, resulting in lengthy outputs. Notably, querying the Llama-3.2 API via Together AI <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib63" title="">together_ai, </a>)</cite> introduces a significant delay compared to other language models. Furthermore, LLMs are typically limited by output token constraints (e.g., 4k tokens for OpenAI o1), which can prevent them from generating complete code outputs, especially in EX2 with multiple serial optimizations. Moreover, some LLM outputs contain extra text or incomplete code, necessitating the manual extraction of code from LLM outputs. It is interesting to observe that OpenAI o1 often produces clean code despite its longest inference time, thereby decreasing the time required for subsequent processing.</p> </div> </section> <section class="ltx_subsection" id="S4.SS6"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.6. </span>Experiment 5: Correctness</h3> <figure class="ltx_table" id="S4.T4"> <figcaption class="ltx_caption ltx_centering" style="font-size:70%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T4.4.1.1" style="font-size:129%;">Table 4</span>. </span><span class="ltx_text" id="S4.T4.5.2" style="font-size:129%;">Comparison of correctness of code optimized by Codee and LLMs in EX1, EX2, and EX3.</span></figcaption> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S4.T4.6"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T4.6.1.1"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r ltx_border_t" id="S4.T4.6.1.1.1" rowspan="2"><span class="ltx_text ltx_font_bold" id="S4.T4.6.1.1.1.1" style="font-size:70%;">Outcome</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="4" id="S4.T4.6.1.1.2"><span class="ltx_text ltx_font_bold" id="S4.T4.6.1.1.2.1" style="font-size:70%;">EX1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="4" id="S4.T4.6.1.1.3"><span class="ltx_text ltx_font_bold" id="S4.T4.6.1.1.3.1" style="font-size:70%;">EX2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="4" id="S4.T4.6.1.1.4"><span class="ltx_text ltx_font_bold" id="S4.T4.6.1.1.4.1" style="font-size:70%;">EX3</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.1.1.5" rowspan="2"><span class="ltx_text ltx_font_bold" id="S4.T4.6.1.1.5.1" style="font-size:70%;">Total</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.2.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.1"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.1.1" style="font-size:70%;">Codee</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.2"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.2.1" style="font-size:70%;">O1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.3"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.3.1" style="font-size:70%;">L3.2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.4"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.4.1" style="font-size:70%;">C3.5</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.5"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.5.1" style="font-size:70%;">Codee</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.6"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.6.1" style="font-size:70%;">O1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.7"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.7.1" style="font-size:70%;">L3.2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.8"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.8.1" style="font-size:70%;">C3.5</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.9"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.9.1" style="font-size:70%;">Codee</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.10"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.10.1" style="font-size:70%;">O1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.11"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.11.1" style="font-size:70%;">L3.2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.2.2.12"><span class="ltx_text ltx_font_bold" id="S4.T4.6.2.2.12.1" style="font-size:70%;">C3.5</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.3.3"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r ltx_border_t" id="S4.T4.6.3.3.1"><span class="ltx_text ltx_font_bold" id="S4.T4.6.3.3.1.1" style="font-size:70%;">Incorrect</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.2"><span class="ltx_text" id="S4.T4.6.3.3.2.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.3"><span class="ltx_text" id="S4.T4.6.3.3.3.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.4"><span class="ltx_text" id="S4.T4.6.3.3.4.1" style="font-size:70%;">6</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.5"><span class="ltx_text" id="S4.T4.6.3.3.5.1" style="font-size:70%;">6</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.6"><span class="ltx_text" id="S4.T4.6.3.3.6.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.7"><span class="ltx_text" id="S4.T4.6.3.3.7.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.8"><span class="ltx_text" id="S4.T4.6.3.3.8.1" style="font-size:70%;">8</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.9"><span class="ltx_text" id="S4.T4.6.3.3.9.1" style="font-size:70%;">7</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.10"><span class="ltx_text" id="S4.T4.6.3.3.10.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.11"><span class="ltx_text" id="S4.T4.6.3.3.11.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.12"><span class="ltx_text" id="S4.T4.6.3.3.12.1" style="font-size:70%;">7</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.3.3.13"><span class="ltx_text" id="S4.T4.6.3.3.13.1" style="font-size:70%;">8</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.3.3.14"><span class="ltx_text" id="S4.T4.6.3.3.14.1" style="font-size:70%;">48</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.4.4"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r ltx_border_t" id="S4.T4.6.4.4.1"><span class="ltx_text" id="S4.T4.6.4.4.1.1" style="font-size:70%;"> Compilation errors</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.2"><span class="ltx_text" id="S4.T4.6.4.4.2.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.3"><span class="ltx_text" id="S4.T4.6.4.4.3.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.4"><span class="ltx_text" id="S4.T4.6.4.4.4.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.5"><span class="ltx_text" id="S4.T4.6.4.4.5.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.6"><span class="ltx_text" id="S4.T4.6.4.4.6.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.7"><span class="ltx_text" id="S4.T4.6.4.4.7.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.8"><span class="ltx_text" id="S4.T4.6.4.4.8.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.9"><span class="ltx_text" id="S4.T4.6.4.4.9.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.10"><span class="ltx_text" id="S4.T4.6.4.4.10.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.11"><span class="ltx_text" id="S4.T4.6.4.4.11.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.12"><span class="ltx_text" id="S4.T4.6.4.4.12.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.4.4.13"><span class="ltx_text" id="S4.T4.6.4.4.13.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.4.4.14"><span class="ltx_text" id="S4.T4.6.4.4.14.1" style="font-size:70%;">3</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.5.5"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r" id="S4.T4.6.5.5.1"><span class="ltx_text" id="S4.T4.6.5.5.1.1" style="font-size:70%;"> No LLM generated code</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.2"><span class="ltx_text" id="S4.T4.6.5.5.2.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.3"><span class="ltx_text" id="S4.T4.6.5.5.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.4"><span class="ltx_text" id="S4.T4.6.5.5.4.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.5"><span class="ltx_text" id="S4.T4.6.5.5.5.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.6"><span class="ltx_text" id="S4.T4.6.5.5.6.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.7"><span class="ltx_text" id="S4.T4.6.5.5.7.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.8"><span class="ltx_text" id="S4.T4.6.5.5.8.1" style="font-size:70%;">6</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.9"><span class="ltx_text" id="S4.T4.6.5.5.9.1" style="font-size:70%;">5</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.10"><span class="ltx_text" id="S4.T4.6.5.5.10.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.11"><span class="ltx_text" id="S4.T4.6.5.5.11.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.12"><span class="ltx_text" id="S4.T4.6.5.5.12.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.5.5.13"><span class="ltx_text" id="S4.T4.6.5.5.13.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.5.5.14"><span class="ltx_text" id="S4.T4.6.5.5.14.1" style="font-size:70%;">19</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.6.6"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r" id="S4.T4.6.6.6.1"><span class="ltx_text" id="S4.T4.6.6.6.1.1" style="font-size:70%;"> Incorrect results - Output mismatch</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.2"><span class="ltx_text" id="S4.T4.6.6.6.2.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.3"><span class="ltx_text" id="S4.T4.6.6.6.3.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.4"><span class="ltx_text" id="S4.T4.6.6.6.4.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.5"><span class="ltx_text" id="S4.T4.6.6.6.5.1" style="font-size:70%;">3</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.6"><span class="ltx_text" id="S4.T4.6.6.6.6.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.7"><span class="ltx_text" id="S4.T4.6.6.6.7.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.8"><span class="ltx_text" id="S4.T4.6.6.6.8.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.9"><span class="ltx_text" id="S4.T4.6.6.6.9.1" style="font-size:70%;">2</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.10"><span class="ltx_text" id="S4.T4.6.6.6.10.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.11"><span class="ltx_text" id="S4.T4.6.6.6.11.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.12"><span class="ltx_text" id="S4.T4.6.6.6.12.1" style="font-size:70%;">5</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.6.6.13"><span class="ltx_text" id="S4.T4.6.6.6.13.1" style="font-size:70%;">6</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.6.6.14"><span class="ltx_text" id="S4.T4.6.6.6.14.1" style="font-size:70%;">24</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.7.7"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r" id="S4.T4.6.7.7.1"><span class="ltx_text" id="S4.T4.6.7.7.1.1" style="font-size:70%;"> LLM - Failed to follow instructions</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.2"><span class="ltx_text" id="S4.T4.6.7.7.2.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.3"><span class="ltx_text" id="S4.T4.6.7.7.3.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.4"><span class="ltx_text" id="S4.T4.6.7.7.4.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.5"><span class="ltx_text" id="S4.T4.6.7.7.5.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.6"><span class="ltx_text" id="S4.T4.6.7.7.6.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.7"><span class="ltx_text" id="S4.T4.6.7.7.7.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.8"><span class="ltx_text" id="S4.T4.6.7.7.8.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.9"><span class="ltx_text" id="S4.T4.6.7.7.9.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.10"><span class="ltx_text" id="S4.T4.6.7.7.10.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.11"><span class="ltx_text" id="S4.T4.6.7.7.11.1" style="font-size:70%;">1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.12"><span class="ltx_text" id="S4.T4.6.7.7.12.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.7.7.13"><span class="ltx_text" id="S4.T4.6.7.7.13.1" style="font-size:70%;">0</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.7.7.14"><span class="ltx_text" id="S4.T4.6.7.7.14.1" style="font-size:70%;">2</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.8.8"> <td class="ltx_td ltx_align_left ltx_border_l ltx_border_r ltx_border_t" id="S4.T4.6.8.8.1"><span class="ltx_text ltx_font_bold" id="S4.T4.6.8.8.1.1" style="font-size:70%;">Correct</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.2"><span class="ltx_text" id="S4.T4.6.8.8.2.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.3"><span class="ltx_text" id="S4.T4.6.8.8.3.1" style="font-size:70%;">18</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.4"><span class="ltx_text" id="S4.T4.6.8.8.4.1" style="font-size:70%;">14</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.5"><span class="ltx_text" id="S4.T4.6.8.8.5.1" style="font-size:70%;">14</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.6"><span class="ltx_text" id="S4.T4.6.8.8.6.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.7"><span class="ltx_text" id="S4.T4.6.8.8.7.1" style="font-size:70%;">17</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.8"><span class="ltx_text" id="S4.T4.6.8.8.8.1" style="font-size:70%;">12</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.9"><span class="ltx_text" id="S4.T4.6.8.8.9.1" style="font-size:70%;">13</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.10"><span class="ltx_text" id="S4.T4.6.8.8.10.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.11"><span class="ltx_text" id="S4.T4.6.8.8.11.1" style="font-size:70%;">19</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.12"><span class="ltx_text" id="S4.T4.6.8.8.12.1" style="font-size:70%;">13</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T4.6.8.8.13"><span class="ltx_text" id="S4.T4.6.8.8.13.1" style="font-size:70%;">12</span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S4.T4.6.8.8.14"><span class="ltx_text" id="S4.T4.6.8.8.14.1" style="font-size:70%;">192</span></td> </tr> <tr class="ltx_tr" id="S4.T4.6.9.9"> <td class="ltx_td ltx_align_left ltx_border_b ltx_border_l ltx_border_r ltx_border_t" id="S4.T4.6.9.9.1"><span class="ltx_text ltx_font_bold" id="S4.T4.6.9.9.1.1" style="font-size:70%;">Total</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.2"><span class="ltx_text" id="S4.T4.6.9.9.2.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.3"><span class="ltx_text" id="S4.T4.6.9.9.3.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.4"><span class="ltx_text" id="S4.T4.6.9.9.4.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.5"><span class="ltx_text" id="S4.T4.6.9.9.5.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.6"><span class="ltx_text" id="S4.T4.6.9.9.6.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.7"><span class="ltx_text" id="S4.T4.6.9.9.7.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.8"><span class="ltx_text" id="S4.T4.6.9.9.8.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.9"><span class="ltx_text" id="S4.T4.6.9.9.9.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.10"><span class="ltx_text" id="S4.T4.6.9.9.10.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.11"><span class="ltx_text" id="S4.T4.6.9.9.11.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.12"><span class="ltx_text" id="S4.T4.6.9.9.12.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.13"><span class="ltx_text" id="S4.T4.6.9.9.13.1" style="font-size:70%;">20</span></td> <td class="ltx_td ltx_align_left ltx_border_b ltx_border_r ltx_border_t" id="S4.T4.6.9.9.14"><span class="ltx_text" id="S4.T4.6.9.9.14.1" style="font-size:70%;">240</span></td> </tr> </tbody> </table> </figure> <div class="ltx_para" id="S4.SS6.p1"> <p class="ltx_p" id="S4.SS6.p1.1">Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.T4" title="Table 4 ‣ 4.6. Experiment 5: Correctness ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">4</span></a> shows that Codee consistently demonstrates a perfect correctness rate across all benchmarks, outperforming the LLMs, which do not tend to always provide the desired results. Codee can accurately optimize code across various scenarios using compiler-based analysis. Despite their broader applicability, LLMs often struggle with analyzing data flow, cross-iteration dependencies, and function invocations, particularly in complex benchmarks. </p> </div> <div class="ltx_para" id="S4.SS6.p2"> <p class="ltx_p" id="S4.SS6.p2.1">In EX1, all LLMs fail to yield codes that match expected outputs/results for the <span class="ltx_text ltx_font_italic" id="S4.SS6.p2.1.1">Srad</span> benchmark. As shown in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F6" title="Figure 6 ‣ 4.6. Experiment 5: Correctness ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">6</span></a>, the correct logic is to first compute all derivatives from the old <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.2">J</span>, then apply them in a separate loop. OpenAI o1 fails because it incorrectly calculates the directional derivatives—<span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.3">dN</span>, <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.4">dS</span>, <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.5">dW</span>, and <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.6">dE</span> using the image <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.7">J</span> in the same loop that updates <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.8">J</span>. While traversing <span class="ltx_text ltx_font_typewriter" id="S4.SS6.p2.1.9">J</span> in the row-major order, some positions (those with smaller indices) are already updated while others are not, leading to old and new values being mixed, and this breaks the fundamental requirement of the <span class="ltx_text ltx_font_italic" id="S4.SS6.p2.1.10">Srad</span> algorithm that the directional derivatives for each pixel must be computed from the same iteration. Llama-3.2 and Claude-3.5 fail due to similar reasons by collapsing two nested loops into one.</p> </div> <figure class="ltx_figure" id="S4.F6"> <div class="ltx_listing ltx_lst_language_C ltx_lstlisting ltx_align_center ltx_framed ltx_framed_rectangle ltx_listing" id="S4.F6.2" style="background-color:#F2F2EB;"> <div class="ltx_listing_data"><a download="" href="data:text/plain;base64,Zm9yIChpbnQgaSA9IDA7IGkgPCByb3dzOyBpKyspCiAgICBmb3IgKGludCBqID0gMDsgaiA8IGNvbHM7IGorKykgewogICAgICAgIC8vIENhbGN1bGF0ZSBkaXJlY3Rpb25hbCBkZXJpdmF0aXZlcwogICAgICAgIGROW2tdID0gSltpTltpXSAqIGNvbHMgKyBqXSAtIEpjOwogICAgICAgIGRTW2tdID0gSltpU1tpXSAqIGNvbHMgKyBqXSAtIEpjOwogICAgICAgIGRXW2tdID0gSltpICogY29scyArIGpXW2pdXSAtIEpjOwogICAgICAgIGRFW2tdID0gSltpICogY29scyArIGpFW2pdXSAtIEpjOwogICAgfQpmb3IgKGludCBpID0gMDsgaSA8IHJvd3M7IGkrKykKICAgIGZvciAoaW50IGogPSAwOyBqIDwgY29sczsgaisrKSB7CiAgICAgICAgLy8gTG9hZCBkTiwgZFMsIGRXLCBhbmQgZEUKICAgICAgICBEID0gY04gKiBkTiArIGNTICogZFMgKyBjVyAqIGRXICsgY0UgKiBkRTsKICAgICAgICAvLyBVcGRhdGUgSgogICAgICAgIEpba10gPSBKW2tdICsgMC4yNSAqIGxhbWJkYSAqIEQ7CiAgICB9">⬇</a></div> <div class="ltx_listingline" id="lstnumberx13"> <span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx13.1" style="font-size:70%;color:#0000FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.2" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx13.3" style="font-size:70%;">(</span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx13.4" style="font-size:70%;color:#0000FF;">int</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.5" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx13.6" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.7" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx13.8" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.9" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx13.10" style="font-size:70%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.11" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx13.12" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.13" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx13.14" style="font-size:70%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.15" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx13.16" style="font-size:70%;">rows</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx13.17" style="font-size:70%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx13.18" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx13.19" style="font-size:70%;">i</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx13.20" style="font-size:70%;">++)</span> </div> <div class="ltx_listingline" id="lstnumberx14"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx14.2" style="font-size:70%;color:#0000FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.3" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.4" style="font-size:70%;">(</span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx14.5" style="font-size:70%;color:#0000FF;">int</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.6" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx14.7" style="font-size:70%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.8" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.9" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.10" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.11" style="font-size:70%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.12" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx14.13" style="font-size:70%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.14" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.15" style="font-size:70%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.16" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx14.17" style="font-size:70%;">cols</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.18" style="font-size:70%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.19" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx14.20" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.21" style="font-size:70%;">++)</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx14.22" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx14.23" style="font-size:70%;">{</span> </div> <div class="ltx_listingline" id="lstnumberx15"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx15.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_comment ltx_font_typewriter" id="lstnumberx15.2" style="font-size:70%;color:#808080;">//<span class="ltx_text ltx_lst_space" id="lstnumberx15.2.1"> </span>Calculate<span class="ltx_text ltx_lst_space" id="lstnumberx15.2.2"> </span>directional<span class="ltx_text ltx_lst_space" id="lstnumberx15.2.3"> </span>derivatives</span> </div> <div class="ltx_listingline" id="lstnumberx16"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.2" style="font-size:70%;">dN</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.3" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.4" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.5" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.6" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.7" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.8" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.9" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.10" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.11" style="font-size:70%;">iN</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.12" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.13" style="font-size:70%;">i</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.14" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.15" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.16" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.17" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.18" style="font-size:70%;">cols</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.19" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.20" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.21" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.22" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.23" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.24" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.25" style="font-size:70%;">-</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx16.26" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx16.27" style="font-size:70%;">Jc</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx16.28" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx17"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.2" style="font-size:70%;">dS</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.3" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.4" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.5" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.6" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.7" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.8" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.9" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.10" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.11" style="font-size:70%;">iS</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.12" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.13" style="font-size:70%;">i</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.14" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.15" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.16" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.17" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.18" style="font-size:70%;">cols</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.19" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.20" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.21" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.22" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.23" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.24" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.25" style="font-size:70%;">-</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx17.26" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx17.27" style="font-size:70%;">Jc</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx17.28" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx18"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.2" style="font-size:70%;">dW</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.3" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.4" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.5" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.6" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.7" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.8" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.9" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.10" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.11" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.12" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.13" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.14" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.15" style="font-size:70%;">cols</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.16" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.17" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.18" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.19" style="font-size:70%;">jW</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.20" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.21" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.22" style="font-size:70%;">]]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.23" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.24" style="font-size:70%;">-</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx18.25" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx18.26" style="font-size:70%;">Jc</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx18.27" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx19"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.2" style="font-size:70%;">dE</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.3" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.4" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.5" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.6" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.7" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.8" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.9" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.10" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.11" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.12" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.13" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.14" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.15" style="font-size:70%;">cols</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.16" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.17" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.18" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.19" style="font-size:70%;">jE</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.20" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.21" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.22" style="font-size:70%;">]]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.23" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.24" style="font-size:70%;">-</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx19.25" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx19.26" style="font-size:70%;">Jc</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx19.27" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx20"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx20.1" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx20.2" style="font-size:70%;">}</span> </div> <div class="ltx_listingline" id="lstnumberx21"> <span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx21.1" style="font-size:70%;color:#0000FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.2" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx21.3" style="font-size:70%;">(</span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx21.4" style="font-size:70%;color:#0000FF;">int</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.5" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx21.6" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.7" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx21.8" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.9" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx21.10" style="font-size:70%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.11" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx21.12" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.13" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx21.14" style="font-size:70%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.15" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx21.16" style="font-size:70%;">rows</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx21.17" style="font-size:70%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx21.18" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx21.19" style="font-size:70%;">i</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx21.20" style="font-size:70%;">++)</span> </div> <div class="ltx_listingline" id="lstnumberx22"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx22.2" style="font-size:70%;color:#0000FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.3" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.4" style="font-size:70%;">(</span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx22.5" style="font-size:70%;color:#0000FF;">int</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.6" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx22.7" style="font-size:70%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.8" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.9" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.10" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.11" style="font-size:70%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.12" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx22.13" style="font-size:70%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.14" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.15" style="font-size:70%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.16" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx22.17" style="font-size:70%;">cols</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.18" style="font-size:70%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.19" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx22.20" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.21" style="font-size:70%;">++)</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx22.22" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx22.23" style="font-size:70%;">{</span> </div> <div class="ltx_listingline" id="lstnumberx23"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx23.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_comment ltx_font_typewriter" id="lstnumberx23.2" style="font-size:70%;color:#808080;">//<span class="ltx_text ltx_lst_space" id="lstnumberx23.2.1"> </span>Load<span class="ltx_text ltx_lst_space" id="lstnumberx23.2.2"> </span>dN,<span class="ltx_text ltx_lst_space" id="lstnumberx23.2.3"> </span>dS,<span class="ltx_text ltx_lst_space" id="lstnumberx23.2.4"> </span>dW,<span class="ltx_text ltx_lst_space" id="lstnumberx23.2.5"> </span>and<span class="ltx_text ltx_lst_space" id="lstnumberx23.2.6"> </span>dE</span> </div> <div class="ltx_listingline" id="lstnumberx24"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.2" style="font-size:70%;">D</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.3" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.4" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.5" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.6" style="font-size:70%;">cN</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.7" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.8" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.9" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.10" style="font-size:70%;">dN</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.11" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.12" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.13" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.14" style="font-size:70%;">cS</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.15" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.16" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.17" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.18" style="font-size:70%;">dS</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.19" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.20" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.21" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.22" style="font-size:70%;">cW</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.23" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.24" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.25" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.26" style="font-size:70%;">dW</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.27" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.28" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.29" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.30" style="font-size:70%;">cE</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.31" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.32" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx24.33" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx24.34" style="font-size:70%;">dE</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx24.35" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx25"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx25.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_comment ltx_font_typewriter" id="lstnumberx25.2" style="font-size:70%;color:#808080;">//<span class="ltx_text ltx_lst_space" id="lstnumberx25.2.1"> </span>Update<span class="ltx_text ltx_lst_space" id="lstnumberx25.2.2"> </span>J</span> </div> <div class="ltx_listingline" id="lstnumberx26"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx26.2" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.3" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx26.4" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.5" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.6" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.7" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.8" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx26.9" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.10" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx26.11" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.12" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.13" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.14" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.15" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.16" style="font-size:70%;">0.25</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.17" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.18" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.19" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx26.20" style="font-size:70%;">lambda</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.21" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.22" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx26.23" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx26.24" style="font-size:70%;">D</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx26.25" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx27"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx27.1" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx27.2" style="font-size:70%;">}</span> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F6.4.1.1" style="font-size:90%;">Figure 6</span>. </span><span class="ltx_text" id="S4.F6.5.2" style="font-size:90%;">Original code snippet of <span class="ltx_text ltx_font_italic" id="S4.F6.5.2.1">Srad</span></span></figcaption> </figure> <figure class="ltx_figure" id="S4.F7"> <div class="ltx_listing ltx_lst_language_C ltx_lstlisting ltx_align_center ltx_framed ltx_framed_rectangle ltx_listing" id="S4.F7.2" style="background-color:#F2F2EB;"> <div class="ltx_listing_data"><a download="" href="data:text/plain;base64,Zm9yIChpbnQgaSA9IDA7IGkgPCByb3dzOyBpKyspIHsKICAgIGZvciAoaW50IGogPSAwOyBqIDwgY29sczsgaisrKSB7CiAgICAgICAgLy8gQ2FsY3VsYXRlcyBkaXJlY3Rpb25hbCBkZXJpdmF0aXZlcyBkTiwgZFMsIGRXLCBkRSBiYXNlZCBvbiBKCiAgICAgICAgRCA9IGNOICogZE4gKyBjUyAqIGRTICsgY1cgKiBkVyArIGNFICogZEU7CiAgICAgICAgSltrXSA9IEpba10gKyAwLjI1ICogbGFtYmRhICogRDsKICAgIH0KfQ==">⬇</a></div> <div class="ltx_listingline" id="lstnumberx28"> <span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx28.1" style="font-size:70%;color:#0000FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.2" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.3" style="font-size:70%;">(</span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx28.4" style="font-size:70%;color:#0000FF;">int</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.5" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx28.6" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.7" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.8" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.9" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.10" style="font-size:70%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.11" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx28.12" style="font-size:70%;">i</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.13" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.14" style="font-size:70%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.15" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx28.16" style="font-size:70%;">rows</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.17" style="font-size:70%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.18" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx28.19" style="font-size:70%;">i</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.20" style="font-size:70%;">++)</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx28.21" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx28.22" style="font-size:70%;">{</span> </div> <div class="ltx_listingline" id="lstnumberx29"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx29.2" style="font-size:70%;color:#0000FF;">for</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.3" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.4" style="font-size:70%;">(</span><span class="ltx_text ltx_lst_keyword ltx_font_typewriter" id="lstnumberx29.5" style="font-size:70%;color:#0000FF;">int</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.6" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx29.7" style="font-size:70%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.8" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.9" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.10" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.11" style="font-size:70%;">0;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.12" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx29.13" style="font-size:70%;">j</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.14" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.15" style="font-size:70%;"><</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.16" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx29.17" style="font-size:70%;">cols</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.18" style="font-size:70%;">;</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.19" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx29.20" style="font-size:70%;">j</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.21" style="font-size:70%;">++)</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx29.22" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx29.23" style="font-size:70%;">{</span> </div> <div class="ltx_listingline" id="lstnumberx30"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx30.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_comment ltx_font_typewriter" id="lstnumberx30.2" style="font-size:70%;color:#808080;">//<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.1"> </span>Calculates<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.2"> </span>directional<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.3"> </span>derivatives<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.4"> </span>dN,<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.5"> </span>dS,<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.6"> </span>dW,<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.7"> </span>dE<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.8"> </span>based<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.9"> </span>on<span class="ltx_text ltx_lst_space" id="lstnumberx30.2.10"> </span>J</span> </div> <div class="ltx_listingline" id="lstnumberx31"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.2" style="font-size:70%;">D</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.3" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.4" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.5" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.6" style="font-size:70%;">cN</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.7" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.8" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.9" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.10" style="font-size:70%;">dN</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.11" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.12" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.13" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.14" style="font-size:70%;">cS</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.15" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.16" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.17" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.18" style="font-size:70%;">dS</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.19" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.20" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.21" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.22" style="font-size:70%;">cW</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.23" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.24" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.25" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.26" style="font-size:70%;">dW</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.27" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.28" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.29" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.30" style="font-size:70%;">cE</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.31" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.32" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx31.33" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx31.34" style="font-size:70%;">dE</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx31.35" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx32"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.1" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx32.2" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.3" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx32.4" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.5" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.6" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.7" style="font-size:70%;">=</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.8" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx32.9" style="font-size:70%;">J</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.10" style="font-size:70%;">[</span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx32.11" style="font-size:70%;">k</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.12" style="font-size:70%;">]</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.13" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.14" style="font-size:70%;">+</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.15" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.16" style="font-size:70%;">0.25</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.17" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.18" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.19" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx32.20" style="font-size:70%;">lambda</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.21" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.22" style="font-size:70%;">*</span><span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx32.23" style="font-size:70%;"> </span><span class="ltx_text ltx_lst_identifier ltx_font_typewriter" id="lstnumberx32.24" style="font-size:70%;">D</span><span class="ltx_text ltx_font_typewriter" id="lstnumberx32.25" style="font-size:70%;">;</span> </div> <div class="ltx_listingline" id="lstnumberx33"> <span class="ltx_text ltx_lst_space ltx_font_typewriter" id="lstnumberx33.1" style="font-size:70%;"> </span><span class="ltx_text ltx_font_typewriter" id="lstnumberx33.2" style="font-size:70%;">}</span> </div> <div class="ltx_listingline" id="lstnumberx34"> <span class="ltx_text ltx_font_typewriter" id="lstnumberx34.1" style="font-size:70%;">}</span> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F7.4.1.1" style="font-size:90%;">Figure 7</span>. </span><span class="ltx_text" id="S4.F7.5.2" style="font-size:90%;">Optimized code of <span class="ltx_text ltx_font_italic" id="S4.F7.5.2.1">Srad</span> - OpenAI o1</span></figcaption> </figure> <div class="ltx_para" id="S4.SS6.p3"> <p class="ltx_p" id="S4.SS6.p3.1">LLMs also tend to output code that is completely irrelevant for the given context despite giving well-structured prompts as inputs. For the <span class="ltx_text ltx_font_italic" id="S4.SS6.p3.1.1">COULOMB</span> example in EX2, Llama-3.2 produces some code that is completely unrelated to the prompt which provided a code snippet that needed to be optimized, which is a classic case of <span class="ltx_text ltx_font_italic" id="S4.SS6.p3.1.2">Hallucination</span> <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib70" title="">xu2024hallucinationinevitableinnatelimitation, </a>)</cite>. In the case of <span class="ltx_text ltx_font_italic" id="S4.SS6.p3.1.3">MATMUL</span> in EX3, when instructed to apply parallel optimizations, OpenAI o1 tends to change the memory access patterns, which isn’t really a parallel optimization strategy, an example of LLMs not following instructions to yield codes that aren’t meaningful. LLMs also lack a precise understanding of compiler features. In EX3, for <span class="ltx_text ltx_font_italic" id="S4.SS6.p3.1.4">Cholesky</span>, Claude-3.5 fails to generate code that yields correct results.</p> </div> </section> <section class="ltx_subsection" id="S4.SS7"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.7. </span>Experiment 6: HPC Commonsense</h3> <figure class="ltx_figure" id="S4.SS7.2"> <div class="ltx_block" id="S4.SS7.2.3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center ltx_align_bottom" id="S4.F8" style="width:433.6pt;"><img alt="Refer to caption" class="ltx_graphics ltx_figure_panel ltx_img_landscape" height="406" id="S4.SS7.1.1.g1" src="x5.png" width="830"/> <br class="ltx_break ltx_break"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F8.2.1.1" style="font-size:90%;">Figure 8</span>. </span><span class="ltx_text" id="S4.F8.3.2" style="font-size:90%;">Average speedups (Y-axis) achieved by the tools by domain in EX1 and EX2.</span></figcaption> </figure> <figure class="ltx_figure ltx_figure_panel ltx_align_center ltx_align_bottom" id="S4.F9" style="width:433.6pt;"><img alt="Refer to caption" class="ltx_graphics ltx_figure_panel ltx_img_landscape" height="409" id="S4.SS7.2.2.g1" src="x6.png" width="830"/> <br class="ltx_break ltx_break"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F9.2.1.1" style="font-size:90%;">Figure 9</span>. </span><span class="ltx_text" id="S4.F9.3.2" style="font-size:90%;">Average speedups (Y-axis) achieved by the tools by domain in EX3.</span></figcaption> </figure> </div> </figure> <figure class="ltx_figure" id="S4.F10"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="698" id="S4.F10.g1" src="x7.png" width="664"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F10.2.1.1" style="font-size:90%;">Figure 10</span>. </span><span class="ltx_text" id="S4.F10.3.2" style="font-size:90%;">Distribution of optimizations applied by each tool</span></figcaption> </figure> <div class="ltx_para" id="S4.SS7.p1"> <p class="ltx_p" id="S4.SS7.p1.1">We assess HPC commonsense of tools from two perspectives. First, we assess if the speedup achieved by each tool across all computation motifs in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.T1" title="Table 1 ‣ Traditional Performance Analysis Tools ‣ 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">1</span></a>. Second, we analyze the distribution of optimizations applied by each tool, as well as the rationale behind the optimizations.</p> </div> <div class="ltx_para" id="S4.SS7.p2"> <p class="ltx_p" id="S4.SS7.p2.1">From Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F8" title="Figure 8 ‣ 4.7. Experiment 6: HPC Commonsense ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">8</span></a> and Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F9" title="Figure 9 ‣ 4.7. Experiment 6: HPC Commonsense ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">9</span></a>, we observe that different domains demonstrate varying speedups when employing specific tools. Codee achieves good results in Spectral Methods and Structured Grids using serial optimization strategies, while excelling in Dense Linear Algebra and N-body Methods in a parallel setting. We believe Codee’s familiarity with the nature of computation logic involved in these domains makes it a better code optimizer in such use cases. For instance, from Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F10" title="Figure 10 ‣ 4.7. Experiment 6: HPC Commonsense ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">10</span></a> it is quite evident that a large chunk of the optimizations suggested by Codee revolve around mathematical optimizations including “Fused multiply-add,” “Change of precision”, and “Mathematical simplification.” Codee, on the other hand, fails to yield speedups in the Dynamic Programming domain, especially in the parallel setting. In comparison, OpenAI o1 achieves a speedup of ~7<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS7.p2.1.m1.1"><semantics id="S4.SS7.p2.1.m1.1a"><mo id="S4.SS7.p2.1.m1.1.1" xref="S4.SS7.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS7.p2.1.m1.1b"><times id="S4.SS7.p2.1.m1.1.1.cmml" xref="S4.SS7.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS7.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS7.p2.1.m1.1d">×</annotation></semantics></math> in the Dynamic Programming domain in EX3. For Dynamic Programming problems, the original form of the many state transition equations cannot be parallelized, due to interdependencies between successive iterations. However, we found that o1 can change the access order of the state transition table to be populated, reducing dependencies so that each column in the table can be updated by individual threads <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib65" title="">venkatachalam2014faster, </a>)</cite>. This finding highlights the potential of LLMs in understanding user code to perform adaptive optimizations.</p> </div> <div class="ltx_para" id="S4.SS7.p3"> <p class="ltx_p" id="S4.SS7.p3.1">Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S4.F10" title="Figure 10 ‣ 4.7. Experiment 6: HPC Commonsense ‣ 4. Evaluation ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">10</span></a> illustrates the distribution of optimizations applied by each tool. Codee’s optimizations are straightforward to summarize, as it categorizes inefficiencies using canonical terminologies such as “fused multiply-add,” “loop fission,” and “loop fusion,” following the structured taxonomy of performance issues outlined in <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib21" title="">codee2024opencatalog, </a>)</cite>. Among these, “fused multiply-add” and “OpenMP scoping” are the most frequently applied optimizations. In contrast, LLMs propose a broader spectrum of optimizations, characterized by more flexible transformations. These are often described using general terms such as “pre-computing constants” and “reducing function overhead,” which, while versatile, may lack the specificity required to precisely identify the transformations applied. A notable trend among LLMs is the frequent emphasis on memory optimizations, which constitute a significant portion of their recommendations, as reflected in the distribution chart. For scenarios involving nested loop structures, OpenAI o1, like Codee, commonly suggests “loop reordering” as an optimization strategy. However, this opportunity is not identified by other models, such as Llama-3.2 and Claude-3.5. When analyzing the code provided in the initial prompt, OpenAI o1 distinguishes itself by offering detailed logical reasoning to justify the chosen optimizations. In contrast, other LLMs tend to lack this level of explanation. Codee also generates a comprehensive report of the entire codebase, including line numbers from all dependent files, performance problems with the code, and potential optimizations. However, it is important to note that Codee’s suggestions are based on predefined heuristics and cannot customize based on application-specific characteristics.</p> </div> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">5. </span>Case Studies</h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">We describe the evaluation of level 3 benchmarks in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S2.T1" title="Table 1 ‣ Traditional Performance Analysis Tools ‣ 2. Related Work ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">1</span></a> in this section. Experiments were performed on up to eight <span class="ltx_text ltx_font_italic" id="S5.p1.1.1">AMD EPYC 7543</span> processors. We provide first an overview of an agent system we designed to conduct studies. Subsequently, we discuss our findings on each benchmark.</p> </div> <section class="ltx_subsection" id="S5.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">5.1. </span>Performance Optimization Agent</h3> <figure class="ltx_figure" id="S5.F11"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="502" id="S5.F11.g1" src="x8.png" width="829"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S5.F11.2.1.1" style="font-size:90%;">Figure 11</span>. </span><span class="ltx_text" id="S5.F11.3.2" style="font-size:90%;">Overview of the Performance Optimization Agent</span></figcaption> </figure> <div class="ltx_para" id="S5.SS1.p1"> <p class="ltx_p" id="S5.SS1.p1.1">Developing and improving HPC applications is an intricate and iterative process that targets efficiency and scalability. While traditional tools provide insights through runtime or static inspection, the interpretation and application of these insights rely heavily on human expertise. This process can be time-intensive and error-prone. Advancements in AI-driven code assistants <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib56" title="">rozière2024codellamaopenfoundation, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib32" title="">guo2024deepseekcoderlargelanguagemodel, </a>; <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib40" title="">li2023starcodersourceyou, </a>)</cite> are bringing a paradigm shift in this domain. On the other hand, while LLMs excel in automatically generating small and general-purpose code, they often fail to analyze large HPC codebases due to the lack of understanding of program structure, invocation context, and performance characteristics.</p> </div> <div class="ltx_para" id="S5.SS1.p2"> <p class="ltx_p" id="S5.SS1.p2.1">To address these challenges, we developed a performance optimization agent leveraging OpenAI o1 as the “brain” of the system, as illustrated in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S5.F11" title="Figure 11 ‣ 5.1. Performance Optimization Agent ‣ 5. Case Studies ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">11</span></a>. This agent facilitates HPC code development by integrating profiling feedback into the optimization process. It serves as a baseline for evaluating the effectiveness of LLMs in enhancing productivity in real HPC application development. Similar to other agent systems, our performance optimization agent comprises core components: <span class="ltx_text ltx_font_italic" id="S5.SS1.p2.1.1">Memory</span>, <span class="ltx_text ltx_font_italic" id="S5.SS1.p2.1.2">Tools</span>, <span class="ltx_text ltx_font_italic" id="S5.SS1.p2.1.3">Planning</span>, and <span class="ltx_text ltx_font_italic" id="S5.SS1.p2.1.4">User Requests</span>. The agent’s workflow begins with time-based profiling using HPCToolkit. Profiling data is then converted into JSON structures—more suitable for LLM processing—using Hatchet <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib13" title="">10.1145/3295500.3356219, </a>)</cite>. In the profiling data, we include information on individual source lines, their invocation contexts, and associated metrics such as total CPU execution time and L1 data cache miss rate. In addition to profiling data, user-provided inputs are incorporated to enrich the execution context. These inputs include environment configurations (e.g., hardware and software resources), the number of threads, ranks, and iterations for the application’s runtime. With this comprehensive input, the LLM generates optimized code and recommends additional metrics for further performance analysis. Following this, the identified hotspot function is replaced with its optimized version. The agent then recompiles the application and measures performance improvements. This optimization loop continues iteratively until a predefined threshold (e.g., three iterations) is reached or the LLM suggests no further optimizations. Importantly, the metrics and program context obtained during each iteration are kept in the agent’s memory to allow the LLM to reference prior iterations for more informed optimization suggestions.</p> </div> </section> <section class="ltx_subsection" id="S5.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">5.2. </span>Case 1: NPB_CG</h3> <div class="ltx_para" id="S5.SS2.p1"> <p class="ltx_p" id="S5.SS2.p1.1">The <span class="ltx_text ltx_font_italic" id="S5.SS2.p1.1.1">Conjugate Gradient (CG)</span> benchmark, part of the <span class="ltx_text ltx_font_italic" id="S5.SS2.p1.1.2">NAS parallel Benchmarks (NPB)</span> suite, evaluates the performance of parallel supercomputers by simulating the solution of sparse linear systems using the conjugate gradient method.</p> </div> <div class="ltx_para" id="S5.SS2.p2"> <p class="ltx_p" id="S5.SS2.p2.6">The <span class="ltx_text ltx_font_italic" id="S5.SS2.p2.6.1">conj_grad</span> function was identified as the hotspot using HPCToolkit. Codee provided one L2 optimization suggestion by applying multithreading parallelism to the specific forall loop on the hotspot. The original code ran <math alttext="25.00" class="ltx_Math" display="inline" id="S5.SS2.p2.1.m1.1"><semantics id="S5.SS2.p2.1.m1.1a"><mn id="S5.SS2.p2.1.m1.1.1" xref="S5.SS2.p2.1.m1.1.1.cmml">25.00</mn><annotation-xml encoding="MathML-Content" id="S5.SS2.p2.1.m1.1b"><cn id="S5.SS2.p2.1.m1.1.1.cmml" type="float" xref="S5.SS2.p2.1.m1.1.1">25.00</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p2.1.m1.1c">25.00</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p2.1.m1.1d">25.00</annotation></semantics></math> seconds, where the hotspot spent <math alttext="22.00" class="ltx_Math" display="inline" id="S5.SS2.p2.2.m2.1"><semantics id="S5.SS2.p2.2.m2.1a"><mn id="S5.SS2.p2.2.m2.1.1" xref="S5.SS2.p2.2.m2.1.1.cmml">22.00</mn><annotation-xml encoding="MathML-Content" id="S5.SS2.p2.2.m2.1b"><cn id="S5.SS2.p2.2.m2.1.1.cmml" type="float" xref="S5.SS2.p2.2.m2.1.1">22.00</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p2.2.m2.1c">22.00</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p2.2.m2.1d">22.00</annotation></semantics></math> seconds (<math alttext="88.1\%" class="ltx_Math" display="inline" id="S5.SS2.p2.3.m3.1"><semantics id="S5.SS2.p2.3.m3.1a"><mrow id="S5.SS2.p2.3.m3.1.1" xref="S5.SS2.p2.3.m3.1.1.cmml"><mn id="S5.SS2.p2.3.m3.1.1.2" xref="S5.SS2.p2.3.m3.1.1.2.cmml">88.1</mn><mo id="S5.SS2.p2.3.m3.1.1.1" xref="S5.SS2.p2.3.m3.1.1.1.cmml">%</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS2.p2.3.m3.1b"><apply id="S5.SS2.p2.3.m3.1.1.cmml" xref="S5.SS2.p2.3.m3.1.1"><csymbol cd="latexml" id="S5.SS2.p2.3.m3.1.1.1.cmml" xref="S5.SS2.p2.3.m3.1.1.1">percent</csymbol><cn id="S5.SS2.p2.3.m3.1.1.2.cmml" type="float" xref="S5.SS2.p2.3.m3.1.1.2">88.1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p2.3.m3.1c">88.1\%</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p2.3.m3.1d">88.1 %</annotation></semantics></math>). The code optimized by Codee running with 16 threads took <math alttext="4.58" class="ltx_Math" display="inline" id="S5.SS2.p2.4.m4.1"><semantics id="S5.SS2.p2.4.m4.1a"><mn id="S5.SS2.p2.4.m4.1.1" xref="S5.SS2.p2.4.m4.1.1.cmml">4.58</mn><annotation-xml encoding="MathML-Content" id="S5.SS2.p2.4.m4.1b"><cn id="S5.SS2.p2.4.m4.1.1.cmml" type="float" xref="S5.SS2.p2.4.m4.1.1">4.58</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p2.4.m4.1c">4.58</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p2.4.m4.1d">4.58</annotation></semantics></math> seconds (<math alttext="5.46\times" class="ltx_math_unparsed" display="inline" id="S5.SS2.p2.5.m5.1"><semantics id="S5.SS2.p2.5.m5.1a"><mrow id="S5.SS2.p2.5.m5.1b"><mn id="S5.SS2.p2.5.m5.1.1">5.46</mn><mo id="S5.SS2.p2.5.m5.1.2" lspace="0.222em">×</mo></mrow><annotation encoding="application/x-tex" id="S5.SS2.p2.5.m5.1c">5.46\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p2.5.m5.1d">5.46 ×</annotation></semantics></math>). To compare with, applying the performance optimization agent with multiple iterations using 16 threads yielded a speedup of up to <math alttext="8.22\times" class="ltx_math_unparsed" display="inline" id="S5.SS2.p2.6.m6.1"><semantics id="S5.SS2.p2.6.m6.1a"><mrow id="S5.SS2.p2.6.m6.1b"><mn id="S5.SS2.p2.6.m6.1.1">8.22</mn><mo id="S5.SS2.p2.6.m6.1.2" lspace="0.222em">×</mo></mrow><annotation encoding="application/x-tex" id="S5.SS2.p2.6.m6.1c">8.22\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p2.6.m6.1d">8.22 ×</annotation></semantics></math> with no discrepancies in output correctness.</p> </div> <section class="ltx_paragraph" id="S5.SS2.SSS0.Px1"> <h5 class="ltx_title ltx_title_paragraph">Version 1</h5> <div class="ltx_para" id="S5.SS2.SSS0.Px1.p1"> <p class="ltx_p" id="S5.SS2.SSS0.Px1.p1.2">The agent optimized the code using <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px1.p1.2.1">#pragma omp parallel</span> to parallelize loops, <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px1.p1.2.2">#pragma unroll</span> to unroll loops, and <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px1.p1.2.3">#pragma omp simd</span> to enable vectorization. To be specific, in order to ensure the correctness for the CG method, it used <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px1.p1.2.4">#pragma omp parallel for reduction(+: rho)</span> or <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px1.p1.2.5">#pragma omp parallel for reduction(+: sum)</span> for corresponding loops to avoid data races. The optimized code’s runtime decreased to <math alttext="3.04" class="ltx_Math" display="inline" id="S5.SS2.SSS0.Px1.p1.1.m1.1"><semantics id="S5.SS2.SSS0.Px1.p1.1.m1.1a"><mn id="S5.SS2.SSS0.Px1.p1.1.m1.1.1" xref="S5.SS2.SSS0.Px1.p1.1.m1.1.1.cmml">3.04</mn><annotation-xml encoding="MathML-Content" id="S5.SS2.SSS0.Px1.p1.1.m1.1b"><cn id="S5.SS2.SSS0.Px1.p1.1.m1.1.1.cmml" type="float" xref="S5.SS2.SSS0.Px1.p1.1.m1.1.1">3.04</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.SSS0.Px1.p1.1.m1.1c">3.04</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.SSS0.Px1.p1.1.m1.1d">3.04</annotation></semantics></math> seconds (<math alttext="8.22\times" class="ltx_math_unparsed" display="inline" id="S5.SS2.SSS0.Px1.p1.2.m2.1"><semantics id="S5.SS2.SSS0.Px1.p1.2.m2.1a"><mrow id="S5.SS2.SSS0.Px1.p1.2.m2.1b"><mn id="S5.SS2.SSS0.Px1.p1.2.m2.1.1">8.22</mn><mo id="S5.SS2.SSS0.Px1.p1.2.m2.1.2" lspace="0.222em">×</mo></mrow><annotation encoding="application/x-tex" id="S5.SS2.SSS0.Px1.p1.2.m2.1c">8.22\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.SSS0.Px1.p1.2.m2.1d">8.22 ×</annotation></semantics></math>).</p> </div> </section> <section class="ltx_paragraph" id="S5.SS2.SSS0.Px2"> <h5 class="ltx_title ltx_title_paragraph">Version 2</h5> <div class="ltx_para" id="S5.SS2.SSS0.Px2.p1"> <p class="ltx_p" id="S5.SS2.SSS0.Px2.p1.1">In this version, the agent provided two additional performance metrics: L1 cache misses and the number of floating-point instructions, hinting that effectively accessed data and utilized CPU vector units can lead to potential speedups. The agent then applied compiler directives (e.g. <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px2.p1.1.1">__restrict</span>), OpenMP reductions, SIMD parallelism, and alignment hints to the specific loop. Furthermore, it added OpenMP parallelization and prefetch instruction to the <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px2.p1.1.2">conj_grad</span> function, which was not parallelized in the previous version. However, following the optimizations, the total runtime increased to <math alttext="5.69" class="ltx_Math" display="inline" id="S5.SS2.SSS0.Px2.p1.1.m1.1"><semantics id="S5.SS2.SSS0.Px2.p1.1.m1.1a"><mn id="S5.SS2.SSS0.Px2.p1.1.m1.1.1" xref="S5.SS2.SSS0.Px2.p1.1.m1.1.1.cmml">5.69</mn><annotation-xml encoding="MathML-Content" id="S5.SS2.SSS0.Px2.p1.1.m1.1b"><cn id="S5.SS2.SSS0.Px2.p1.1.m1.1.1.cmml" type="float" xref="S5.SS2.SSS0.Px2.p1.1.m1.1.1">5.69</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.SSS0.Px2.p1.1.m1.1c">5.69</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.SSS0.Px2.p1.1.m1.1d">5.69</annotation></semantics></math> seconds compared with the previous version.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS2.SSS0.Px3"> <h5 class="ltx_title ltx_title_paragraph">Version 3</h5> <div class="ltx_para" id="S5.SS2.SSS0.Px3.p1"> <p class="ltx_p" id="S5.SS2.SSS0.Px3.p1.1">Based on profiling results, L1 cache misses and floating-point instruction counts remained nearly unchanged, and the thread loads across the three indicators are highly imbalanced. To this end, the agent applied advanced optimizations for <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px3.p1.1.1">conj_grad</span>, such as dynamic scheduling (<span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px3.p1.1.2">#pragma omp parallel schedule</span>) for threads and aligning the addresses of arrays with <span class="ltx_text ltx_font_typewriter" id="S5.SS2.SSS0.Px3.p1.1.3">alignas(64)</span>. The final runtime was decreased to <math alttext="5.52" class="ltx_Math" display="inline" id="S5.SS2.SSS0.Px3.p1.1.m1.1"><semantics id="S5.SS2.SSS0.Px3.p1.1.m1.1a"><mn id="S5.SS2.SSS0.Px3.p1.1.m1.1.1" xref="S5.SS2.SSS0.Px3.p1.1.m1.1.1.cmml">5.52</mn><annotation-xml encoding="MathML-Content" id="S5.SS2.SSS0.Px3.p1.1.m1.1b"><cn id="S5.SS2.SSS0.Px3.p1.1.m1.1.1.cmml" type="float" xref="S5.SS2.SSS0.Px3.p1.1.m1.1.1">5.52</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.SSS0.Px3.p1.1.m1.1c">5.52</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.SSS0.Px3.p1.1.m1.1d">5.52</annotation></semantics></math> seconds compared with version 2, but L1 cache misses remained the same as version 2.</p> </div> </section> </section> <section class="ltx_subsection" id="S5.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">5.3. </span>Case 2: XSBench</h3> <div class="ltx_para" id="S5.SS3.p1"> <p class="ltx_p" id="S5.SS3.p1.1"><span class="ltx_text ltx_font_italic" id="S5.SS3.p1.1.1">XSBench</span> is a compact application that simulates a crucial computation kernel of the Monte Carlo neutron transport algorithm. The hotspot is located in the <span class="ltx_text ltx_font_typewriter" id="S5.SS3.p1.1.2">calculate_macrox_xs</span> function identified using HPCToolkit. The original code was executed in 39.5 seconds, and the <span class="ltx_text ltx_font_typewriter" id="S5.SS3.p1.1.3">calculate_macrox_xs</span> function took 28.3 seconds (71.5%). Codee didn’t identify any optimization opportunities for this function.</p> </div> <div class="ltx_para" id="S5.SS3.p2"> <p class="ltx_p" id="S5.SS3.p2.1">Applying the performance optimization agent with three iterations yielded <math alttext="1.33\times" class="ltx_math_unparsed" display="inline" id="S5.SS3.p2.1.m1.1"><semantics id="S5.SS3.p2.1.m1.1a"><mrow id="S5.SS3.p2.1.m1.1b"><mn id="S5.SS3.p2.1.m1.1.1">1.33</mn><mo id="S5.SS3.p2.1.m1.1.2" lspace="0.222em">×</mo></mrow><annotation encoding="application/x-tex" id="S5.SS3.p2.1.m1.1c">1.33\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.1.m1.1d">1.33 ×</annotation></semantics></math> speedup. Below we describe metrics profiled and optimizations applied by the agent.</p> </div> <section class="ltx_paragraph" id="S5.SS3.SSS0.Px1"> <h5 class="ltx_title ltx_title_paragraph">Version 1</h5> <div class="ltx_para" id="S5.SS3.SSS0.Px1.p1"> <p class="ltx_p" id="S5.SS3.SSS0.Px1.p1.1">In the first iteration, the agent applied the SIMD intrinsics from <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px1.p1.1.1"><immintrin.h></span> to optimize the hotspot function, suggesting it helped accelerate larger datasets. Then, it employed compiler hints such as <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px1.p1.1.2">__attribute((hot,flatten,always_inline))</span> to reduce the function invocation overhead because the hotspot function was called in the innermost level of a nested loop. In addition, the agent also annotated <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px1.p1.1.3">__builtin_expect</span> for certain conditions, aiming to improve branch prediction. Yet, the overall runtime slightly increased to 40.1 seconds.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS3.SSS0.Px2"> <h5 class="ltx_title ltx_title_paragraph">Version 2</h5> <div class="ltx_para" id="S5.SS3.SSS0.Px2.p1"> <p class="ltx_p" id="S5.SS3.SSS0.Px2.p1.2">By investigating L1 data cache load misses as a new profiling metric, the agent pointed out that memory access optimization could lead to potential speedups. The agent reduced the L1 data cache load misses by adopting prefetch instructions (e.g. <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px2.p1.2.1">__bultin_prefetch</span>) for frequently accessed pointers and introducing local variables for frequently accessed positions from <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px2.p1.2.2">high</span> and <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px2.p1.2.3">low</span> arrays. Moreover, the agent duplicated the calculation of interpolation factors to reduce memory accesses. After these optimizations, the overall runtime decreased to <math alttext="34.6" class="ltx_Math" display="inline" id="S5.SS3.SSS0.Px2.p1.1.m1.1"><semantics id="S5.SS3.SSS0.Px2.p1.1.m1.1a"><mn id="S5.SS3.SSS0.Px2.p1.1.m1.1.1" xref="S5.SS3.SSS0.Px2.p1.1.m1.1.1.cmml">34.6</mn><annotation-xml encoding="MathML-Content" id="S5.SS3.SSS0.Px2.p1.1.m1.1b"><cn id="S5.SS3.SSS0.Px2.p1.1.m1.1.1.cmml" type="float" xref="S5.SS3.SSS0.Px2.p1.1.m1.1.1">34.6</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.SSS0.Px2.p1.1.m1.1c">34.6</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.SSS0.Px2.p1.1.m1.1d">34.6</annotation></semantics></math> seconds, and the hotspot runtime decreased to <math alttext="20.2" class="ltx_Math" display="inline" id="S5.SS3.SSS0.Px2.p1.2.m2.1"><semantics id="S5.SS3.SSS0.Px2.p1.2.m2.1a"><mn id="S5.SS3.SSS0.Px2.p1.2.m2.1.1" xref="S5.SS3.SSS0.Px2.p1.2.m2.1.1.cmml">20.2</mn><annotation-xml encoding="MathML-Content" id="S5.SS3.SSS0.Px2.p1.2.m2.1b"><cn id="S5.SS3.SSS0.Px2.p1.2.m2.1.1.cmml" type="float" xref="S5.SS3.SSS0.Px2.p1.2.m2.1.1">20.2</cn></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.SSS0.Px2.p1.2.m2.1c">20.2</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.SSS0.Px2.p1.2.m2.1d">20.2</annotation></semantics></math> seconds.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS3.SSS0.Px3"> <h5 class="ltx_title ltx_title_paragraph">Version 3</h5> <div class="ltx_para" id="S5.SS3.SSS0.Px3.p1"> <p class="ltx_p" id="S5.SS3.SSS0.Px3.p1.1">Lastly, the agent focused on leveraging SIMD intrinsics to further reduce L1 data cache load misses and improve performance, which is implemented by adopting <span class="ltx_text ltx_font_typewriter" id="S5.SS3.SSS0.Px3.p1.1.1">_m256d</span> functions for vectorization. It also unrolled the loops around the hotspot function. Applying the above optimizations achieved the best result, with a runtime of 29.8 seconds.</p> </div> </section> </section> <section class="ltx_subsection" id="S5.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">5.4. </span>Case 3: LBM D2Q37</h3> <div class="ltx_para" id="S5.SS4.p1"> <p class="ltx_p" id="S5.SS4.p1.1"><span class="ltx_text ltx_font_italic" id="S5.SS4.p1.1.1">LBM D2Q37</span> is a Computational Fluid Dynamics (CFD) code that simulates 2D fluids using the “Lattice Boltzmann Method” <span class="ltx_text ltx_font_italic" id="S5.SS4.p1.1.2">(LBM)</span> with 37 velocity components. The computation is performed in double precision, featuring two key kernels: the memory-bound <span class="ltx_text ltx_font_typewriter" id="S5.SS4.p1.1.3">propagate</span> kernel and the compute-bound <span class="ltx_text ltx_font_typewriter" id="S5.SS4.p1.1.4">collide</span> kernel. We applied our performance agent to optimize LBM in single and multi-process environments using the “Test” dataset.</p> </div> <section class="ltx_subsubsection" id="S5.SS4.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">5.4.1. </span>Single Process</h4> <div class="ltx_para" id="S5.SS4.SSS1.p1"> <p class="ltx_p" id="S5.SS4.SSS1.p1.1">The hotspot was identified in the <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.p1.1.1">collide</span> kernel based on profiling data obtained using HPCToolkit. Codee only suggested one optimization opportunity for the hotspot, indicating lower chances of achieving an increase in speedup. It suggested applying optimizations like “change of memory access pattern” from column-major order to row-major order and applying either “loop fission”, “loop tiling”, or “loop fusion” to avoid non-consecutive/indirect array access. The original code was executed in 98.4 seconds, while the code with loop tiling ran slightly longer, taking 106 seconds, achieving a speedup ¡ 1.</p> </div> <div class="ltx_para" id="S5.SS4.SSS1.p2"> <p class="ltx_p" id="S5.SS4.SSS1.p2.1">Applying the performance optimization agent with multiple iterations also yielded no improvements in performance, with speedups ~1 and no discrepancies in output correctness.</p> </div> <section class="ltx_paragraph" id="S5.SS4.SSS1.Px1"> <h5 class="ltx_title ltx_title_paragraph">Version 1</h5> <div class="ltx_para" id="S5.SS4.SSS1.Px1.p1"> <p class="ltx_p" id="S5.SS4.SSS1.Px1.p1.1">The first set of optimizations applied by the agent are using OpenMP <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.1">#pragma omp simd</span> directives to enable vectorization in the innermost loops over the hotspot. These directives were added for loops that computed values like <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.2">rho</span>, <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.3">u</span>, <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.4">v</span>, <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.5">Hermite_projections</span>, <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.6">Collisional_updates</span>, and the final projections written to <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px1.p1.1.7">nxt</span>. The goal was to leverage data parallelism in these loops. However, the total runtime only slightly decreased to 98.3 seconds.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS4.SSS1.Px2"> <h5 class="ltx_title ltx_title_paragraph">Version 2</h5> <div class="ltx_para" id="S5.SS4.SSS1.Px2.p1"> <p class="ltx_p" id="S5.SS4.SSS1.Px2.p1.1">The agent requested new profiling metrics to inspect L1 cache inefficiencies, suggesting memory access optimizations could yield potential speedups. Based on the new metrics, the optimized code refined the placement of <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px2.p1.1.1">#pragma omp simd</span> to specific computational loops. The runtime marginally increased to 98.6 seconds, with profiling metrics showing an increase in L1 data cache misses and stalled cycles, indicating the failure of the previous assumption that the placement of SIMD instructions causes inefficient memory access patterns.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS4.SSS1.Px3"> <h5 class="ltx_title ltx_title_paragraph">Version 3</h5> <div class="ltx_para" id="S5.SS4.SSS1.Px3.p1"> <p class="ltx_p" id="S5.SS4.SSS1.Px3.p1.1">Finally, to address the memory bottlenecks, more aggressive optimizations were applied by the agent, including compiler-specific directives like <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px3.p1.1.1">#pragma GCC ivdep</span> to ignore assumed dependencies and loop unrolling to improve throughput. Additionally, <span class="ltx_text ltx_font_typewriter" id="S5.SS4.SSS1.Px3.p1.1.2">__builtin_prefetch</span> was used to prefetch data into cache, aiming to reduce L1 cache miss latency. While these advanced optimizations were better aligned with profiling insights, they only achieved a runtime of 98.5 seconds, again yielding a speedup close to 1.</p> </div> </section> </section> <section class="ltx_subsubsection" id="S5.SS4.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">5.4.2. </span>Multiple Processes</h4> <div class="ltx_para" id="S5.SS4.SSS2.p1"> <p class="ltx_p" id="S5.SS4.SSS2.p1.1">The benchmark was profiled using eight MPI ranks. Per rank performance metrics were obtained by the agent. By observing that all processes were taking a long time in the same hotspot loop, the agent optimized the code by increasing parallelism by using four threads per rank, instead of two in the original code. However, the total runtime rose from 9.7 seconds to 10.8 seconds. The hotspot loop still consumed the majority of the total compute time, though it averaged at 85.7% of total runtime instead of 91.7%, indicating that some computational overhead outside of the hotspot had grown. The profiling data for each rank showed a spike in front-end stalls—often over 80%—which implied the CPU pipeline was waiting for instruction fetch or decode. This can occur when additional threads compete for the same instruction stream. Another possible factor is the synchronization overhead and increased thread scheduling complexity.</p> </div> </section> </section> <section class="ltx_subsection" id="S5.SS5"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">5.5. </span>Case 4: Minisweep</h3> <div class="ltx_para" id="S5.SS5.p1"> <p class="ltx_p" id="S5.SS5.p1.1">Minisweep is a radiation transport benchmark that reproduces the computational pattern of the sweep kernel of the Denovo Sn radiation transport code <cite class="ltx_cite ltx_citemacro_citep">(<a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#bib.bib62" title="">Evans01082010, </a>)</cite>. The sweep kernel, responsible for most of Denovo’s computational cost, models radiation transport for nuclear reactors. We applied the performance agent to optimize Minisweep in a single-process environment using the “Test” dataset.</p> </div> <div class="ltx_para" id="S5.SS5.p2"> <p class="ltx_p" id="S5.SS5.p2.1">The hotspot was identified in the <span class="ltx_text ltx_font_typewriter" id="S5.SS5.p2.1.1">sweeper_kba_c_kernels.h</span> file based on profiling data obtained using HPCToolkit. Codee failed to suggest meaningful optimizations for the hotspot itself. Applying the performance optimization agent with multiple iterations also yielded no improvements, resulting in a similar runtime to the original code of 3.79 seconds.</p> </div> <section class="ltx_paragraph" id="S5.SS5.SSS0.Px1"> <h5 class="ltx_title ltx_title_paragraph">Version 1</h5> <div class="ltx_para" id="S5.SS5.SSS0.Px1.p1"> <p class="ltx_p" id="S5.SS5.SSS0.Px1.p1.1">The first set of optimizations applied <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px1.p1.1.1">__attribute__((always_inline))</span> to enable full inlining of the <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px1.p1.1.2">Sweeper_sweep_cell</span> function and inserted OpenMP <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px1.p1.1.3">#pragma omp simd</span> directives to enable vectorization in the innermost loops over the hotspot. Additionally, <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px1.p1.1.4">#pragma unroll</span> and <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px1.p1.1.5">#pragma ivdep</span> were used to improve throughput and allow for more aggressive compiler optimizations. These changes aimed to leverage data parallelism. Despite this, the runtime only slightly decreased to 3.77 seconds.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS5.SSS0.Px2"> <h5 class="ltx_title ltx_title_paragraph">Version 2</h5> <div class="ltx_para" id="S5.SS5.SSS0.Px2.p1"> <p class="ltx_p" id="S5.SS5.SSS0.Px2.p1.1">In the second iteration, new profiling metrics including measurements of frontend and backend stalled cycles were applied. The profiling data showed that the hotspot loop accounted for 55.5% of the computation time, with 43.8% stalled frontend cycles and 39.1% stalled backend cycles. Based on the new metrics, the optimized code refined the placement of <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px2.p1.1.1">#pragma GCC ivdep</span> to ignore assumed dependencies, <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px2.p1.1.2">#pragma unroll</span> to improve throughput, and <span class="ltx_text ltx_font_typewriter" id="S5.SS5.SSS0.Px2.p1.1.3">#pragma omp simd</span> to enhance vectorization. These changes aimed to reduce frontend stalls by improving instruction flow and backend stalls by enhancing parallel execution efficiency. However, the new version resulted in a slightly increased runtime of 3.79 seconds.</p> </div> </section> <section class="ltx_paragraph" id="S5.SS5.SSS0.Px3"> <h5 class="ltx_title ltx_title_paragraph">Version 3</h5> <div class="ltx_para" id="S5.SS5.SSS0.Px3.p1"> <p class="ltx_p" id="S5.SS5.SSS0.Px3.p1.1">In the third iteration, the agent applied the same optimization as the previous version but refined the application of optimization directives. The compiler-specific directives were inserted around the hotspot. These changes led to a slightly increased runtime of 3.88 seconds. Profiling metrics showed a significant decrease in backend stalls at 30.9% indicating that while some improvements were made, the root causes of frontend instruction dependencies persisted.</p> </div> </section> </section> </section> <section class="ltx_section" id="S6"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">6. </span>Discussion</h2> <figure class="ltx_table" id="S6.T5"> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S6.T5.2"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S6.T5.2.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S6.T5.2.1.1.1" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.1.1.1.1" style="font-size:80%;">Tool</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.1.1.2" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.1.1.2.1" style="font-size:80%;">Speedup</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.1.1.3" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.1.1.3.1" style="font-size:80%;">Correctness</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.1.1.4" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.1.1.4.1" style="font-size:80%;">Commonsense</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.1.1.5" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.1.1.5.1" style="font-size:80%;">Latency</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.1.1.6" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.1.1.6.1" style="font-size:80%;">Applicability</span></td> </tr> <tr class="ltx_tr" id="S6.T5.2.2.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S6.T5.2.2.2.1" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.2.2.1.1" style="font-size:80%;">Codee</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.2.2.2" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.2.2.2.1" style="font-size:80%;color:#008000;">★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.2.2.3" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.2.2.3.1" style="font-size:80%;color:#008000;">★★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.2.2.4" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.2.2.4.1" style="font-size:80%;color:#008000;">★★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.2.2.5" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.2.2.5.1" style="font-size:80%;color:#008000;">★★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.2.2.6" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.2.2.6.1" style="font-size:80%;color:#008000;">★</span></td> </tr> <tr class="ltx_tr" id="S6.T5.2.3.3"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S6.T5.2.3.3.1" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.3.3.1.1" style="font-size:80%;">OpenAI o1</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.3.3.2" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.3.3.2.1" style="font-size:80%;color:#008000;">★★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.3.3.3" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.3.3.3.1" style="font-size:80%;color:#008000;">★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.3.3.4" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.3.3.4.1" style="font-size:80%;color:#008000;">★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.3.3.5" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.3.3.5.1" style="font-size:80%;color:#008000;">★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.3.3.6" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.3.3.6.1" style="font-size:80%;color:#008000;">★★</span></td> </tr> <tr class="ltx_tr" id="S6.T5.2.4.4"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S6.T5.2.4.4.1" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.4.4.1.1" style="font-size:80%;">Claude-3.5</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.4.4.2" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.4.4.2.1" style="font-size:80%;color:#008000;">★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.4.4.3" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.4.4.3.1" style="font-size:80%;color:#008000;">★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.4.4.4" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.4.4.4.1" style="font-size:80%;color:#008000;">★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.4.4.5" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.4.4.5.1" style="font-size:80%;color:#008000;">★★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T5.2.4.4.6" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.4.4.6.1" style="font-size:80%;">NA</span></td> </tr> <tr class="ltx_tr" id="S6.T5.2.5.5"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t" id="S6.T5.2.5.5.1" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.5.5.1.1" style="font-size:80%;">Llama-3.2</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T5.2.5.5.2" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.5.5.2.1" style="font-size:80%;color:#008000;">★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T5.2.5.5.3" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.5.5.3.1" style="font-size:80%;color:#008000;">★★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T5.2.5.5.4" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.5.5.4.1" style="font-size:80%;color:#008000;">★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T5.2.5.5.5" style="padding:0.8pt 1.8pt;"><span class="ltx_text" id="S6.T5.2.5.5.5.1" style="font-size:80%;color:#008000;">★</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T5.2.5.5.6" style="padding:0.8pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S6.T5.2.5.5.6.1" style="font-size:80%;">NA</span></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering" style="font-size:80%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S6.T5.5.1.1" style="font-size:113%;">Table 5</span>. </span><span class="ltx_text" id="S6.T5.6.2" style="font-size:113%;">Overall tool performance ranked by the total star count</span></figcaption> </figure> <div class="ltx_para" id="S6.p1"> <p class="ltx_p" id="S6.p1.1">We assess the overall performance of the tools employed in the studies by defining and evaluating five key metrics:</p> </div> <div class="ltx_para" id="S6.p2"> <ul class="ltx_itemize" id="S6.I1"> <li class="ltx_item" id="S6.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S6.I1.i1.p1"> <p class="ltx_p" id="S6.I1.i1.p1.1"><span class="ltx_text ltx_font_bold" id="S6.I1.i1.p1.1.1">Speedup:</span> Measures the average speed improvement achieved by the tools across various experiments. Higher rankings indicate greater speedups.</p> </div> </li> <li class="ltx_item" id="S6.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S6.I1.i2.p1"> <p class="ltx_p" id="S6.I1.i2.p1.1"><span class="ltx_text ltx_font_bold" id="S6.I1.i2.p1.1.1">Correctness:</span> Evaluates the number of benchmarks for which the tools produced accurate results, with higher rankings reflecting greater reliability.</p> </div> </li> <li class="ltx_item" id="S6.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S6.I1.i3.p1"> <p class="ltx_p" id="S6.I1.i3.p1.1"><span class="ltx_text ltx_font_bold" id="S6.I1.i3.p1.1.1">Commonsense:</span> Assesses the tools’ effectiveness across multiple domains. Higher ranking indicates more domains a tool leads in terms of the average speedup.</p> </div> </li> <li class="ltx_item" id="S6.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S6.I1.i4.p1"> <p class="ltx_p" id="S6.I1.i4.p1.1"><span class="ltx_text ltx_font_bold" id="S6.I1.i4.p1.1.1">Latency:</span> Measures the time required by each tool to apply optimizations. Higher rankings indicate a lower latency.</p> </div> </li> <li class="ltx_item" id="S6.I1.i5" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S6.I1.i5.p1"> <p class="ltx_p" id="S6.I1.i5.p1.1"><span class="ltx_text ltx_font_bold" id="S6.I1.i5.p1.1.1">Applicability:</span> Examines the tools’ ability to optimize larger benchmarks featured in the case studies.</p> </div> </li> </ul> </div> <div class="ltx_para" id="S6.p3"> <p class="ltx_p" id="S6.p3.1">Each tool is rated on a scale of 1-4 stars (4 being the best and 1 being the worst) for each of these metrics, as shown in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.13772v1#S6.T5" title="Table 5 ‣ 6. Discussion ‣ Do Large Language Models Understand Performance Optimization?"><span class="ltx_text ltx_ref_tag">5</span></a>. From our evaluation, Codee ranks as the best overall in optimizing HPC codes, with OpenAI o1 being a close second. Claude-3.5 ranks third, while Llama-3.2 ranks last. It is noteworthy that both Llama-3.2 and Claude-3.5 generate many incorrect codes across experiments; Claude-3.5 achieves a higher speedup in general. Codee’s strength lies in its capability to invoke reliable and accurate compiler-based analysis, exhibit low response latency, and avoid context length limitations. Nonetheless, its primary limitation lies in the fact that the optimization patterns are predefined and lack the flexibility for customization according to user-specific code, and it is also incapable of offering algorithm-related enhancements. Although LLMs generally fall short in rendering correct code, we observed that they have great potential in suggesting optimizations adaptive to user code. Notably, OpenAI o1, as a reasoning model, performs quite well in optimizing code, especially in suggesting algorithm-related optimizations, yielding speedups even better than Codee. It is also worth noting that neither Codee nor LLMs have proven to be mature tools for automatically optimizing large-scale codebases.</p> </div> </section> <section class="ltx_section" id="S7"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">7. </span>Conclusions</h2> <div class="ltx_para" id="S7.p1"> <p class="ltx_p" id="S7.p1.1">In this study, we designed and conducted a series of experiments using HPC-focused benchmarks to compare the impact of LLMs on performance optimization against state-of-the-art compiler-based tools. Our evaluation reveals that state-of-the-art LLMs lag behind in several critical aspects, especially in terms of correctness. However, we recognize the potential for integrating LLMs with traditional tools, as traditional tools typically offer only a subset of possible transformations. Our performance optimization agent system also reveals a potential to integrate LLMs with profilers to circumvent its limitation on accessing the execution environment. In the future, we plan to evaluate hardware performance not limited to CPUs. Additionally, we aim to enhance our agent systems through the utilization of structured inputs, more concise prompts, and static code analysis.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> Gpt-4 documentation. </span> <span class="ltx_bibblock"><a class="ltx_ref ltx_url ltx_font_typewriter" href="https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4" title="">https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> Llama3.2 documentation. </span> <span class="ltx_bibblock"><a class="ltx_ref ltx_url ltx_font_typewriter" href="https://github.com/meta-llama/llama-models?tab=readme-ov-file" title="">https://github.com/meta-llama/llama-models?tab=readme-ov-file</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> Spechpc™ 2021 benchmark suites, 2024. </span> <span class="ltx_bibblock">Accessed: 2024-10-25. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> Arkadeep Acharya, Brijraj Singh, and Naoyuki Onoe. </span> <span class="ltx_bibblock">Llm based generation of item-description for recommendation system. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib4.1.1">Proceedings of the 17th ACM Conference on Recommender Systems</span>, RecSys ’23, page 1204–1207, New York, NY, USA, 2023. Association for Computing Machinery. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. </span> <span class="ltx_bibblock">Hpctoolkit: tools for performance analysis of optimized parallel programs http://hpctoolkit.org. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib5.1.1">Concurr. Comput.: Pract. Exper.</span>, 22(6):685–701, apr 2010. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> Toufique Ahmed and Prem Devanbu. </span> <span class="ltx_bibblock">Few-shot training llms for project-specific code-summarization. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib6.1.1">Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering</span>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> Cognition AI. </span> <span class="ltx_bibblock">Devin ai: World’s first ai software engineer, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> Amazon. </span> <span class="ltx_bibblock">What is codewhisperer?, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> Anthropic. </span> <span class="ltx_bibblock">Introducing the next generation of claude, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> LM Arena. </span> <span class="ltx_bibblock">Lm arena: Benchmarking and evaluating language models. </span> <span class="ltx_bibblock"><a class="ltx_ref ltx_url ltx_font_typewriter" href="https://lmarena.ai/" title="">https://lmarena.ai/</a>, 2025. </span> <span class="ltx_bibblock">Accessed: 2025-01-07. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. </span> <span class="ltx_bibblock">Program synthesis with large language models. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib11.1.1">ArXiv</span>, abs/2108.07732, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> Satanjeev Banerjee and Alon Lavie. </span> <span class="ltx_bibblock">METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. </span> <span class="ltx_bibblock">In Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss, editors, <span class="ltx_text ltx_font_italic" id="bib.bib12.1.1">Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization</span>, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> Abhinav Bhatele, Stephanie Brink, and Todd Gamblin. </span> <span class="ltx_bibblock">Hatchet: pruning the overgrowth in parallel profiles. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib13.1.1">Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis</span>, SC ’19, New York, NY, USA, 2019. Association for Computing Machinery. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. </span> <span class="ltx_bibblock">Language models are few-shot learners. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib14.1.1">Proceedings of the 34th International Conference on Neural Information Processing Systems</span>, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. </span> <span class="ltx_bibblock">Rodinia: A benchmark suite for heterogeneous computing. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib15.1.1">2009 IEEE international symposium on workload characterization (IISWC)</span>, pages 44–54. Ieee, 2009. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> Le Chen, Arijit Bhattacharjee, Nesreen K. Ahmed, Niranjan Hasabnis, Gal Oren, Vy A. Vo, and Ali Jannesari. </span> <span class="ltx_bibblock">Ompgpt: A generative pre-trained transformer model for openmp. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib16.1.1">ArXiv</span>, abs/2401.16445, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. </span> <span class="ltx_bibblock">Evaluating large language models trained on code. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib17.1.1">arXiv preprint arXiv:2107.03374</span>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harrison Edwards, Yura Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, David W. Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William H. Guss, Alex Nichol, Igor Babuschkin, Suchir Balaji, Shantanu Jain, Andrew Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew M. Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. </span> <span class="ltx_bibblock">Evaluating large language models trained on code. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib18.1.1">ArXiv</span>, abs/2107.03374, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. </span> <span class="ltx_bibblock">Teaching large language models to self-debug, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> Codee. </span> <span class="ltx_bibblock">Codee: Automated code performance tuning for c/c++, 2024. </span> <span class="ltx_bibblock">Accessed: 2024-08-30. </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> Codee. </span> <span class="ltx_bibblock">Open catalog, 2024. </span> <span class="ltx_bibblock">Accessed: 2024-08-27. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> Codee. </span> <span class="ltx_bibblock">Performance demos repository, 2024. </span> <span class="ltx_bibblock">Accessed: 2024-08-23. </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[23]</span> <span class="ltx_bibblock"> Codee Community. </span> <span class="ltx_bibblock">Open catalog, 2025. </span> <span class="ltx_bibblock">Accessed: 2025-01-08. </span> </li> <li class="ltx_bibitem" id="bib.bib24"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[24]</span> <span class="ltx_bibblock"> LangChain Contributors. </span> <span class="ltx_bibblock">Langchain, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib25"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[25]</span> <span class="ltx_bibblock"> Xianzhong Ding, Le Chen, Murali Emani, Chunhua Liao, Pei-Hung Lin, Tristan Vanderbruggen, Zhen Xie, Alberto Cerpa, and Wan Du. </span> <span class="ltx_bibblock">Hpc-gpt: Integrating large language model for high-performance computing. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib25.1.1">Proceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis</span>, SC-W 2023. ACM, November 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib26"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[26]</span> <span class="ltx_bibblock"> Lamia Djoudi, Denis Barthou, Patrick Carribault, Christophe Lemuet, Jean-Thomas Acquaviva, and William Jalby. </span> <span class="ltx_bibblock">Maqao : Modular assembler quality analyzer and optimizer for itanium 2, 2005. </span> </li> <li class="ltx_bibitem" id="bib.bib27"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[27]</span> <span class="ltx_bibblock"> Lamia Djoudi, Denis Barthou, Patrick Carribault, Christophe Lemuet, Jean-Thomas Acquaviva, William Jalby, et al. </span> <span class="ltx_bibblock">Maqao: Modular assembler quality analyzer and optimizer for itanium 2. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib27.1.1">The 4th Workshop on EPIC architectures and compiler technology, San Jose</span>, volume 200, 2005. </span> </li> <li class="ltx_bibitem" id="bib.bib28"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[28]</span> <span class="ltx_bibblock"> Lamia Djoudi, Vasil Khachidze, and William Jalby. </span> <span class="ltx_bibblock">Kbs-maqao: A knowledge based system for maqao tool. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib28.1.1">2009 11th IEEE International Conference on High Performance Computing and Communications</span>, pages 571–578, 2009. </span> </li> <li class="ltx_bibitem" id="bib.bib29"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[29]</span> <span class="ltx_bibblock"> Xiangxin Fang and Lev Mukhanov. </span> <span class="ltx_bibblock">Towards llm-based optimization compilers. can llms learn how to apply a single peephole optimization? reasoning is all llms need! </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib29.1.1">arXiv preprint arXiv:2412.12163</span>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib30"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[30]</span> <span class="ltx_bibblock"> Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. </span> <span class="ltx_bibblock">The pile: An 800gb dataset of diverse text for language modeling, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib31"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[31]</span> <span class="ltx_bibblock"> Significant Gravitas. </span> <span class="ltx_bibblock">Autogpt, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib32"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[32]</span> <span class="ltx_bibblock"> Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. </span> <span class="ltx_bibblock">Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib33"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[33]</span> <span class="ltx_bibblock"> Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Xiaodong Song, and Jacob Steinhardt. </span> <span class="ltx_bibblock">Measuring coding challenge competence with apps. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib33.1.1">ArXiv</span>, abs/2105.09938, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib34"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[34]</span> <span class="ltx_bibblock"> Anysphere Inc. </span> <span class="ltx_bibblock">Cursor: The ai code editor, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib35"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[35]</span> <span class="ltx_bibblock"> Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. </span> <span class="ltx_bibblock">A survey on large language models for code generation, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib36"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[36]</span> <span class="ltx_bibblock"> Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W Mahoney, Kurt Keutzer, and Amir Gholami. </span> <span class="ltx_bibblock">An llm compiler for parallel function calling. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib36.1.1">arXiv preprint arXiv:2312.04511</span>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib37"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[37]</span> <span class="ltx_bibblock"> Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, and Harm de Vries. </span> <span class="ltx_bibblock">The stack: 3 tb of permissively licensed source code, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib38"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[38]</span> <span class="ltx_bibblock"> Chris Lattner and Vikram Adve. </span> <span class="ltx_bibblock">Llvm: A compilation framework for lifelong program analysis & transformation. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib38.1.1">International symposium on code generation and optimization, 2004. CGO 2004.</span>, pages 75–86. IEEE, 2004. </span> </li> <li class="ltx_bibitem" id="bib.bib39"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[39]</span> <span class="ltx_bibblock"> Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, and Steven Chu Hong Hoi. </span> <span class="ltx_bibblock">Coderl: Mastering code generation through pretrained models and deep reinforcement learning. </span> <span class="ltx_bibblock">In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, <span class="ltx_text ltx_font_italic" id="bib.bib39.1.1">Advances in Neural Information Processing Systems</span>, volume 35, pages 21314–21328. Curran Associates, Inc., 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib40"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[40]</span> <span class="ltx_bibblock"> Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries. </span> <span class="ltx_bibblock">Starcoder: may the source be with you!, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib41"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[41]</span> <span class="ltx_bibblock"> Chin-Yew Lin. </span> <span class="ltx_bibblock">ROUGE: A package for automatic evaluation of summaries. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib41.1.1">Text Summarization Branches Out</span>, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. </span> </li> <li class="ltx_bibitem" id="bib.bib42"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[42]</span> <span class="ltx_bibblock"> NERSC. </span> <span class="ltx_bibblock">Codee documentation. </span> <span class="ltx_bibblock"><a class="ltx_ref ltx_url ltx_font_typewriter" href="https://docs.nersc.gov/tools/performance/codee/" title="">https://docs.nersc.gov/tools/performance/codee/</a>, 2025. </span> <span class="ltx_bibblock">Accessed: 2025-01-04. </span> </li> <li class="ltx_bibitem" id="bib.bib43"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[43]</span> <span class="ltx_bibblock"> Daniel Nichols, Joshua H Davis, Zhaojun Xie, Arjun Rajaram, and Abhinav Bhatele. </span> <span class="ltx_bibblock">Can large language models write parallel code? </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib43.1.1">Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing</span>, pages 281–294, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib44"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[44]</span> <span class="ltx_bibblock"> Daniel Nichols, Joshua Hoke Davis, Zhaojun Xie, Arjun Rajaram, and Abhinav Bhatele. </span> <span class="ltx_bibblock">Can large language models write parallel code? </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib44.1.1">ArXiv</span>, abs/2401.12554, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib45"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[45]</span> <span class="ltx_bibblock"> Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, and Abhinav Bhatele. </span> <span class="ltx_bibblock">Performance-aligned llms for generating fast code. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib45.1.1">ArXiv</span>, abs/2404.18864, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib46"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[46]</span> <span class="ltx_bibblock"> Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, and Abhinav Bhatele. </span> <span class="ltx_bibblock">Performance-aligned llms for generating fast code. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib46.1.1">arXiv preprint arXiv:2404.18864</span>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib47"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[47]</span> <span class="ltx_bibblock"> Changan Niu, Ting Zhang, Chuanyi Li, Bin Luo, and Vincent Ng. </span> <span class="ltx_bibblock">On evaluating the efficiency of source code generated by llms, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib48"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[48]</span> <span class="ltx_bibblock"> Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova Dassarma, Tom Henighan, Benjamin Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, John Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom B. Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Christopher Olah. </span> <span class="ltx_bibblock">In-context learning and induction heads. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib48.1.1">ArXiv</span>, abs/2209.11895, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib49"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[49]</span> <span class="ltx_bibblock"> OpenAI. </span> <span class="ltx_bibblock">Gpt-4 technical report, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib50"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[50]</span> <span class="ltx_bibblock"> Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. </span> <span class="ltx_bibblock">Training language models to follow instructions with human feedback, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib51"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[51]</span> <span class="ltx_bibblock"> Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. </span> <span class="ltx_bibblock">Bleu: a method for automatic evaluation of machine translation. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib51.1.1">Annual Meeting of the Association for Computational Linguistics</span>, 2002. </span> </li> <li class="ltx_bibitem" id="bib.bib52"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[52]</span> <span class="ltx_bibblock"> Louis-Noel Pouchet and Tomofumi Yuki. </span> <span class="ltx_bibblock">Polybenchc-4.2.1, 2024. </span> <span class="ltx_bibblock">Accessed: 2024-9-23. </span> </li> <li class="ltx_bibitem" id="bib.bib53"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[53]</span> <span class="ltx_bibblock"> Ruizhong Qiu, Weiliang Will Zeng, Hanghang Tong, James Ezick, and Christopher Lott. </span> <span class="ltx_bibblock">How efficient is llm-generated code? a rigorous & high-standard benchmark. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib53.1.1">ArXiv</span>, abs/2406.06647, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib54"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[54]</span> <span class="ltx_bibblock"> Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I Gustafson Jr, Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste, et al. </span> <span class="ltx_bibblock">Optimizing the weather research and forecasting model with openmp offload and codee. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib54.1.1">arXiv preprint arXiv:2409.07232</span>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib55"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[55]</span> <span class="ltx_bibblock"> Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. </span> <span class="ltx_bibblock">Code llama: Open foundation models for code, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib56"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[56]</span> <span class="ltx_bibblock"> Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. </span> <span class="ltx_bibblock">Code llama: Open foundation models for code, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib57"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[57]</span> <span class="ltx_bibblock"> Sameer S. Shende and Allen D. Malony. </span> <span class="ltx_bibblock">The tau parallel performance system. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib57.1.1">The International Journal of High Performance Computing Applications</span>, 20:287 – 311, 2006. </span> </li> <li class="ltx_bibitem" id="bib.bib58"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[58]</span> <span class="ltx_bibblock"> Sameer S Shende and Allen D Malony. </span> <span class="ltx_bibblock">The tau parallel performance system. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib58.1.1">The International Journal of High Performance Computing Applications</span>, 20(2):287–311, 2006. </span> </li> <li class="ltx_bibitem" id="bib.bib59"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[59]</span> <span class="ltx_bibblock"> Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. </span> <span class="ltx_bibblock">Reflexion: Language agents with verbal reinforcement learning. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib59.1.1">Advances in Neural Information Processing Systems</span>, 36, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib60"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[60]</span> <span class="ltx_bibblock"> Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni, and Chandan K. Reddy. </span> <span class="ltx_bibblock">Execution-based code generation using deep reinforcement learning, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib61"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[61]</span> <span class="ltx_bibblock"> Sofia Eleni Spatharioti, David M. Rothschild, Daniel G. Goldstein, and Jake M. Hofman. </span> <span class="ltx_bibblock">Comparing traditional and llm-based search for consumer choice: A randomized experiment. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib61.1.1">ArXiv</span>, abs/2307.03744, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib62"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[62]</span> <span class="ltx_bibblock"> Rachel N. Slaybaugh Thomas M. Evans, Alissa S. Stafford and Kevin T. Clarno. </span> <span class="ltx_bibblock">Denovo: A new three-dimensional parallel discrete ordinates code in scale. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib62.1.1">Nuclear Technology</span>, 171(2):171–200, 2010. </span> </li> <li class="ltx_bibitem" id="bib.bib63"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[63]</span> <span class="ltx_bibblock"> Together.ai. </span> <span class="ltx_bibblock">Together.ai - advancing collaborative ai development, 2025. </span> <span class="ltx_bibblock">Accessed: 2025-01-08. </span> </li> <li class="ltx_bibitem" id="bib.bib64"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[64]</span> <span class="ltx_bibblock"> Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy, Keita Teranishi, Prasanna Balaprakash, and Jeffrey S. Vetter. </span> <span class="ltx_bibblock">Comparing llama-2 and gpt-3 llms for hpc kernels generation. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib64.1.1">ArXiv</span>, abs/2309.07103, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib65"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[65]</span> <span class="ltx_bibblock"> Balaji Venkatachalam, Dan Gusfield, and Yelena Frid. </span> <span class="ltx_bibblock">Faster algorithms for rna-folding using the four-russians method. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib65.1.1">Algorithms for Molecular Biology</span>, 9:1–12, 2014. </span> </li> <li class="ltx_bibitem" id="bib.bib66"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[66]</span> <span class="ltx_bibblock"> Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. </span> <span class="ltx_bibblock">A survey on large language model based autonomous agents. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib66.1.1">Frontiers of Computer Science</span>, 18(6):186345, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib67"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[67]</span> <span class="ltx_bibblock"> Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. </span> <span class="ltx_bibblock">Openhands: An open platform for ai software developers as generalist agents, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib68"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[68]</span> <span class="ltx_bibblock"> Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. </span> <span class="ltx_bibblock">Chain-of-thought prompting elicits reasoning in large language models, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib69"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[69]</span> <span class="ltx_bibblock"> Chun Xia and Lingming Zhang. </span> <span class="ltx_bibblock">Less training, more repairing please: revisiting automated program repair via zero-shot learning. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib69.1.1">Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering</span>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib70"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[70]</span> <span class="ltx_bibblock"> Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. </span> <span class="ltx_bibblock">Hallucination is inevitable: An innate limitation of large language models, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib71"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[71]</span> <span class="ltx_bibblock"> Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. </span> <span class="ltx_bibblock">Tree of thoughts: Deliberate problem solving with large language models. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib71.1.1">Advances in Neural Information Processing Systems</span>, 36, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib72"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[72]</span> <span class="ltx_bibblock"> Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. </span> <span class="ltx_bibblock">React: Synergizing reasoning and acting in language models. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib72.1.1">arXiv preprint arXiv:2210.03629</span>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib73"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[73]</span> <span class="ltx_bibblock"> Yongjian Yu and S.T. Acton. </span> <span class="ltx_bibblock">Speckle reducing anisotropic diffusion. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib73.1.1">IEEE Transactions on Image Processing</span>, 11(11):1260–1270, 2002. </span> </li> <li class="ltx_bibitem" id="bib.bib74"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[74]</span> <span class="ltx_bibblock"> Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. </span> <span class="ltx_bibblock">Repocoder: Repository-level code completion through iterative retrieval and generation, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib75"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[75]</span> <span class="ltx_bibblock"> Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. </span> <span class="ltx_bibblock">Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib75.1.1">arXiv preprint arXiv:2401.07339</span>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib76"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[76]</span> <span class="ltx_bibblock"> Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, and Jie Tang. </span> <span class="ltx_bibblock">Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib77"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[77]</span> <span class="ltx_bibblock"> Keren Zhou, Xiaozhu Meng, Ryuichi Sai, Dejan Grubisic, and John Mellor-Crummey. </span> <span class="ltx_bibblock">An automated tool for analysis and tuning of gpu-accelerated code in hpc applications. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib77.1.1">IEEE Transactions on Parallel and Distributed Systems</span>, 33(4):854–865, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib78"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[78]</span> <span class="ltx_bibblock"> Keren Zhou, Xiaozhu Meng, Ryuichi Sai, and John Mellor-Crummey. </span> <span class="ltx_bibblock">Gpa: A gpu performance advisor based on instruction sampling. </span> <span class="ltx_bibblock">In <span class="ltx_text ltx_font_italic" id="bib.bib78.1.1">2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)</span>, pages 115–125. IEEE, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib79"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">[79]</span> <span class="ltx_bibblock"> Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Lingpeng Kong, Jiajun Chen, Lei Li, and Shujian Huang. </span> <span class="ltx_bibblock">Multilingual machine translation with large language models: Empirical results and analysis. </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib79.1.1">ArXiv</span>, abs/2304.04675, 2023. </span> </li> </ul> </section> <div class="ltx_pagination ltx_role_newpage"></div> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Mar 17 22:26:10 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src=""/></a> </div></footer> </div> </body> </html>