Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3)

<!DOCTYPE html> <html lang="en"> <head>  <script async src="https://www.googletagmanager.com/gtag/js?id=G-NRZJLJCSH6"></script> <script> window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag("js", new Date()); gtag("config", "G-NRZJLJCSH6"); </script> <script src="assets/lib/chart.umd.js"></script> <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-autocolors"></script> <script type="text/javascript"> window.PlotlyConfig = { MathJaxConfig: "local" }; </script> <script type="text/javascript" src="treemap.js"></script> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="description" content="Explore The Berkeley Function Calling Leaderboard (also called The Berkeley Tool Calling Leaderboard) to see the LLM's ability to call functions (aka tools) accurately." />  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/semantic-ui/dist/semantic.min.css"> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.5.0/css/bootstrap.min.css" /> <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro" /> <link rel="stylesheet" href="assets/css/api-explorer.css" /> <link rel="stylesheet" href="assets/css/common-styles.css" /> <link rel="stylesheet" href="assets/css/Highlight-Clean-leaderboard.css" /> <link rel="stylesheet" href="assets/css/leaderboard.css" /> <link rel="stylesheet" href="assets/css/leaderboard_main.css" /> <link rel="stylesheet" href="assets/css/treemap.css" /> <link rel="stylesheet" href="assets/css/contact.css" /> <link rel="stylesheet" href="assets/css/styles.css" /> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css"> <title> Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3) </title> </head> <body>  <div class="navbar" style=" position: absolute; top: 0; right: 20px; padding: 10px; z-index: 100; font-size: 18px; "> <a href="index.html">Home</a> <a href="blogs/13_bfcl_v3_multi_turn.html">Blog</a> <a href="#api-explorer">Try it Out!</a> <a href="#leaderboard">Leaderboard</a> </div> <div class="highlight-clean" style="padding-bottom: 10px">  <h1 class="text-center"> <img src="assets/img/Cal.png" alt="UC Berkeley Logo" class="header-image" /> Berkeley Function-Calling Leaderboard </h1> <div> <p></p> </div>   </div>  <div class="container" id="leaderboard" style="background: #e5effc"> <div class="col-md-12"> <h2>BFCL Leaderboard</h2> <p class="text-center"> The Berkeley Function Calling Leaderboard V3 (also called Berkeley Tool Calling Leaderboard V3) evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blogs: <a href="blogs/8_berkeley_function_calling_leaderboard.html">BFCL-v1</a> introducing AST as an evaluation metric, <a href="blogs/12_bfcl_v2_live.html">BFCL-v2</a> introducing enterprise and OSS-contributed functions, and <a href="blogs/13_bfcl_v3_multi_turn.html">BFCL-v3</a> introducing multi-turn interactions. Checkout <a href="https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard">code and data</a>. </p> <div class="mb-3"> <div class="d-flex flex-column flex-md-row justify-content-between align-items-center">  <div> <span> <b><i style="font-size: 1.0em;">Last Updated: 2024-11-17 <a href="https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/CHANGELOG.md">[Change Log]</a> </i></b> </span> </div>  <div class="d-flex ms-md-auto mt-3 mt-md-0 justify-content-end" style="gap: 5px; width: 100%; max-width: 360px;"> <input type="text" id="search-input" class="form-control flex-grow-1" placeholder="Search model names..." /> <button id="search-btn" class="btn btn-primary">Search</button> </div> </div> </div> <div style="margin-bottom: 15px;"> </div> <div class="table-container"> <table id="leaderboard-table"> <thead id="leaderboard-head"> </thead> <tbody></tbody> </table> </div> <p></p> <p> FC = native support for function/tool calling. </p> <p> <b>Cost</b> is calculated as an estimate of the cost per 1000 function calls, in USD. <b>Latency</b> is measured in seconds. For <b>Open-Source Models</b>, the cost and latency are calculated when serving with <a href="https://github.com/vllm-project/vllm">vLLM</a> using 8 V100 GPUs. The formula can be found in the <a href="https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#cost">blog</a>. </p> <p> <b>AST Summary</b> is the <b>unweighted</b> average of the four test categories under AST Evaluation. <b>Exec Summary</b> is the <b>unweighted</b> average of the four test categories under Exec Evaluation. <b>Overall Accuracy</b> is the <b>unweighted</b> average of all the sub-categories. </p> <p> Click on column header to sort. If you would like to add your model or contribute test-cases, please contact us via <a href="https://discord.gg/grXXvj9Whz">discord</a>. </p> </div> </div> <div> <p></p> </div> <div class="container chart-container" style="background: white"> <div class="chart-inner-container"> <h2>Wagon Wheel</h2> <p class="text-center"> The following chart shows the comparison of the models based on a few metrics. You can select and deselect which models to compare. More information on each metric can be found in the <a href="blogs/8_berkeley_function_calling_leaderboard.html#benchmarking">blog</a>. </p> <div class="ui container"> <div class="ui form"> <div class="dropdown-container"> <label class="dropdown-label">Select Models to Compare</label> <button id="clear-all-btn" class="btn btn-primary" type="button" >Clear All</button>  </div> <div id="dataset-dropdown" class="ui fluid multiple search selection dropdown"> <input id="search-dropdown" type="hidden" name="datasets"> <i class="dropdown icon"></i> <div class="default text">Search models...</div> <div class="menu"></div> </div> </div> </div>  <div id="myChart-container"> <canvas id="myChart"></canvas> </div> </div> </div>   <div id="api-explorer" class="container" style="background: #e5effc"> <div class="col-md-12"> <h2>Function Calling Demo</h2> <p class="text-center"> In this demo for function calling, you can enter a prompt and a function and see the output. There will be two outputs (and two output boxes accordingly): one in the actual code format (the top one) and the other in the OpenAI compatible format (the bottom one). Note that the OpenAI compatible format output is only available if the actual code output has valid syntax and can be parsed. We also provide you a few examples to try out and get a sense of the input format and the output. </p>  <div id="examples"> <button id="example-btn-1">Example 1</button> <button id="example-btn-2">Example 2</button> <button id="example-btn-3">Example 3</button> </div> <div> <p></p> </div> <div class="container" id="demo-input-container"> <div class="inputs"> <div> Model: <select name="option" id="model-dropdown"> <option value="gorilla-openfunctions-v2"> Gorilla OpenFunctions-v2 </option>  </select> </div> <div> <br /> </div> <div> <label for="temperatureSlider">Temperature:</label> <input type="range" id="temperatureSlider" name="temperature" min="0" max="1" value="0.7" step="0.1" oninput="temperatureValue.value = temperatureSlider.value" /> <output id="temperatureValue">0.7</output> </div> <div> <p></p> </div> <textarea id="input-text" placeholder="Enter your prompt here" rows="3"></textarea> <textarea id="input-function" placeholder="Enter your function description here" rows="10"></textarea> <button class="api-explorer-button" id="submit-btn">Submit</button> </div> <div class="output-section"> <div class="output" id="code-output">Output will be shown here:</div> <div class="output" id="json-output" style="white-space: pre-wrap;">OpenAI compatible format output: </div> <div class="button-container"> <button class="thumbs" id="thumbs-up-btn" onclick="sendFeedbackPositive()" style="display: none;">馃憤</button> <button class="thumbs thumbs-down" id="thumbs-down-btn" onclick="sendFeedbackNegative()" style="display: none;">馃憥</button> <button id="report-issue-btn" style="display: none;">Report Issue</button> </div> </div> </div> </div> </div> <div> <p></p> </div> <div class="container shorter-container" style="background: #e5effc">  <div class="col-md-6 contact-us"> <h2>Contact Us</h2> <form class="submit-to-google-sheet" name="submit-to-google-sheet"> <input type="text" name="Name" placeholder="Your Name" required /> <input type="Email" name="Email" placeholder="Your Email" required /> <input type="Organization" name="Organization" placeholder="Your Organization" /> <textarea name="Message" rows="6" placeholder="Your Message"></textarea> <button type="submit" class="btn-secondary2">Submit</button> </form> <span id="msg"></span> </div>  <div class="col-md-6"> <h2>Citation</h2> <pre> <code> @misc{berkeley-function-calling-leaderboard, title={Berkeley Function Calling Leaderboard}, author={Fanjia Yan and Huanzhi Mao and Charlie Cheng-Jie Ji and Tianjun Zhang and Shishir G. Patil and Ion Stoica and Joseph E. Gonzalez}, howpublished={\url{https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html}}, year={2024}, } </code></pre> </div> </div> </body>  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>  <script src="https://cdn.jsdelivr.net/npm/semantic-ui/dist/semantic.min.js"></script>  <script src="https://cdn.jsdelivr.net/npm/chart.js"></script> <script src="index_main.js"></script> <script type="text/javascript" src="treemap_main.js"></script> </html>

CINXE.COM

Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3)