CINXE.COM

About data flow analysis — CodeQL

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> <head> <meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>About data flow analysis &#8212; CodeQL</title> <link rel="stylesheet" href="../../_static/alabaster.css" type="text/css" /> <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script> <script type="text/javascript" src="../../_static/jquery.js"></script> <script type="text/javascript" src="../../_static/underscore.js"></script> <script type="text/javascript" src="../../_static/doctools.js"></script> <script type="text/javascript" src="../../_static/language_data.js"></script> <link rel="shortcut icon" href="../../_static/favicon.ico"/> <link rel="index" title="Index" href="../../genindex/" /> <link rel="search" title="Search" href="../../search/" /> <link rel="next" title="Creating path queries" href="../creating-path-queries/" /> <link rel="prev" title="Providing locations in CodeQL queries" href="../providing-locations-in-codeql-queries/" /> <title>CodeQL docs</title> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link rel="stylesheet" href="../../_static/custom.css" type="text/css" /> <link rel="stylesheet" href="../../_static/primer.css" type="text/css" /> </head><body> <header class="Header"> <div class="Header-item--full"> <a href="https://codeql.github.com/docs" class="Header-link f2 d-flex flex-items-center"> <!-- <%= octicon "mark-github", class: "mr-2", height: 32 %> --> <svg height="32" class="octicon octicon-mark-github mr-2" viewBox="0 0 16 16" version="1.1" width="32" aria-hidden="true"> <path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"> </path> </svg> <span class="hide-sm">CodeQL documentation</span> </a> </div> <div class="Header-item hide-sm hide-md"> <script src="https://addsearch.com/js/?key=93b4d287e2fc079a4089412b669785d5&categories=!0xhelp.semmle.com,0xcodeql.github.com,1xdocs,1xcodeql-standard-libraries,1xcodeql-query-help"></script> </div> <div class="Header-item"> <details class="dropdown details-reset details-overlay d-inline-block"> <summary class="btn bg-gray-dark text-white border" aria-haspopup="true"> CodeQL resources <div class="dropdown-caret"></div> </summary> <ul class="dropdown-menu dropdown-menu-se dropdown-menu-dark"> <li><a class="dropdown-item" href="https://codeql.github.com/docs/codeql-overview">CodeQL overview</a></li> <li class="dropdown-divider" role="separator"></li> <div class="dropdown-header"> CodeQL guides </div> <li><a class="dropdown-item" href="https://codeql.github.com/docs/writing-codeql-queries">Writing CodeQL queries</a></li> <li><a class="dropdown-item" href="https://codeql.github.com/docs/codeql-language-guides">CodeQL language guides</a> <li class="dropdown-divider" role="separator"></li> <div class="dropdown-header"> Reference docs </div> <li><a class="dropdown-item" href="https://codeql.github.com/docs/ql-language-reference/">QL language reference</a> <li><a class="dropdown-item" href="https://codeql.github.com/codeql-standard-libraries">CodeQL standard-libraries</a> <li><a class="dropdown-item" href="https://codeql.github.com/codeql-query-help">CodeQL query help</a> <li class="dropdown-divider" role="separator"></li> <div class="dropdown-header"> Source files </div> <li><a class="dropdown-item" href="https://github.com/github/codeql">CodeQL repository</a> <li class="dropdown-divider" role="separator"></li> <div class="dropdown-header"> Academic </div> <li><a class="dropdown-item" href="https://codeql.github.com/publications">QL publications</a> </ul> </details> </div> </header> <main class="bg-gray-light clearfix"> <nav class="SideNav position-sticky top-0 col-lg-3 col-md-3 float-left p-4 hide-sm hide-md overflow-y-auto"> <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="../../codeql-overview/">CodeQL overview</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../">Writing CodeQL queries</a><ul class="current"> <li class="toctree-l2 current"><a class="reference internal" href="../codeql-queries/">CodeQL queries</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../about-codeql-queries/">About CodeQL queries</a></li> <li class="toctree-l3"><a class="reference internal" href="../metadata-for-codeql-queries/">Metadata for CodeQL queries</a></li> <li class="toctree-l3"><a class="reference internal" href="../query-help-files/">Query help files</a></li> <li class="toctree-l3"><a class="reference internal" href="../defining-the-results-of-a-query/">Defining the results of a query</a></li> <li class="toctree-l3"><a class="reference internal" href="../providing-locations-in-codeql-queries/">Providing locations in CodeQL queries</a></li> <li class="toctree-l3 current"><a class="current reference internal" href="#">About data flow analysis</a></li> <li class="toctree-l3"><a class="reference internal" href="../creating-path-queries/">Creating path queries</a></li> <li class="toctree-l3"><a class="reference internal" href="../troubleshooting-query-performance/">Troubleshooting query performance</a></li> <li class="toctree-l3"><a class="reference internal" href="../debugging-data-flow-queries-using-partial-flow/">Debugging data-flow queries using partial flow</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../ql-tutorials/">QL tutorials</a></li> <li class="toctree-l2"><a class="reference internal" href="../running-codeql-queries/">Running CodeQL queries</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../../codeql-language-guides/">CodeQL language guides</a></li> <li class="toctree-l1"><a class="reference internal" href="../../ql-language-reference/">QL language reference</a></li> </ul> </nav> <div class="body col-sm-12 col-md-9 col-lg-9 float-left border-left"> <div class="hide-lg hide-xl px-4 pt-4"> <div class="related" role="navigation" aria-label="related navigation"> <ul> <li class="nav-item nav-item-0"><a href="../../contents/">CodeQL</a> &#187;</li> <li class="nav-item nav-item-1"><a href="../" >Writing CodeQL queries</a> &#187;</li> <li class="nav-item nav-item-2"><a href="../codeql-queries/" accesskey="U">CodeQL queries</a> &#187;</li> </ul> </div> </div> <article class="p-4 col-lg-10 col-md-10 col-sm-12"> <div class="section" id="about-data-flow-analysis-1"> <span id="about-data-flow-analysis"></span><h1>About data flow analysis<a class="headerlink" href="#about-data-flow-analysis-1" title="Permalink to this headline">¶</a></h1> <p>Data flow analysis is used to compute the possible values that a variable can hold at various points in a program, determining how those values propagate through the program and where they are used.</p> <div class="section" id="overview"> <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2> <p>Many CodeQL security queries implement data flow analysis, which can highlight the fate of potentially malicious or insecure data that can cause vulnerabilities in your code base. These queries help you understand if data is used in an insecure way, whether dangerous arguments are passed to functions, or whether sensitive data can leak. As well as highlighting potential security issues, you can also use data flow analysis to understand other aspects of how a program behaves, by finding, for example, uses of uninitialized variables and resource leaks.</p> <p>The following sections provide a brief introduction to data flow analysis with CodeQL.</p> <p>See the following tutorials for more information about analyzing data flow in specific languages:</p> <ul class="simple"> <li>“<a class="reference internal" href="../../codeql-language-guides/analyzing-data-flow-in-cpp/#analyzing-data-flow-in-cpp"><span class="std std-ref">Analyzing data flow in C/C++</span></a>”</li> <li>“<a class="reference internal" href="../../codeql-language-guides/analyzing-data-flow-in-csharp/#analyzing-data-flow-in-csharp"><span class="std std-ref">Analyzing data flow in C#</span></a>”</li> <li>“<a class="reference internal" href="../../codeql-language-guides/analyzing-data-flow-in-java/#analyzing-data-flow-in-java"><span class="std std-ref">Analyzing data flow in Java/Kotlin</span></a>”</li> <li>“<a class="reference internal" href="../../codeql-language-guides/analyzing-data-flow-in-javascript-and-typescript/#analyzing-data-flow-in-javascript-and-typescript"><span class="std std-ref">Analyzing data flow in JavaScript/TypeScript</span></a>”</li> <li>“<a class="reference internal" href="../../codeql-language-guides/analyzing-data-flow-in-python/#analyzing-data-flow-in-python"><span class="std std-ref">Analyzing data flow in Python</span></a>”</li> <li>“<a class="reference internal" href="../../codeql-language-guides/analyzing-data-flow-in-ruby/#analyzing-data-flow-in-ruby"><span class="std std-ref">Analyzing data flow in Ruby</span></a>”</li> </ul> <blockquote class="pull-quote"> <div><p>Note</p> <p>Data flow analysis is used extensively in path queries. To learn more about path queries, see “<a class="reference internal" href="../creating-path-queries/"><span class="doc">Creating path queries</span></a>.”</p> </div></blockquote> </div> <div class="section" id="data-flow-graph-1"> <span id="data-flow-graph"></span><h2>Data flow graph<a class="headerlink" href="#data-flow-graph-1" title="Permalink to this headline">¶</a></h2> <p>The CodeQL data flow libraries implement data flow analysis on a program or function by modeling its data flow graph. Unlike the <a class="reference external" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>, the data flow graph does not reflect the syntactic structure of the program, but models the way data flows through the program at runtime. Nodes in the abstract syntax tree represent syntactic elements such as statements or expressions. Nodes in the data flow graph, on the other hand, represent semantic elements that carry values at runtime.</p> <p>Some AST nodes (such as expressions) have corresponding data flow nodes, but others (such as <code class="docutils literal notranslate"><span class="pre">if</span></code> statements) do not. This is because expressions are evaluated to a value at runtime, whereas <code class="docutils literal notranslate"><span class="pre">if</span></code> statements are purely a control-flow construct and do not carry values. There are also data flow nodes that do not correspond to AST nodes at all.</p> <p>Edges in the data flow graph represent the way data flows between program elements. For example, in the expression <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">||</span> <span class="pre">y</span></code> there are data flow nodes corresponding to the sub-expressions <code class="docutils literal notranslate"><span class="pre">x</span></code> and <code class="docutils literal notranslate"><span class="pre">y</span></code>, as well as a data flow node corresponding to the entire expression <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">||</span> <span class="pre">y</span></code>. There is an edge from the node corresponding to <code class="docutils literal notranslate"><span class="pre">x</span></code> to the node corresponding to <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">||</span> <span class="pre">y</span></code>, representing the fact that data may flow from <code class="docutils literal notranslate"><span class="pre">x</span></code> to <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">||</span> <span class="pre">y</span></code> (since the expression <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">||</span> <span class="pre">y</span></code> may evaluate to <code class="docutils literal notranslate"><span class="pre">x</span></code>). Similarly, there is an edge from the node corresponding to <code class="docutils literal notranslate"><span class="pre">y</span></code> to the node corresponding to <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">||</span> <span class="pre">y</span></code>.</p> <p>Local and global data flow differ in which edges they consider: local data flow only considers edges between data flow nodes belonging to the same function and ignores data flow between functions and through object properties. Global data flow, however, considers the latter as well. Taint tracking introduces additional edges into the data flow graph that do not precisely correspond to the flow of values, but model whether some value at runtime may be derived from another, for instance through a string manipulating operation.</p> <p>The data flow graph is computed using <a class="reference internal" href="../../ql-language-reference/types/#classes"><span class="std std-ref">classes</span></a> to model the program elements that represent the graph’s nodes. The flow of data between the nodes is modeled using <a class="reference internal" href="../../ql-language-reference/predicates/#predicates"><span class="std std-ref">predicates</span></a> to compute the graph’s edges.</p> <p>Computing an accurate and complete data flow graph presents several challenges:</p> <ul class="simple"> <li>It isn’t possible to compute data flow through standard library functions, where the source code is unavailable.</li> <li>Some behavior isn’t determined until run time, which means that the data flow library must take extra steps to find potential call targets.</li> <li>Aliasing between variables can result in a single write changing the value that multiple pointers point to.</li> <li>The data flow graph can be very large and slow to compute.</li> </ul> <p>To overcome these potential problems, two kinds of data flow are modeled in the libraries:</p> <ul class="simple"> <li>Local data flow, concerning the data flow within a single function. When reasoning about local data flow, you only consider edges between data flow nodes belonging to the same function. It is generally sufficiently fast, efficient and precise for many queries, and it is usually possible to compute the local data flow for all functions in a CodeQL database.</li> <li>Global data flow, effectively considers the data flow within an entire program, by calculating data flow between functions and through object properties. Computing global data flow is typically more time and energy intensive than local data flow, therefore queries should be refined to look for more specific sources and sinks.</li> </ul> <p>Many CodeQL queries contain examples of both local and global data flow analysis. For more information, see <a class="reference external" href="https://codeql.github.com/codeql-query-help">CodeQL query help</a>.</p> </div> <div class="section" id="normal-data-flow-vs-taint-tracking"> <h2>Normal data flow vs taint tracking<a class="headerlink" href="#normal-data-flow-vs-taint-tracking" title="Permalink to this headline">¶</a></h2> <p>In the standard libraries, we make a distinction between ‘normal’ data flow and taint tracking. The normal data flow libraries are used to analyze the information flow in which data values are preserved at each step.</p> <p>For example, if you are tracking an insecure object <code class="docutils literal notranslate"><span class="pre">x</span></code> (which might be some untrusted or potentially malicious data), a step in the program may ‘change’ its value. So, in a simple process such as <code class="docutils literal notranslate"><span class="pre">y</span> <span class="pre">=</span> <span class="pre">x</span> <span class="pre">+</span> <span class="pre">1</span></code>, a normal data flow analysis will highlight the use of <code class="docutils literal notranslate"><span class="pre">x</span></code>, but not <code class="docutils literal notranslate"><span class="pre">y</span></code>. However, since <code class="docutils literal notranslate"><span class="pre">y</span></code> is derived from <code class="docutils literal notranslate"><span class="pre">x</span></code>, it is influenced by the untrusted or ‘tainted’ information, and therefore it is also tainted. Analyzing the flow of the taint from <code class="docutils literal notranslate"><span class="pre">x</span></code> to <code class="docutils literal notranslate"><span class="pre">y</span></code> is known as taint tracking.</p> <p>In QL, taint tracking extends data flow analysis by including steps in which the data values are not necessarily preserved, but the potentially insecure object is still propagated. These flow steps are modeled in the taint-tracking library using predicates that hold if taint is propagated between nodes.</p> </div> <div class="section" id="further-reading"> <h2>Further reading<a class="headerlink" href="#further-reading" title="Permalink to this headline">¶</a></h2> <ul class="simple"> <li><a class="reference external" href="https://docs.github.com/en/code-security/codeql-for-vs-code/getting-started-with-codeql-for-vs-code/exploring-data-flow-with-path-queries">Exploring data flow with path queries</a> in the GitHub documentation.</li> </ul> </div> </div> </article> <!-- GitHub footer, with links to terms and privacy statement --> <div class="px-3 px-md-6 f6 py-4 d-sm-flex flex-justify-between flex-row-reverse flex-items-center border-top"> <ul class="list-style-none d-flex flex-items-center mb-3 mb-sm-0 lh-condensed-ultra"> <li class="mr-3"> <a href="https://twitter.com/github" title="GitHub on Twitter" style="color: #959da5;"> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 273.5 222.3" class="d-block" height="18"> <path d="M273.5 26.3a109.77 109.77 0 0 1-32.2 8.8 56.07 56.07 0 0 0 24.7-31 113.39 113.39 0 0 1-35.7 13.6 56.1 56.1 0 0 0-97 38.4 54 54 0 0 0 1.5 12.8A159.68 159.68 0 0 1 19.1 10.3a56.12 56.12 0 0 0 17.4 74.9 56.06 56.06 0 0 1-25.4-7v.7a56.11 56.11 0 0 0 45 55 55.65 55.65 0 0 1-14.8 2 62.39 62.39 0 0 1-10.6-1 56.24 56.24 0 0 0 52.4 39 112.87 112.87 0 0 1-69.7 24 119 119 0 0 1-13.4-.8 158.83 158.83 0 0 0 86 25.2c103.2 0 159.6-85.5 159.6-159.6 0-2.4-.1-4.9-.2-7.3a114.25 114.25 0 0 0 28.1-29.1" fill="currentColor"></path> </svg> </a> </li> <li class="mr-3"> <a href="https://www.facebook.com/GitHub" title="GitHub on Facebook" style="color: #959da5;"> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 15.3 15.4" class="d-block" height="18"> <path d="M14.5 0H.8a.88.88 0 0 0-.8.9v13.6a.88.88 0 0 0 .8.9h7.3v-6h-2V7.1h2V5.4a2.87 2.87 0 0 1 2.5-3.1h.5a10.87 10.87 0 0 1 1.8.1v2.1h-1.3c-1 0-1.1.5-1.1 1.1v1.5h2.3l-.3 2.3h-2v5.9h3.9a.88.88 0 0 0 .9-.8V.8a.86.86 0 0 0-.8-.8z" fill="currentColor"></path> </svg> </a> </li> <li class="mr-3"> <a href="https://www.youtube.com/github" title="GitHub on YouTube" style="color: #959da5;"> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 19.17 13.6" class="d-block" height="16"> <path d="M18.77 2.13A2.4 2.4 0 0 0 17.09.42C15.59 0 9.58 0 9.58 0a57.55 57.55 0 0 0-7.5.4A2.49 2.49 0 0 0 .39 2.13 26.27 26.27 0 0 0 0 6.8a26.15 26.15 0 0 0 .39 4.67 2.43 2.43 0 0 0 1.69 1.71c1.52.42 7.5.42 7.5.42a57.69 57.69 0 0 0 7.51-.4 2.4 2.4 0 0 0 1.68-1.71 25.63 25.63 0 0 0 .4-4.67 24 24 0 0 0-.4-4.69zM7.67 9.71V3.89l5 2.91z" fill="currentColor"></path> </svg> </a> </li> <li class="mr-3 flex-self-start"> <a href="https://www.linkedin.com/company/github" title="GitHub on Linkedin" style="color: #959da5;"> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 19 18" class="d-block" height="18"> <path d="M3.94 2A2 2 0 1 1 2 0a2 2 0 0 1 1.94 2zM4 5.48H0V18h4zm6.32 0H6.34V18h3.94v-6.57c0-3.66 4.77-4 4.77 0V18H19v-7.93c0-6.17-7.06-5.94-8.72-2.91z" fill="currentColor"></path> </svg> </a> </li> <li> <a href="https://github.com/github" title="GitHub's organization" style="color: #959da5;"> <svg version="1.1" width="20" height="20" viewBox="0 0 16 16" class="octicon octicon-mark-github" aria-hidden="true"> <path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"> </path> </svg> </a> </li> </ul> <ul class="list-style-none d-flex text-gray"> <li class="mr-3">&copy; <script type="text/javascript">document.write(new Date().getFullYear());</script> GitHub, Inc.</li> <li class="mr-3"><a href="https://docs.github.com/site-policy/github-terms/github-terms-of-service" class="link-gray">Terms </a></li> <li><a href="https://docs.github.com/site-policy/privacy-policies/github-privacy-statement" class="link-gray">Privacy </a></li> </ul> </div> </div> </main> <script type="text/javascript"> $(document).ready(function () { $(".toggle > *").hide(); $(".toggle .name").show(); $(".toggle .name").click(function () { $(this).parent().children().not(".name").toggle(400); $(this).parent().children(".name").toggleClass("open"); }) }); </script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10