CINXE.COM

GMP developers' corner

<!DOCTYPE HTML> <html> <head> <title>GMP developers' corner</title> <link rel="shortcut icon" href="favicon.ico"> <link rel="stylesheet" href="new.css"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <style type="text/css"> td {padding-left:4pt; padding-right:4pt;}</style> <style type="text/css"> th {padding-left:4pt; padding-right:2pt;}</style> <style type="text/css"> img {border:0;}</style> </head> <body> <div id="top"> <table style="width:100%; background-color:#b0b0b0;"> <tr> <td style="text-align:left;"> <svg width="180px" height="60px" version="1.1" viewBox="0 0 1500 500" xmlns="https://www.w3.org/2000/svg"> <rect x="0" y="0" width="1500" height="540" fill="#b0b0b0" /> <text x="0" y="440" fill="#c00000" font-size="540" font-family="arial" font-weight="bold"> GMP </text> <text x="50" y="500" font-size="70" font-family="Verdana"> &#171;Arithmetic without limitations&#187; </text> </svg> </td> <td style="text-align:center;"> <span style="font-size:200%">GMP developers' corner </span> <br> <span style="font-size:75%">Last modified: 2023-07-30 </span> </td> </tr> </table> </div> <div id="container"> <div id="top-spacer"></div> <br><br> <table> <tr><td style="text-align:right;"> Documentation: <td> <a href="../manual/">Online</a> | <a href="../gmp-man-6.2.1.pdf">PDF</a></tr> <tr><td style="text-align:right;"> Development sources: <td> <a href="/repo/">GMP repository</a> - <a href="repo-usage">Repo usage tips</a> | <a href="//gmplib.org/download/snapshot/">Daily snapshots</a></tr> <tr><td style="text-align:right;"> Testing: <td> <a href="tm/gmp/date">Head status</a> | <a href="tm">All status</a> | <a href="lcov">Head coverage</a></tr> <tr><td style="text-align:right;"> Speed:<td> <a href="perfguard">Performance guard</a></tr> <tr><td style="text-align:right;"> Tuneup: <td> <a href="thres">Current threshold tables</a></tr> <tr><td style="text-align:right;"> Asm: <td> <a href="asm">Assembly loops</a> | <a href="anomaly">Performance anomalies</a></tr> <tr><td style="text-align:right;"> Computers: <td> <a href="testsystems">GMP developer's systems</a></tr> <tr><td style="text-align:right;"> Incompatibility: <td> <a href="incompatibility">Incompatible changes considered</a></tr> </table> <hr> <h3> GMP release update </h3> <p> GMP 6.3.0 is out 2023-07-30. </p> <hr> <h3> GMP itemised plans </h3> <p> <a href="GMPng">List of planned GMP improvements</a> </p> <p> <a href="arm">List of desirable ARM improvements</a> </p> <p> <a href="x86-64">List of desirable X86-64 improvements</a> </p> <p> <a href="sparc">List of desirable SPARC (T4-T5) improvements</a> </p> <p> <a href="sec">List of sec/cnd improvements</a> </p> <hr> <h3> Comparing sqrt and root(2) </h3> <p> We use quite different algorithms for mpz_sqrt/mpz_sqrtrem/mpn_sqrtrem and mpz_root/mpz_rootrem/mpn_rootrem. Here we compare the former group to the latter group for computing square roots. </p> <p> The <i>x</i>-axis represents input operand size in bits. The left-hand diagram is for computing just the root result, the right-hand diagram also computes the remainder. </p> <p> There are two anomalies here, one very visible and one quite subtle: </p> <ol> <li> The root functions badly underperform for operands of less than about 10000 bits; this is due to the absence of starting value tables for the Newton code and that sqrt has register-based code for the first few iterations. </li> <li> When producing the remainder, the root functions perform about 20% worse than the sqrt functions. </li> <li> Less visible is that for quite large operands the sqrt function performs around 5% worse than the more general root function when both are asked not to return the remainder. This is supposedly related to the different strategies of the sqrt and root code; the former basically computes a remainder always in order to adjust the final result, the latter computes an extra limb which usually allows it to return the accurate result without further work. </li> </ol> <table> <tr> <td> <img src="norem.png" alt=""> </td> <td> <img src="rem.png" alt=""> </td> </tr> </table> <h3> Broadwell optimisation </h3> <p> Most GMP's inner-loops run faster on Broadwell than on Haswell in GMP 6.0.0, but unfortunately some of the most critical multiply loops run quite considerably slower. </p> <p> These plots compare Haswell and Broadwell using the dev sources for plain multiplication, squaring, and Hensel remaindering. Dimension is cycles per n-limb operands, meaning that <b>lower is better</b>. </p> <table> <tr> <td> <img src="hsw-bdw-mul_basecase.png" alt=""> </td> </tr> <tr> <td> <img src="hsw-bdw-sqr_basecase.png" alt=""> </td> </tr> <tr> <td> <img src="hsw-bdw-redc_1.png" alt=""> </td> </tr> </table> <h3> Basecase performance (obsolete with 6.0.0) </h3> <p> We are working to make critical basecase functions perform near-optimally on interesting CPUs. The current status can be seen in the diagrams below. The diagrams are ordered pairwise per CPU, with linear scale to the left and log/log scale to the right. Measured values are in cycles. </p> <p> Well-formed functions should perform smoothly, where the order from slowest to fastest should be redc_1, mul_basecase, sqr_basecase, and mullo_basecase. In most cases where the order is not well-formed, assembly support is missing or inadequate. </p> <p> The diagrams are sorted fastest-CPU-first as measured by mul_basecase, except that Itanium and POWER7 are at the end. The diagrams' ranges are the same for all CPU types, to allow some vertical comparisons. </p> <table> <tr><th>Intel Haswell lin/lin</th> <th>Intel Haswell log/log</th> <tr><td> <img src="hannah.png" alt=""> </td><td> <img src="hannah-loglog.png" alt=""> </td> <tr><th>AMD K10/Thuban lin/lin</th> <th>AMD K10/Thuban log/log</th> <tr><td> <img src="shell.png" alt=""> </td><td> <img src="shell-loglog.png" alt=""> </td> <tr><th>Intel Ivy bridge lin/lin</th> <th>Intel Ivy bridge log/log</th> <tr><td> <img src="joerg.png" alt=""> </td><td> <img src="joerg-loglog.png" alt=""> </td> <tr><th>Intel Sandy bridge lin/lin</th> <th>Intel Sandy bridge log/log</th> <tr><td> <img src="tom.png" alt=""> </td><td> <img src="tom-loglog.png" alt=""> </td> <tr><th>Intel Nehalem lin/lin</th> <th>Intel Nehalem log/log</th> <tr><td> <img src="bikodeb64.png" alt=""> </td><td> <img src="bikodeb64-loglog.png" alt=""> </td> <tr><th>Intel Conroe lin/lin</th> <th>Intel Conroe log/log</th> <tr><td> <img src="repentium.png" alt=""> </td><td> <img src="repentium-loglog.png" alt=""> </td> <tr><th>AMD Piledriver lin/lin</th> <th>AMD Piledriver log/log</th> <tr><td> <img src="pile.png" alt=""> </td><td> <img src="pile-loglog.png" alt=""> </td> <tr><th>AMD Bulldozer lin/lin</th> <th>AMD Bulldozer log/log</th> <tr><td> <img src="tutu.png" alt=""> </td><td> <img src="tutu-loglog.png" alt=""> </td> <tr><th>AMD Bobcat lin/lin</th> <th>AMD Bobcat log/log</th> <tr><td> <img src="bobcat.png" alt=""> </td><td> <img src="bobcat-loglog.png" alt=""> </td> <tr><th>Intel Atom lin/lin</th> <th>Intel Atom log/log</th> <tr><td> <img src="hehe.png" alt=""> </td><td> <img src="hehe-loglog.png" alt=""> </td> <tr><th>Intel Itanium2 lin/lin</th> <th>Intel Itanium2 log/log</th> <tr><td> <img src="olympic.png" alt=""> </td><td> <img src="olympic-loglog.png" alt=""> </td> <tr><th>IBM POWER7-smt4 lin/lin</th> <th>IBM POWER7-smt4 log/log</th> <tr><td> <img src="gcc110.png" alt=""> </td><td> <img src="gcc110-loglog.png" alt=""> </td> </table> <h3> Division performance anomalies (mostly obsolete with 6.0.0) </h3> <p> These diagrams show performance for fixed quotient sizes, meaning that the difference between the dividend size and the divisor size is constant. The quotient size can be deduced from the "new-10000", "new-11365", etc. The <i>x</i>-axis is the dividend size. The red line is for the code up until GMP 5.1, the green line is new code. The new code uses more memory, but is almost always faster. Click for larger diagrams. </p> <a href="10000.png"><img src="10000-small.png" alt=""> </a> <a href="11365.png"><img src="11365-small.png" alt=""> </a> <a href="12915.png"><img src="12915-small.png" alt=""> </a> <a href="14678.png"><img src="14678-small.png" alt=""> </a> <a href="16681.png"><img src="16681-small.png" alt=""> </a> <a href="18957.png"><img src="18957-small.png" alt=""> </a> <a href="21544.png"><img src="21544-small.png" alt=""> </a> <a href="24484.png"><img src="24484-small.png" alt=""> </a> <a href="27826.png"><img src="27826-small.png" alt=""> </a> <a href="31623.png"><img src="31623-small.png" alt=""> </a> <a href="35938.png"><img src="35938-small.png" alt=""> </a> <a href="40842.png"><img src="40842-small.png" alt=""> </a> <a href="46416.png"><img src="46416-small.png" alt=""> </a> <a href="52750.png"><img src="52750-small.png" alt=""> </a> <a href="59948.png"><img src="59948-small.png" alt=""> </a> <a href="68129.png"><img src="68129-small.png" alt=""> </a> <a href="77426.png"><img src="77426-small.png" alt=""> </a> <a href="87992.png"><img src="87992-small.png" alt=""> </a> <a href="100000.png"><img src="100000-small.png" alt=""> </a> <a href="113646.png"><img src="113646-small.png" alt=""> </a> <a href="129155.png"><img src="129155-small.png" alt=""> </a> <a href="146780.png"><img src="146780-small.png" alt=""> </a> <a href="166810.png"><img src="166810-small.png" alt=""> </a> <a href="189574.png"><img src="189574-small.png" alt=""> </a> <a href="215443.png"><img src="215443-small.png" alt=""> </a> <a href="244844.png"><img src="244844-small.png" alt=""> </a> <a href="278256.png"><img src="278256-small.png" alt=""> </a> <a href="316228.png"><img src="316228-small.png" alt=""> </a> <a href="359381.png"><img src="359381-small.png" alt=""> </a> <a href="408424.png"><img src="408424-small.png" alt=""> </a> <a href="464159.png"><img src="464159-small.png" alt=""> </a> <a href="527500.png"><img src="527500-small.png" alt=""> </a> <a href="599484.png"><img src="599484-small.png" alt=""> </a> <a href="681292.png"><img src="681292-small.png" alt=""> </a> <a href="774264.png"><img src="774264-small.png" alt=""> </a> <a href="879923.png"><img src="879923-small.png" alt=""> </a> <a href="1000000.png"><img src="1000000-small.png" alt=""> </a> <a href="1136464.png"><img src="1136464-small.png" alt=""> </a> <a href="1291550.png"><img src="1291550-small.png" alt=""> </a> <a href="1467799.png"><img src="1467799-small.png" alt=""> </a> <a href="1668101.png"><img src="1668101-small.png" alt=""> </a> <a href="1895736.png"><img src="1895736-small.png" alt=""> </a> <a href="2154435.png"><img src="2154435-small.png" alt=""> </a> <a href="2448437.png"><img src="2448437-small.png" alt=""> </a> <a href="2782559.png"><img src="2782559-small.png" alt=""> </a> <a href="3162278.png"><img src="3162278-small.png" alt=""> </a> <a href="3593814.png"><img src="3593814-small.png" alt=""> </a> <a href="4084239.png"><img src="4084239-small.png" alt=""> </a> <a href="4641589.png"><img src="4641589-small.png" alt=""> </a> <a href="5274997.png"><img src="5274997-small.png" alt=""> </a> <a href="5994843.png"><img src="5994843-small.png" alt=""> </a> <a href="6812921.png"><img src="6812921-small.png" alt=""> </a> <a href="7742637.png"><img src="7742637-small.png" alt=""> </a> <a href="8799225.png"><img src="8799225-small.png" alt=""> </a> <a href="10000000.png"><img src="10000000-small.png" alt=""> </a> <a href="11364637.png"><img src="11364637-small.png" alt=""> </a> <a href="12915497.png"><img src="12915497-small.png" alt=""> </a> <a href="14677993.png"><img src="14677993-small.png" alt=""> </a> <a href="16681005.png"><img src="16681005-small.png" alt=""> </a> <hr> <h3> mpz_powm and mpz_powm_ui </h3> <p> This is a suggested new algorithm (in simple square-and-multiply form, in practice we should use k-ary sliding window): </p> <pre> // Compute r &larr; b<sup>e</sup> mod m, m odd, e &ge; 2. // Optimised for the case b &lt;&lt; m. The first loop avoids reductions, // the alternate 2nd loops performs necessary reductions. // Let &beta; be the word base and let n = &lceil;log<sub>&beta;</sub> m&rceil; r &larr; b while 2 &times; log<sub>&beta;</sub> r &lt; log<sub>&beta;</sub> m r &larr; r<sup>2</sup> if next-lower-bit(e) = 1 r &larr; r &times; b if no-more-bits(e) return r (mod m) // mod sometimes needed if few-bits-remaining-in(e) &vee; &lceil;log<sub>&beta;</sub> b&rceil; + 3 &lt; &lceil;log<sub>&beta;</sub> n&rceil; while true r &larr; r<sup>2</sup> mod m if next-lower-bit(e) = 1 r &larr; r &times; b mod m if no-more-bits(e) return r else r' &larr; &beta;^n &times; r mod m // convert to REDC residue b' &larr; &beta;^n &times; b mod m // convert to REDC residue m<sub>i</sub> &larr; m<sup>-1</sup> mod &beta;<sup>k</sup> // a suitable modular inverse while true r' &larr; REDC((r')<sup>2</sup>, m, m<sub>i</sub>) if next-lower-bit(e) = 1 r' &larr; REDC(r' &times; b', m, m<sub>i</sub>) if no-more-bits(e) return REDC(r', m, m<sub>i</sub>) endif </pre> <hr> <h3> New division interface </h3> <p> We are working in a modernised set of low-level division loops, splitting several of the current functions that have multiple loops into separate functions. Here we will try to describe the progress. </p> <p> The function naming scheme is not completely coherent, and is far from finalised. </p> <blockquote> <table rules="groups"> <colgroup><col><col> <thead> <tr> <th> symbol </th> <th> meaning </th> </tr> </thead> <tbody> <tr> <td> 1n,2n </td> <td> function expects normalised divisor, 1 and 2 limbs respectively </td> </tr> <tr> <td> 1u,2u </td> <td> function expects unnormalised divisor, 1 and 2 limbs respectively </td> </tr> <tr> <td> ; </td> <td> delimiter between output and input operands </td> </tr> <tr> <td> qp, np, dp</td> <td> pointers to quotient, numerator, and divisior</td> </tr> <tr> <td> qh</td> <td> high quotient limb </td> </tr> <tr> <td> d </td> <td> divisor limb </td> </tr> <tr> <td> r </td> <td> remainder limb </td> </tr> <tr> <td> di </td> <td> some sort of inverse of divisor </td> </tr> </tbody> </table> </blockquote> <blockquote> <table style="width:90%;"> <!-- left column euclidiam, right column hensel --> <tr><td style="width:70%;"> <table style="width:100%;"> <!-- three rows --> <tr><td> <table style="width:90%;"> <caption>Euclidian division/mod by a 1-limb divisor</caption> <thead> <tr> <th style="text-align:left;">function int</th> <th style="text-align:left;">function frac</th> </tr> </thead> <tbody> <tr> <td> r = mpn_div_qr_1n_pi1(qp,qh;np,nn,d,di) </td> <td> r = mpn_divf_qr_1_pi1(qp;fn,r,d,di) </td> </tr> <tr> <td> r = mpn_div_qr_1u_pi1(qp,qh;np,nn,d,di) </td> <td> &nbsp; </td> </tr> <tr> <td> r = mpn_div_qr_1n_pi2(qp,qh;np,nn,d,di) </td> <td> r = mpn_divf_qr_1_pi2(qp;fn,r,d,di) </td> </tr> <tr> <td> r = mpn_div_qr_1u_pi2(qp,qh;np,nn,d,di) </td> <td> &nbsp; </td> </tr> <tr> <td> r = mpn_div_r_1n_pi1(;np,nn,d,di) </td> <td> &nbsp; </td> </tr> <tr> <td> r = mpn_div_r_1u_pi1(;np,nn,d,di) </td> <td> &nbsp; </td> </tr> <tr> <td> r = mpn_div_r_1n_pi2(;np,nn,d,di) </td> <td> &nbsp; </td> </tr> <tr> <td> r = mpn_div_r_1u_pi2(;np,nn,d,di) </td> <td> &nbsp; </td> </tr> </tbody> </table> </td> <tr><td> <table style="width:90%;"> <caption>Euclidian division/mod by a 2-limb divisor</caption> <thead> <tr> <th style="text-align:left;">function int</th> <th style="text-align:left;">function frac</th> </tr> </thead> <tbody> <tr> <td> qh = mpn_div_qr_2n_pi1(qp,rp[2];np,nn,dp[2],di) </td> <td> mpn_divf_qr_2n_pi1(qp;fn,np[2],dp[2],di) </td> </tr> <tr> <td> qh = mpn_div_qr_2u_pi1(qp,rp[2];np,nn,dp[2],di) </td> <td> &nbsp; </td> </tr> <tr> <td> qh = mpn_div_qr_2n_pi2(qp,rp[2];np,nn,dp[2],di) </td> <td> mpn_divf_qr_2n_pi2(qp;fn,np[2],dp[2],di) </td> </tr> <tr> <td> qh = mpn_div_qr_2u_pi2(qp,rp[2];np,nn,dp[2],di) </td> <td> &nbsp; </td> </tr> <tr> <td> mpn_div_r_2n_pi1(rp[2];np,nn,dp[2],di) </td> <td> &nbsp; </td> </tr> <tr> <td> mpn_div_r_2u_pi1(rp[2];np,nn,dp[2],di) </td> <td> &nbsp; </td> </tr> <tr> <td> mpn_div_r_2n_pi2(rp[2];np,nn,dp[2],di) </td> <td> &nbsp; </td> </tr> <tr> <td> mpn_div_r_2u_pi2(rp[2];np,nn,dp[2],di) </td> <td> &nbsp; </td> </tr> </tbody> </table> </td> <tr><td> <table style="width:50%;"> <caption>Euclidian division/mod by an m-limb divisor</caption> <thead><tr><th style="text-align:left;">function</th> <tbody> <tr><td> mpn_div_qr </td> </tr> <tr><td> mpn_div_qr </td> </tr> <tr><td> mpn_div_q </td> </tr> <tr><td> mpn_div_q </td> </tr> <tr><td> mpn_div_r </td> </tr> <tr><td> mpn_div_r </td> </tr> </tbody> </table> </td> <tr><td> <table style="width:50%;"> <caption>Euclidian mod by a 1-limb divisor</caption> <thead><tr><th style="text-align:left;">function</th> <tbody> <tr><td> mpn_mod_1_1p </td> </tr> <tr><td> mpn_mod_1s_2p </td> </tr> <tr><td> mpn_mod_1s_3p </td> </tr> <tr><td> mpn_mod_1s_4p </td> </tr> </tbody> </table> </td> </table> </td> <td style="width:30%; vertical-align:top;"> <table style="width:100%;"> <!-- three rows --> <tr><td> <table style="width:100%;"> <caption>Hensel division/mod by a 1-limb divisor</caption> <thead><tr><th style="text-align:left;">function</th> <tbody> <tr><td> mpn_bdiv_qr_1_pi1 </td> </tr> <tr><td> mpn_bdiv_qr_1_pi2 </td> </tr> <tr><td> mpn_bdiv_r_1_pi1 </td> </tr> <tr><td> mpn_bdiv_r_1_pi2 </td> </tr> </tbody> </table> </td> <tr><td> <table style="width:100%;"> <caption>Hensel division/mod by a 2-limb divisor</caption> <thead><tr><th style="text-align:left;">function</th> <tbody> <tr><td> mpn_bdiv_qr_2_pi1 </td> </tr> <tr><td> mpn_bdiv_qr_2_pi2 </td> </tr> <tr><td> mpn_bdiv_r_2_pi1 </td> </tr> <tr><td> mpn_bdiv_r_2_pi2 </td> </tr> </tbody> </table> </td> <tr><td> <table style="width:100%;"> <caption>Hensel division/mod by an m-limb divisor</caption> <thead><tr><th style="text-align:left;">function</th> <tbody> <tr><td> mpn_bdiv_qr </td> </tr> <tr><td> mpn_bdiv_qr </td> </tr> <tr><td> mpn_bdiv_q </td> </tr> <tr><td> mpn_bdiv_q </td> </tr> <tr><td> mpn_bdiv_r </td> </tr> <tr><td> mpn_bdiv_r </td> </tr> </tbody> </table> </td> <tr><td> <table style="width:100%;"> <caption>Hensel mod/redc</caption> <thead><tr><th style="text-align:left;">function</th> <tbody> <tr><td> mpn_redc_1 </td> </tr> <tr><td> mpn_redc_2 </td> </tr> <tr><td> mpn_redc_n </td> </tr> </tbody> </table> </td> </table> </td> </table> </blockquote> <hr> <h3> New Toom multiplication code for unbalanced operands </h3> <p> We have been adding more Toom functions during the last years, mainly for better handling of unbalanced operands. </p> <p> We believe that even more Toom functions might be needed for optimal performance, probably going further than 8 evaluation points (used by toom54, toom63, toom72). This will largely depend on ongoing work on FFT multiplication; the faster that will become, the less room for higher-order Toom. The large number of functions is a bit disturbing, of course, since complexity is always bad. (But performance is GMP's main objective.) </p> <p> These pictures show the best current multiplication primitive for operand sizes 1 &le; bn &le; an &le; N limbs, where N is indicated in each diagram. The <i>x</i>-axis represents an, which is the size of the first operand; the y-axis represents bn, the size of the second operand. The first picture uses linear scale, the second uses log/log scale. For larger scale, click the pictures. </p> <p> Comments: </p> <ul> <li> GMP 4.3 uses schoolbook, toom22, toom32, toom42, toom33, toom44, and FFT. <li> GMP 5.0 also uses toom43, toom53, and toom63, and the algorithm selection code for unbalanced operands has been improved. <li> FFT takes over for larger operands for machines with faster hardware multiplication. AMD's processors have great hardware multiplication, Intel's Nehalem's multiplication has twice Athlon's latency, and the venerable Pentium 4 has really poor hardware multiplication. Basecase (a k a schoolbook) multiplication performance is direct function of hardware multiplication performance. And since the various Toom functions recurse into a large number of smaller basecase function invocations, their performance is also very linked. But FFT performs most of its work doing shifts and additions, and while these are also faster on Athlons, the other processors are closer behind. </ul> <table> <tr> <td> <a href="lin.k10.1024.png"><img src="lin.k10.550.png" alt=""></a> <td> <a href="log.k10.1024.png"><img src="log.k10.550.png" alt=""></a> </table> Multiplication diagrams for AMD Phenom, X4, 2.4 GHz, 512 Kibyte L2, 2048 Kibyte L3. To the left linear scale, to the right log/log scale. <br> <br> <br> <table> <tr> <td> <a href="lin.i7.1024.png"><img src="lin.i7.550.png" alt=""></a> <td> <a href="log.i7.1024.png"><img src="log.i7.550.png" alt=""></a> </table> Multiplication diagrams for Intel Nehalem, 2.67 GHz, 256 Kibyte L2, 8192 Kibyte L3. To the left linear scale, to the right log/log scale. An older Core 2 system, with its completely different cache and memory subsystems, give almost identical diagrams, down to the smallest artifact. <br> <br> <br> <table> <tr> <td> <a href="lin.p4.1024.png"><img src="lin.p4.550.png" alt=""></a> <td> <a href="log.p4.1024.png"><img src="log.p4.550.png" alt=""></a> </table> Multiplication diagrams for Intel Pentium 4 (64-bit), 3.2 GHz, 2048 Kibyte L2. To the left linear scale, to the right log/log scale. <br> <br> <br> <p> The ideas behind much of the Toom multiply improvements are due to Marco Bodrato and Alberto Zanoni. Please see <a href="http://bodrato.it/software/toom.html">Marco Bodrato's site</a> for more information about Toom multiplication. The algorithmic foundation was made by Anatolii Alexeevitch Karatsuba in 1962, and Andrei Toom in 1963. The GMP implementations were made by Marco Bodrato, Niels M枚ller, and Torbj枚rn Granlund. </p> <hr> <p> This tool is useful for for optimising the evaluation sequences of Toom functions. It exhaustively searches for possible sequences, given a goal function. It is similar to the <a href="https://ftp.gnu.org/pub/gnu/superopt">GNU superoptimizer</a>, except that this program optimises over a a simple algebraic structure and handles m-ary -> n-ary projections (the GNU superoptimizer optimises over an assembly language subset, and handles m-ary -> 1-ary projections). This is very immature code, but it seems to work. </p> <blockquote> <table style="width:80%;"> <tr> <th style="width:40%; text-align:left;"> Download files </th> <th style="text-align:left;"> Last changed </th> <th style="text-align:left;"> What changed? </th> </tr> <tr> <td> <a href="superopt-va.c">superopt-va.c</a> </td> <td> 2007-01-16/2 </td> <td> speedup+UI improvement </td> </tr> <tr> <td> <a href="goal-va.def">goal-va.def</a> </td> <td> 2007-01-16 </td> <td> bug fixes </td> </tr> </table> </blockquote> <br><br> <div id="footer-spacer"></div> </div> <div id="footer"> <table style="width:100%; background-color:#b0b0b0;"> <tr> <td style="text-align:center;"> <div style="font-size:80%"> Please send comments about this page to gmp-discuss<div style="display:inline;"> at </div>gmplib.org </div> </td> </tr> <tr> <td style="text-align:center;"> <div style="font-size:80%;"> Copyright 2000&#8211;2014 Free Software Foundation </div> </td> </tr> <tr> <td style="text-align:center;"> <div style="font-size:80%;"> Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved. </div> </td> </tr> </table> </div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10