CINXE.COM
LKML: Paul Jackson: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Paul Jackson: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Paul Jackson" href="/groupie.php?aid=5266" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/4"> [Apr]</a> 聽 <a class="nb" href="/lkml/2005/4/3"> [3]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/4/3/81" onclick="this.href='/lkml/headers'+'/2005/4/3/81';">[headers]</a>聽 <a href="/lkml/bounce/2005/4/3/81">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/4/1/310">First message in thread</a></li><li><a href="/lkml/2005/4/3/19">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/28">Paul Jackson</a></li><li><a href="/lkml/2005/4/3/55">Paul Jackson</a><ul><li class="origin"><a href="/lkml/2005/4/3/86">Paul Jackson</a><ul><li><a href="/lkml/2005/4/3/86">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/123">Paul Jackson</a></li></ul></li><li><a href="/lkml/2005/4/3/89">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/133">Paul Jackson</a></li></ul></li></ul></li><li><a href="/lkml/2005/4/3/83">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/136">Paul Jackson</a></li><li><a href="/lkml/2005/4/3/148">"Chen, Kenneth W"</a><ul><li><a href="/lkml/2005/4/4/26">Ingo Molnar</a></li></ul></li></ul></li><li><a href="/lkml/2005/4/4/4">Andy Lutomirski</a><ul><li><a href="/lkml/2005/4/4/7">Paul Jackson</a></li></ul></li></ul></li><li><a href="/lkml/2005/4/3/145">"Chen, Kenneth W"</a><ul><li><a href="/lkml/2005/4/4/90">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/4/141">Paul Jackson</a></li><li><a href="/lkml/2005/4/4/316">"Chen, Kenneth W"</a><ul><li><a href="/lkml/2005/4/4/317">Ingo Molnar</a></li></ul></li></ul></li></ul></li></ul></li></ul><div class="threadlist">Patch in this message</div><ul class="threadlist"><li><a href="/lkml/diff/2005/4/3/81/1">Get diff 1</a></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Sun, 3 Apr 2005 07:12:27 -0700</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Paul Jackson <></td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">Three more observations.<br /><br /> 1) The slowest measure_one() calls are, not surprisingly, those for the<br /> largest sizes. At least on my test system of the moment, the plot<br /> of cost versus size has one major maximum (a one hump camel, not two).<br /><br /> Seems like if we computed from smallest size upward, instead of largest<br /> downward, and stopped whenever two consecutive measurements were less<br /> than say 70% of the max seen so far, then we could save a nice chunk<br /> of the time.<br /><br /> Of course, if two hump systems exist, this is not reliable on them.<br /><br /> 2) Trivial warning fix for printf format mismatch:<br /><br />=================================== begin ===================================<br />--- 2.6.12-rc1-mm4.orig/kernel/sched.c 2005-04-03 06:32:34.000000000 -0700<br />+++ 2.6.12-rc1-mm4/kernel/sched.c 2005-04-03 06:34:07.000000000 -0700<br />@@ -5211,7 +5211,7 @@ void __devinit calibrate_migration_costs<br /> #ifdef CONFIG_X86<br /> cpu_khz/1000<br /> #else<br />- -1<br />+ -1L<br /> #endif<br /> );<br /> printk("---------------------\n");<br />==================================== end ====================================<br /><br /> 3) I was noticing that my test system was only showing a couple of distinct<br /> values for cpu_distance, even though it has 4 distinct distances for<br /> values of node_distance. So I coded up a variant of cpu_distance that<br /> converts the problem to a node_distance problem, and got the following<br /> cost matrix:<br /><br />=================================== begin ===================================<br />Total of 8 processors activated (15515.64 BogoMIPS).<br />---------------------<br />migration cost matrix (max_cache_size: 0, cpu: -1 MHz):<br />---------------------<br /> [00] [01] [02] [03] [04] [05] [06] [07]<br />[00]: - 4.0(0) 21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)<br />[01]: 4.0(0) - 21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)<br />[02]: 21.7(1) 21.7(1) - 4.0(0) 25.3(3) 25.3(3) 25.2(2) 25.2(2)<br />[03]: 21.7(1) 21.7(1) 4.0(0) - 25.3(3) 25.3(3) 25.2(2) 25.2(2)<br />[04]: 25.2(2) 25.2(2) 25.3(3) 25.3(3) - 4.0(0) 21.7(1) 21.7(1)<br />[05]: 25.2(2) 25.2(2) 25.3(3) 25.3(3) 4.0(0) - 21.7(1) 21.7(1)<br />[06]: 25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1) - 4.0(0)<br />[07]: 25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1) 4.0(0) -<br />---------------------<br />cacheflush times [4]: 4.0 (4080540) 21.7 (21781380) 25.2 (25259428) 25.3 (25372682)<br />---------------------<br />==================================== end ====================================<br /><br /> The code (below) is twice as complicated, the runtime twice as long,<br /> and it's less intuitive - sched_domains seems more appropriate as<br /> the basis for migration costs than the node distances in SLIT tables.<br /> Finally, I don't know if distinguishing between costs of 21.7 and<br /> 25.3 is worth much.<br /><br /> So the case for switching to this node_distance base is less than<br /> persuasive, to put it politely.<br /><br /> Perhaps it's only real value is in highlighting that perhaps the<br /> code to setup the sched_domain topology on our ia64 SN2 Altix systems<br /> is too coarse, given that it only found two distance values, not four.<br /><br /> If that's the case, I will have to call in someone else to examine<br /> whether it's appropriate to refine the sched_domains setup for this<br /> kind of system. I'm not competent to determine that, nor to code it.<br /><br /> Here's the code that bases cpu_distance on node_distance:<br /><br />=================================== begin ===================================<br />__init static int cmpint(const void *a, const void *b)<br />{<br /> return *(int *)a - *(int *)b;<br />}<br />/*<br /> * Estimate distance of two CPUs based on their node_distance,<br /> * mapping to sequential integers 0, 1, ... N-1, for the N<br /> * distinct values of distances (closest CPUs are distance 0,<br /> * farthest CPUs are distance N-1). If there are more than<br /> * MAX_DOMAIN_DISTANCE distinct different distance values,<br /> * collapse the larger distances to one value.<br /> */<br />__init static unsigned long cpu_distance(int cpu1, int cpu2)<br />{<br /> static int num_dist_vals;<br /> static int node_distances[MAX_DOMAIN_DISTANCE];<br /> int dist = node_distance(cpu_to_node(cpu1), cpu_to_node(cpu2));<br /> int v;<br /> if (num_dist_vals == 0) {<br /> int i, j, k;<br /> /*<br /> * For each dist not already in node_distances[], if there's<br /> * room or it's less than an existing 'luser' entry, add it.<br /> */<br /> for_each_online_node(i) {<br /> for_each_online_node(j) {<br /> int dist = node_distance(i, j);<br /> int luser = -1;<br /> for (k = 0; k < num_dist_vals; k++) {<br /> if (node_distances[k] == dist)<br /> break;<br /> if (dist < node_distances[k])<br /> luser = k;<br /> }<br /> if (node_distances[k] == dist)<br /> continue;<br /> else if (num_dist_vals < MAX_DOMAIN_DISTANCE)<br /> node_distances[num_dist_vals++] = dist;<br /> else if (luser >= 0)<br /> node_distances[luser] = dist;<br /> }<br /> }<br /> /*<br /> * Now node_distances[] has the smallest num_dist_vals<br /> * distinct node distance values. Lets sort them.<br /> */<br /> sort(node_distances, num_dist_vals, sizeof(int), cmpint, NULL);<br /> if (migration_debug) {<br /> printk("Sorted node_distances used for cpu distances\n");<br /> for(v = 0; v < num_dist_vals; v++)<br /> printk(" node_distances[%d] = %d\n",<br /> v, node_distances[v]);<br /> }<br /> }<br /> /* The "- 1" maps all larger sizes onto one */<br /> for (v = 0; v < num_dist_vals - 1; v++)<br /> if (dist == node_distances[v])<br /> break;<br /> return v;<br />}<br />==================================== end ====================================<br /><br />-- <br /> I won't rest till it's the best ...<br /> Programmer, Linux Scalability<br /> Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo@vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-04-06 13:31 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>