CINXE.COM

LKML: Ingo Molnar: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Ingo Molnar: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Ingo Molnar" href="/groupie.php?aid=18" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/4"> [Apr]</a> 聽 <a class="nb" href="/lkml/2005/4/3"> [3]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/4/3/83" onclick="this.href='/lkml/headers'+'/2005/4/3/83';">[headers]</a>聽 <a href="/lkml/bounce/2005/4/3/83">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/4/1/310">First message in thread</a></li><li><a href="/lkml/2005/4/3/19">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/28">Paul Jackson</a></li><li><a href="/lkml/2005/4/3/55">Paul Jackson</a><ul><li><a href="/lkml/2005/4/3/81">Paul Jackson</a><ul><li><a href="/lkml/2005/4/3/86">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/123">Paul Jackson</a></li></ul></li><li><a href="/lkml/2005/4/3/89">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/133">Paul Jackson</a></li></ul></li></ul></li><li class="origin"><a href="/lkml/2005/4/3/136">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/3/136">Paul Jackson</a></li><li><a href="/lkml/2005/4/3/148">"Chen, Kenneth W"</a><ul><li><a href="/lkml/2005/4/4/26">Ingo Molnar</a></li></ul></li></ul></li><li><a href="/lkml/2005/4/4/4">Andy Lutomirski</a><ul><li><a href="/lkml/2005/4/4/7">Paul Jackson</a></li></ul></li></ul></li><li><a href="/lkml/2005/4/3/145">"Chen, Kenneth W"</a><ul><li><a href="/lkml/2005/4/4/90">Ingo Molnar</a><ul><li><a href="/lkml/2005/4/4/141">Paul Jackson</a></li><li><a href="/lkml/2005/4/4/316">"Chen, Kenneth W"</a><ul><li><a href="/lkml/2005/4/4/317">Ingo Molnar</a></li></ul></li></ul></li></ul></li></ul></li></ul><div class="threadlist">Patch in this message</div><ul class="threadlist"><li><a href="/lkml/diff/2005/4/3/83/1">Get diff 1</a></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Sun, 3 Apr 2005 16:29:59 +0200</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Ingo Molnar &lt;&gt;</td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody"><br />* Paul Jackson &lt;pj&#64;engr.sgi.com&gt; wrote:<br /><br />&gt; Ok - that flies, or at least walks. It took 53 seconds to compute <br />&gt; this cost matrix.<br /><br />53 seconds is too much - i'm working on reducing it.<br /><br />&gt; Here's what it prints, on a small 8 CPU ia64 SN2 Altix, with<br />&gt; the migration_debug prints formatted separately from the primary<br />&gt; table, for ease of reading:<br />&gt; <br />&gt; Total of 8 processors activated (15548.60 BogoMIPS).<br />&gt; ---------------------<br />&gt; migration cost matrix (max_cache_size: 0, cpu: -1 MHz):<br />&gt; ---------------------<br />&gt; [00] [01] [02] [03] [04] [05] [06] [07]<br />&gt; [00]: - 4.0(0) 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1)<br />&gt; [01]: 4.0(0) - 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1)<br />&gt; [02]: 21.7(1) 21.7(1) - 4.0(0) 21.7(1) 21.7(1) 21.7(1) 21.7(1)<br />&gt; [03]: 21.7(1) 21.7(1) 4.0(0) - 21.7(1) 21.7(1) 21.7(1) 21.7(1)<br />&gt; [04]: 21.7(1) 21.7(1) 21.7(1) 21.7(1) - 4.0(0) 21.7(1) 21.7(1)<br />&gt; [05]: 21.7(1) 21.7(1) 21.7(1) 21.7(1) 4.0(0) - 21.7(1) 21.7(1)<br />&gt; [06]: 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1) - 4.0(0)<br />&gt; [07]: 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1) 21.7(1) 4.0(0) -<br /><br />how close are these numbers to the real worst-case migration costs on <br />that box? What are the cache sizes and what is their hierarchies?<br /><br />i've attached another snapshot - there is no speedup yet, but i've <br />changed the debug printout to be separate from the matrix printout, and <br />i've fixed the cache_size printout. (the printout of a 68K cache was <br />incorrect - that was just the last iteration step)<br /><br />it will be interesting to see what effect the above assymetry in <br />migration costs will have on scheduling. With 4msec intra-node cutoff it <br />should be pretty migration-happy, inter-node 21 msec is rather high and <br />should avoid unnecessary migration.<br /><br />is there any workload that shows the same scheduling related performance <br />regressions, other than Ken's $1m+ benchmark kit?<br /><br /> Ingo<br />--- linux/kernel/sched.c.orig<br />+++ linux/kernel/sched.c<br />&#64;&#64; -47,6 +47,7 &#64;&#64;<br /> #include &lt;linux/syscalls.h&gt;<br /> #include &lt;linux/times.h&gt;<br /> #include &lt;linux/acct.h&gt;<br />+#include &lt;linux/vmalloc.h&gt;<br /> #include &lt;asm/tlb.h&gt;<br /> <br /> #include &lt;asm/unistd.h&gt;<br />&#64;&#64; -4639,6 +4640,453 &#64;&#64; void __devinit init_sched_build_groups(s<br /> last-&gt;next = first;<br /> }<br /> <br />+/*<br />+ * Self-tuning task migration cost measurement between source and target CPUs.<br />+ *<br />+ * This is done by measuring the cost of manipulating buffers of varying<br />+ * sizes. For a given buffer-size here are the steps that are taken:<br />+ *<br />+ * 1) the source CPU reads a big buffer to flush caches<br />+ * 2) the source CPU reads+dirties a shared buffer <br />+ * 3) the target CPU reads+dirties the same shared buffer<br />+ * 4) the target CPU reads a big buffer to flush caches<br />+ *<br />+ * We measure how long steps #2 and #3 take (step #1 and #4 is not<br />+ * measured), in the following 4 scenarios:<br />+ *<br />+ * - source: CPU1, target: CPU2 | cost1<br />+ * - source: CPU2, target: CPU1 | cost2<br />+ * - source: CPU1, target: CPU1 | cost3<br />+ * - source: CPU2, target: CPU2 | cost4<br />+ *<br />+ * We then calculate the cost3+cost4-cost1-cost2 difference - this is<br />+ * the cost of migration.<br />+ *<br />+ * We then start off from a large buffer-size and iterate down to smaller<br />+ * buffer sizes, in 5% steps - measuring each buffer-size separately, and<br />+ * do a maximum search for the cost. The maximum cost for a migration<br />+ * occurs when the working set is just below the effective cache size.<br />+ */<br />+<br />+<br />+/*<br />+ * Flush the cache by reading a big buffer. (We want all writeback<br />+ * activities to subside. Works only if cache size is larger than<br />+ * 2*size, but that is good enough as the biggest migration effect<br />+ * is around cachesize size.)<br />+ */<br />+__init static void read_cache(void *__cache, unsigned long __size)<br />+{<br />+ unsigned long size = __size/sizeof(long);<br />+ unsigned long *cache = __cache;<br />+ volatile unsigned long data;<br />+ int i;<br />+<br />+ for (i = 0; i &lt; 2*size; i += 4)<br />+ data = cache[i];<br />+}<br />+<br />+<br />+/*<br />+ * Dirty a big buffer in a hard-to-predict (for the L2 cache) way. This<br />+ * is the operation that is timed, so we try to generate unpredictable<br />+ * cachemisses that still end up filling the L2 cache:<br />+ */<br />+__init static void touch_cache(void *__cache, unsigned long __size)<br />+{<br />+ unsigned long size = __size/sizeof(long), chunk1 = size/3,<br />+ chunk2 = 2*size/3;<br />+ unsigned long *cache = __cache;<br />+ int i;<br />+<br />+ for (i = 0; i &lt; size/6; i += 4) {<br />+ switch (i % 6) {<br />+ case 0: cache[i]++;<br />+ case 1: cache[size-1-i]++;<br />+ case 2: cache[chunk1-i]++;<br />+ case 3: cache[chunk1+i]++;<br />+ case 4: cache[chunk2-i]++;<br />+ case 5: cache[chunk2+i]++;<br />+ }<br />+ }<br />+}<br />+<br />+struct flush_data {<br />+ unsigned long source, target;<br />+ void (*fn)(void *, unsigned long);<br />+ void *cache;<br />+ unsigned long size;<br />+ unsigned long long delta;<br />+};<br />+<br />+/*<br />+ * Dirty L2 on the source CPU:<br />+ */<br />+__init static void source_handler(void *__data)<br />+{<br />+ struct flush_data *data = __data;<br />+<br />+ if (smp_processor_id() != data-&gt;source)<br />+ return;<br />+<br />+ /*<br />+ * Make sure the cache is quiet on this CPU,<br />+ * before starting measurement:<br />+ */<br />+ read_cache(data-&gt;cache, data-&gt;size);<br />+<br />+ data-&gt;delta = sched_clock();<br />+ touch_cache(data-&gt;cache, data-&gt;size);<br />+}<br />+<br />+/*<br />+ * Dirty the L2 cache on this CPU and then access the shared<br />+ * buffer. (which represents the working set of the migrated task.)<br />+ */<br />+__init static void target_handler(void *__data)<br />+{<br />+ struct flush_data *data = __data;<br />+<br />+ if (smp_processor_id() != data-&gt;target)<br />+ return;<br />+<br />+ touch_cache(data-&gt;cache, data-&gt;size);<br />+ data-&gt;delta = sched_clock() - data-&gt;delta;<br />+ /*<br />+ * Make sure the cache is quiet, so that it does not interfere<br />+ * with the next run on this CPU:<br />+ */<br />+ read_cache(data-&gt;cache, data-&gt;size);<br />+}<br />+<br />+/*<br />+ * Measure the cache-cost of one task migration. Returns in units of nsec.<br />+ */<br />+__init static unsigned long long measure_one(void *cache, unsigned long size,<br />+ int source, int target)<br />+{<br />+ struct flush_data data;<br />+<br />+ data.source = source;<br />+ data.target = target;<br />+ data.size = size;<br />+ data.cache = cache;<br />+<br />+ if (on_each_cpu(source_handler, &amp;data, 1, 1) != 0) {<br />+ printk("measure_one: timed out waiting for other CPUs\n");<br />+ return -1;<br />+ }<br />+ if (on_each_cpu(target_handler, &amp;data, 1, 1) != 0) {<br />+ printk("measure_one: timed out waiting for other CPUs\n");<br />+ return -1;<br />+ }<br />+<br />+ return data.delta;<br />+}<br />+<br />+/*<br />+ * Maximum cache-size that the scheduler should try to measure.<br />+ * Architectures with larger caches should tune this up during<br />+ * bootup. Gets used in the domain-setup code (i.e. during SMP<br />+ * bootup).<br />+ */<br />+__initdata unsigned int max_cache_size;<br />+<br />+static int __init setup_max_cache_size(char *str)<br />+{<br />+ get_option(&amp;str, &amp;max_cache_size);<br />+ return 1;<br />+}<br />+<br />+__setup("max_cache_size=", setup_max_cache_size);<br />+<br />+/*<br />+ * The migration cost is a function of 'domain distance'. Domain<br />+ * distance is the number of steps a CPU has to iterate down its<br />+ * domain tree to share a domain with the other CPU. The farther<br />+ * two CPUs are from each other, the larger the distance gets.<br />+ *<br />+ * Note that we use the distance only to cache measurement results,<br />+ * the distance value is not used numerically otherwise. When two<br />+ * CPUs have the same distance it is assumed that the migration<br />+ * cost is the same. (this is a simplification but quite practical)<br />+ */<br />+#define MAX_DOMAIN_DISTANCE 32<br />+<br />+static __initdata unsigned long long migration_cost[MAX_DOMAIN_DISTANCE];<br />+<br />+/*<br />+ * Allow override of migration cost - in units of microseconds.<br />+ * E.g. migration_cost=1000,2000,3000 will set up a level-1 cost<br />+ * of 1 msec, level-2 cost of 2 msecs and level3 cost of 3 msecs:<br />+ */<br />+static int __init migration_cost_setup(char *str)<br />+{<br />+ int ints[MAX_DOMAIN_DISTANCE+1], i;<br />+<br />+ str = get_options(str, ARRAY_SIZE(ints), ints);<br />+<br />+ printk("#ints: %d\n", ints[0]);<br />+ for (i = 1; i &lt;= ints[0]; i++) {<br />+ migration_cost[i-1] = (unsigned long long)ints[i]*1000;<br />+ printk("migration_cost[%d]: %Ld\n", i-1, migration_cost[i-1]);<br />+ }<br />+ return 1;<br />+}<br />+<br />+__setup ("migration_cost=", migration_cost_setup);<br />+<br />+/*<br />+ * Global multiplier (divisor) for migration-cutoff values,<br />+ * in percentiles. E.g. use a value of 150 to get 1.5 times<br />+ * longer cache-hot cutoff times.<br />+ *<br />+ * (We scale it from 100 to 128 to long long handling easier.)<br />+ */<br />+<br />+#define MIGRATION_FACTOR_SCALE 128<br />+<br />+static __initdata unsigned int migration_factor = MIGRATION_FACTOR_SCALE;<br />+<br />+static int __init setup_migration_factor(char *str)<br />+{<br />+ get_option(&amp;str, &amp;migration_factor);<br />+ migration_factor = migration_factor * MIGRATION_FACTOR_SCALE / 100;<br />+ return 1;<br />+}<br />+<br />+__setup("migration_factor=", setup_migration_factor);<br />+<br />+static __initdata unsigned int migration_debug;<br />+<br />+static int __init setup_migration_debug(char *str)<br />+{<br />+ get_option(&amp;str, &amp;migration_debug);<br />+ return 1;<br />+}<br />+<br />+__setup("migration_debug=", setup_migration_debug);<br />+<br />+/*<br />+ * Estimated distance of two CPUs, measured via the number of domains<br />+ * we have to pass for the two CPUs to be in the same span:<br />+ */<br />+__init static unsigned long cpu_distance(int cpu1, int cpu2)<br />+{<br />+ unsigned long distance = 0;<br />+ struct sched_domain *sd;<br />+<br />+ for_each_domain(cpu1, sd) {<br />+ WARN_ON(!cpu_isset(cpu1, sd-&gt;span));<br />+ if (cpu_isset(cpu2, sd-&gt;span))<br />+ return distance;<br />+ distance++;<br />+ }<br />+ if (distance &gt;= MAX_DOMAIN_DISTANCE) {<br />+ WARN_ON(1);<br />+ distance = MAX_DOMAIN_DISTANCE-1;<br />+ }<br />+<br />+ return distance;<br />+}<br />+<br />+/*<br />+ * Measure a series of task migrations and return the average<br />+ * result. Since this code runs early during bootup the system<br />+ * is 'undisturbed' and the average latency makes sense.<br />+ *<br />+ * The algorithm in essence auto-detects the relevant cache-size,<br />+ * so it will properly detect different cachesizes for different<br />+ * cache-hierarchies, depending on how the CPUs are connected.<br />+ *<br />+ * Architectures can prime the upper limit of the search range via<br />+ * max_cache_size, otherwise the search range defaults to 20MB...64K.<br />+ */<br />+#define SEARCH_SCOPE 2<br />+#define MIN_CACHE_SIZE (64*1024U)<br />+#define DEFAULT_CACHE_SIZE (5*1024*1024U)<br />+#define ITERATIONS 2<br />+<br />+__init static unsigned long long measure_cacheflush_time(int cpu1, int cpu2)<br />+{<br />+ unsigned long long cost = 0, cost1 = 0, cost2 = 0;<br />+ unsigned int size, cache_size = 0;<br />+ void *cache;<br />+ int i;<br />+<br />+ /*<br />+ * Search from max_cache_size*5 down to 64K - the real relevant<br />+ * cachesize has to lie somewhere inbetween.<br />+ */<br />+ if (max_cache_size)<br />+ size = max(max_cache_size * SEARCH_SCOPE, MIN_CACHE_SIZE);<br />+ else<br />+ /*<br />+ * Since we have no estimation about the relevant<br />+ * search range<br />+ */<br />+ size = DEFAULT_CACHE_SIZE * SEARCH_SCOPE;<br />+ <br />+ if (!cpu_online(cpu1) || !cpu_online(cpu2)) {<br />+ printk("cpu %d and %d not both online!\n", cpu1, cpu2);<br />+ return 0;<br />+ }<br />+ /*<br />+ * We allocate 2*size so that read_cache() can access a<br />+ * larger buffer:<br />+ */<br />+ cache = vmalloc(2*size);<br />+ if (!cache) {<br />+ printk("could not vmalloc %d bytes for cache!\n", 2*size);<br />+ return 1000000; // return 1 msec on very small boxen<br />+ }<br />+ memset(cache, 0, 2*size);<br />+<br />+ while (size &gt;= MIN_CACHE_SIZE) {<br />+ /*<br />+ * Measure the migration cost of 'size' bytes, over an<br />+ * average of 10 runs:<br />+ *<br />+ * (We perturb the cache size by a small (0..4k)<br />+ * value to compensate size/alignment related artifacts.<br />+ * We also subtract the cost of the operation done on<br />+ * the same CPU.)<br />+ */<br />+ cost1 = 0;<br />+ for (i = 0; i &lt; ITERATIONS; i++) {<br />+ cost1 += measure_one(cache, size - i*1024, cpu1, cpu2);<br />+ cost1 += measure_one(cache, size - i*1024, cpu2, cpu1);<br />+ }<br />+<br />+ cost2 = 0;<br />+ for (i = 0; i &lt; ITERATIONS; i++) {<br />+ cost2 += measure_one(cache, size - i*1024, cpu1, cpu1);<br />+ cost2 += measure_one(cache, size - i*1024, cpu2, cpu2);<br />+ }<br />+ if (cost1 &gt; cost2) {<br />+ if (cost &lt; cost1 - cost2) {<br />+ cost = cost1 - cost2;<br />+ cache_size = size;<br />+ }<br />+ }<br />+ if (migration_debug)<br />+ printk("-&gt; [%d][%d][%7d] %3ld.%ld (%ld): (%8Ld %8Ld %8Ld)\n",<br />+ cpu1, cpu2, size,<br />+ (long)cost / 1000000,<br />+ ((long)cost / 100000) % 10,<br />+ cpu_distance(cpu1, cpu2),<br />+ cost1, cost2, cost1-cost2);<br />+ /*<br />+ * Iterate down the cachesize (in a non-power-of-2<br />+ * way to avoid artifacts) in 5% decrements:<br />+ */<br />+ size = size * 19 / 20;<br />+ }<br />+ /*<br />+ * Get the per-iteration migration cost:<br />+ */<br />+ do_div(cost, 2*ITERATIONS);<br />+<br />+ if (migration_debug)<br />+ printk("[%d][%d] cache size found: %d, cost: %Ld\n",<br />+ cpu1, cpu2, cache_size, cost);<br />+<br />+ vfree(cache);<br />+<br />+ /*<br />+ * A task is considered 'cache cold' if at least 2 times<br />+ * the worst-case cost of migration has passed.<br />+ *<br />+ * (this limit is only listened to if the load-balancing<br />+ * situation is 'nice' - if there is a large imbalance we<br />+ * ignore it for the sake of CPU utilization and<br />+ * processing fairness.)<br />+ */<br />+ return 2 * cost * migration_factor / MIGRATION_FACTOR_SCALE;<br />+}<br />+<br />+void __devinit calibrate_migration_costs(void)<br />+{<br />+ int cpu1 = -1, cpu2 = -1, cpu;<br />+ struct sched_domain *sd;<br />+ unsigned long distance, max_distance = 0;<br />+ unsigned long long cost;<br />+<br />+ for_each_online_cpu(cpu1)<br />+ printk(" [%02d]", cpu1);<br />+ printk("\n");<br />+ /*<br />+ * First pass - calculate the cacheflush times:<br />+ */<br />+ for_each_online_cpu(cpu1) {<br />+ for_each_online_cpu(cpu2) {<br />+ if (cpu1 == cpu2)<br />+ continue;<br />+ distance = cpu_distance(cpu1, cpu2);<br />+ max_distance = max(max_distance, distance);<br />+ /*<br />+ * Do we have the result cached already?<br />+ */<br />+ if (migration_cost[distance])<br />+ cost = migration_cost[distance];<br />+ else {<br />+ cost = measure_cacheflush_time(cpu1, cpu2);<br />+ migration_cost[distance] = cost;<br />+ }<br />+ }<br />+ }<br />+ /*<br />+ * Second pass - update the sched domain hierarchy with<br />+ * the new cache-hot-time estimations:<br />+ */<br />+ for_each_online_cpu(cpu) {<br />+ distance = 0;<br />+ for_each_domain(cpu, sd) {<br />+ sd-&gt;cache_hot_time = migration_cost[distance];<br />+ distance++;<br />+ }<br />+ }<br />+ /*<br />+ * Print the matrix:<br />+ */<br />+ printk("---------------------\n");<br />+ printk("migration cost matrix (max_cache_size: %d, cpu: %ld MHz):\n",<br />+ max_cache_size,<br />+#ifdef CONFIG_X86<br />+ cpu_khz/1000<br />+#else<br />+ -1<br />+#endif<br />+ );<br />+ printk("---------------------\n");<br />+ printk(" ");<br />+ for_each_online_cpu(cpu1) {<br />+ printk("[%02d]: ", cpu1);<br />+ for_each_online_cpu(cpu2) {<br />+ if (cpu1 == cpu2) {<br />+ printk(" - ");<br />+ continue;<br />+ }<br />+ distance = cpu_distance(cpu1, cpu2);<br />+ max_distance = max(max_distance, distance);<br />+ cost = migration_cost[distance];<br />+ printk(" %2ld.%ld(%ld)", (long)cost / 1000000,<br />+ ((long)cost / 100000) % 10, distance);<br />+ }<br />+ printk("\n");<br />+ }<br />+ printk("---------------------\n");<br />+ printk("cacheflush times [%ld]:", max_distance+1);<br />+ for (distance = 0; distance &lt;= max_distance; distance++) {<br />+ cost = migration_cost[distance];<br />+ printk(" %ld.%ld (%Ld)", (long)cost / 1000000,<br />+ ((long)cost / 100000) % 10, cost);<br />+ }<br />+ printk("\n");<br />+ printk("---------------------\n");<br />+<br />+}<br />+<br /> <br /> #ifdef ARCH_HAS_SCHED_DOMAIN<br /> extern void __devinit arch_init_sched_domains(void);<br />&#64;&#64; -4820,6 +5268,10 &#64;&#64; static void __devinit arch_init_sched_do<br /> #endif<br /> cpu_attach_domain(sd, i);<br /> }<br />+ /*<br />+ * Tune cache-hot values:<br />+ */<br />+ calibrate_migration_costs();<br /> }<br /> <br /> #ifdef CONFIG_HOTPLUG_CPU<br />--- linux/arch/ia64/kernel/domain.c.orig<br />+++ linux/arch/ia64/kernel/domain.c<br />&#64;&#64; -358,6 +358,10 &#64;&#64; next_sg:<br /> #endif<br /> cpu_attach_domain(sd, i);<br /> }<br />+ /*<br />+ * Tune cache-hot values:<br />+ */<br />+ calibrate_migration_costs();<br /> }<br /> <br /> void __devinit arch_destroy_sched_domains(void)<br />--- linux/arch/i386/kernel/smpboot.c.orig<br />+++ linux/arch/i386/kernel/smpboot.c<br />&#64;&#64; -873,6 +873,7 &#64;&#64; static void smp_tune_scheduling (void)<br /> cachesize = 16; /* Pentiums, 2x8kB cache */<br /> bandwidth = 100;<br /> }<br />+ max_cache_size = cachesize * 1024;<br /> }<br /> }<br /> <br />--- linux/include/asm-ia64/topology.h.orig<br />+++ linux/include/asm-ia64/topology.h<br />&#64;&#64; -51,7 +51,6 &#64;&#64; void build_cpu_to_node_map(void);<br /> .max_interval = 320, \<br /> .busy_factor = 320, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (10*1000000), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \<br />&#64;&#64; -73,7 +72,6 &#64;&#64; void build_cpu_to_node_map(void);<br /> .max_interval = 320, \<br /> .busy_factor = 320, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (10*1000000), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \<br />--- linux/include/linux/topology.h.orig<br />+++ linux/include/linux/topology.h<br />&#64;&#64; -86,7 +86,6 &#64;&#64;<br /> .max_interval = 2, \<br /> .busy_factor = 8, \<br /> .imbalance_pct = 110, \<br />- .cache_hot_time = 0, \<br /> .cache_nice_tries = 0, \<br /> .per_cpu_gain = 25, \<br /> .flags = SD_LOAD_BALANCE \<br />&#64;&#64; -112,7 +111,6 &#64;&#64;<br /> .max_interval = 4, \<br /> .busy_factor = 64, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (5*1000000/2), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \<br />--- linux/include/linux/sched.h.orig<br />+++ linux/include/linux/sched.h<br />&#64;&#64; -527,7 +527,17 &#64;&#64; extern cpumask_t cpu_isolated_map;<br /> extern void init_sched_build_groups(struct sched_group groups[],<br /> cpumask_t span, int (*group_fn)(int cpu));<br /> extern void cpu_attach_domain(struct sched_domain *sd, int cpu);<br />+<br /> #endif /* ARCH_HAS_SCHED_DOMAIN */<br />+<br />+/*<br />+ * Maximum cache size the migration-costs auto-tuning code will<br />+ * search from:<br />+ */<br />+extern unsigned int max_cache_size;<br />+<br />+extern void calibrate_migration_costs(void);<br />+<br /> #endif /* CONFIG_SMP */<br /> <br /> <br />--- linux/include/asm-i386/topology.h.orig<br />+++ linux/include/asm-i386/topology.h<br />&#64;&#64; -75,7 +75,6 &#64;&#64; static inline cpumask_t pcibus_to_cpumas<br /> .max_interval = 32, \<br /> .busy_factor = 32, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (10*1000000), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \<br />--- linux/include/asm-ppc64/topology.h.orig<br />+++ linux/include/asm-ppc64/topology.h<br />&#64;&#64; -46,7 +46,6 &#64;&#64; static inline int node_to_first_cpu(int <br /> .max_interval = 32, \<br /> .busy_factor = 32, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (10*1000000), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \<br />--- linux/include/asm-x86_64/topology.h.orig<br />+++ linux/include/asm-x86_64/topology.h<br />&#64;&#64; -48,7 +48,6 &#64;&#64; static inline cpumask_t __pcibus_to_cpum<br /> .max_interval = 32, \<br /> .busy_factor = 32, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (10*1000000), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \<br />--- linux/include/asm-mips/mach-ip27/topology.h.orig<br />+++ linux/include/asm-mips/mach-ip27/topology.h<br />&#64;&#64; -24,7 +24,6 &#64;&#64; extern unsigned char __node_distances[MA<br /> .max_interval = 32, \<br /> .busy_factor = 32, \<br /> .imbalance_pct = 125, \<br />- .cache_hot_time = (10*1000), \<br /> .cache_nice_tries = 1, \<br /> .per_cpu_gain = 100, \<br /> .flags = SD_LOAD_BALANCE \</pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-04-06 13:31 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10