CINXE.COM

LKML: Christoph Lameter: [PATCH] Pageset Localization V2

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Christoph Lameter: [PATCH] Pageset Localization V2</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Christoph Lameter" href="/groupie.php?aid=843" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/3"> [Mar]</a> 聽 <a class="nb" href="/lkml/2005/3/30"> [30]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/3/30/10" onclick="this.href='/lkml/headers'+'/2005/3/30/10';">[headers]</a>聽 <a href="/lkml/bounce/2005/3/30/10">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/3/30/10">First message in thread</a></li><li class="origin"><a href="/lkml/2005/3/30/71">Christoph Lameter</a><ul><li><a href="/lkml/2005/3/30/71">Christoph Hellwig</a><ul><li><a href="/lkml/2005/3/30/88">shobhit dayal</a><ul><li><a href="/lkml/2005/3/31/148">Christoph Hellwig</a><ul><li><a href="/lkml/2005/3/31/155">Matthew Wilcox</a></li><li><a href="/lkml/2005/3/31/172">Christoph Lameter</a></li></ul></li></ul></li></ul></li><li><a href="/lkml/2005/3/30/94">Matthew Wilcox</a><ul><li><a href="/lkml/2005/3/30/326">Christoph Lameter</a></li></ul></li></ul></li></ul><div class="threadlist">Patch in this message</div><ul class="threadlist"><li><a href="/lkml/diff/2005/3/30/10/1">Get diff 1</a></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Tue, 29 Mar 2005 21:51:08 -0800 (PST)</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Christoph Lameter &lt;&gt;</td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">[PATCH] Pageset Localization V2</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">This patch modifies the way pagesets in struct zone are allocated. It relocates<br />the pagesets contained in a zone for each cpu to the node that is nearest to<br />the cpu instead keeping the pagesets in the (possibly remote) target zone.<br />This means that the operations to manage caches of pages on remote zones can<br />be done with information available in the local zone.<br /><br />The patch depends on the API changes to the slab allocator posted before<br />this patch.<br /><br />AIM7 benchmark on a 32 CPU SMP system:<br /><br />w/o patches:<br />Tasks jobs/min jti jobs/min/task real cpu<br /> 1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005<br /> 100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005<br /> 200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005<br /> 300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005<br /> 400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005<br /> 500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005<br /> 600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005<br /> 700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005<br /> 800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005<br /> 900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005<br /> 1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005<br /><br />with Slab API changes and pageset patch:<br /><br />Tasks jobs/min jti jobs/min/task real cpu<br /> 1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005<br /> 100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005<br /> 200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005<br /> 300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005<br /> 400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005<br /> 500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005<br /> 600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005<br /> 700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005<br /> 800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005<br /> 900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005<br /> 1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005<br /><br />The major improvement is in the mid range when running 100-600 tasks. For 1 task<br />there is barely any improvement since most data will be locally allocated. In the high<br />range other factors seem to become important.<br /><br />Patch against 2.6.11.6-bk3<br /><br />Signed-off-by: Christoph Lameter &lt;christoph&#64;lameter.com&gt;<br />Signed-off-by: Shobhit Dayal &lt;shobhit&#64;calsoftinc.com&gt;<br />Signed-off-by: Shai Fultheim &lt;Shai&#64;Scalex86.org&gt;<br /><br />Index: linux-2.6.11/drivers/base/node.c<br />===================================================================<br />--- linux-2.6.11.orig/drivers/base/node.c 2005-03-21 13:18:06.000000000 -0800<br />+++ linux-2.6.11/drivers/base/node.c 2005-03-21 13:22:06.000000000 -0800<br />&#64;&#64; -87,7 +87,7 &#64;&#64; static ssize_t node_read_numastat(struct<br /> for (i = 0; i &lt; MAX_NR_ZONES; i++) {<br /> struct zone *z = &amp;pg-&gt;node_zones[i];<br /> for (cpu = 0; cpu &lt; NR_CPUS; cpu++) {<br />- struct per_cpu_pageset *ps = &amp;z-&gt;pageset[cpu];<br />+ struct per_cpu_pageset *ps = z-&gt;pageset[cpu];<br /> numa_hit += ps-&gt;numa_hit;<br /> numa_miss += ps-&gt;numa_miss;<br /> numa_foreign += ps-&gt;numa_foreign;<br />Index: linux-2.6.11/include/linux/mm.h<br />===================================================================<br />--- linux-2.6.11.orig/include/linux/mm.h 2005-03-21 13:18:06.000000000 -0800<br />+++ linux-2.6.11/include/linux/mm.h 2005-03-21 13:22:06.000000000 -0800<br />&#64;&#64; -691,6 +691,7 &#64;&#64; extern void mem_init(void);<br /> extern void show_mem(void);<br /> extern void si_meminfo(struct sysinfo * val);<br /> extern void si_meminfo_node(struct sysinfo *val, int nid);<br />+extern void setup_per_cpu_pageset(void);<br /><br /> /* prio_tree.c */<br /> void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);<br />Index: linux-2.6.11/include/linux/mmzone.h<br />===================================================================<br />--- linux-2.6.11.orig/include/linux/mmzone.h 2005-03-21 13:21:59.000000000 -0800<br />+++ linux-2.6.11/include/linux/mmzone.h 2005-03-21 13:22:06.000000000 -0800<br />&#64;&#64; -122,7 +122,7 &#64;&#64; struct zone {<br /> */<br /> unsigned long lowmem_reserve[MAX_NR_ZONES];<br /><br />- struct per_cpu_pageset pageset[NR_CPUS];<br />+ struct per_cpu_pageset *pageset[NR_CPUS];<br /><br /> /*<br /> * free areas of different sizes<br />Index: linux-2.6.11/init/main.c<br />===================================================================<br />--- linux-2.6.11.orig/init/main.c 2005-03-21 13:18:06.000000000 -0800<br />+++ linux-2.6.11/init/main.c 2005-03-21 13:22:06.000000000 -0800<br />&#64;&#64; -490,6 +490,7 &#64;&#64; asmlinkage void __init start_kernel(void<br /> vfs_caches_init_early();<br /> mem_init();<br /> kmem_cache_init();<br />+ setup_per_cpu_pageset();<br /> numa_policy_init();<br /> if (late_time_init)<br /> late_time_init();<br />Index: linux-2.6.11/mm/mempolicy.c<br />===================================================================<br />--- linux-2.6.11.orig/mm/mempolicy.c 2005-03-21 13:18:06.000000000 -0800<br />+++ linux-2.6.11/mm/mempolicy.c 2005-03-21 13:22:06.000000000 -0800<br />&#64;&#64; -721,7 +721,7 &#64;&#64; static struct page *alloc_page_interleav<br /> zl = NODE_DATA(nid)-&gt;node_zonelists + (gfp &amp; GFP_ZONEMASK);<br /> page = __alloc_pages(gfp, order, zl);<br /> if (page &amp;&amp; page_zone(page) == zl-&gt;zones[0]) {<br />- zl-&gt;zones[0]-&gt;pageset[get_cpu()].interleave_hit++;<br />+ zl-&gt;zones[0]-&gt;pageset[get_cpu()]-&gt;interleave_hit++;<br /> put_cpu();<br /> }<br /> return page;<br />Index: linux-2.6.11/mm/page_alloc.c<br />===================================================================<br />--- linux-2.6.11.orig/mm/page_alloc.c 2005-03-21 13:18:06.000000000 -0800<br />+++ linux-2.6.11/mm/page_alloc.c 2005-03-21 13:22:06.000000000 -0800<br />&#64;&#64; -68,6 +68,7 &#64;&#64; EXPORT_SYMBOL(nr_swap_pages);<br /> */<br /> struct zone *zone_table[1 &lt;&lt; (ZONES_SHIFT + NODES_SHIFT)];<br /> EXPORT_SYMBOL(zone_table);<br />+struct per_cpu_pageset pageset_table[MAX_NR_ZONES*MAX_NUMNODES*NR_CPUS] __initdata;<br /><br /> static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };<br /> int min_free_kbytes = 1024;<br />&#64;&#64; -518,7 +519,7 &#64;&#64; static void __drain_pages(unsigned int c<br /> for_each_zone(zone) {<br /> struct per_cpu_pageset *pset;<br /><br />- pset = &amp;zone-&gt;pageset[cpu];<br />+ pset = zone-&gt;pageset[cpu];<br /> for (i = 0; i &lt; ARRAY_SIZE(pset-&gt;pcp); i++) {<br /> struct per_cpu_pages *pcp;<br /><br />&#64;&#64; -581,12 +582,12 &#64;&#64; static void zone_statistics(struct zonel<br /><br /> local_irq_save(flags);<br /> cpu = smp_processor_id();<br />- p = &amp;z-&gt;pageset[cpu];<br />+ p = z-&gt;pageset[cpu];<br /> if (pg == orig) {<br />- z-&gt;pageset[cpu].numa_hit++;<br />+ z-&gt;pageset[cpu]-&gt;numa_hit++;<br /> } else {<br /> p-&gt;numa_miss++;<br />- zonelist-&gt;zones[0]-&gt;pageset[cpu].numa_foreign++;<br />+ zonelist-&gt;zones[0]-&gt;pageset[cpu]-&gt;numa_foreign++;<br /> }<br /> if (pg == NODE_DATA(numa_node_id()))<br /> p-&gt;local_node++;<br />&#64;&#64; -613,7 +614,7 &#64;&#64; static void fastcall free_hot_cold_page(<br /> if (PageAnon(page))<br /> page-&gt;mapping = NULL;<br /> free_pages_check(__FUNCTION__, page);<br />- pcp = &amp;zone-&gt;pageset[get_cpu()].pcp[cold];<br />+ pcp = &amp;zone-&gt;pageset[get_cpu()]-&gt;pcp[cold];<br /> local_irq_save(flags);<br /> if (pcp-&gt;count &gt;= pcp-&gt;high)<br /> pcp-&gt;count -= free_pages_bulk(zone, pcp-&gt;batch, &amp;pcp-&gt;list, 0);<br />&#64;&#64; -657,7 +658,7 &#64;&#64; buffered_rmqueue(struct zone *zone, int<br /> if (order == 0) {<br /> struct per_cpu_pages *pcp;<br /><br />- pcp = &amp;zone-&gt;pageset[get_cpu()].pcp[cold];<br />+ pcp = &amp;zone-&gt;pageset[get_cpu()]-&gt;pcp[cold];<br /> local_irq_save(flags);<br /> if (pcp-&gt;count &lt;= pcp-&gt;low)<br /> pcp-&gt;count += rmqueue_bulk(zone, 0,<br />&#64;&#64; -1228,7 +1229,7 &#64;&#64; void show_free_areas(void)<br /> if (!cpu_possible(cpu))<br /> continue;<br /><br />- pageset = zone-&gt;pageset + cpu;<br />+ pageset = zone-&gt;pageset[cpu];<br /><br /> for (temperature = 0; temperature &lt; 2; temperature++)<br /> printk("cpu %d %s: low %d, high %d, batch %d\n",<br />&#64;&#64; -1612,6 +1613,122 &#64;&#64; void zone_init_free_lists(struct pglist_<br /> memmap_init_zone((size), (nid), (zone), (start_pfn))<br /> #endif<br /><br />+#define MAKE_LIST(list, nlist) \<br />+ do { \<br />+ if(list_empty(&amp;list)) \<br />+ INIT_LIST_HEAD(nlist); \<br />+ else { nlist-&gt;next-&gt;prev = nlist; \<br />+ nlist-&gt;prev-&gt;next = nlist; \<br />+ } \<br />+ }while(0)<br />+<br />+/*<br />+ * Dynamicaly allocate memory for the<br />+ * per cpu pageset array in struct zone.<br />+ */<br />+static inline int __devinit process_zones(int cpu)<br />+{<br />+ struct zone *zone, *dzone;<br />+<br />+ for_each_zone(zone) {<br />+ struct per_cpu_pageset *npageset = NULL;<br />+<br />+ npageset = kmalloc_node(sizeof(struct per_cpu_pageset),<br />+ GFP_KERNEL, cpu_to_node(cpu));<br />+ if(!npageset) {<br />+ zone-&gt;pageset[cpu] = NULL;<br />+ goto bad;<br />+ }<br />+<br />+ if(zone-&gt;pageset[cpu]) {<br />+ memcpy(npageset, zone-&gt;pageset[cpu], sizeof(struct per_cpu_pageset));<br />+ MAKE_LIST(zone-&gt;pageset[cpu]-&gt;pcp[0].list, (&amp;npageset-&gt;pcp[0].list));<br />+ MAKE_LIST(zone-&gt;pageset[cpu]-&gt;pcp[1].list, (&amp;npageset-&gt;pcp[1].list));<br />+ }<br />+ else {<br />+ struct per_cpu_pages *pcp;<br />+ unsigned long batch;<br />+<br />+ batch = zone-&gt;present_pages / 1024;<br />+ if (batch * PAGE_SIZE &gt; 256 * 1024)<br />+ batch = (256 * 1024) / PAGE_SIZE;<br />+ batch /= 4; /* We effectively *= 4 below */<br />+ if (batch &lt; 1)<br />+ batch = 1;<br />+<br />+ pcp = &amp;npageset-&gt;pcp[0]; /* hot */<br />+ pcp-&gt;count = 0;<br />+ pcp-&gt;low = 2 * batch;<br />+ pcp-&gt;high = 6 * batch;<br />+ pcp-&gt;batch = 1 * batch;<br />+ INIT_LIST_HEAD(&amp;pcp-&gt;list);<br />+<br />+ pcp = &amp;npageset-&gt;pcp[1]; /* cold*/<br />+ pcp-&gt;count = 0;<br />+ pcp-&gt;low = 0;<br />+ pcp-&gt;high = 2 * batch;<br />+ pcp-&gt;batch = 1 * batch;<br />+ INIT_LIST_HEAD(&amp;pcp-&gt;list);<br />+ }<br />+ zone-&gt;pageset[cpu] = npageset;<br />+ }<br />+<br />+ return 0;<br />+bad:<br />+ for_each_zone(dzone) {<br />+ if(dzone == zone)<br />+ break;<br />+ kfree(dzone-&gt;pageset[cpu]);<br />+ dzone-&gt;pageset[cpu] = NULL;<br />+ }<br />+ return -ENOBUFS;<br />+}<br />+<br />+static int __devinit pageset_cpuup_callback(struct notifier_block *nfb,<br />+ unsigned long action,<br />+ void *hcpu)<br />+{<br />+ int cpu = (long)hcpu;<br />+<br />+ switch(action) {<br />+ case CPU_UP_PREPARE:<br />+ if(process_zones(cpu))<br />+ goto bad;<br />+ break;<br />+#ifdef CONFIG_HOTPLUG_CPU<br />+ case CPU_DEAD:<br />+ {<br />+ struct zone *zone;<br />+ for_each_zone(zone) {<br />+ struct per_cpu_pageset *pset;<br />+<br />+ pset = zone-&gt;pageset[cpu];<br />+ zone-&gt;pageset[cpu] = NULL;<br />+<br />+ kfree(pset);<br />+ }<br />+ }<br />+ break;<br />+#endif<br />+ default:<br />+ break;<br />+ }<br />+ return NOTIFY_OK;<br />+bad:<br />+ return NOTIFY_BAD;<br />+}<br />+struct notifier_block pageset_notifier = { &amp;pageset_cpuup_callback, NULL, 0 };<br />+<br />+void __init setup_per_cpu_pageset()<br />+{<br />+ /*Iintialize per_cpu_pageset for cpu 0.<br />+ A cpuup callback will do this for every cpu<br />+ as it comes online<br />+ */<br />+ BUG_ON(process_zones(smp_processor_id()));<br />+ register_cpu_notifier(&amp;pageset_notifier);<br />+}<br />+<br /> /*<br /> * Set up the zone data structures:<br /> * - mark all pages reserved<br />&#64;&#64; -1670,15 +1787,17 &#64;&#64; static void __init free_area_init_core(s<br /><br /> for (cpu = 0; cpu &lt; NR_CPUS; cpu++) {<br /> struct per_cpu_pages *pcp;<br />+ struct per_cpu_pageset *pgset = &amp;pageset_table[nid*MAX_NR_ZONES*NR_CPUS + (j * NR_CPUS) + cpu];<br /><br />- pcp = &amp;zone-&gt;pageset[cpu].pcp[0]; /* hot */<br />+ zone-&gt;pageset[cpu] = pgset;<br />+ pcp = &amp;pgset-&gt;pcp[0]; /* hot */<br /> pcp-&gt;count = 0;<br /> pcp-&gt;low = 2 * batch;<br /> pcp-&gt;high = 6 * batch;<br /> pcp-&gt;batch = 1 * batch;<br /> INIT_LIST_HEAD(&amp;pcp-&gt;list);<br /><br />- pcp = &amp;zone-&gt;pageset[cpu].pcp[1]; /* cold */<br />+ pcp = &amp;pgset-&gt;pcp[1]; /* cold */<br /> pcp-&gt;count = 0;<br /> pcp-&gt;low = 0;<br /> pcp-&gt;high = 2 * batch;<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo&#64;vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-04-06 13:31 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10