CINXE.COM
LKML: WU Fengguang: Adaptive read-ahead: benchmarks
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: WU Fengguang: Adaptive read-ahead: benchmarks</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by WU Fengguang" href="/groupie.php?aid=22635" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/9"> [Sep]</a> 聽 <a class="nb" href="/lkml/2005/9/15"> [15]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/9/15/185" onclick="this.href='/lkml/headers'+'/2005/9/15/185';">[headers]</a>聽 <a href="/lkml/bounce/2005/9/15/185">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/9/15/185">First message in thread</a></li><li class="origin"><a href="/lkml/2005/9/15/202">WU Fengguang</a><ul><li><a href="/lkml/2005/9/15/202">Paolo Ornati</a><ul><li><a href="/lkml/2005/9/15/207">WU Fengguang</a></li></ul></li></ul></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Thu, 15 Sep 2005 21:16:51 +0800</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">WU Fengguang <></td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">Adaptive read-ahead: benchmarks</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">I carried out the tests on a DELL PE1750 with 2G memory and a hardware<br />raid5 array built on three 73G SCSI disks. I failed to interpret some of<br />the benchmark results, any comments are greatly appreciated. <br /><br />DEBUG OUTPUT<br />=====================================================================<br /><br /># cp kernel-image-2.6.13_10.00.Custom_i386.deb /dev/null<br />-----------------------------------------------------------------<br />the current logic:<br /><br />[17179814.868000] blockable-readahead(ino=83571, ra=0+1) = 1<br />[17179814.880000] blockable-readahead(ino=83571, ra=1+4) = 4<br />[17179814.880000] blockable-readahead(ino=83571, ra=5+16) = 16<br />[17179814.892000] blockable-readahead(ino=83571, ra=21+32) = 32<br />[17179814.896000] blockable-readahead(ino=83571, ra=53+64) = 64<br />[17179814.924000] blockable-readahead(ino=83571, ra=117+128) = 128<br />[17179814.924000] blockable-readahead(ino=83571, ra=245+256) = 256<br />[17179815.024000] blockable-readahead(ino=83571, ra=501+256) = 256<br />[17179815.024000] blockable-readahead(ino=83571, ra=757+256) = 256<br />[17179815.076000] blockable-readahead(ino=83571, ra=1013+256) = 256<br />[17179815.100000] blockable-readahead(ino=83571, ra=1269+256) = 256<br />[17179815.184000] blockable-readahead(ino=83571, ra=1525+256) = 256<br />[17179815.188000] blockable-readahead(ino=83571, ra=1781+256) = 256<br />[17179815.236000] blockable-readahead(ino=83571, ra=2037+256) = 256<br />[17179815.272000] blockable-readahead(ino=83571, ra=2293+256) = 256<br />[17179815.328000] blockable-readahead(ino=83571, ra=2549+256) = 256<br />[17179815.332000] blockable-readahead(ino=83571, ra=2805+256) = 256<br />[17179815.368000] blockable-readahead(ino=83571, ra=3061+256) = 256<br />[17179815.396000] blockable-readahead(ino=83571, ra=3317+256) = 251<br />[17179815.424000] blockable-readahead(ino=83571, ra=3573+256) = 0<br /><br />the new logic with readahead-ratio set to 101:<br /><br />[17179673.064000] readahead(ino=83571, index=0-0-1, ra=0+32) = 32<br />[17179673.084000] readahead(ino=83571, index=32-32-33, ra=32+32) = 32<br />[17179673.100000] readahead(ino=83571, index=64-64-65, ra=64+64) = 64<br />[17179673.116000] readahead(ino=83571, index=128-128-129, ra=128+129) = 129<br />[17179673.144000] readahead(ino=83571, index=257-257-258, ra=257+256) = 256<br />[17179673.212000] readahead(ino=83571, index=513-513-514, ra=513+256) = 256<br />[17179673.248000] readahead(ino=83571, index=769-769-770, ra=769+256) = 256<br />[17179673.288000] readahead(ino=83571, index=1025-1025-1026, ra=1025+256) = 256<br />[17179673.332000] readahead(ino=83571, index=1281-1281-1282, ra=1281+256) = 256<br />[17179673.408000] readahead(ino=83571, index=1537-1537-1538, ra=1537+256) = 256<br />[17179673.440000] readahead(ino=83571, index=1793-1793-1794, ra=1793+256) = 256<br />[17179673.476000] readahead(ino=83571, index=2049-2049-2050, ra=2049+256) = 256<br />[17179673.512000] readahead(ino=83571, index=2305-2305-2306, ra=2305+256) = 256<br />[17179673.576000] readahead(ino=83571, index=2561-2561-2562, ra=2561+256) = 256<br />[17179673.604000] readahead(ino=83571, index=2817-2817-2818, ra=2817+256) = 256<br />[17179673.652000] readahead(ino=83571, index=3073-3073-3074, ra=3073+256) = 256<br />[17179673.704000] readahead(ino=83571, index=3329-3329-3330, ra=3329+256) = 239<br /><br />OPROFILE STATISTICS<br />=====================================================================<br /># cat oprofile.sh <br />#!/bin/sh<br /><br />oprofile() {<br /> opcontrol --vmlinux=/temp/kernel/linux-2.6.13ra/vmlinux<br /> opcontrol --start<br /> opcontrol --reset<br /> echo $1 > /proc/sys/vm/readahead_ratio<br /> dd if=/temp/kernel/hugefile of=/dev/null bs=$bs<br /> opreport -l -o oprofile.$1.$bs /temp/kernel/linux-2.6.13ra/vmlinux<br /> opcontrol --stop<br />}<br /><br />bs=$1<br />shift<br /><br />while test -n "$1"<br />do <br /> oprofile $1<br /> shift<br />done<br /><br /># ./oprofile.sh 4k 0 100<br />3932946432 bytes transferred in 80.383702 seconds (48927162 bytes/sec)<br />3932946432 bytes transferred in 109.835723 seconds (35807534 bytes/sec)<br /><br /># head -n100 oprofile.0.4k<br />CPU: P4 / Xeon with 2 hyper-threads, speed 2388.38 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />34745 14.0697 __copy_to_user_ll<br />11894 4.8164 ll_rw_block<br />10362 4.1960 system_call<br />8669 3.5104 shrink_list<br />7761 3.1428 radix_tree_delete<br />7627 3.0885 __find_get_block<br />6901 2.7945 delay_pmtmr<br />6474 2.6216 add_to_page_cache<br />5350 2.1664 dnotify_parent<br />4868 1.9713 isolate_lru_pages<br />4768 1.9308 inotify_dentry_parent_queue_event<br />4453 1.8032 do_generic_mapping_read<br />4421 1.7902 do_mpage_readpage<br />4108 1.6635 find_get_page<br />4047 1.6388 default_idle<br />4000 1.6198 unlock_buffer<br />3914 1.5849 __brelse<br />3850 1.5590 zone_watermark_ok<br />3777 1.5295 __wake_up_bit<br />3615 1.4639 __do_page_cache_readahead<br />3211 1.3003 unlock_page<br />3095 1.2533 kmap_atomic<br />3059 1.2387 bad_range<br />2894 1.1719 __mod_page_state<br />2665 1.0792 free_pages_bulk<br />2530 1.0245 __alloc_pages<br />2385 0.9658 __pagevec_lru_add<br />2273 0.9204 mpage_end_io_read<br />2225 0.9010 free_hot_cold_page<br />2092 0.8471 buffered_rmqueue<br />1811 0.7333 page_waitqueue<br />1808 0.7321 radix_tree_lookup<br />1780 0.7208 __rmqueue<br />1702 0.6892 find_busiest_group<br />1672 0.6771 page_referenced<br />1648 0.6673 mark_page_accessed<br />1594 0.6455 __read_lock_failed<br />1560 0.6317 wakeup_kswapd<br />1557 0.6305 __pagevec_release_nonlru<br />1464 0.5928 schedule<br />1449 0.5868 mark_offset_pmtmr<br />1439 0.5827 radix_tree_insert<br />1367 0.5536 blk_recount_segments<br />1306 0.5289 release_pages<br />1162 0.4705 current_fs_time<br />1159 0.4693 apic_timer_interrupt<br />1098 0.4446 mempool_free<br />1061 0.4296 vfs_read<br />1045 0.4232 kmem_cache_alloc<br />1031 0.4175 __write_lock_failed<br />1019 0.4126 timer_interrupt<br />931 0.3770 vfs_write<br />925 0.3746 bit_waitqueue<br />908 0.3677 mpage_readpages<br />895 0.3624 find_next_bit<br />895 0.3624 kmem_cache_free<br />892 0.3612 __bio_add_page<br />883 0.3576 irq_entries_start<br />818 0.3312 sys_write<br />794 0.3215 sys_read<br />788 0.3191 restore_nocheck<br />783 0.3171 __remove_from_page_cache<br />762 0.3086 __do_IRQ<br />750 0.3037 __generic_file_aio_read<br />749 0.3033 page_address<br />718 0.2907 __getblk<br />694 0.2810 shrink_cache<br />686 0.2778 blk_rq_map_sg<br />654 0.2648 as_completed_request<br />613 0.2482 shrink_slab<br />606 0.2454 scheduler_tick<br />602 0.2438 file_read_actor<br />589 0.2385 rmqueue_bulk<br />583 0.2361 generic_file_read<br />577 0.2337 load_balance<br />567 0.2296 __make_request<br />563 0.2280 fget_light<br />558 0.2260 rw_verify_area<br />553 0.2239 run_timer_softirq<br />513 0.2077 radix_tree_preload<br />513 0.2077 smp_apic_timer_interrupt<br />495 0.2004 cond_resched<br />493 0.1996 bio_alloc_bioset<br />462 0.1871 rebalance_tick<br />417 0.1689 update_atime<br />406 0.1644 generic_make_request<br />389 0.1575 __do_softirq<br />382 0.1547 cpu_idle<br />381 0.1543 update_process_times<br />377 0.1527 prep_new_page<br />373 0.1510 __end_that_request_first<br />371 0.1502 balance_pgdat<br />352 0.1425 as_update_iohist<br />349 0.1413 tasklet_action<br />345 0.1397 account_system_time<br />333 0.1348 __read_page_state<br />333 0.1348 sched_clock<br /># head -n100 oprofile.100.4k<br />CPU: P4 / Xeon with 2 hyper-threads, speed 2388.38 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />36120 13.7910 __copy_to_user_ll<br />12537 4.7868 ll_rw_block<br />10416 3.9770 system_call<br />8790 3.3561 shrink_list<br />8147 3.1106 radix_tree_delete<br />7833 2.9907 __find_get_block<br />7731 2.9518 delay_pmtmr<br />6651 2.5394 add_to_page_cache<br />5346 2.0412 dnotify_parent<br />5174 1.9755 default_idle<br />5174 1.9755 isolate_lru_pages<br />4966 1.8961 inotify_dentry_parent_queue_event<br />4626 1.7663 do_generic_mapping_read<br />4418 1.6868 do_mpage_readpage<br />4054 1.5479 __brelse<br />4033 1.5398 __wake_up_bit<br />4012 1.5318 unlock_buffer<br />3906 1.4914 find_get_page<br />3859 1.4734 zone_watermark_ok<br />3712 1.4173 __do_page_cache_readahead<br />3418 1.3050 unlock_page<br />3194 1.2195 bad_range<br />3109 1.1871 kmap_atomic<br />3029 1.1565 __mod_page_state<br />2603 0.9939 __alloc_pages<br />2594 0.9904 free_pages_bulk<br />2356 0.8995 __pagevec_lru_add<br />2309 0.8816 find_busiest_group<br />2227 0.8503 free_hot_cold_page<br />2126 0.8117 schedule<br />2078 0.7934 mpage_end_io_read<br />2030 0.7751 buffered_rmqueue<br />2015 0.7694 mark_offset_pmtmr<br />1943 0.7419 radix_tree_lookup<br />1904 0.7270 page_waitqueue<br />1898 0.7247 __rmqueue<br />1774 0.6773 page_referenced<br />1720 0.6567 mark_page_accessed<br />1672 0.6384 __read_lock_failed<br />1646 0.6285 __pagevec_release_nonlru<br />1614 0.6162 wakeup_kswapd<br />1511 0.5769 apic_timer_interrupt<br />1471 0.5616 radix_tree_insert<br />1452 0.5544 blk_recount_segments<br />1437 0.5487 release_pages<br />1320 0.5040 timer_interrupt<br />1247 0.4761 current_fs_time<br />1149 0.4387 vfs_read<br />1133 0.4326 mempool_free<br />1129 0.4311 find_next_bit<br />1105 0.4219 kmem_cache_alloc<br />1048 0.4001 __write_lock_failed<br />1023 0.3906 irq_entries_start<br />929 0.3547 vfs_write<br />902 0.3444 bit_waitqueue<br />887 0.3387 __generic_file_aio_read<br />878 0.3352 kmem_cache_free<br />865 0.3303 __bio_add_page<br />861 0.3287 load_balance<br />853 0.3257 __do_IRQ<br />843 0.3219 mpage_readpages<br />840 0.3207 page_address<br />839 0.3203 __remove_from_page_cache<br />822 0.3138 restore_nocheck<br />810 0.3093 sys_read<br />787 0.3005 sys_write<br />776 0.2963 __getblk<br />749 0.2860 scheduler_tick<br />726 0.2772 shrink_cache<br />707 0.2699 run_timer_softirq<br />656 0.2505 blk_rq_map_sg<br />651 0.2486 as_completed_request<br />651 0.2486 file_read_actor<br />642 0.2451 smp_apic_timer_interrupt<br />610 0.2329 rw_verify_area<br />604 0.2306 shrink_slab<br />600 0.2291 rebalance_tick<br />598 0.2283 rmqueue_bulk<br />590 0.2253 generic_file_read<br />570 0.2176 fget_light<br />558 0.2131 __make_request<br />524 0.2001 update_process_times<br />523 0.1997 bio_alloc_bioset<br />522 0.1993 cond_resched<br />506 0.1932 radix_tree_preload<br />502 0.1917 __do_softirq<br />482 0.1840 cpu_idle<br />454 0.1733 sched_clock<br />437 0.1669 update_atime<br />430 0.1642 account_system_time<br />425 0.1623 generic_make_request<br />412 0.1573 __switch_to<br />399 0.1523 tasklet_action<br />396 0.1512 rt_run_flush<br />395 0.1508 try_to_wake_up<br />390 0.1489 __read_page_state<br />369 0.1409 balance_pgdat <br />additional two lines:<br />324 0.1237 page_cache_readahead_adaptive<br />46 0.0176 count_sequential_pages<br /><br />Summary:<br />1. The real time of dd is 80.383702 to 109.835723, or 1:1.37.<br />The current logic has an 'ahead window' and always issues readaheads<br />ahead of time, which effectively prevents reads to stall from time to<br />time. The new logic simply issues readaheads at times of immediate need.<br />So the difference is quite reasonable.<br /><br />2. The new logic shows 5% or so overhead, though the direct overhead of<br />the two functions is only 2%.<br />Where is the remaining 3%? Locking contentions somewhere?<br /><br /><br />SEQUENTIAL READ PATTERNS<br />=====================================================================<br /># cat test_ra.sh<br />#!/bin/zsh<br /><br />blockdev --setra 256 /dev/sda <br />echo deadline > /sys/block/sda/queue/scheduler<br /><br />TIMEFMT="%E real %S system %U user %w+%c cs %J"<br />FILE=hugefile<br /><br />bs=$1<br />shift<br /><br />while test -n "$1"<br />do<br /> echo $1 > /proc/sys/vm/readahead_ratio<br /><br /> cp $FILE /dev/null<br /> echo "readahead_ratio = $1"<br /><br /> time dd if=$FILE of=/dev/null bs=$bs 2>/dev/null<br /> time grep -r 'asdfghjkl;' linux-2.6.13 &<br /> time dd if=$FILE of=/dev/null bs=$bs 2>/dev/null &<br /> time diff -r linux-2.6.13 linux-2.6.13ra >/dev/null &<br /><br /> sleep 20<br /> while {pidof dd diff grep > /dev/null};<br /> do<br /> sleep 3<br /> done<br /><br /> shift<br />done<br /><br /># ./test_ra.sh 4k 0 100<br />readahead_ratio = 0<br />79.75s real 8.88s system 0.43s user 13927+10 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />142.75s real 1.37s system 24.16s user 13724+42 cs grep -r 'asdfghjkl;' linux-2.6.13<br />167.09s real 9.47s system 0.50s user 25065+30 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />173.99s real 3.70s system 1.48s user 31148+10 cs diff -r linux-2.6.13 linux-2.6.13ra > /dev/null<br />readahead_ratio = 100<br />107.51s real 8.58s system 0.48s user 28065+8 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />149.86s real 1.60s system 23.57s user 14372+42 cs grep -r 'asdfghjkl;' linux-2.6.13<br />178.63s real 9.52s system 0.51s user 32047+34 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />184.56s real 3.87s system 1.34s user 32399+6 cs diff -r linux-2.6.13 linux-2.6.13ra > /dev/null<br /><br />Summary:<br />1. The blockdev command takes no effect at all -- the readahead-max is<br />still 1M.<br /><br />2. In single stream, the new logic seems to get lower system time,<br />doesn't it conflict with that of oprofile?<br />And context switches are more than doubled, but why?<br /><br />3. In parallel streams, the new logic gets more system time and lower<br />user time, context switches are roughly the same.<br />The 'ahead window' magic should work only for large files, though the<br />new logic losed again in small files. Maybe it's the 3 radix tree<br />lookups that show up in small reads?<br /><br />To find things out, I add a call to lock_page() at the end of<br />__do_page_cache_readahead(), and the code can be enabled with some<br />readahead_ratio values(such as 102).<br />The command below runs the test with the page lock:<br /><br /># ./test_ra.sh 4k 102<br />readahead_ratio = 102<br />108.66s real 9.43s system 0.48s user 10235+132 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />192.52s real 1.87s system 23.86s user 19073+321 cs grep -r 'asdfghjkl;' linux-2.6.13<br />201.40s real 9.99s system 0.46s user 17490+141 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />209.59s real 4.37s system 1.55s user 36987+68 cs diff -r linux-2.6.13 linux-2.6.13ra > /dev/null<br /><br />It shows that context switches are effectively cut down in dd, though<br />system times are even higher.<br /><br />MORE DATA<br />=====================================================================<br />Some more tests with 32k block size:<br /><br /># ./test_ra.sh 32k 0 100 102<br />readahead_ratio = 0<br />80.05s real 7.98s system 0.07s user 14044+117 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />148.02s real 1.38s system 24.45s user 13532+310 cs grep -r 'asdfghjkl;' linux-2.6.13<br />178.84s real 8.66s system 0.08s user 26165+130 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />180.80s real 4.01s system 1.54s user 32994+67 cs diff -r linux-2.6.13 linux-2.6.13ra > /dev/null<br />readahead_ratio = 100<br />108.13s real 7.92s system 0.05s user 28877+97 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />157.50s real 1.50s system 22.35s user 14842+291 cs grep -r 'asdfghjkl;' linux-2.6.13<br />185.94s real 8.81s system 0.09s user 34166+117 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />190.27s real 3.90s system 1.60s user 35728+66 cs diff -r linux-2.6.13 linux-2.6.13ra > /dev/null<br />readahead_ratio = 102<br />112.04s real 9.22s system 0.48s user 10483+136 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />190.23s real 1.74s system 21.76s user 19097+279 cs grep -r 'asdfghjkl;' linux-2.6.13<br />199.14s real 10.10s system 0.56s user 17904+179 cs dd if=$FILE of=/dev/null bs=$bs 2> /dev/null<br />207.10s real 4.20s system 1.43s user 38820+63 cs diff -r linux-2.6.13 linux-2.6.13ra > /dev/null<br /><br />More oprofile results:<br /># head -n20 oprofile.0.4k<br />PU: P4 / Xeon with 2 hyper-threads, speed 2388.41 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />34763 14.0479 __copy_to_user_ll<br />15171 6.1307 ll_rw_block<br />10795 4.3623 system_call<br />10254 4.1437 shrink_list<br />9199 3.7174 __find_get_block<br />7818 3.1593 delay_pmtmr<br />6480 2.6186 add_to_page_cache<br />5541 2.2391 radix_tree_delete<br />5478 2.2137 isolate_lru_pages<br />5474 2.2121 dnotify_parent<br />5141 2.0775 __brelse<br />5014 2.0262 inotify_dentry_parent_queue_event<br />4838 1.9551 unlock_buffer<br />4790 1.9357 find_get_page<br />4618 1.8662 do_mpage_readpage<br />4390 1.7740 do_generic_mapping_read<br />3989 1.6120 default_idle<br /># head -n20 oprofile.102.4k <br />CPU: P4 / Xeon with 2 hyper-threads, speed 2388.41 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />34954 14.0548 __copy_to_user_ll<br />14888 5.9864 ll_rw_block<br />10895 4.3808 shrink_list<br />10398 4.1810 system_call<br />9109 3.6627 __find_get_block<br />8003 3.2180 delay_pmtmr<br />6746 2.7125 add_to_page_cache<br />5752 2.3128 radix_tree_delete<br />5644 2.2694 isolate_lru_pages<br />5354 2.1528 dnotify_parent<br />5004 2.0121 __brelse<br />4952 1.9912 default_idle<br />4900 1.9703 find_get_page<br />4845 1.9481 inotify_dentry_parent_queue_event<br />4711 1.8943 unlock_buffer<br />4404 1.7708 do_generic_mapping_read<br />4135 1.6627 do_mpage_readpage<br /><br /># head -n20 oprofile.0.32k<br />CPU: P4 / Xeon with 2 hyper-threads, speed 2388.41 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />36906 16.4281 __copy_to_user_ll<br />15199 6.7656 ll_rw_block<br />10248 4.5617 shrink_list<br />9276 4.1291 __find_get_block<br />7811 3.4769 delay_pmtmr<br />6706 2.9851 add_to_page_cache<br />6028 2.6833 isolate_lru_pages<br />5855 2.6063 radix_tree_delete<br />5577 2.4825 find_get_page<br />4959 2.2074 __brelse<br />4931 2.1950 unlock_buffer<br />4189 1.8647 do_mpage_readpage<br />4025 1.7917 default_idle<br />3869 1.7222 mpage_end_io_read<br />3809 1.6955 __do_page_cache_readahead<br />3069 1.3661 kmap_atomic<br />2814 1.2526 __read_lock_failed<br /># head -n20 oprofile.100.32k<br />CPU: P4 / Xeon with 2 hyper-threads, speed 2388.41 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />37118 15.8645 __copy_to_user_ll<br />15602 6.6684 ll_rw_block<br />10664 4.5579 shrink_list<br />9487 4.0548 __find_get_block<br />8366 3.5757 delay_pmtmr<br />6902 2.9500 add_to_page_cache<br />6192 2.6465 isolate_lru_pages<br />5989 2.5597 radix_tree_delete<br />5164 2.2071 __brelse<br />5060 2.1627 unlock_buffer<br />4962 2.1208 default_idle<br />4668 1.9951 find_get_page<br />4381 1.8725 do_mpage_readpage<br />3957 1.6912 __do_page_cache_readahead<br />3223 1.3775 kmap_atomic<br />3185 1.3613 mpage_end_io_read<br />3080 1.3164 __wake_up_bit<br /># head -n20 oprofile.102.32k<br />CPU: P4 / Xeon with 2 hyper-threads, speed 2388.41 MHz (estimated)<br />Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000<br />samples % symbol name<br />34634 15.8598 __copy_to_user_ll<br />14628 6.6985 ll_rw_block<br />10239 4.6887 shrink_list<br />8721 3.9936 __find_get_block<br />7880 3.6085 delay_pmtmr<br />6424 2.9417 add_to_page_cache<br />5654 2.5891 isolate_lru_pages<br />5418 2.4810 radix_tree_delete<br />5104 2.3373 find_get_page<br />4860 2.2255 default_idle<br />4821 2.2077 __brelse<br />4765 2.1820 unlock_buffer<br />4008 1.8354 do_mpage_readpage<br />3322 1.5212 __do_page_cache_readahead<br />3114 1.4260 __read_lock_failed<br />2959 1.3550 kmap_atomic<br />2713 1.2424 __wake_up_bit<br /><br /><br /><br />LARGE READ SIZE<br />=====================================================================<br />There is an early (broken) version that issues the next readahead when<br />only 1/4 of the previous readahead pages are left. This behaves like the<br />current logic, and the results are exciting. But I think it is not sane<br />to search through the page cache too much.<br /><br />The default kernel on my PC with 128K readahead-max:<br />====================================================================<br />readahead_ratio = 0 <br />22387+1 records in<br />22387+1 records out <br />733589504 bytes transferred in 25.293262 seconds (29003357 bytes/sec)<br /> 0.01s user 1.56s system 6% cpu 25.310 total<br />22326+1 records in<br />22326+1 records out<br />731596800 bytes transferred in 103.776015 seconds (7049768 bytes/sec)<br /> 0.01s user 1.54s system 1% cpu 1:43.85 total<br /> 0.38s user 1.50s system 1% cpu 2:09.72 total<br />--------------------------------------------------------------------<br /><br />An early version of the new logic with 2M readahead-max:<br />====================================================================<br />readahead_ratio = 90<br />22387+1 records in<br />22387+1 records out<br />733589504 bytes transferred in 21.265200 seconds (34497183 bytes/sec)<br /> 0.01s user 1.56s system 7% cpu 21.334 total<br />22326+1 records in<br />22326+1 records out<br />731596800 bytes transferred in 42.916377 seconds (17047031 bytes/sec)<br /> 0.02s user 1.51s system 3% cpu 43.126 total<br /> 0.37s user 1.44s system 2% cpu 1:08.89 total<br />--------------------------------------------------------------------<br /><br />It effectively shows the advantage of large readahead sizes ;)<br />-- <br />WU Fengguang<br />Dept. of Automation<br />University of Science and Technology of China<br />Hefei, Anhui<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo@vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-09-15 15:16 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>