CINXE.COM
LKML: Christoph Lameter: [PATCH 2/5] Direct Migration V6: migrate_pages() extension
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Christoph Lameter: [PATCH 2/5] Direct Migration V6: migrate_pages() extension</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Christoph Lameter" href="/groupie.php?aid=843" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/12"> [Dec]</a> 聽 <a class="nb" href="/lkml/2005/12/5"> [5]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/12/5/201" onclick="this.href='/lkml/headers'+'/2005/12/5/201';">[headers]</a>聽 <a href="/lkml/bounce/2005/12/5/201">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/12/5/199">First message in thread</a></li><li><a href="/lkml/2005/12/5/199">Christoph Lameter</a><ul><li><a href="/lkml/2005/12/5/200">Christoph Lameter</a></li><li class="origin"><a href="">Christoph Lameter</a></li><li><a href="/lkml/2005/12/5/202">Christoph Lameter</a></li><li><a href="/lkml/2005/12/5/203">Christoph Lameter</a></li><li><a href="/lkml/2005/12/5/204">Christoph Lameter</a></li></ul></li></ul><div class="threadlist">Patch in this message</div><ul class="threadlist"><li><a href="/lkml/diff/2005/12/5/201/1">Get diff 1</a></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Mon, 5 Dec 2005 11:50:46 -0800 (PST)</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Christoph Lameter <></td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">[PATCH 2/5] Direct Migration V6: migrate_pages() extension</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">Add direct migration support with fall back to swap.<br /><br />Direct migration support on top of the swap based page migration facility.<br /><br />This allows the direct migration of anonymous pages and the migration of<br />file backed pages by dropping the associated buffers (requires writeout).<br /><br />Fall back to swap out if necessary.<br /><br />The patch is based on lots of patches from the hotplug project but the code<br />was restructured, documented and simplified as much as possible.<br /><br />Note that an additional patch that defines the migrate_page() method<br />for filesystems is necessary in order to avoid writeback for anonymous<br />and file backed pages.<br /><br />V6->V7:<br /> - Patch against 2.6.15-rc5-mm1.<br /> - Replace one occurrence of page->mapping with page_mapping(page) in<br /> migrate_pages_remove_references()<br /><br />V4->V5:<br /> - Patch against 2.6.15-rc2-mm1 + double unlock fix + consolidation patch<br /><br />V3->V4:<br />- Remove components already in the swap migration patch<br /><br />V1->V2:<br />- Change migrate_pages() so that it can return pagelist for failed and<br /> moved pages. No longer free the old pages but allow caller to dispose<br /> of them.<br />- Unmap pages before changing reverse map under tree lock. Take<br /> a write_lock instead of a read_lock.<br />- Add documentation<br /><br />Signed-off-by: Mike Kravetz <kravetz@us.ibm.com><br />Signed-off-by: Christoph Lameter <clameter@sgi.com><br /><br />Index: linux-2.6.15-rc5/include/linux/swap.h<br />===================================================================<br />--- linux-2.6.15-rc5.orig/include/linux/swap.h 2005-12-05 11:15:23.000000000 -0800<br />+++ linux-2.6.15-rc5/include/linux/swap.h 2005-12-05 11:32:11.000000000 -0800<br />@@ -178,6 +178,9 @@ extern int vm_swappiness;<br /> #ifdef CONFIG_MIGRATION<br /> extern int isolate_lru_page(struct page *p);<br /> extern int putback_lru_pages(struct list_head *l);<br />+extern int migrate_page(struct page *, struct page *);<br />+extern int migrate_page_remove_references(struct page *, struct page *, int);<br />+extern void migrate_page_copy(struct page *, struct page *);<br /> extern int migrate_pages(struct list_head *l, struct list_head *t,<br /> struct list_head *moved, struct list_head *failed);<br /> #endif<br />Index: linux-2.6.15-rc5/Documentation/vm/page_migration<br />===================================================================<br />--- /dev/null 1970-01-01 00:00:00.000000000 +0000<br />+++ linux-2.6.15-rc5/Documentation/vm/page_migration 2005-12-05 11:32:11.000000000 -0800<br />@@ -0,0 +1,95 @@<br />+Page migration<br />+--------------<br />+<br />+Page migration occurs in several steps. First a high level<br />+description for those trying to use migrate_pages() and then<br />+a low level description of how the low level details work.<br />+<br />+<br />+A. Use of migrate_pages()<br />+-------------------------<br />+<br />+1. Remove pages from the LRU.<br />+<br />+ Lists of pages to be migrated are generated by scanning over<br />+ pages and moving them into lists. This is done by<br />+ calling isolate_lru_page() or __isolate_lru_page().<br />+ Calling isolate_lru_page increases the references to the page<br />+ so that it cannot vanish under us.<br />+<br />+2. Generate a list of newly allocates page to move the contents<br />+ of the first list to.<br />+<br />+3. The migrate_pages() function is called which attempts<br />+ to do the migration. It returns the moved pages in the<br />+ list specified as the third parameter and the failed<br />+ migrations in the fourth parameter. The first parameter<br />+ will contain the pages that could still be retried.<br />+<br />+4. The leftover pages of various types are returned<br />+ to the LRU using putback_to_lru_pages() or otherwise<br />+ disposed of. The pages will still have the refcount as<br />+ increased by isolate_lru_pages()!<br />+<br />+B. Operation of migrate_pages()<br />+--------------------------------<br />+<br />+migrate_pages does several passes over its list of pages. A page is moved<br />+if all references to a page are removable at the time.<br />+<br />+Steps:<br />+<br />+1. Lock the page to be migrated<br />+<br />+2. Insure that writeback is complete.<br />+<br />+3. Make sure that the page has assigned swap cache entry if<br />+ it is an anonyous page. The swap cache reference is necessary<br />+ to preserve the information contain in the page table maps.<br />+<br />+4. Prep the new page that we want to move to. It is locked<br />+ and set to not being uptodate so that all accesses to the new<br />+ page immediately lock while we are moving references.<br />+<br />+5. All the page table references to the page are either dropped (file backed)<br />+ or converted to swap references (anonymous pages). This should decrease the<br />+ reference count.<br />+<br />+6. The radix tree lock is taken<br />+<br />+7. The refcount of the page is examined and we back out if references remain<br />+ otherwise we know that we are the only one referencing this page.<br />+<br />+8. The radix tree is checked and if it does not contain the pointer to this<br />+ page then we back out.<br />+<br />+9. The mapping is checked. If the mapping is gone then a truncate action may<br />+ be in progress and we back out.<br />+<br />+10. The new page is prepped with some settings from the old page so that accesses<br />+ to the new page will be discovererd to have the correct settings.<br />+<br />+11. The radix tree is changed to point to the new page.<br />+<br />+12. The reference count of the old page is dropped because the reference has now<br />+ been removed.<br />+<br />+13. The radix tree lock is dropped.<br />+<br />+14. The page contents are copied to the new page.<br />+<br />+15. The remaining page flags are copied to the new page.<br />+<br />+16. The old page flags are cleared to indicate that the page does<br />+ not use any information anymore.<br />+<br />+17. Queued up writeback on the new page is triggered.<br />+<br />+18. The locks are dropped from the old and new page.<br />+<br />+19. The swapcache reference is removed from the new page.<br />+<br />+20. The new page is moved to the LRU.<br />+<br />+Christoph Lameter, November 29, 2005.<br />+<br />Index: linux-2.6.15-rc5/mm/vmscan.c<br />===================================================================<br />--- linux-2.6.15-rc5.orig/mm/vmscan.c 2005-12-05 11:15:24.000000000 -0800<br />+++ linux-2.6.15-rc5/mm/vmscan.c 2005-12-05 11:32:11.000000000 -0800<br />@@ -659,6 +659,164 @@ retry:<br /> return -EAGAIN;<br /> }<br /> /*<br />+ * Page migration was first developed in the context of the memory hotplug<br />+ * project. The main authors of the migration code are:<br />+ *<br />+ * IWAMOTO Toshihiro <iwamoto@valinux.co.jp><br />+ * Hirokazu Takahashi <taka@valinux.co.jp><br />+ * Dave Hansen <haveblue@us.ibm.com><br />+ * Christoph Lameter <clameter@sgi.com><br />+ */<br />+<br />+/*<br />+ * Remove references for a page and establish the new page with the correct<br />+ * basic settings to be able to stop accesses to the page.<br />+ */<br />+int migrate_page_remove_references(struct page *newpage, struct page *page, int nr_refs)<br />+{<br />+ struct address_space *mapping = page_mapping(page);<br />+ struct page **radix_pointer;<br />+ int i;<br />+<br />+ /*<br />+ * Avoid doing any of the following work if the page count<br />+ * indicates that the page is in use or truncate has removed<br />+ * the page.<br />+ */<br />+ if (!mapping || page_mapcount(page) + nr_refs != page_count(page))<br />+ return 1;<br />+<br />+ /*<br />+ * Establish swap ptes for anonymous pages or destroy pte<br />+ * maps for files.<br />+ *<br />+ * In order to reestablish file backed mappings the fault handlers<br />+ * will take the radix tree_lock which may then be used to stop<br />+ * processses from accessing this page until the new page is ready.<br />+ *<br />+ * A process accessing via a swap pte (an anonymous page) will take a<br />+ * page_lock on the old page which will block the process until the<br />+ * migration attempt is complete. At that time the PageSwapCache bit<br />+ * will be examined. If the page was migrated then the PageSwapCache<br />+ * bit will be clear and the operation to retrieve the page will be<br />+ * retried which will find the new page in the radix tree. Then a new<br />+ * direct mapping may be generated based on the radix tree contents.<br />+ *<br />+ * If the page was not migrated then the PageSwapCache bit<br />+ * is still set and the operation may continue.<br />+ */<br />+ for(i = 0; i < 10 && page_mapped(page); i++) {<br />+ int rc = try_to_unmap(page);<br />+<br />+ if (rc == SWAP_SUCCESS)<br />+ break;<br />+ /*<br />+ * If there are other runnable processes then running<br />+ * them may make it possible to unmap the page<br />+ */<br />+ schedule();<br />+ }<br />+<br />+ /*<br />+ * Give up if we were unable to remove all mappings.<br />+ */<br />+ if (page_mapcount(page))<br />+ return 1;<br />+<br />+ write_lock_irq(&mapping->tree_lock);<br />+<br />+ radix_pointer = (struct page **)radix_tree_lookup_slot(<br />+ &mapping->page_tree,<br />+ page_index(page));<br />+<br />+ if (!page_mapping(page) ||<br />+ page_count(page) != nr_refs ||<br />+ *radix_pointer != page) {<br />+ write_unlock_irq(&mapping->tree_lock);<br />+ return 1;<br />+ }<br />+<br />+ /*<br />+ * Now we know that no one else is looking at the page.<br />+ *<br />+ * Certain minimal information about a page must be available<br />+ * in order for other subsystems to properly handle the page if they<br />+ * find it through the radix tree update before we are finished<br />+ * copying the page.<br />+ */<br />+ get_page(newpage);<br />+ newpage->index = page_index(page);<br />+ if (PageSwapCache(page)) {<br />+ SetPageSwapCache(newpage);<br />+ set_page_private(newpage, page_private(page));<br />+ } else<br />+ newpage->mapping = page->mapping;<br />+<br />+ *radix_pointer = newpage;<br />+ __put_page(page);<br />+ write_unlock_irq(&mapping->tree_lock);<br />+<br />+ return 0;<br />+}<br />+<br />+/*<br />+ * Copy the page to its new location<br />+ */<br />+void migrate_page_copy(struct page *newpage, struct page *page)<br />+{<br />+ copy_highpage(newpage, page);<br />+<br />+ if (PageError(page))<br />+ SetPageError(newpage);<br />+ if (PageReferenced(page))<br />+ SetPageReferenced(newpage);<br />+ if (PageUptodate(page))<br />+ SetPageUptodate(newpage);<br />+ if (PageActive(page))<br />+ SetPageActive(newpage);<br />+ if (PageChecked(page))<br />+ SetPageChecked(newpage);<br />+ if (PageMappedToDisk(page))<br />+ SetPageMappedToDisk(newpage);<br />+<br />+ if (PageDirty(page)) {<br />+ clear_page_dirty_for_io(page);<br />+ set_page_dirty(newpage);<br />+ }<br />+<br />+ ClearPageSwapCache(page);<br />+ ClearPageActive(page);<br />+ ClearPagePrivate(page);<br />+ set_page_private(page, 0);<br />+ page->mapping = NULL;<br />+<br />+ /*<br />+ * If any waiters have accumulated on the new page then<br />+ * wake them up.<br />+ */<br />+ if (PageWriteback(newpage))<br />+ end_page_writeback(newpage);<br />+}<br />+<br />+/*<br />+ * Common logic to directly migrate a single page suitable for<br />+ * pages that do not use PagePrivate.<br />+ *<br />+ * Pages are locked upon entry and exit.<br />+ */<br />+int migrate_page(struct page *newpage, struct page *page)<br />+{<br />+ BUG_ON(PageWriteback(page)); /* Writeback must be complete */<br />+<br />+ if (migrate_page_remove_references(newpage, page, 2))<br />+ return -EAGAIN;<br />+<br />+ migrate_page_copy(newpage, page);<br />+<br />+ return 0;<br />+}<br />+<br />+/*<br /> * migrate_pages<br /> *<br /> * Two lists are passed to this function. The first list<br />@@ -671,11 +829,6 @@ retry:<br /> * are movable anymore because t has become empty<br /> * or no retryable pages exist anymore.<br /> *<br />- * SIMPLIFIED VERSION: This implementation of migrate_pages<br />- * is only swapping out pages and never touches the second<br />- * list. The direct migration patchset<br />- * extends this function to avoid the use of swap.<br />- *<br /> * Return: Number of pages not migrated when "to" ran empty.<br /> */<br /> int migrate_pages(struct list_head *from, struct list_head *to,<br />@@ -696,6 +849,9 @@ redo:<br /> retry = 0;<br /> <br /> list_for_each_entry_safe(page, page2, from, lru) {<br />+ struct page *newpage = NULL;<br />+ struct address_space *mapping;<br />+<br /> cond_resched();<br /> <br /> rc = 0;<br />@@ -703,6 +859,9 @@ redo:<br /> /* page was freed from under us. So we are done. */<br /> goto next;<br /> <br />+ if (to && list_empty(to))<br />+ break;<br />+<br /> /*<br /> * Skip locked pages during the first two passes to give the<br /> * functions holding the lock time to release the page. Later we<br />@@ -739,12 +898,64 @@ redo:<br /> }<br /> }<br /> <br />+ if (!to) {<br />+ rc = swap_page(page);<br />+ goto next;<br />+ }<br />+<br />+ newpage = lru_to_page(to);<br />+ lock_page(newpage);<br />+<br /> /*<br />- * Page is properly locked and writeback is complete.<br />+ * Pages are properly locked and writeback is complete.<br /> * Try to migrate the page.<br /> */<br />- rc = swap_page(page);<br />- goto next;<br />+ mapping = page_mapping(page);<br />+ if (!mapping)<br />+ goto unlock_both;<br />+<br />+ /*<br />+ * Trigger writeout if page is dirty<br />+ */<br />+ if (PageDirty(page)) {<br />+ switch (pageout(page, mapping)) {<br />+ case PAGE_KEEP:<br />+ case PAGE_ACTIVATE:<br />+ goto unlock_both;<br />+<br />+ case PAGE_SUCCESS:<br />+ unlock_page(newpage);<br />+ goto next;<br />+<br />+ case PAGE_CLEAN:<br />+ ; /* try to migrate the page below */<br />+ }<br />+ }<br />+ /*<br />+ * If we have no buffer or can release the buffer<br />+ * then do a simple migration.<br />+ */<br />+ if (!page_has_buffers(page) ||<br />+ try_to_release_page(page, GFP_KERNEL)) {<br />+ rc = migrate_page(newpage, page);<br />+ goto unlock_both;<br />+ }<br />+<br />+ /*<br />+ * On early passes with mapped pages simply<br />+ * retry. There may be a lock held for some<br />+ * buffers that may go away. Later<br />+ * swap them out.<br />+ */<br />+ if (pass > 4) {<br />+ unlock_page(newpage);<br />+ newpage = NULL;<br />+ rc = swap_page(page);<br />+ goto next;<br />+ }<br />+<br />+unlock_both:<br />+ unlock_page(newpage);<br /> <br /> unlock_page:<br /> unlock_page(page);<br />@@ -757,7 +968,10 @@ next:<br /> list_move(&page->lru, failed);<br /> nr_failed++;<br /> } else {<br />- /* Success */<br />+ if (newpage)<br />+ /* Successful migration. Return new page to LRU */<br />+ move_to_lru(newpage);<br />+<br /> list_move(&page->lru, moved);<br /> }<br /> }<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo@vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-12-05 20:55 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>