CINXE.COM

LKML: Linus Torvalds: Re: more git updates..

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Linus Torvalds: Re: more git updates..</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Linus Torvalds" href="/groupie.php?aid=1" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/4"> [Apr]</a> 聽 <a class="nb" href="/lkml/2005/4/9"> [9]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/4/9/109" onclick="this.href='/lkml/headers'+'/2005/4/9/109';">[headers]</a>聽 <a href="/lkml/bounce/2005/4/9/109">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/4/9/103">First message in thread</a></li><li><a href="/lkml/2005/4/9/103">Linus Torvalds</a><ul><li><a href="/lkml/2005/4/9/104">Linus Torvalds</a></li><li><a href="/lkml/2005/4/9/105">Petr Baudis</a><ul><li class="origin"><a href="/lkml/2005/4/9/110">Linus Torvalds</a><ul><li><a href="/lkml/2005/4/9/110">Linus Torvalds</a><ul><li><a href="/lkml/2005/4/9/136">Linus Torvalds</a></li></ul></li><li><a href="/lkml/2005/4/9/150">Paul Jackson</a><ul><li><a href="/lkml/2005/4/9/152">Paul Jackson</a></li></ul></li><li><a href="/lkml/2005/4/9/151">Paul Jackson</a></li><li><a href="/lkml/2005/4/10/8">Junio C Hamano</a><ul><li><a href="/lkml/2005/4/10/13">Christopher Li</a></li><li><a href="/lkml/2005/4/10/37">Rutger Nijlunsing</a></li><li><a href="/lkml/2005/4/10/70">Linus Torvalds</a></li></ul></li><li><a href="/lkml/2005/4/10/44"> tony.luck&#64;intel ...</a><ul><li><a href="/lkml/2005/4/10/71">Linus Torvalds</a></li><li><a href="/lkml/2005/4/10/103">Paul Jackson</a></li></ul></li></ul></li></ul></li><li><a href="/lkml/2005/4/9/114">Paul Jackson</a></li><li><a href="/lkml/2005/4/9/134">Ralph Corderoy</a><ul><li><a href="/lkml/2005/4/9/143">Paul Jackson</a><ul><li><a href="/lkml/2005/4/9/145">Bernd Eckenfels</a><ul><li><a href="/lkml/2005/4/9/146">Paul Jackson</a></li></ul></li><li><a href="/lkml/2005/4/10/29">Ralph Corderoy</a><ul><li><a href="/lkml/2005/4/10/83">Paul Jackson</a></li></ul></li></ul></li></ul></li><li><a href="/lkml/2005/4/10/82">Rik van Riel</a><ul><li><a href="/lkml/2005/4/10/85">Ingo Molnar</a></li></ul></li><li><a href="/lkml/2005/4/11/154"> ross&#64;lug ...</a></li></ul></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Sat, 9 Apr 2005 14:00:09 -0700 (PDT)</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Linus Torvalds &lt;&gt;</td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">Re: more git updates..</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody"><br /><br />On Sat, 9 Apr 2005, Petr Baudis wrote:<br />&gt; <br />&gt; &gt; Also, I wrote the "diff-tree" thing I talked about: <br />&gt; ..snip..<br />&gt; <br />&gt; Hmm, I wonder, is this better done in C instead of a simple shell<br />&gt; script, like my gitdiff.sh?<br /><br />With 17,000 files in the kernel, and most commits just changing a small <br />number of them, I actually think "diff-tree" matters. You use "join" <br />(which is quite reasonable), but let's put it this way: just the list of <br />files in the current kernel is about half a megabyte of data. Ie your <br />temporary files that you use in the "ls-tree + ls-tree + join" is actually <br />going to be quite sizeable.<br /><br />My goal here is that the speed of "git" really should be almost totally<br />independent of the size of the project. You clearly cannot avoid _some_ <br />size-dependency: my "diff-tree" clearly also has to work through the same <br />1MB of data, but I think it's worth making the constant factor be as small <br />as humanly possible.<br /><br />I just tried checking in a kernel tree tar-file, and the initial checkin <br />(which is allt he compression and the sha1 calculations for every single <br />file) took about 1:35 (minutes, not hours ;).<br /><br />Doing a commit (trivial change to the top-level Makefile) and then doing a <br />"treediff" between those two things took 0.05 seconds using my C thing. Ie <br />we're talking so fast that we really don't care.<br /><br />Doing a "show-diff" takes 0.15 secs or so (that's all the "stat" calls), <br />and now that I test it out I realize that the most expensive operation is <br />actually _writing_ the "index" file out. These are the two most expensive <br />steps:<br /><br /> torvalds&#64;ppc970:~/lx-test/linux-2.6.12-rc2&gt; time update-cache Makefile<br /><br /> real 0m0.283s<br /> user 0m0.171s<br /> sys 0m0.113s<br /><br /><br /> torvalds&#64;ppc970:~/lx-test/linux-2.6.12-rc2&gt; time write-tree<br /> 5ca21c9d808fa4bee1eb6948a59dfb9c7d73f36a<br /> <br /> real 0m0.441s<br /> user 0m0.354s<br /> sys 0m0.087s<br /><br />ie with the current infrastructure it looks like I can do a "patch + <br />commit" in less than one second on the kernel, and 0.75 secs of that is <br />because the "tree" file actually grows pretty large:<br /><br /> cat-file tree 5ca21c9d808fa4bee1eb6948a59dfb9c7d73f36a | wc -c <br /><br />says that the uncompressed tree-file is 950,874 bytes. Compressing it <br />means that the archival version of it is "just" 462,546 bytes, but this is <br />really the part that is going to eat _tons_ of disk-space.<br /><br />In other words, each "commit" file is very small and cheap, but since <br />almost every commit will also imply a totally new tree-file, "git" is <br />going to have an overhead of half a megabyte per commit. Oops.<br /><br />Damn, that's painful. I suspect I will have to change the format somehow.<br /><br />One option (which I haven't tested yet) is that since the tree-file is <br />already sorted, I could always write it out with the common subdirectory <br />part "collapsed", ie instead of writing<br /><br /> ...<br /> include/asm-i386/mach-default/bios_ebda.h<br /> include/asm-i386/mach-default/do_timer.h<br /> ...<br /><br />I'd write just<br /><br /> ...<br /> ///bios_ebda.h<br /> ///do_timer.h<br /> ...<br /><br />since the directory names are implied by the predecessor.<br /><br />However, that doesn't help with the 20-byte sha1 associated with each<br />file, which is also obviously uncompressible, so with 17,000+ files, we<br />have a minimum overhead of abotu 350kB per tree-file.<br /><br />So even if I did the pathname compression, it wouldn't help all that much. <br />I'd only be removing the only part of the file that _is_ very<br />compressible, and I'd probably end up with something that isn't all that<br />far away from the 450kB+ it is now.<br /><br />I suspect that I have to change the file format. Maybe make the "tree" <br />object a two-level thing, and have a "directory" object.<br /><br />Then a "tree" object would point to a "directory" object, which would in<br />turn point to the individual files (and other "directory" objects, of<br />course). That way a commit that only changes a few files will only need to<br />create a few new "directory" objects, instead of creating one huge "tree"<br />object.<br /><br />Sadly, that will make "tree-diff" potentially more expensive. On the other<br />hand, maybe not: it will also speed it _up_, since directories that are<br />totally shared will be trivially seen as such and need no further<br />operation.<br /><br />Thougths? That would break the current repository formats, and I'd have to <br />create a converter thing (which shouldn't be that bad, of course).<br /><br />I don't have to do it right now. In fact, I'd almost prefer for the<br />current thing to become good enough that it's not painful to work with,<br />since right now I'm using it to develop itself. Then I can convert the<br />format with an automated script later, before I actually start working on<br />the kernel...<br /><br />&gt; BTW, do we care about changed modes? If so, they should probably have<br />&gt; their place in the diff-tree output.<br /><br />They're there. If you want to ignore them, you can just notice that the <br />sha1 matches between two lines, and then you don't even have to diff them.<br /><br /> Linus<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo&#64;vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2009-11-18 23:46 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10