CINXE.COM

LKML: Roman Zippel: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>LKML: Roman Zippel: Re: [patch 00/21] hrtimer - High-resolution timer subsystem</title><link href="/css/message.css" rel="stylesheet" type="text/css" /><link href="/css/wrap.css" rel="alternate stylesheet" type="text/css" title="wrap" /><link href="/css/nowrap.css" rel="stylesheet" type="text/css" title="nowrap" /><link href="/favicon.ico" rel="shortcut icon" /><script src="/js/simple-calendar.js" type="text/javascript"></script><script src="/js/styleswitcher.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="lkml.org : last 100 messages" href="/rss.php" /><link rel="alternate" type="application/rss+xml" title="lkml.org : last messages by Roman Zippel" href="/groupie.php?aid=23" /><!--Matomo--><script> var _paq = window._paq = window._paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(["setDoNotTrack", true]); _paq.push(["disableCookies"]); _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//m.lkml.org/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); </script><!--End Matomo Code--></head><body onload="es.jasper.simpleCalendar.init();" itemscope="itemscope" itemtype="http://schema.org/BlogPosting"><table border="0" cellpadding="0" cellspacing="0"><tr><td width="180" align="center"><a href="/"><img style="border:0;width:135px;height:32px" src="/images/toprowlk.gif" alt="lkml.org" /></a></td><td width="32">聽</td><td class="nb"><div><a class="nb" href="/lkml"> [lkml]</a> 聽 <a class="nb" href="/lkml/2005"> [2005]</a> 聽 <a class="nb" href="/lkml/2005/12"> [Dec]</a> 聽 <a class="nb" href="/lkml/2005/12/14"> [14]</a> 聽 <a class="nb" href="/lkml/last100"> [last100]</a> 聽 <a href="/rss.php"><img src="/images/rss-or.gif" border="0" alt="RSS Feed" /></a></div><div>Views: <a href="#" class="nowrap" onclick="setActiveStyleSheet('wrap');return false;">[wrap]</a><a href="#" class="wrap" onclick="setActiveStyleSheet('nowrap');return false;">[no wrap]</a> 聽 <a class="nb" href="/lkml/mheaders/2005/12/14/269" onclick="this.href='/lkml/headers'+'/2005/12/14/269';">[headers]</a>聽 <a href="/lkml/bounce/2005/12/14/269">[forward]</a>聽 </div></td><td width="32">聽</td></tr><tr><td valign="top"><div class="es-jasper-simpleCalendar" baseurl="/lkml/"></div><div class="threadlist">Messages in this thread</div><ul class="threadlist"><li class="root"><a href="/lkml/2005/12/5/313">First message in thread</a></li><li><a href="/lkml/2005/12/12/85">Roman Zippel</a><ul><li><a href="/lkml/2005/12/12/122">Thomas Gleixner</a><ul><li><a href="/lkml/2005/12/12/154">Thomas Gleixner</a></li><li><a href="/lkml/2005/12/12/278">George Anzinger</a><ul><li><a href="/lkml/2005/12/13/89">Thomas Gleixner</a></li><li><a href="/lkml/2005/12/14/406">Roman Zippel</a><ul><li><a href="/lkml/2005/12/14/426">George Anzinger</a></li></ul></li></ul></li><li class="origin"><a href="/lkml/2005/12/14/299">Roman Zippel</a><ul><li><a href="/lkml/2005/12/14/299">Thomas Gleixner</a><ul><li><a href="/lkml/2005/12/14/394">George Anzinger</a></li><li><a href="/lkml/2005/12/15/151">Steven Rostedt</a></li><li><a href="/lkml/2005/12/19/62">Roman Zippel</a></li></ul></li></ul></li></ul></li></ul></li></ul></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerl.gif" width="32" height="32" alt="/" /></td><td class="c" rowspan="2" valign="top" style="padding-top: 1em"><table><tr><td><table><tr><td class="lp">Date</td><td class="rp" itemprop="datePublished">Wed, 14 Dec 2005 21:48:59 +0100 (CET)</td></tr><tr><td class="lp">From</td><td class="rp" itemprop="author">Roman Zippel &lt;&gt;</td></tr><tr><td class="lp">Subject</td><td class="rp" itemprop="name">Re: [patch 00/21] hrtimer - High-resolution timer subsystem</td></tr></table></td><td></td></tr></table><pre itemprop="articleBody">Hi,<br /><br />On Mon, 12 Dec 2005, Thomas Gleixner wrote:<br /><br />&gt; For the high resolution implementation we have to move the expired<br />&gt; timers to a seperate list, as we do not want to do complex callback<br />&gt; functions from the event interrupt itself. But we have to reprogramm the<br />&gt; next event interrupt, so we need simple access to the timer which<br />&gt; expires first.<br />&gt; <br />&gt; The initial implementation did simply move the timer from the pending<br />&gt; list to the expired list without doing the rb_tree removal inside of the<br />&gt; event interrupt handler. That way the next event for reprogramming was<br />&gt; the first event in the pending list.<br />&gt; <br />&gt; The new rebased version with the pending list removed does the rb_tree<br />&gt; removal inside the event interrupt and enqueues the timer, for which the<br />&gt; callback function has to be executed in the softirq, to the expired<br />&gt; list. One exception are simple wakeup callback functions, as they are<br />&gt; reasonably fast and we save two context switches. The next event for<br />&gt; reprogramming the event interrupt is retrieved by the pointer in the<br />&gt; base structure.<br />&gt; <br />&gt; This way the list head is only necessary for the high resolution case.<br /><br />Thanks for the explanation. If it's just for reprogramming the interrupt, <br />it should be cheaper to just check the rbtree than walk the list to find <br />the next expiration time (at least theoretically). This leaves only <br />optimizations for rt kernel and from the base kernel point of view I <br />prefer the immediate space savings.<br /><br />&gt; The state field is not removed because I'm not a big fan of those<br />&gt; overloaded fields and I prefer to pay the 4 byte penalty for the<br />&gt; seperation.<br />&gt; Of course if there is the absolute requirement to reduce the size, I'm<br />&gt; not insisting on keeping it.<br /><br />Well, I'm not a big fan of redundant state information, e.g. the pending <br />information can be included in the rb_node (it's not as quite simple as <br />with the timer_list, but it's the same thing). The expired information <br />(together with the data field) is an optimization for simple sleeps that <br />AFAICT only makes a difference in the rt kernel (the saved context switch <br />you mentioned above). What makes me more uncomfortable is that this is a <br />special case optimization and other callbacks are probably fast as well <br />(e.g. wakeup + timer restart).<br /><br />I can understand you want to keep the difference to the rt kernel small, <br />but for me it's more about immediate benefits against uncertain long term <br />benefits.<br /><br />&gt; &gt; The rationale for example talks about "a periodic timer with an absolute <br />&gt; &gt; _initial_ expiration time", so I could also construct a valid example with <br />&gt; &gt; this expectation. The more I read the spec the more I think the current <br />&gt; &gt; behaviour is not correct, e.g. that ABS_TIME is only relevant for <br />&gt; &gt; it_value.<br />&gt; &gt; So I'm interested in specific interpretations of the spec which support <br />&gt; &gt; the current behaviour.<br />&gt; <br />&gt; Unfortunately you find just the spec all over the place. I fear we have<br />&gt; to find and agree on an interpretation ourself.<br />&gt; <br />&gt; I agree, that the restriction to the initial it_value is definitely<br />&gt; something you can read out of the spec. But it does not make a lot of<br />&gt; sense for me. Also the restriction to TIMER_ABSTIME is somehow strange<br />&gt; as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I<br />&gt; never understood the rationale behind that.<br /><br />As George already said, it's easier to keep these clocks separate. I think <br />the spec rationale is also more clear about the intended usage. About <br />timers it says: <br /><br />"Two timer types are required for a system to support realtime <br />applications:<br /><br />1. One-shot<br />...<br /><br />2. Periodic<br />..."<br /><br />Basically you have two independent timer types. It's quite explicit about <br />that only the "initial timer expiration" can be relative or absolute. It <br />doesn't say anywhere that there are relative and absolute periodic timer, <br />all references to "absolute" or "relative" are only in connection with the <br />initial expiration time and after the initial expiration, it becomes a <br />periodic timer. At every timer expiration the timer is reloaded with a <br />relative time interval.<br />I can understand that you find this behaviour useful (although other <br />people may disagree) and the spec doesn't explicitely say that you must <br />not do this, but I really can't convince myself that this is the <br />_intendend_ behaviour.<br /><br />&gt; &gt; Since you don't do any rounding at all anymore, your timer may now expire <br />&gt; &gt; early with low resolution clocks (the old jiffies + 1 problem I explained <br />&gt; &gt; in my ktime_t patch).<br />&gt; <br />&gt; It does not expire early. The timer-&gt;expires field is still compared<br />&gt; against now. <br /><br />I don't think that's enough (unless I missed something). Steven maybe <br />explained it better than I did in<br />http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=113047529313935<br /><br />Even if you set the timer resolution to 1 nsec, there is still the <br />resolution of the actual hardware clock and it has to be taken into <br />account somewhere when you start a relative timer. Even if the clock <br />resolution is usually higher than the normal latency, so the problem won't <br />be visible for most people, the general timer code should take this into <br />account. If someone doesn't care about high resolution timer, he can still <br />use it with a low resolution clock (e.g. jiffies) and then this becomes a <br />problem.<br /><br />&gt; &gt; But this is now exactly the bevhaviour your timer has, why is not <br />&gt; &gt; "surprising" anymore?<br />&gt; <br />&gt; Yes, we wrote that before. After reconsidering the results we came to<br />&gt; the conclusion, that we actually dont need the rounding at all because<br />&gt; the uneven distance is equally surprising as the summing up errors<br />&gt; introduced by rounding.<br /><br />I don't think that the summing up error is surprising at all, the spec is <br />quite clear that the time values have to be rounded up to the resolution <br />of the timer and it's also the behaviour of the current timer.<br />This error is actually the expected behaviour for any timer with a <br />resolution different from 1 nsec. I don't want to say that we can't have <br />such a timer, but I'm not so sure whether this should be the default <br />behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME <br />and CLOCK_REALTIME_HR, where one is possibly faster and the other is more <br />precise. Even within the kernel I would prefer to map itimer and nanosleep <br />to the first clock (maybe also based on arch/kconfig defaults).<br />OTOH if the hardware allows it, both clocks can do the same thing, but I <br />really would like to have the possibility to give higher (and thus <br />possibly more expensive) resolution only to those asking for it.<br /><br />&gt; &gt; I don't mind changing the behaviour, but I would prefer to do this in a <br />&gt; &gt; separate step and with an analysis of the possible consequences. This is <br />&gt; &gt; not just about posix-timers, but it also affects itimers, nanosleep and <br />&gt; &gt; possibly other systems in the future. Actually my main focus is not on HR <br />&gt; &gt; posix timer, my main interest is that everythings else keeps working and <br />&gt; &gt; doesn't has to pay the price for it.<br />&gt; <br />&gt; While my focus is a clean merging of high resolution timers without<br />&gt; breaking the non hrt case, I still believe that we can find a solution,<br />&gt; where we can have the base implementation without any reference to<br />&gt; jiffies.<br /><br />This is what I think requires the better clock abstraction, most of it is <br />related to the clock resolution, the generic timer code currently has no <br />idea of the real resolution of the underlying clock, so I assumed a worst <br />case of TICK_NSEC everywhere.<br /><br />&gt; I try to compare and contrast the two possible solutions:<br />&gt; <br />&gt; Rounding the initial expiry time and the interval results in a summing<br />&gt; up error, which depends on the delta of the interval and the<br />&gt; resolution. <br />&gt; <br />&gt; The non rounding solution results in a summing up error for intervals<br />&gt; which are less than the resolution. For intervals &gt;= resolution no<br />&gt; summing up error is happening, but for intervals, which are not a<br />&gt; multiple of the resolution, an uneven interval as close as possible to<br />&gt; the timeline is delivered.<br />&gt; <br />&gt; In both cases the timers never expire early and I think both variants<br />&gt; are compliant with the specification.<br /><br />What I'd like to avoid is that we have to commit ourselves to only one <br />solution right now. I think the first solution is better suited to the low <br />resolution timer, that we have right now. The second solution requires a <br />better clock framework - this includes better time keeping and <br />reprogrammable timer interrupts.<br />At this point I wouldn't like to settle on just one solution, I had to <br />see more of the infrastructure integrated before doing this. At this point <br />I see more advantages in having a choice (may it be as Kconfig or even a <br />runtime option).<br /><br />bye, Roman<br />-<br />To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br />the body of a message to majordomo&#64;vger.kernel.org<br />More majordomo info at <a href="http://vger.kernel.org/majordomo-info.html">http://vger.kernel.org/majordomo-info.html</a><br />Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br /><br /></pre></td><td width="32" rowspan="2" class="c" valign="top"><img src="/images/icornerr.gif" width="32" height="32" alt="\" /></td></tr><tr><td align="right" valign="bottom"> 聽 </td></tr><tr><td align="right" valign="bottom">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerl.gif" width="32" height="32" alt="\" /></td><td class="c">聽</td><td class="c" valign="bottom" style="padding-bottom: 0px"><img src="/images/bcornerr.gif" width="32" height="32" alt="/" /></td></tr><tr><td align="right" valign="top" colspan="2"> 聽 </td><td class="lm">Last update: 2005-12-14 21:51 聽聽 [from the cache]<br />漏2003-2020 <a href="http://blog.jasper.es/"><span itemprop="editor">Jasper Spaans</span></a>|hosted at <a href="https://www.digitalocean.com/?refcode=9a8e99d24cf9">Digital Ocean</a> and my Meterkast|<a href="http://blog.jasper.es/categories.html#lkml-ref">Read the blog</a></td><td>聽</td></tr></table><script language="javascript" src="/js/styleswitcher.js" type="text/javascript"></script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10