CINXE.COM

[Python-Dev] Reiterability

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE> [Python-Dev] Reiterability </TITLE> <LINK REL="Index" HREF="index.html" > <LINK REL="made" HREF="mailto:python-dev%40python.org?Subject=%5BPython-Dev%5D%20Reiterability&In-Reply-To=200310181120.45477.aleaxit%40yahoo.com"> <META NAME="robots" CONTENT="index,nofollow"> <META http-equiv="Content-Type" content="text/html; charset=us-ascii"> <LINK REL="Previous" HREF="038955.html"> <LINK REL="Next" HREF="038975.html"> </HEAD> <BODY BGCOLOR="#ffffff"> <H1>[Python-Dev] Reiterability</H1> <B>Guido van Rossum</B> <A HREF="mailto:python-dev%40python.org?Subject=%5BPython-Dev%5D%20Reiterability&In-Reply-To=200310181120.45477.aleaxit%40yahoo.com" TITLE="[Python-Dev] Reiterability">guido at python.org </A><BR> <I>Sat Oct 18 13:17:40 EDT 2003</I> <P><UL> <LI>Previous message: <A HREF="038955.html">[Python-Dev] generator comprehension syntax, was: accumulator display syntax </A></li> <LI>Next message: <A HREF="038975.html">[Python-Dev] Re: Reiterability </A></li> <LI> <B>Messages sorted by:</B> <a href="date.html#38969">[ date ]</a> <a href="thread.html#38969">[ thread ]</a> <a href="subject.html#38969">[ subject ]</a> <a href="author.html#38969">[ author ]</a> </LI> </UL> <HR> <!--beginarticle--> <PRE>[Guido] &gt;<i> &gt;Oh, no. Not reiterability again. How can you promise something to be </I>&gt;<i> &gt;reiterable if you don't know whether the underlying iterator can be </I>&gt;<i> &gt;reiterated? Keeping a hidden buffer would be a bad idea. </I> [Alex] &gt;<i> I agree it would be bad to have &quot;black magic&quot; performed by every </I>&gt;<i> iterator to fulfil a contract that may or may not be useful to </I>&gt;<i> clients and might be costly to fulfil. </I>&gt;<i> </I>&gt;<i> IF &quot;reiterability&quot; is useful (and I'd need to see some use cases, </I>&gt;<i> because I don't particularly recall pining for it in Python) it </I>&gt;<i> should be exposed as a separate protocol that may or may not be </I>&gt;<i> offered by any given iterator type. E.g., the presence of a special </I>&gt;<i> method __reiter__ could indicate that this iterator IS able to </I>&gt;<i> supply another iterator which retraces the same steps from the </I>&gt;<i> start; and perhaps iter(xxx, reiterable=True) could strive to </I>&gt;<i> provide a reiterable iterator for xxx, which might justify building </I>&gt;<i> one that keeps a hidden buffer as a last resort. But first, I'd </I>&gt;<i> like use cases... </I> In cases where reiterabiliy can be implemented without much effort, there is already an underlying object representing the sequence (e.g. a collection object, or an object defining a numerical series). Reiteration comes for free if you hold on to that underlying object rather than passing an iterator to them around. [Phillip] &gt;<i> I think I phrased my question poorly. What I should have said was: </I>&gt;<i> </I>&gt;<i> &quot;Should iterator expressions preserve the reiterability of the base </I>&gt;<i> expression?&quot; </I> (An iterator expression being something like (f(x) for x in S) right?) &gt;<i> I don't want to make them guarantee reiterability, only to preserve </I>&gt;<i> it if it already exists. Does that make more sense? </I>&gt;<i> </I>&gt;<i> In essence, this would be done by having an itercomp expression </I>&gt;<i> resolve to an object whose __iter__ method calls the underlying </I>&gt;<i> generator, returning a generator-iterator. Thus, any iteration over </I>&gt;<i> the itercomp is equivalent to calling a no-arguments generator. The </I>&gt;<i> result is reiterable if the base iterable is reiterable, otherwise </I>&gt;<i> not. </I> OK, I think I understand what you're after. The code for an iterator expression has to create a generator function behind the scenes, and call it. For example: A = (f(x) for x in S) could be translated into: def gen(seq): for x in seq: yield f(x) A = gen(S) (Note that S could be an arbitrary expression and should be evaluated only once. This translation does that correctly.) This allows one to iterate once over A (a generator function doesn't allow reiteration). What you are asking looks like it could be done like this (never mind the local names): def gen(seq): for x in seq: yield f(x) class Helper: def __init__(seq): self.seq = seq def __iter__(self): return gen(self.seq) A = Helper(S) Then every time you use iter(A) gen() will be called with the saved value of S as argument. &gt;<i> I suppose technically, this means the itercomp doesn't return an </I>&gt;<i> iterator, but an iterable, which I suppose could be confusing if you </I>&gt;<i> try to call its 'next()' method. But then, it could have a next() </I>&gt;<i> method that raises an error saying &quot;call 'iter()' on me first&quot;. </I> I don't mind that so much, but I don't think all the extra machinery is worth it; the compiler generally can't tell if it is needed so it has to produce the reiterable code every time. If you *want* to have an iterable instead of an iterator, it's usually easy enough do (especially given knowledge about the type of S). [Alex again] &gt;<i> There ARE other features I'd REALLY have liked to get from iterators </I>&gt;<i> in some applications. </I>&gt;<i> </I>&gt;<i> A &quot;snapshot&quot; -- providing me two iterators, the original one and </I>&gt;<i> another, which will step independently over the same sequence of </I>&gt;<i> items -- would have been really handy at times. And a &quot;step back&quot; </I>&gt;<i> facility (&quot;undo&quot; of the last call to next) -- sometimes one level </I>&gt;<i> would suffice, sometimes not; often I could have provided the item </I>&gt;<i> to be &quot;pushed back&quot; so the iterator need not retain memory of it </I>&gt;<i> independently, but that wouldn't always be handy. Now any of these </I>&gt;<i> can be built as a wrapper over an existing iterator, of course -- </I>&gt;<i> just like 'reiterability' could (and you could in fact easily </I>&gt;<i> implement reiterability in terms of snapshotting, by just ensuring a </I>&gt;<i> snapshot is taken at the start and further snapshotted but never </I>&gt;<i> disturbed); but not knowing the abilities of the underlying iterator </I>&gt;<i> would mean these wrappers would often duplicate functionality </I>&gt;<i> needlessly. </I> I don't see how it can be done without an explicit request for such a wrapper in the calling code. If the underlying iterator is ephemeral (is not reiterable) the snapshotter has to save a copy of every item, and that would defeat the purpose of iterators if it was done automatically. Or am I misunderstanding? &gt;<i> E.g.: </I>&gt;<i> </I>&gt;<i> class snapshottable_sequence_iter(object): </I>&gt;<i> def __init__(self, sequence, i=0): </I>&gt;<i> self.sequence = sequence </I>&gt;<i> self.i = i </I>&gt;<i> def __iter__(self): return self </I>&gt;<i> def next(self): </I>&gt;<i> try: result = self.sequence[self.i] </I>&gt;<i> except IndexError: raise StopIteration </I>&gt;<i> self.i += 1 </I>&gt;<i> return result </I>&gt;<i> def snapshot(self): </I>&gt;<i> return self.__class__(self.sequence, self.i) </I>&gt;<i> </I>&gt;<i> Here, snapshotting is quite cheap, requiring just a new counter and </I>&gt;<i> another reference to the same underlying sequence. So would be </I>&gt;<i> restarting and stepping back, directly implemented. But if we need </I>&gt;<i> to wrap a totally generic iterator to provide &quot;snapshottability&quot;, we </I>&gt;<i> inevitably end up keeping a list (or the like) of items so far seen </I>&gt;<i> from one but not both 'independent' iterators obtained by a snapshot </I>&gt;<i> -- all potentially redundant storage, not to mention the possible </I>&gt;<i> coding trickiness in maintaining that FIFO queue. </I> I'm not sure what you are suggesting here. Are you proposing that *some* iterators (those which can be snapshotted cheaply) sprout a new snapshot() method? &gt;<i> As I said I do have use cases for all of these. Simplest is the </I>&gt;<i> ability to push back the last item obtained by next, since a frequent </I>&gt;<i> patter is: </I>&gt;<i> for item in iterator: </I>&gt;<i> if isok(item): process(item) </I>&gt;<i> else: </I>&gt;<i> # need to push item back onto iterator, then </I>&gt;<i> break </I>&gt;<i> else: </I>&gt;<i> # all items were OK, iterator exhausted, blah blah </I>&gt;<i> </I>&gt;<i> ...and later... </I>&gt;<i> </I>&gt;<i> for item in iterator: # process some more items </I>&gt;<i> </I>&gt;<i> Of course, as long as just a few levels of pushback are enough, THIS </I>&gt;<i> one is an easy and light-weight wrapper to write: </I>&gt;<i> </I>&gt;<i> class pushback_wrapper(object): </I>&gt;<i> def __init__(self, it): </I>&gt;<i> self.it = it </I>&gt;<i> self.pushed_back = [] </I>&gt;<i> def __iter__(self): return self </I>&gt;<i> def next(self): </I>&gt;<i> try: return self.pushed_back.pop() </I>&gt;<i> except IndexError: return self.it.next() </I>&gt;<i> def pushback(self, item): </I>&gt;<i> self.pushed_back.append(item) </I> This definitely sounds like you'd want to create an explicit wrapper for this; there is too much machinery here to make this a standard feature. Perhaps a snapshottable iterator could also have a backup() method (which would decrement self.i in your first example) or a prev() method (which would return self.sequence[self.i] and decrement self.i). &gt;<i> A &quot;snapshot&quot; would be useful whenever more than one pass on a </I>&gt;<i> sequence _or part of it_ is needed (more useful than a &quot;restart&quot; </I>&gt;<i> because of the &quot;part of it&quot; provision). And a decent wrapper for it </I>&gt;<i> is a bear... </I> Such wrappers for specific container types (or maybe just one for sequences) could be in a standard library module. Is more needed? --Guido van Rossum (home page: <A HREF="http://www.python.org/~guido/">http://www.python.org/~guido/</A>) </PRE> <!--endarticle--> <HR> <P><UL> <!--threads--> <LI>Previous message: <A HREF="038955.html">[Python-Dev] generator comprehension syntax, was: accumulator display syntax </A></li> <LI>Next message: <A HREF="038975.html">[Python-Dev] Re: Reiterability </A></li> <LI> <B>Messages sorted by:</B> <a href="date.html#38969">[ date ]</a> <a href="thread.html#38969">[ thread ]</a> <a href="subject.html#38969">[ subject ]</a> <a href="author.html#38969">[ author ]</a> </LI> </UL> <hr> <a href="http://mail.python.org/mailman/listinfo/python-dev">More information about the Python-Dev mailing list</a><br> </body></html>

Pages: 1 2 3 4 5 6 7 8 9 10