CINXE.COM
[Python-Dev] Reiterability
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE> [Python-Dev] Reiterability </TITLE> <LINK REL="Index" HREF="index.html" > <LINK REL="made" HREF="mailto:python-dev%40python.org?Subject=%5BPython-Dev%5D%20Reiterability&In-Reply-To=200310181120.45477.aleaxit%40yahoo.com"> <META NAME="robots" CONTENT="index,nofollow"> <META http-equiv="Content-Type" content="text/html; charset=us-ascii"> <LINK REL="Previous" HREF="038955.html"> <LINK REL="Next" HREF="038975.html"> </HEAD> <BODY BGCOLOR="#ffffff"> <H1>[Python-Dev] Reiterability</H1> <B>Guido van Rossum</B> <A HREF="mailto:python-dev%40python.org?Subject=%5BPython-Dev%5D%20Reiterability&In-Reply-To=200310181120.45477.aleaxit%40yahoo.com" TITLE="[Python-Dev] Reiterability">guido at python.org </A><BR> <I>Sat Oct 18 13:17:40 EDT 2003</I> <P><UL> <LI>Previous message: <A HREF="038955.html">[Python-Dev] generator comprehension syntax, was: accumulator display syntax </A></li> <LI>Next message: <A HREF="038975.html">[Python-Dev] Re: Reiterability </A></li> <LI> <B>Messages sorted by:</B> <a href="date.html#38969">[ date ]</a> <a href="thread.html#38969">[ thread ]</a> <a href="subject.html#38969">[ subject ]</a> <a href="author.html#38969">[ author ]</a> </LI> </UL> <HR> <!--beginarticle--> <PRE>[Guido] ><i> >Oh, no. Not reiterability again. How can you promise something to be </I>><i> >reiterable if you don't know whether the underlying iterator can be </I>><i> >reiterated? Keeping a hidden buffer would be a bad idea. </I> [Alex] ><i> I agree it would be bad to have "black magic" performed by every </I>><i> iterator to fulfil a contract that may or may not be useful to </I>><i> clients and might be costly to fulfil. </I>><i> </I>><i> IF "reiterability" is useful (and I'd need to see some use cases, </I>><i> because I don't particularly recall pining for it in Python) it </I>><i> should be exposed as a separate protocol that may or may not be </I>><i> offered by any given iterator type. E.g., the presence of a special </I>><i> method __reiter__ could indicate that this iterator IS able to </I>><i> supply another iterator which retraces the same steps from the </I>><i> start; and perhaps iter(xxx, reiterable=True) could strive to </I>><i> provide a reiterable iterator for xxx, which might justify building </I>><i> one that keeps a hidden buffer as a last resort. But first, I'd </I>><i> like use cases... </I> In cases where reiterabiliy can be implemented without much effort, there is already an underlying object representing the sequence (e.g. a collection object, or an object defining a numerical series). Reiteration comes for free if you hold on to that underlying object rather than passing an iterator to them around. [Phillip] ><i> I think I phrased my question poorly. What I should have said was: </I>><i> </I>><i> "Should iterator expressions preserve the reiterability of the base </I>><i> expression?" </I> (An iterator expression being something like (f(x) for x in S) right?) ><i> I don't want to make them guarantee reiterability, only to preserve </I>><i> it if it already exists. Does that make more sense? </I>><i> </I>><i> In essence, this would be done by having an itercomp expression </I>><i> resolve to an object whose __iter__ method calls the underlying </I>><i> generator, returning a generator-iterator. Thus, any iteration over </I>><i> the itercomp is equivalent to calling a no-arguments generator. The </I>><i> result is reiterable if the base iterable is reiterable, otherwise </I>><i> not. </I> OK, I think I understand what you're after. The code for an iterator expression has to create a generator function behind the scenes, and call it. For example: A = (f(x) for x in S) could be translated into: def gen(seq): for x in seq: yield f(x) A = gen(S) (Note that S could be an arbitrary expression and should be evaluated only once. This translation does that correctly.) This allows one to iterate once over A (a generator function doesn't allow reiteration). What you are asking looks like it could be done like this (never mind the local names): def gen(seq): for x in seq: yield f(x) class Helper: def __init__(seq): self.seq = seq def __iter__(self): return gen(self.seq) A = Helper(S) Then every time you use iter(A) gen() will be called with the saved value of S as argument. ><i> I suppose technically, this means the itercomp doesn't return an </I>><i> iterator, but an iterable, which I suppose could be confusing if you </I>><i> try to call its 'next()' method. But then, it could have a next() </I>><i> method that raises an error saying "call 'iter()' on me first". </I> I don't mind that so much, but I don't think all the extra machinery is worth it; the compiler generally can't tell if it is needed so it has to produce the reiterable code every time. If you *want* to have an iterable instead of an iterator, it's usually easy enough do (especially given knowledge about the type of S). [Alex again] ><i> There ARE other features I'd REALLY have liked to get from iterators </I>><i> in some applications. </I>><i> </I>><i> A "snapshot" -- providing me two iterators, the original one and </I>><i> another, which will step independently over the same sequence of </I>><i> items -- would have been really handy at times. And a "step back" </I>><i> facility ("undo" of the last call to next) -- sometimes one level </I>><i> would suffice, sometimes not; often I could have provided the item </I>><i> to be "pushed back" so the iterator need not retain memory of it </I>><i> independently, but that wouldn't always be handy. Now any of these </I>><i> can be built as a wrapper over an existing iterator, of course -- </I>><i> just like 'reiterability' could (and you could in fact easily </I>><i> implement reiterability in terms of snapshotting, by just ensuring a </I>><i> snapshot is taken at the start and further snapshotted but never </I>><i> disturbed); but not knowing the abilities of the underlying iterator </I>><i> would mean these wrappers would often duplicate functionality </I>><i> needlessly. </I> I don't see how it can be done without an explicit request for such a wrapper in the calling code. If the underlying iterator is ephemeral (is not reiterable) the snapshotter has to save a copy of every item, and that would defeat the purpose of iterators if it was done automatically. Or am I misunderstanding? ><i> E.g.: </I>><i> </I>><i> class snapshottable_sequence_iter(object): </I>><i> def __init__(self, sequence, i=0): </I>><i> self.sequence = sequence </I>><i> self.i = i </I>><i> def __iter__(self): return self </I>><i> def next(self): </I>><i> try: result = self.sequence[self.i] </I>><i> except IndexError: raise StopIteration </I>><i> self.i += 1 </I>><i> return result </I>><i> def snapshot(self): </I>><i> return self.__class__(self.sequence, self.i) </I>><i> </I>><i> Here, snapshotting is quite cheap, requiring just a new counter and </I>><i> another reference to the same underlying sequence. So would be </I>><i> restarting and stepping back, directly implemented. But if we need </I>><i> to wrap a totally generic iterator to provide "snapshottability", we </I>><i> inevitably end up keeping a list (or the like) of items so far seen </I>><i> from one but not both 'independent' iterators obtained by a snapshot </I>><i> -- all potentially redundant storage, not to mention the possible </I>><i> coding trickiness in maintaining that FIFO queue. </I> I'm not sure what you are suggesting here. Are you proposing that *some* iterators (those which can be snapshotted cheaply) sprout a new snapshot() method? ><i> As I said I do have use cases for all of these. Simplest is the </I>><i> ability to push back the last item obtained by next, since a frequent </I>><i> patter is: </I>><i> for item in iterator: </I>><i> if isok(item): process(item) </I>><i> else: </I>><i> # need to push item back onto iterator, then </I>><i> break </I>><i> else: </I>><i> # all items were OK, iterator exhausted, blah blah </I>><i> </I>><i> ...and later... </I>><i> </I>><i> for item in iterator: # process some more items </I>><i> </I>><i> Of course, as long as just a few levels of pushback are enough, THIS </I>><i> one is an easy and light-weight wrapper to write: </I>><i> </I>><i> class pushback_wrapper(object): </I>><i> def __init__(self, it): </I>><i> self.it = it </I>><i> self.pushed_back = [] </I>><i> def __iter__(self): return self </I>><i> def next(self): </I>><i> try: return self.pushed_back.pop() </I>><i> except IndexError: return self.it.next() </I>><i> def pushback(self, item): </I>><i> self.pushed_back.append(item) </I> This definitely sounds like you'd want to create an explicit wrapper for this; there is too much machinery here to make this a standard feature. Perhaps a snapshottable iterator could also have a backup() method (which would decrement self.i in your first example) or a prev() method (which would return self.sequence[self.i] and decrement self.i). ><i> A "snapshot" would be useful whenever more than one pass on a </I>><i> sequence _or part of it_ is needed (more useful than a "restart" </I>><i> because of the "part of it" provision). And a decent wrapper for it </I>><i> is a bear... </I> Such wrappers for specific container types (or maybe just one for sequences) could be in a standard library module. Is more needed? --Guido van Rossum (home page: <A HREF="http://www.python.org/~guido/">http://www.python.org/~guido/</A>) </PRE> <!--endarticle--> <HR> <P><UL> <!--threads--> <LI>Previous message: <A HREF="038955.html">[Python-Dev] generator comprehension syntax, was: accumulator display syntax </A></li> <LI>Next message: <A HREF="038975.html">[Python-Dev] Re: Reiterability </A></li> <LI> <B>Messages sorted by:</B> <a href="date.html#38969">[ date ]</a> <a href="thread.html#38969">[ thread ]</a> <a href="subject.html#38969">[ subject ]</a> <a href="author.html#38969">[ author ]</a> </LI> </UL> <hr> <a href="http://mail.python.org/mailman/listinfo/python-dev">More information about the Python-Dev mailing list</a><br> </body></html>