[Python-Dev] Reiterability

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE> [Python-Dev] Reiterability </TITLE> <LINK REL="Index" HREF="index.html" > <LINK REL="made" HREF="mailto:python-dev%40python.org?Subject=%5BPython-Dev%5D%20Reiterability&In-Reply-To=200310181120.45477.aleaxit%40yahoo.com"> <META NAME="robots" CONTENT="index,nofollow"> <META http-equiv="Content-Type" content="text/html; charset=us-ascii"> <LINK REL="Previous" HREF="038955.html"> <LINK REL="Next" HREF="038975.html"> </HEAD> <BODY BGCOLOR="#ffffff"> <H1>[Python-Dev] Reiterability</H1> Guido van Rossum <A HREF="mailto:python-dev%40python.org?Subject=%5BPython-Dev%5D%20Reiterability&In-Reply-To=200310181120.45477.aleaxit%40yahoo.com" TITLE="[Python-Dev] Reiterability">guido at python.org </A> Sat Oct 18 13:17:40 EDT 2003 <UL> <LI>Previous message: <A HREF="038955.html">[Python-Dev] generator comprehension syntax, was: accumulator display syntax </A></li> <LI>Next message: <A HREF="038975.html">[Python-Dev] Re: Reiterability </A></li> <LI> Messages sorted by: <a href="date.html#38969">[ date ]</a> <a href="thread.html#38969">[ thread ]</a> <a href="subject.html#38969">[ subject ]</a> <a href="author.html#38969">[ author ]</a> </LI> </UL> <HR>  <PRE>[Guido] > >Oh, no. Not reiterability again. How can you promise something to be > >reiterable if you don't know whether the underlying iterator can be > >reiterated? Keeping a hidden buffer would be a bad idea. [Alex] > I agree it would be bad to have "black magic" performed by every > iterator to fulfil a contract that may or may not be useful to > clients and might be costly to fulfil. > > IF "reiterability" is useful (and I'd need to see some use cases, > because I don't particularly recall pining for it in Python) it > should be exposed as a separate protocol that may or may not be > offered by any given iterator type. E.g., the presence of a special > method __reiter__ could indicate that this iterator IS able to > supply another iterator which retraces the same steps from the > start; and perhaps iter(xxx, reiterable=True) could strive to > provide a reiterable iterator for xxx, which might justify building > one that keeps a hidden buffer as a last resort. But first, I'd > like use cases... In cases where reiterabiliy can be implemented without much effort, there is already an underlying object representing the sequence (e.g. a collection object, or an object defining a numerical series). Reiteration comes for free if you hold on to that underlying object rather than passing an iterator to them around. [Phillip] > I think I phrased my question poorly. What I should have said was: > > "Should iterator expressions preserve the reiterability of the base > expression?" (An iterator expression being something like (f(x) for x in S) right?) > I don't want to make them guarantee reiterability, only to preserve > it if it already exists. Does that make more sense? > > In essence, this would be done by having an itercomp expression > resolve to an object whose __iter__ method calls the underlying > generator, returning a generator-iterator. Thus, any iteration over > the itercomp is equivalent to calling a no-arguments generator. The > result is reiterable if the base iterable is reiterable, otherwise > not. OK, I think I understand what you're after. The code for an iterator expression has to create a generator function behind the scenes, and call it. For example: A = (f(x) for x in S) could be translated into: def gen(seq): for x in seq: yield f(x) A = gen(S) (Note that S could be an arbitrary expression and should be evaluated only once. This translation does that correctly.) This allows one to iterate once over A (a generator function doesn't allow reiteration). What you are asking looks like it could be done like this (never mind the local names): def gen(seq): for x in seq: yield f(x) class Helper: def __init__(seq): self.seq = seq def __iter__(self): return gen(self.seq) A = Helper(S) Then every time you use iter(A) gen() will be called with the saved value of S as argument. > I suppose technically, this means the itercomp doesn't return an > iterator, but an iterable, which I suppose could be confusing if you > try to call its 'next()' method. But then, it could have a next() > method that raises an error saying "call 'iter()' on me first". I don't mind that so much, but I don't think all the extra machinery is worth it; the compiler generally can't tell if it is needed so it has to produce the reiterable code every time. If you *want* to have an iterable instead of an iterator, it's usually easy enough do (especially given knowledge about the type of S). [Alex again] > There ARE other features I'd REALLY have liked to get from iterators > in some applications. > > A "snapshot" -- providing me two iterators, the original one and > another, which will step independently over the same sequence of > items -- would have been really handy at times. And a "step back" > facility ("undo" of the last call to next) -- sometimes one level > would suffice, sometimes not; often I could have provided the item > to be "pushed back" so the iterator need not retain memory of it > independently, but that wouldn't always be handy. Now any of these > can be built as a wrapper over an existing iterator, of course -- > just like 'reiterability' could (and you could in fact easily > implement reiterability in terms of snapshotting, by just ensuring a > snapshot is taken at the start and further snapshotted but never > disturbed); but not knowing the abilities of the underlying iterator > would mean these wrappers would often duplicate functionality > needlessly. I don't see how it can be done without an explicit request for such a wrapper in the calling code. If the underlying iterator is ephemeral (is not reiterable) the snapshotter has to save a copy of every item, and that would defeat the purpose of iterators if it was done automatically. Or am I misunderstanding? > E.g.: > > class snapshottable_sequence_iter(object): > def __init__(self, sequence, i=0): > self.sequence = sequence > self.i = i > def __iter__(self): return self > def next(self): > try: result = self.sequence[self.i] > except IndexError: raise StopIteration > self.i += 1 > return result > def snapshot(self): > return self.__class__(self.sequence, self.i) > > Here, snapshotting is quite cheap, requiring just a new counter and > another reference to the same underlying sequence. So would be > restarting and stepping back, directly implemented. But if we need > to wrap a totally generic iterator to provide "snapshottability", we > inevitably end up keeping a list (or the like) of items so far seen > from one but not both 'independent' iterators obtained by a snapshot > -- all potentially redundant storage, not to mention the possible > coding trickiness in maintaining that FIFO queue. I'm not sure what you are suggesting here. Are you proposing that *some* iterators (those which can be snapshotted cheaply) sprout a new snapshot() method? > As I said I do have use cases for all of these. Simplest is the > ability to push back the last item obtained by next, since a frequent > patter is: > for item in iterator: > if isok(item): process(item) > else: > # need to push item back onto iterator, then > break > else: > # all items were OK, iterator exhausted, blah blah > > ...and later... > > for item in iterator: # process some more items > > Of course, as long as just a few levels of pushback are enough, THIS > one is an easy and light-weight wrapper to write: > > class pushback_wrapper(object): > def __init__(self, it): > self.it = it > self.pushed_back = [] > def __iter__(self): return self > def next(self): > try: return self.pushed_back.pop() > except IndexError: return self.it.next() > def pushback(self, item): > self.pushed_back.append(item) This definitely sounds like you'd want to create an explicit wrapper for this; there is too much machinery here to make this a standard feature. Perhaps a snapshottable iterator could also have a backup() method (which would decrement self.i in your first example) or a prev() method (which would return self.sequence[self.i] and decrement self.i). > A "snapshot" would be useful whenever more than one pass on a > sequence _or part of it_ is needed (more useful than a "restart" > because of the "part of it" provision). And a decent wrapper for it > is a bear... Such wrappers for specific container types (or maybe just one for sequences) could be in a standard library module. Is more needed? --Guido van Rossum (home page: <A HREF="http://www.python.org/~guido/">http://www.python.org/~guido/</A>) </PRE>  <HR> <UL>  <LI>Previous message: <A HREF="038955.html">[Python-Dev] generator comprehension syntax, was: accumulator display syntax </A></li> <LI>Next message: <A HREF="038975.html">[Python-Dev] Re: Reiterability </A></li> <LI> Messages sorted by: <a href="date.html#38969">[ date ]</a> <a href="thread.html#38969">[ thread ]</a> <a href="subject.html#38969">[ subject ]</a> <a href="author.html#38969">[ author ]</a> </LI> </UL> <hr> <a href="http://mail.python.org/mailman/listinfo/python-dev">More information about the Python-Dev mailing list</a> </body></html>

CINXE.COM

[Python-Dev] Reiterability