Domanda

Assumed: When using the python itertools.tee(), all duplicate iterators refer to the original iterator, and the original is cached to improve performance.

My main concern in the following inquiry is regarding MY IDEA of intended/proper caching behavior.
Edit: my idea of proper caching was based on flawed functional assumptions. Will ultimately need a little wrapper around tee (which will probably have consequences regarding caching).


Question:

Lets say I create 3 iterator clones using tee: a, b, c = itertools.tee(myiter,3). Also assume that at this point, I remove all references to the original, myiter (meaning, there is no great way for my code to refer back to the original hereafter).

At some later point in code, if I decided that I want another clone of myiter, can I just re-tee() one of my duplicates? (with proper caching back to the originally cached myiter)

In other words, at some later point, I wish I had instead used this: a, b, c, d = itertools.tee(myiter,4). But, since I have discarded all references to the original myiter, the best I can muster would be: copytee = itertools.tee(a, 1) #where 'a' is from a previous tee()

Does tee() know what I want here? (that I REALLY want to create a clone based on the original myiter, NOT the intermediate clone a (which may be partially consumed))

È stato utile?

Soluzione

There's nothing magical about tee. It's just clever ;-) At any point, tee clones the iterator passed to it. That means the cloned iterator(s) will yield the values produced by the passed-in iterator from this point on. But it's impossible for them to reproduce values that were produced before tee was invoked.

Let's show it with something much simpler than your example:

>>> it = iter(range(5))
>>> next(it)
0

0 is gone now - forever. tee() can't get it back:

>>> a, b = tee(it)
>>> next(a)
1

So a pushed it to produce its next value. It's that value that gets cached, so that other clones can reproduce it too:

>>> next(b)
1

To get that result, it wasn't touched - 1 was retrieved from the internal cache. And now that all of it, a and b have produced 1, 1 is gone forever too.

I don't know whether that answers your question - answering "Does tee() know what I want here?" seems to require telepathy ;-) That is, I don't know what you mean by "with proper caching". It would be most helpful if you gave an exact example of the input/output behavior you're hoping for.

Short of that, the Python docs give Python code that's equivalent to tee(), and perhaps studying that would answer your question:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                newval = next(it)       # fetch a new value and
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

You can see from that, for example, that nothing about an iterator's internal state is cached - all that's cached is the values produced by the passed-in iterator, starting from the time tee() is called. Each clone has its own deque (FIFO list) of the passed-in iterator's values produced so far, and that's all the clones know about the passed-in iterator. So it may be too simple for whatever you're really hoping for.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top