Python itertools tee, clones and caching

Question

There's nothing magical about tee. It's just clever ;-) At any point, tee clones the iterator passed to it. That means the cloned iterator(s) will yield the values produced by the passed-in iterator from this point on. But it's impossible for them to reproduce values that were produced before tee was invoked.

Let's show it with something much simpler than your example:

>>> it = iter(range(5))
>>> next(it)
0

0 is gone now - forever. tee() can't get it back:

>>> a, b = tee(it)
>>> next(a)
1

So a pushed it to produce its next value. It's that value that gets cached, so that other clones can reproduce it too:

>>> next(b)
1

To get that result, it wasn't touched - 1 was retrieved from the internal cache. And now that all of it, a and b have produced 1, 1 is gone forever too.

I don't know whether that answers your question - answering "Does tee() know what I want here?" seems to require telepathy ;-) That is, I don't know what you mean by "with proper caching". It would be most helpful if you gave an exact example of the input/output behavior you're hoping for.

Short of that, the Python docs give Python code that's equivalent to tee(), and perhaps studying that would answer your question:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                newval = next(it)       # fetch a new value and
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

You can see from that, for example, that nothing about an iterator's internal state is cached - all that's cached is the values produced by the passed-in iterator, starting from the time tee() is called. Each clone has its own deque (FIFO list) of the passed-in iterator's values produced so far, and that's all the clones know about the passed-in iterator. So it may be too simple for whatever you're really hoping for.

Python itertools tee, clones and caching

Question: