What the heck is an xrange?
Pop quiz: how would you implement Python's xrange
(known in Python 3.x as range
) in Python and without making a list or any other sequence type?
If you answered "with a generator," you'd be wrong! And it's likely because you've only ever used an xrange
in code like:
for i in xrange(10):
# do something 10 times
In this case (where the xrange
is only used as an "argument" to a for
loop), a generator would probably suffice. But, in fact, xrange
is implemented as an object, not a function or generator. Why? Let's find out.
The Sequence Protocol
The range
builtin returns a list, which (as the de facto sequence type) implements all the sequence protocol methods. xrange
was intended to replace range
(and did in Python 3.x), so it, too, must support all that a sequence supports.
So, what does a sequence support? Sequences have a finite, ordered bunch of elements, which means they have a length (they support len
), they support indexed access (with the []
operator), and most importantly, they can be iterated -- in forward order with a for
loop, and in reverse order with the help of the reversed
builtin. These behaviors are all implemented by way of "dunder" methods, and typically accessed using language syntax (as with for ... in
) or builtins (as with len
).
Sequences also support two instance methods: index
, which returns the 0-based position in the sequence of the first occurrence of the argument, and count
, which returns the number of occurrences of the argument.
Perhaps surprisingly, we can actually implement all of this functionality with constant-time and constant-space operations, using only the bounds of the xrange
.
Building an xrange
As a quick refresher, both range
and xrange
take between 1 and 3 arguments. A single argument defines the length of the sequence (with an implicit starting value of 0). Two arguments define the starting point and ending point (actually, the second argument is the next integer after the last one that will be a part of the sequence). A third argument defines a step value, the number between each of the elements of the sequence. All arguments must be integers.
With that in hand, we can begin building our implementation of xrange
:
class xrange(Sequence):
"""Pure-Python implementation of an ``xrange`` (aka ``range``
in Python 3) object. See `the CPython documentation
<http://docs.python.org/py3k/library/functions.html#range>`_
for details.
"""
def __init__(self, *args):
if len(args) == 1:
start, stop, step = 0, args[0], 1
elif len(args) == 2:
start, stop, step = args[0], args[1], 1
elif len(args) == 3:
start, stop, step = args
else:
raise TypeError('xrange() requires 1-3 int arguments')
try:
start, stop, step = int(start), int(stop), int(step)
except ValueError:
raise TypeError('an integer is required')
if step == 0:
raise ValueError('xrange() arg 3 must not be zero')
elif step < 0:
stop = min(stop, start)
else:
stop = max(stop, start)
self._start = start
self._stop = stop
self._step = step
self._len = (stop - start) // step + bool((stop - start) % step)
Update: thanks to sltkr for a correction to the _len
calculation on Hacker News.
Most of this is error checking and adjustment of values for impossible sequences (like a sequence starting at 0 and going to 10 stepping by -1; in these cases, we collapse the xrange
to a zero-length sequence with the same start value).
Importantly, since the entire sequence is "known" at initialization time, and is immutable, I precompute the length of the sequence, which is useful in the implementation of several of the methods, as we will see.
With our computed _start
, _stop
, _step
, and _len
attributes in hand, we can implement equality, representation, and length trivially:
def __repr__(self):
if self._start == 0 and self._step == 1:
return 'xrange(%d)' % self._stop
elif self._step == 1:
return 'xrange(%d, %d)' % (self._start, self._stop)
return 'xrange(%d, %d, %d)' % (self._start, self._stop, self._step)
def __eq__(self, other):
return isinstance(other, xrange) and \
self._start == other._start and \
self._stop == other._stop and \
self._step == other._step
def __len__(self):
return self._len
Nothing really surprising here.
Next we'll implement index
, as it serves as a basis for several other methods.
To find the index of an element in an immaterial sequence, we have to do a little math. First, we compute the difference between the element we're looking for, and the start value for the sequence. Next, we compute the quotient and remainder using the divmod
builtin. If the remainder is 0, then the element may be a member of the sequence (it falls on one of the values that might be produced by adding some number of _step
s to the _start
). If the quotient is between 0 (inclusive) and the sequence's length (exclusive), then this is definitely a member, and we can return its index; if not, we raise ValueError
to indicate it was not found:
def index(self, value):
"""Return the 0-based position of integer `value` in
the sequence this xrange represents."""
diff = value - self._start
quotient, remainder = divmod(diff, self._step)
if remainder == 0 and 0 <= quotient < self._len:
return abs(quotient)
raise ValueError('%r is not in range' % value)
Note that we're safe from ZeroDivisionError
in the divmod
call, since we've previously ensured that _step
is not zero.
With this in hand, we can trivially implement __contains__
(the dunder method that is called by x in foo
syntax for checking membership):
def __contains__(self, value):
"""Return ``True`` if the integer `value` occurs in
the sequence this xrange represents."""
try:
self.index(value)
return True
except:
return False
Implementing count
is trivial, as well, since an element can only occur zero or one times in an xrange
:
def count(self, value):
"""Return the number of ocurrences of integer `value`
in the sequence this xrange represents."""
return int(value in self)
Element access with the []
operator (implemented with the __getitem__
dunder method) is relatively easy, except for two complications: we have to support negative indexes, for sequence access from the end; and we have to support slice access.
The former is easy, as we can just add the length of the sequence to the index we're attempting to access, and proceed with bounds-checking as usual. Since the behavior of slices of sequences is somewhat subtle, we implement it in a separate method for clarity:
def __getitem__(self, index):
"""Return the element at position ``index`` in the sequence
this xrange represents, or raise :class:`IndexError` if the
position is out of range."""
if isinstance(index, slice):
return self.__getitem_slice(index)
if index < 0:
# negative indexes access from the end
index = self._len + index
if index < 0 or index >= self._len:
raise IndexError('xrange object index out of range')
return self._start + index * self._step
def __getitem_slice(self, slce):
"""Return an xrange which represents the requested slce
of the sequence represented by this xrange.
"""
start, stop, step = slce.start, slce.stop, slce.step
if step == 0:
raise ValueError('slice step cannot be 0')
start = start or self._start
stop = stop or self._stop
if start < 0:
start = max(0, start + self._len)
if stop < 0:
stop = max(start, stop + self._len)
if step is None or step > 0:
return xrange(start, stop, step or 1)
else:
rv = reversed(self)
rv._step = step
return rv
Finally, we come to iteration, which is (I suspect) what most people use xrange
objects for. Remember that xrange
objects act like any other sequence -- you can have multiple, independent iterators iterating the same underlying sequence, so we have to implement a separate object for the iterator. The implementation of the iterator is about as simple as you might expect: it tracks the last value it returned, and adds the _step
value to that value, then returns it for each element it iterates:
class xrangeiterator(Iterator):
"""An iterator for an :class:`xrange`.
"""
def __init__(self, xrangeobj):
self._xrange = xrangeobj
# Intialize the "last outputted value" to the value
# just before the first value; this simplifies next()
self._last = self._xrange._start - self._xrange._step
self._count = 0
def __iter__(self):
"""An iterator is already an iterator, so return ``self``.
"""
return self
def next(self):
"""Return the next element in the sequence represented
by the xrange we are iterating, or raise StopIteration
if we have passed the end of the sequence."""
self._last += self._xrange._step
self._count += 1
if self._count > self._xrange._len:
raise StopIteration()
return self._last
Put it all together
I've published the code for the xrange
object, along with a comprehensive test suite, to my GitHub.
Closing Thoughts
Python already has xrange
(or, in Python 3, range
), as a built-in, implemented efficiently in C, so re-implementing it in Python is largely a learning exercise, rather than code which you should actually use.
However, this implementation is superior to CPython's implementation in two ways:
-
I have backported support for slicing,
index
, andcount
from Python 3 to Python 2-compatible code; if these features are important and you need to support multiple versions of Python, this may be of use to you. -
CPython 2.x's implementation of
xrange
does not implement__contains__
, so the expression1000 in xrange(1001)
actually has to scan all 1000 elements in thexrange
before it can determine if 1000 is a member of the sequence. This was corrected in CPython 3, but very long lists in CPython 2.x will have__contains__
performance characteristics more similar tolist
s than to 2.7xrange
s. This implementation uses a constant-time algorithm to compute membership, so it may actually be faster than the CPython 2.x implementation for very large sequences.