Tuples
Today, David Beazley made some tweets:
At the bookstore. I'll admit I judge Python books by their tuple description. "Read only list?" Back on the shelf. Nope.
— David Beazley (@dabeaz) September 21, 2016
Usually there's a sentence nearby "sometimes you need a read only list." No. No, I haven't. Not in 20 years. Not even once. Sorry.
— David Beazley (@dabeaz) September 21, 2016
My main objection is that "read only list" is a lazy description lacking thought. A red flag for every other topic that might be covered.
— David Beazley (@dabeaz) September 21, 2016
There are quite a few good responses to these tweets, both from David and from others (and from yours truly). I recommend reading the the thread (click on the first tweet above).
Now to start off, I want to say that I respect the hell out of David Beazley. The guy literally wrote the book on Python, and he knows way more about Python than I ever will. He's also one of the most entertaining Python people you can follow on Twitter. But hey, that doesn't mean I can't disagree sometimes.
List vs. Tuple. Fight!
As you probably know, there are two "array" datatypes in Python, list
and
tuple
.1 The primary difference between the two is that lists are mutable,
that is you can change their entries and length after they are created, with
methods like .append
or +=
. Tuples, on the other hand, are immutable.
Once you create one, you cannot change it. This makes the implementation
simpler (and hence faster, although don't let anyone tell you you should use a
tuple just because it's faster). This, as
Ned Batchelder
points out, is the only technical difference between the two.
The the idea that particularly bugs me here is that tuples are primarily useful as "record" datatypes.
@AllenDowney Better than "read-only-list." ;-). Mainly looking for the tuple-as-record description. That's often what's missing.
— David Beazley (@dabeaz) September 21, 2016
Tuples are awesome for records. This is both by design—since they have a
fixed shape, the positions in a tuple can be "fixed" values, and by convention—if a
Python programmer sees parentheses instead of square brackets, he is more
likely to see the object as "record-like". The namedtuple
object in the standard library
takes the record idea further by letting you actually name the fields:
>>> from collections import namedtuple >>> person = namedtuple('Person', 'name, age') >>> person('Aaron', 26) Person(name='Aaron', age=26)
But is that really the only place you'd want to use a tuple over a list?
Consider five other places you might encounter a tuple in Python, courtesy of Allen Downey:
@dabeaz (1) tuple assignment (2) multiple return values (3) *args (4) output from zip, enumerate, etc (5) key in dictionary
— Allen Downey (@AllenDowney) September 21, 2016
In code these look like:
-
Multiple assignments:
>>> (a, b) = 1, 2
(yes, the parentheses are optional here, as they are in many places where a tuple can be used, but this is still a tuple, or at least it looks like one ;)
-
Multiple return values:
For example,
os.walk
. This is for the most part a special case of using tuples as records. -
*args
:>>> def f(*args): ... print(type(args), args) ... >>> f(1, 2, 3) <class 'tuple'> (1, 2, 3)
Arbitrary positional function arguments are always stored as a
tuple
. -
Return value from builtins
zip
,enumerate
, etc.:>>> for i in zip(range(3), 'abc'): ... print(i) ... (0, 'a') (1, 'b') (2, 'c') >>> for i in enumerate('abc'): ... print(i) ... (0, 'a') (1, 'b') (2, 'c')
This also applies to the combinatoric generators in itertools (like
product
,combinations
, etc.) -
Dictionary keys:
>>> { ... (0, 0): '.', ... (0, 1): ' ', ... (1, 0): '.', ... (1, 1): ' ', ... } {(0, 1): ' ', (1, 0): '.', (0, 0): '.', (1, 1): ' '}
This last one I find to be very important. You could arguably use a list for the first four of Allen Downey's points2 (or Python could have, if it wanted to). But it is impossible to meaningfully hash a mutable data structure in Python, and hashability is a requirement for dictionary keys.
However, be careful. Not all tuples are hashable. Tuples can contain anything, but only tuples of immutable values are hashable. Consider4
>>> t = (1, 2, [3, 4]) >>> t[2].append(5) >>> t (1, 2, [3, 4, 5])
Such tuples are not hashable, and cannot be used as dictionary keys.
>>> hash(t) Traceback (most recent call last): File "<ipython-input-39-36822ba665ca>", line 1, in <module> hash(t) TypeError: unhashable type: 'list'
Why is list
the Default?
My second gripe here is this notion that your default ordered collection
object in Python should be list
. tuples
are only to be used as "records",
or if you suspect might want to use it as a dictionary key. First off, you
never know when you'll want something to be hashable. Both dictionary keys and
sets
require hashability. Suppose you want to de-duplicate a collection of
sequences. If you represent the sequences with list
, you'll either have to
write a custom loop that checks for duplicates, or manually convert them to
tuple
and throw them in a set
. If you start with tuple
, you don't have
to worry about it (again, assuming the entries of the tuples are all hashable
as well).
Consider another usage of tuples, which I consider to be
important, namely tree structures. Say you wanted a simple representation of a
Python syntax tree. You might represent 1 - 2*(-3 + 4)
as
('-', 1, ('*', 2, ('+', ('-', 3), 4)))
This isn't really a record. The meaning of the entries in the tuples is
determined by the first value of the tuple, not position. In this example, the length
of the tuple also signifies meaning (binary vs. unary -
).
If this looks familiar to you, it's because this is how the language Lisp represents all programs. This is a common pattern. Dask graphs use tuples and dictionaries to represent computations. SymPy expression trees use tuples and Python classes to represent symbolic mathematical expressions.
But why use tuples over lists here? Suppose you had an object like the one
above, but using lists: ['-', 1, ['*', 2, ['+', ['-', 3], 4]]]
. If you
discover you need to use this as a dictionary key, or want to put it in a
set
, you would need to convert this to a hashable object. To do this you
need to write a function that recursively converts each list
to a tuple
.
See how long it takes you to write that function correctly.
Mutability is Bad
More to the point, however, mutability is bad. I counted 12 distinct methods
on list
that mutate it (how many can you remember off the top of your
head?3). Any function that gets access to a list can mutate it,
using any one of these methods. All it takes is for someone to forget that
+=
mutates a list (and that they should copy it first) for code completely
distant from the origin definition to cause issues. The hardest bug I ever
debugged had a three character
fix,
adding [:]
to copy a global list that I was (accidentally) mutating. It took
me a several hour airplane ride and some deep dark magic that I'll leave for
another blog post to discover the source of my problems (the problems I was
having appeared to be quite distant from the actual source).
A Better "Default"
I propose that Python code in general would be vastly improved if people used
tuple
as the default ordered collection, and only switched to list
if
mutation was necessary (it's less necessary than you think; you can always
copy a tuple instead of mutating it). I agree with David Beazley that you
don't "sometimes need a read only list". Rather, you "sometimes need a
writable tuple".
This makes more sense than defaulting to list
, and only switching to tuple
when hashability is needed, or when some weird "rule of thumb" applies that
says that you should use tuple
if you have a "record". Maybe there's a good
reason that *args
and almost all builtin and standard library functions
return tuples instead of lists. It's harder to accidentally break someone
else's code, or have someone else accidentally break your code, when your data
structures are immutable.
Footnotes
-
I want to avoid saying "a tuple is an immutable list", since "list" can be interpreted in two ways, as an English word meaning "ordered collection" (in which case, the statement is true), or as the Python type
list
(in which case, the statement is false—tuple
is not a subclass oflist
). ↩ -
Yes,
>>> [a, b] = 1, 2
works. ↩
- ↩
-
One of the tweets from the conversation:
@asmeurer @AllenDowney As yes:
— David Beazley (@dabeaz) September 21, 2016
t = (1,2, [3, 4])
t[2] += [5,6]
;-)This is similar to this example. But it turns out this one doesn't work:
>>> t = (1,2, [3, 4]) >>> t[2] += [5,6] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment
I have no idea why. It seems to me that it should work.
t[2]
is a list andlist
has__iadd__
defined. It seems that Python gets kind of weird about things on the left-hand side of an assignment. EDIT: Here's why. ↩
Comments