Scientific Python antipatterns advent calendar day eleven

For today, another example involving dictionaries (they really are useful!). As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Writing our own dictionary logic

Let’s try to build a dictionary that stores the number of times we see each letter in a string. One way is to use count

string = 'banana'

letter_counts = {
    'a' : string.count('a'),
    'b' : string.count('b')
    # etc.
}

letter_counts
{'a': 3, 'b': 1}

This is going to be tedious to write though. We can do a bit better with a loop:

letter_counts = {}
for letter in ['a', 'b']: # etc.
    letter_counts[letter] = string.count(letter)
letter_counts
{'a': 3, 'b': 1}

This is more elegant. But repeatedly calling count takes a lot of time when the string gets long. Eventually we will settle on the idea of keeping a running total - looking at each letter in the string and incrementing the counts.

The logic is easy for a single letter:

a_count = 0
for letter in string:
    if letter == 'a':
        a_count = a_count + 1
a_count
3

but when we try to do the same thing with a dictionary, the obvious code doesn’t work:

letter_counts = {}
for letter in string:
    letter_counts[letter] = letter_counts[letter] + 1
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 3
      1 letter_counts = {}
      2 for letter in string:
----> 3     letter_counts[letter] = letter_counts[letter] + 1

KeyError: 'b'

As the error message makes clear, the problem comes when we try to look up the current count the first time we see each letter, and the key doesn’t exist.

We can try to get round this by creating the keys explicitly and setting them to zero:

letter_counts = {
    'a' : 0,
    'b' : 0
    # etc
}

but for large numbers of keys this is annoying, and uses more memory that we need (imagine if we wanted to count pairs or triplets of letters - there are around unique 600 pairs and over 17,000 unique triplets, most of which will not be in the string).

So we come up with a better approach: before adding one, we check to see if the letter is already in the dictionary, and create it if not:

letter_counts = {}

for letter in string:
    if letter in letter_counts:
        letter_counts[letter] = letter_counts[letter] + 1
    else:
        letter_counts[letter] = 1
        
letter_counts
{'b': 1, 'a': 3, 'n': 2}

This works and has the nice property that it only includes the letters that actually occur in the string. But the logic in the loop is not necessary; we can just use the get method with a default. We can use an extra variable:

letter_counts = {}

for letter in string:
    current_letter_count = letter_counts.get(letter, 0)
    letter_counts[letter] = current_letter_count  + 1
        
letter_counts
{'b': 1, 'a': 3, 'n': 2}

or do it more concisely:

letter_counts = {}

for letter in string:
    letter_counts[letter] =  letter_counts.get(letter, 0)  + 1
        
letter_counts
{'b': 1, 'a': 3, 'n': 2}

We can apply similar logic to any situation where we need to update a dictionary.

Now that the dictionary is created, let’s try to use it by looking up some letter counts. When the letter we want to count is in the string, this is easy:

letter_counts['a']
3

but when it’s not, we run into an error:

letter_counts['c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[16], line 1
----> 1 letter_counts['c']

KeyError: 'c'

due to the fact that the dictionary only contains letters that ara actually in the string.

Beginners often write some logic to deal with this, checking to see if the key is in the dictionary and assigning a default if not:

my_letter = 'c'
if my_letter in letter_counts:
    my_count = letter_counts[my_letter]
else:
    my_count = 0

my_count
0

but it’s better to use get here again:

my_count = letter_counts.get(my_letter, 0)

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list