Profiling execution time in Python

Profiling

Profiling is the process of monitoring your code as it runs in order to figure out which bits of it take the most time. This
is a vital step if we want to make a program run faster. In general, it is very hard to look at a program and guess
which parts of it take the most time. There’s no point trying to optimize a program before we know this – if a particular function only takes a tiny amount of the total run time, then we will be wasting our time trying to speed it up.

How to profile your code: first import the cProfile module

import cProfile

then write a function which executes your code, and call cProfile.run() with your function name as a string as its argument.

cProfile.run('myFunction()')

Here is an example which uses the DNA -> protein translation code from the introductory course. This is a good candidate
for profiling, because the code is split up into several functions:

split_into_codons
translate_codon
join_amino_acids
reverse_complement

Any of which we could try to improve. Note that in the run() function we translate the same piece of DNA 10,000 times. This is because the translation function is already very quick – we want the script to run for at least a couple of seconds in order to get accurate profiling results, otherwise the results can be biased by overhead from starting/stopping the script.

Profiling results can vary considerably between computers, operating systems, versions of Python, etc. so it’s important to profile in an environment similar to the one where you will be running your program. This includes the input data – try to make your profiling runs as similar as possible to your real-life datasets.

import cProfile

gencode = {
'ATA':'I','ACA':'T','AAC':'N','AGC':'S','CTA':'L','CCA':'P','CAC':'H','CGA':'R','GTA':'V','GCA':'A',
'GAC':'D','GGA':'G','TCA':'S','TTC':'F','TAC':'Y','TGC':'C','ATC':'I','ACC':'T','AAT':'N','AGT':'S',
'CTC':'L','CCC':'P','CAT':'H','CGC':'R','GTC':'V','GCC':'A','GAT':'D','GGC':'G','TCC':'S','TTT':'F',
'TAT':'Y','TGT':'C','ATT':'I','ACG':'T','AAA':'K','AGA':'R','CTG':'L','CCG':'P','CAA':'Q','CGG':'R',
'GTG':'V','GCG':'A','GAA':'E','GGG':'G','TCG':'S','TTA':'L','TAA':'_','TGA':'_','ATG':'M','ACT':'T',
'AAG':'K','AGG':'R','CTT':'L','CCT':'P','CAG':'Q','CGT':'R','GTT':'V','GCT':'A','GAG':'E','GGT':'G',
'TCT':'S','TTG':'L','TAG':'_','TGG':'W'}

def translate_codon(codon):
    return gencode.get(codon.upper(), 'x')

def join_amino_acids(amino_acids):
    result = ''
    for aa in amino_acids:
        result = result + aa
    return result

def split_into_codons(dna, frame):
    codons = []
    for i in range(abs(frame) - 1, len(dna)-2, 3):
        codon = dna[i:i+3]
        codons.append(codon)
    return codons

def reverse_complement(dna):
    result_list = []
    base_complement = {'a':'t', 't':'a', 'g':'c', 'c':'g'}
    for base in dna:
        result_list.append(base_complement[base])
    result_list.reverse()
    result_string = ''
    for base in result_list:
        result_string = result_string + base
    return result_string

def translate_dna_single(dna, frame=1):
    if frame < 0:
        dna = reverse_complement(dna)
    codons = split_into_codons(dna, frame)
    amino_acids = []
    for codon in codons:
        amino_acids.append(translate_codon(codon))
    protein_string = join_amino_acids(amino_acids)
    return protein_string

def translate_dna(dna):
    all_translations = []
    for frame in range(1,4):
        all_translations.append(translate_dna_single(dna, frame))
    for frame in range(-4,-1):
        all_translations.append(translate_dna_single(dna, frame))
    return all_translations

def run():
    for i in range(0,10000):
        dna = 'atgcgatcgatcgatcgatgctagctacgtagcatcgatc'
        translate_dna(dna)

cProfile.run('run()')
         5380004 function calls in 7.063 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   750000    1.378    0.000    2.257    0.000 <ipython-input-1-b95e6a946fed>:12(translate_codon)
    60000    0.121    0.000    0.121    0.000 <ipython-input-1-b95e6a946fed>:15(join_amino_acids)
    60000    0.761    0.000    1.251    0.000 <ipython-input-1-b95e6a946fed>:21(split_into_codons)
    30000    0.990    0.000    1.667    0.000 <ipython-input-1-b95e6a946fed>:28(reverse_complement)
    60000    1.158    0.000    6.875    0.000 <ipython-input-1-b95e6a946fed>:39(translate_dna_single)
    10000    0.140    0.000    7.051    0.001 <ipython-input-1-b95e6a946fed>:49(translate_dna)
        1    0.012    0.012    7.063    7.063 <ipython-input-1-b95e6a946fed>:57(run)
        1    0.000    0.000    7.063    7.063 <string>:1(<module>)
    60000    0.034    0.000    0.034    0.000 {built-in method abs}
        1    0.000    0.000    7.063    7.063 {built-in method exec}
    60000    0.035    0.000    0.035    0.000 {built-in method len}
  2760000    1.537    0.000    1.537    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   750000    0.451    0.000    0.451    0.000 {method 'get' of 'dict' objects}
    30000    0.018    0.000    0.018    0.000 {method 'reverse' of 'list' objects}
   750000    0.428    0.000    0.428    0.000 {method 'upper' of 'str' objects}

The output from cProfile tells us some stats for each function:

ncalls is the total number of times the function was called. As we expect, translate_codon is the function that is called
the most times.

tottime is the total amount of time spent in the function, excluding subfunctions. This is probably the most important stat.
Functions that take up a large total time are the best ones to try to speed up, as they will have the most effect on the overall program speed.

percall is the time it takes to run the function once, i.e. the total time divided by the number of calls.

cumtime is the cumulative time to run the function, including all subfunctions.

The second field called percall the same as the first, but including subfunctions.

Looking at the results from the above run, we see that translate_codon has the largest total time with 1.4 seconds. However, this function is so simple that it’s not really obvious how we could speed it up! The next function is reverse_complement with just under 1 second. This is quite a complicated function, so we could experiment with different
ways of writing it in order to speed it up.

Interestingly, the program spends a long time in the list.append() method that is built-in to Python. We cannot try to speed this up (it is probably very efficient already, since it is a core part of the Python language) but maybe we could try to rewrite our code to use fewer list append operations.

Let’s look at another example – this time we will use the sequence similarity search program from day one. Again, we will write a run() method that carries out the same search many times, in order to get an accurate reading.

import cProfile
def get_words(sequence, wordsize):
    result = {}
    for pos in range(0,len(sequence) + 1 - wordsize):
        word = sequence[pos:pos+wordsize]
        current_list = result.get(word, [])
        current_list.append(pos)
        result[word] = current_list
    return result

def score_match(subject, query, subject_start, query_start, length):
    score = 0
    for i in range(0,length):
        subject_base = subject[subject_start + i]
        query_base = query[query_start + i]
        if subject_base == query_base:
            score = score + 1
        else:
            score = score - 1
    return score

def pretty_print_match(subject, query, subject_start, query_start, length):
    result = ''
    result += str(subject_start) + (' ' * length) + str(subject_start+length) + 'n'
    result += ' ' + subject[subject_start:subject_start+length] + 'n'
    result += ' ' + query[query_start:query_start+length] + 'n'
    result += str(query_start) + (' ' * length) + str(query_start+length) + 'n'
    result += 'n--------------------n' + 'n'
    return result

def find_matches(subject, query, wordsize, score_threshold):
    subject_words = get_words(subject, wordsize)
    all_matches = []
    for query_pos in range(0,len(query) + 1 - wordsize):
        word = query[query_pos:query_pos+wordsize]
        for subject_pos in subject_words.get(word, []):
            all_matches.append(extend_hit(subject, query, subject_pos, query_pos, wordsize, score_threshold))
    result = ''
    for match in sorted(all_matches, key=lambda x : x['score'], reverse=True):
        result += 'Score : ' + str(match['score']) + 'n'
        result += pretty_print_match(match['subject'], match['query'], match['subject_pos'], match['query_pos'], match['length'])       
    return result

def extend_hit(subject, query, subject_pos, query_pos, length, score_threshold):
    while length < len(subject) and length < len(query):
        # if the current length is even, then extend to the right, else extend to the left
        # this just makes sure that we try left and right alternately
        if length %2 == 0:
            if len(query) <= query_pos + length + 1 or len(subject) <= subject_pos + length + 1:
                return {
                        'subject' : subject, 'query' : query, 'subject_pos' : subject_pos, 'query_pos' : query_pos, 'length' : length,
                        'score' : score_match(subject, query, subject_pos, query_pos, length)
                        }
            if score_match(subject, query, subject_pos, query_pos, length+1) < score_threshold:
                return {
                        'subject' : subject, 'query' : query, 'subject_pos' : subject_pos, 'query_pos' : query_pos, 'length' : length,
                        'score' : score_match(subject, query, subject_pos, query_pos, length)
                        }
            else:
                length = length + 1
        else:
            # if either the subject or query pos is already 0, then we cannot extend to the left
            if query_pos == 0 or subject_pos == 0:
                return {
                        'subject' : subject, 'query' : query, 'subject_pos' : subject_pos, 'query_pos' : query_pos, 'length' : length,
                        'score' : score_match(subject, query, subject_pos, query_pos, length)
                        }
            # to test the new score we need to reduce the start positions by one
            if score_match(subject, query, subject_pos-1, query_pos-1, length+1) < score_threshold:
                return {
                        'subject' : subject, 'query' : query, 'subject_pos' : subject_pos, 'query_pos' : query_pos, 'length' : length,
                        'score' : score_match(subject, query, subject_pos, query_pos, length)
                        }
            else:
            # if the extended match is OK, then make the changes
                length = length + 1
                subject_pos = subject_pos-1
                query_pos = query_pos-1

def run():
    for i in range(0,10000):
        find_matches('actgatcgtagcaatcgatcgatgcatctcgagcttcatacgacgatgctacgtacga', 'ttattcatctccagaa', 4, 2)

cProfile.run('run()')

         3200004 function calls in 5.886 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   440000    1.414    0.000    1.414    0.000 <ipython-input-2-29901b4a61d2>:12(score_match)
    10000    0.892    0.000    1.544    0.000 <ipython-input-2-29901b4a61d2>:2(get_words)
    50000    0.220    0.000    0.220    0.000 <ipython-input-2-29901b4a61d2>:24(pretty_print_match)
    10000    0.390    0.000    5.820    0.001 <ipython-input-2-29901b4a61d2>:33(find_matches)
    50000    0.031    0.000    0.031    0.000 <ipython-input-2-29901b4a61d2>:41(<lambda>)
    50000    1.356    0.000    3.447    0.000 <ipython-input-3-3de2c95a906a>:1(extend_hit)
        1    0.066    0.066    5.886    5.886 <ipython-input-3-3de2c95a906a>:38(run)
        1    0.000    0.000    5.886    5.886 <string>:1(<module>)
        1    0.000    0.000    5.886    5.886 {built-in method exec}
  1300000    0.688    0.000    0.688    0.000 {built-in method len}
    10000    0.071    0.000    0.101    0.000 {built-in method sorted}
   600000    0.353    0.000    0.353    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   680000    0.404    0.000    0.404    0.000 {method 'get' of 'dict' objects}

Here we find that two functions are using up the majority of the time – score_match and extend_hit. Let’s try to
speed up extend_hit. Look at the first few lines of the function:
def extend_hit(subject, query, subject_pos, query_pos, length, score_threshold):
while length < len(subject) and length < len(query): if length %2 == 0: if len(query) <= query_pos + length + 1 or len(subject) <= subject_pos + length + 1: We are calling length(subject) and length(query) a total of four times each time round the while loop. We know that the
subject sequence and query sequence are not going to change inside this function so let’s store the length in variables then
we only have to calculate them once:
def extend_hit(subject, query, subject_pos, query_pos, length, score_threshold):
subject_length = len(subject)
query_length = len(query)
while length < subject_length and length < query_length: if length %2 == 0: if query_length <= query_pos + length + 1 or subject_length <= subject_pos + length + 1: Now we only make a two calls to len() in the whole function. How does this affect the profiling results?

Ordered by: standard name

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    4.551    4.551 <string>:1(<module>)
 440000    1.449    0.000    1.449    0.000 speedtest2.py:12(score_match)
  10000    0.878    0.000    1.520    0.000 speedtest2.py:2(get_words)
  50000    0.214    0.000    0.214    0.000 speedtest2.py:24(pretty_print_match)
  10000    0.366    0.000    4.501    0.000 speedtest2.py:33(find_matches)
  50000    0.030    0.000    0.030    0.000 speedtest2.py:41(<lambda>)
  50000    0.678    0.000    2.182    0.000 speedtest2.py:47(extend_hit)
      1    0.050    0.050    4.551    4.551 speedtest2.py:84(run)
      1    0.000    0.000    4.551    4.551 {built-in method exec}
 120000    0.067    0.000    0.067    0.000 {built-in method len}
  10000    0.071    0.000    0.101    0.000 {built-in method sorted}
 600000    0.341    0.000    0.341    0.000 {method 'append' of 'list' objects}
      1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 680000    0.407    0.000    0.407    0.000 {method 'get' of 'dict' objects}

We have reduced the total length of time spent in extend_hit by almost half, and reduced the number of calls to the built-in function len() from 1.3 million to 120,000.

No comments yet.

Leave a Reply

Powered by WordPress. Designed by Woo Themes