Applied Python 4

[sc:fullwidth_signup]

Classes, modules and web applications

Once you have written some useful code, it’s nice to be able to share it with other people. Sometimes you can share your code just by emailing a Python script, or putting it on a website. But sometimes it’s not that simple:

  • the other person might not have Python
  • the other person might not have the right version of Python
  • the other person might not have the right modules installed
  • the script might rely on some third-part software (e.g. BLAST)
  • the script might rely on software that is hard to set up (e.g. Interproscan, database servers)
  • the script might rely on some data files (e.g. NCBI taxonomy)
  • the script might have sensitive information that you don’t want to share
  • the script might require a very powerful computer, which the other person doesn’t have
  • you might want to keep track of who is using the script
  • you might change the script, leaving the other person with an old version

In this section we will learn how to build a web interface to some of the scripts we have written. This means that we can have the code running on a computer that we control, and everybody else can access it via a web browser. First we need learn a little bit more about two tools that you have been using –
classes and modules.

Classes

We know from previous notes that there are multiple types of “things” in Python – Strings, Numbers, Files, Lists, Dicts, etc. What we really mean by “things” are “classes”. Classes are the tool that we use to define
new types of things to use in our scripts. Our classes can have methods (just like the classes we have already encountered like Strings, Lists, etc) and attributes (which behave like variables). Here is a simple class
that describes a DNA sequence:

# we start off by writing 'class' then the name of the new class
class Dna_sequence:
    # a DNA sequence has a name and a sequence
    # these are attributes of the class
    # they behave like variables
    name = ''
    sequence = ''

    # this is a method of the class
    # self refers to the object that the method is called on
    def printMe(self):
        print("my name is " + self.name + " and my sequence is " + self.sequence)

        
# now let's create some new DNA sequences
one_sequence = Dna_sequence()
# the name and sequence have not been set yet, so printMe() is not very interesting
one_sequence.printMe()
# note that we do not have to write printMe(self) - the 'self' is added automatically
one_sequence.name = 'abc123'
one_sequence.sequence = 'actgatcgat'
one_sequence.printMe()

# we can create as many Dna_sequence objects as we like
# just like we can create as many numbers, strings, lists etc as we like
# each time we say Dna_sequence() we get a new object
another_sequence = Dna_sequence()
another_sequence.name = 'xyz456'
another_sequence.sequence = 'tgtacggactg'
another_sequence.printMe()

# this does not affect the behaviour of the first object we created
# they are totally separate.
one_sequence.printMe()

# How does Python know which name to print?
# Because of the magic 'self' argument, each object knows what its own name is
my name is abc123 and my sequence is actgatcgat
my name is xyz456 and my sequence is tgtacggactg
my name is abc123 and my sequence is actgatcgat

Classes are a useful way of structuring code for many problems, but they are a big topic and we will not talk about them any further here.

Modules

We have made extensive use of modules in this course. We have been using a mixture of built-in modules that come with Python (re, urllib) and
other ones that we have to install (Beautiful Soup).

A very nice feature of Python is that you can use an existing Python script as a module without having to change anything! Here’s an example: suppose
we want to write a new Python script that needs to be able to translate DNA sequences into protein. We already have some code that can do this (from
the introductory course). Rather than copying and pasting the existing code into our new script, we will just use it as a module. Here is the code
from the course – it includes a big dict definition that holds the translation table, and a bunch of functions that use it:

gencode = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

# a function to translate a single codon
def translate_codon(codon):
    return gencode.get(codon.upper(), 'x')

# a function to split a sequence into codons
def split_into_codons(dna, frame):
    codons = []
    for i in range(frame - 1, len(dna)-2, 3):
        codon = dna[i:i+3]
        codons.append(codon)
    return codons

# a function to translate a dna sequence in a single frame
def translate_dna_single(dna, frame=1):
    codons = split_into_codons(dna, frame)
    amino_acids = ''
    for codon in codons:
        amino_acids = amino_acids + translate_codon(codon)
    return amino_acids

# a function to translate a dna sequence in 3 forward frames
def translate_dna(dna):
    all_translations = []
    for frame in range(1,4):
        all_translations.append(translate_dna_single(dna, frame))
    return all_translations

Notice that this code does not actually call any of the functions – it just defines them. We will save this code in a file called translator.py. Now
we can write a new script that imports our code and uses it to translate some DNA:

# the name of the module is just the file name without the .py on the end
import translator

# to use one of the functions, we write the module name, then a dot, then the function name
translation = translator.translate('actgatcgtagctagctagc')

This is a good way of structuring our code – we can separate out the definitions of the functions from the scripts which use them. We can now write
several different scripts that use the translation functions, without having to copy and paste the functions into each one.

CherryPy

To make our web application, we will use a Python module called CherryPy. This module has the very nice feature that it includes a web server, so we don’t have to install a separate one.

First, a bit of terminology:

A web server is a piece of software that waits for a web browser to ask for a page, then sends an HTML page in response. The pages that are returned by a web server can be simple HTML files on disk, or they can be generated by
another program. Most web servers that you may have heard of (Apache, lighttpd) can do both. CherryPy has a web server built-in, and the pages that it returns are produced by functions that we will write. When the pages produced
by a web server are designed to take some input (e.g. from a form) and process it in some way, we will call it a web application. This is in contrast to a web server that simply returns static pages, whic is just a website.
Not everybody will agree with these definitions but at least they will let us be clear what we’re talking about for the purposes of this project.

A web application is a class

CherryPy lets you create a web application by writing a class (it lets you do things many other ways too, but this is what we will
concentrate on). The rules for creating a web application class are very simple:

  • we can call the class anything we want
  • each method of the class will be a web page
  • the result of loading a page in a browser will be whatever is returned by the method
  • if we want a web page to be viewable, we need to set exposed to True
  • we start the web server by running the quickstart() method of cherrypy and giving an instance of our class as the argument.
  • there is a special method called index() which is used to return the root page

Here is a simple web application using cherrypy, that just has a root page and nothing else.

# first we import the cherrypy module, just like we did for re, Beautiful Soup, etc
import cherrypy

# it doesn't matter what the name of the class is - we can call it anything
class MyWebApp:
    
    # here is the index method; the result of the method will be the page that is returned to the web browser
    def index(self):
        return "Hello world"
    
    # this is how we tell Python that this particular page should be accessible
    index.exposed = True

# this line starts the webserver. It takes one argument, which is an instance of the class that we want to
# be the front page
cherrypy.quickstart(MyWebApp())

If we run this script, we will notice some interesting behaviour – the script does not exit! Instead it waits for a web browser to request a page. To view the page, we have to go to a slightly odd url:

localhost:8080

localhost just means the computer that the browser is running on and 8080 is the port. There are boring historical reasons for the port number which we will not go into here.

If we fire up a web browser, and type in the url above, we should see a page with the text “Hello World”.

To add a new page to our application, we just have to define a new method in our class. Here is a modified version of our webapp with two new pages.

import cherrypy

class MyWebApp:
    
    def index(self):
        return "Hello world"
    index.exposed = True

    # to add a new page, we just add a new method to our class
    # the address of the page will be the same as the method - simple!
    # remember to add the new page to the list of pages that we want to 
    # be accessible
    def one_page(self):
        return "page number one"
    one_page.exposed = True

    def another_page(self):
        return "page number two"
    another_page.exposed = True
    


cherrypy.quickstart(MyWebApp())

Remember that there is a simple one-to-one correspondence between methods and pages, so we now have three different URLs for our application:

localhost:8080
localhost:8080/one_page
localhost:8080/another_page

Dynamic pages

The web pages that we have made so far are not very application-like; they just return the same page each time. CherryPy has a very nice mechanism
for making dynamic pages; we just add parameters to the method (including defaults) which are set by including the same parameters in the URL. Here’s
a method for a page that has a firstname paramter which is used in the response:

import cherrypy

class MyWebApp:
    
    def index(self, firstname='Martin'):
        return "Hello " + firstname
    index.exposed = True    


cherrypy.quickstart(MyWebApp())

The index method now has two paramters; the second, firstname, has a default value of ‘Martin’. To pass parameters to the page, we simply have to
include them in the URL. If we run this script and then visit the URL

localhost:8080

we will see the text

Hello Martin

But if we add a firstname parameter to the url like this:

localhost:8080/?firstname=john

then the page will use the parameter to create the response:

Hello john

Note that we do not have to include any quotes in the url parameter, even though we do need quotes when we write a string in a Python script.

We have changed the page from static (i.e. returning the same content every time) to dynamic (i.e. returning different content). Does this remind you of anything? Hint: constructing a URL to fetch a FASTA sequence from NCBI eutils, or constructing a url to fetch a taxonomic group web page from the NCBI taxonomy broswer.

Forms

The method we are using to pass parameters to the page – editing the URL – is not very convenient. What we would normally like to do is have
the user fill in a form which adds in the parameters to the URL automatically. To do this we need to know a little bit of HTML. Here is a
snippet of HTML which will make the browser display a simple form, with just one field.

<form action="greet" method="get">
    Type your name:
    <input type="text" name="firstname"/>
    <input type="submit" value="Say hello!">
</form>

If you’re not familiar with HTML, here’s what you need to know:

  • the action attribute of the form element tells the browser which URL to request when the submit button is pressed (relative to the current location)
  • the input type=”text” tells the browser to display a text box for the user to type into
  • the name attribute of the input tells the browser what the name of the form parameter is
  • the input type=”submit” tells the browser to display a button that will submit the form when pressed

When the user presses the button, the web browser will try to load the page from the action attribute (greet) with the URL parameter
from the name attribute (“firstname”) with the value of whatever the user typed in the text box. I.e. if I load this snippet in a web browser,
type ‘martin’ into the box, and press the button, then my web browser will request the URL

localhost:8080/greet?firstname=martin

Here’s a web application that combines these last two ideas. The application will have two pages. The index page has a form for the user to type
their name. When the user presses the form submit button, it loads the second page, which uses the information from the form to display a greeting.

import cherrypy

class MyWebApp:
    
    # here is the index method that returns the form
    # we will use a triple-quoted string for convenience
    # because it lets us split the string over multiple lines
    def index(self):
        return """
            <form action="greet" method="get">
                Type your name:
                <input type="text" name="firstname"/>
                <input type="submit" value="Say hello!">
            </form>

        """
    index.exposed = True    

    # here is the greet method
    def greet(self, firstname = 'user'):
        return "Hello " + firstname
    greet.exposed = True

cherrypy.quickstart(MyWebApp())

In general, this is a good pattern to use for a web application – one page collects some information, and passes it on to a second page
which uses it to generate a response.

The URL parameter firstname is just like a variable – the name doesn’t have any special meaning, it’s just a label. As long as the name attribute
of the text field and the parameter of the method are the same, we can use anything for the name.

Here is a slightly more useful web application – a DNA translator. We will use the module from before.

import cherrypy

# import our dna translation code
import translator

class MyWebApp:
    
    # the index method just displays a form
    def index(self):
        return """
            <form action="do_translate" method="get">
                Type your DNA sequence:
                <input type="text" name="dna_sequence"/>
                <input type="submit" value="Translate!">
            </form>

        """
    index.exposed = True    

    # here is the translate method
    def do_translate(self, dna_sequence):
        # use the imported function to carry out the translation
        # repr() gives us a string representation of a list
        return repr(translator.translate_dna(dna_sequence))
    do_translate.exposed = True

cherrypy.quickstart(MyWebApp())

Notice that we have changed the names of many things:

  • the form action / name of the method (do_translate)
  • the label for the text box (Type your DNA sequence)
  • the text on the submit button (Translate!)
  • the name of the url parameter (dna_sequence)
  • the return value of the method

but the structure of the webapp is exactly the same.


Exercises

1. Download and install the cherrypy module. Test that you can run it by copying and pasting the simple greeter
web application script from above into a new text file. Once you have that working, try running the DNA translator web application example above.
You will need to copy and paste the translation code into a new text file and save it as translator.py so that it can be found by the import statement.

Write a new web application that contains a tool for generating FASTA format sequences. Here is a reminder of the function that generates FASTA format:

def format<em>fasta(name, sequence):
  fasta</em>string = '>' + name + "n" + sequence + "n"
  return fasta_string

Save this function in a separate file and use import to use it in your web application. You will probably find that this function doesn’t quite
work on a web page; the new line will be missing. This is because in HTML a new line is indicated by
. You can fix it by doing something like

result = result.replace(‘n’, ‘
‘)

Your webapp can use the same structure as the examples –
one page with a form, and one page with results. Your form will need to have two text boxes – one for the sequence name, and one for the sequence
dna. Remember to give them different name attributes, so that you do not accidentally use the same url parameter twice!

2. Once you have tested your web application, combine it with the DNA translator one. The easiest way to do this is to turn the index page
into a collection of links to different tools, and have each form on its own page. The new web application will have five pages:

  • index page with links to form pages
  • form page for DNA translator
  • results page for DNA translator
  • form page for FASTA formatter
  • results page for FASTA formatter

The HTML code for a link looks like this:

 <a href="url/of/page"> link text</a>

Modify the DNA translation tool so that the user can specify the frame of translation. You will need to add a new input to the form and a new
parameter to the results method. You will also need to use translate_dna_single to carry out the translation.

Add some more tools to your webapp using code from the introductory course :

  • Restriction enzyme motif checker
  • base frequency counter
  • kmer frequency counter

And from earlier this week:

  • sequence similarity search tool (you will have to tweak the function so that it returns a string rather than printing the results to the screen)
  • last common ancestor tool

For each new tool you will need to do the following:

  • create a new method to display the form
  • create a new method to process the form and display the results
  • add a link to the index page
  • add any necessary functions to a new file and import it

A taxonomy browsing web application

3. Writing web applications in this way allows us to build a website with a lot of information with very little coding.

Write a new webapp that lets you browse the NCBI taxonomy. It will have a single page (i.e. the class will have a single method) that
shows information for one taxid at a time. The page will have a single url parameter which will be the taxid you want to view. I.e. the url for
a page will look like this (assuming we use the index page):

localhost:8080/?taxid=12345

The page should
contain:

  • the name of the node you are viewing
  • a link to the parent of the node
  • a list of links to the immediate children of the node
  • a link to the NCBI page for the node
  • a link to a Google search for the name of the node
  • the total number of nodes of each rank (phylum, order, etc) under this node (difficulty level: high)

Start with a simplified version! Make a page that just displays the name, then add the other stuff in later.

To construct a link to a page, we can simply put the parameters inside the url (like we initially did above for our greeter example, before
we switched to using forms). For example, if I want to create a link to a Google search for Python, I can figure out the url:

http://www.google.com/search?q=python

and put it in a link:

<a href="http://www.google.com/search?q=python">click here to search for python</a>

If I want to do the same thing inside a python script with a variable, I can do this:

search_term = 'python'
link_string = '<a href="http://www.google.com/search?q=' + search_string + '">click here to search for ' + search_string + '</a>'

but it is difficult to read because of all the quotes. It is more readable if we do it this way:

search_term = 'python'
link_string = """
   <a href="http://www.google.com/search?q={query}">click here to search for {query}</a>
""".format(query=search_term)

remember that using triple quotes allows us to split a string accross mulitple lines, which makes things easier to read when we are dealing with
HTML.

The links to parents/children will be a bit more complicated than this example, because we want the link text to be the name of a node,
and the url has to contain the taxid.

You will have to copy & paste the taxonomy parsing code from yesterday into a new python file, save it, and import it. You will notice that it
takes longer to test your script now, because every time you run it you will have to wait for the dicts to be created.

Powered by WordPress. Designed by Woo Themes