In this week’s bioinformatics Bite will pick up where last week’s post left off: the results page after you hit the BLAST button.
The results page has several sections that we will go through individually:
- Search details: What you entered into your query
- Graphic Summary: A visual representation comparing your query to the results
- Descriptions: Details about what the results are and how well they match the Query
- Alignments: The actual alignment of your query with each result (Subject)
(Click to enlarge)
The search details are essentially a recap of what parameters you entered into your search (name of your search, molecule type, query length, database name, the algorithm/program that you ran) but it also gives your search results a unique ID called an RID. This is a temporary link to get back to these search results just in case your computer crashes or you close the window, etc. You can get back to them by going to the BLAST main page and clicking on the gray “Recent Results” tab. It also creates a temporary query ID. If you prefer a video tutorial, also notice that there’s a link to a YouTube video about how to read the BLAST result page. Note that this section is also where you can edit and save your search.
The Graphic Summary visualizes how the search results align with your query sequence. The colored boxes at the top are a key for the alignment scores for each search results. Higher numbers (red) are better. The thick red bar under the color key represents the Query. The thinner lines here are how the search results align with the query. The first (best) matches align the while length of the query, while the last few are missing some base pairs at the end.
If you click on one of the red results bars, the screen will jump to the actual alignment for that result and your query, but first let’s look at the Descriptions.
The Descriptions section contains a table with several informative columns
- Description: how the result is annotated in the database
- Max Score: alignment score for the best matched segment
- Total Score: alignment score for the whole result
- Query Cover: how much of your query is included in the alignment
- E value: “expect value”, probability of a false positive
- Identity: %nucleotides that are identical between query and result
- Accession: the unique identifier for the result sequence (with link)
The max and total scores (2&3) refer to similarity scores, which are a measure of how well the query and subject match. Usually the score is calculated by adding points for all bases that match and subtracting points for mismatches and gaps. I will cover Similarity scores in more depth in another post about algorithm parameters. As you can probably tell, if the subject and query match reasonably well, then the longer the sequence, the higher the score. This fact means that you can’t compare similarity scores between different BLAST queries. It a way of ranking results within one search.
Query cover is the % of the query that is matched by the subject. In this case, all but the last 4 are 100% because they cover the whole query.
The E value, aka the “expect value”, is the number of matches you’d expect to get by random chance in a given database/query combo, or a false positive. Because BLAST is meant to get at evolutionary relationships among sequences, another way of explaining an E value is the likelihood that you got this search result even though the subject and the query are not evolutionarily related given your BLAST query.
The identity is the % nucleotides that are exact matches between the subject and the query.
Finally, the Accession number is the unique identifier for the search result, with a link. This link maps to the entire sequence, not just the part that matches. For example, it would pull up the entire contig for a genomic hit or a whole transcript for an mRNA hit.
If you click on the description for a hit, it will move the page down to the alignment.
The alignment lines up the query (top) and the subject (bottom) base by base along the whole length of the query. Vertical lines indicate exact matches. Horizontal lines indicates gaps in the sequence. The numbers on the sides of the alignment refer to the base position of each sequence.
One of the most useful features here is the related information section on the right. This will have links to other NCBI databases like Gene where you can find more information about a given search result.
Hope this was useful. We’re not done with BLAST yet though. Upcoming posts will discuss filtering your search results, adjusting algorithm parameters, saving BLAST searches, and creating custom search databases.
-Tobin Magle, Biomedical Sciences Research support specialist.