Bioinformatics bites: Constructing a BLAST query

This week’s Bioinformatics Bite will go though the basics of how to construct a query for NCBI‘s BLAST services.

A previous post discusses how to use text searching to find information about genes. BLAST is another way to search the NCBI databases.

BLAST stands for Basic Local Alignment Search Tool. This tool takes a nucleotide or protein sequence and searches a database of your choice for sequences that have homology, or a shared ancestry, with the sequence that you entered. We’re not going to go into the nuts and bolts of how the algorithm works. Instead we will focus on what the user needs to know to use this tool.

You need 5 pieces of information before you begin your BLAST search.

  1. Query – What are you putting into BLAST?
  2. Subject (aka search result) – What do you want to retrieve from the NCBI databases?
  3. Algorithm – depends on the combination of Query and Subject from above
  4. Database – Can you limit what you search based on what you’re looking for?

Let’s look at these aspects one at a time.

The Query is the most obvious component of the BLAST search. It is a nucleotide or protein sequence?

The Subject is also pretty straightforward: are you looking for a protein sequence or a nucleotide sequence?

Once you know what type of Query and Subject you want, finding the Algorithm is also easy. There are many varieties of BLAST algorithms. The specific algorithm you use will depend on whether your Query (the sequence you are putting in) and Subjects (the search results you want to get) are protein or nucleotide. Here is a list of the common algorithms, and their queries and subjects.

  • Nucleotide – search for nucleotides using nucleotides
  • Protein – search for proteins using proteins
  • blastx – search for protein using a translated nucleotide
  • tblastn – search for translated nucleotides using protein
  • tblastx– search for translated nucleotides using translated nucleotides

This scheme is a bit of an oversimplification, because there are multiple algorithms for nucleotide (megablast, discontiguous megablast, blastn) and protein (blastp, PSI-PLAST, PHI-BLAST, DELTA-BLAST). For the purposes of this post, let’s consider Nucleodide BLAST to be megablast and Protein BLAST to be blastp. Future posts will focus on these other algorithms.

Basically there are 6 steps to running a BLAST search: (see figure) Query Window

  1. Pick what type of BLAST based on Query/Subject combination
  2. Enter your Query
  3. Name your search (to help you find it later)
  4. Choose your database (search set)
  5. Choose a specific BLAST algorithm
  6. BLAST!

Future posts will cover how to interpret results and set Algorithm Parameters for the different types of BLAST.

Hope you found this to be helpful!

C. Tobin Magle, PhD, Biomedical Sciences Research Support Specialist