This bioinformatics bite is another basic skill needed to answer another patron question: how to effectively search the gene database. Last week, I showed you how to find official gene symbol. We’re going to look for the gene entry for human signal transducer and activator of transcription 1, or STAT1.
Let’s start at the gene database main page. Try entering STAT1 into the search box. This query returns over 800 gene results, but we’re looking for one specific gene. Not super efficient. Let’s take a second to explore the results page though.
The left hand column contains filters. For example, if you’re only interested in protein coding genes, you can apply that filter and get rid of noncoding genes. You can see what filters are currently applied by looking at which are checked in the left column.
The only filter that is applied right now is “Current Only” under status. The restricts the results to only gene predictions that are, well, current. NCBI is always updating Gene entries based on new experimental evidence, so some entries become obsolete, but you can still see the old entries by clicking ” See also 2 discontinued or replaced items” above the results box.
Next, let’s look at the results column in more detail:
The first name is the official gene name and the Gene ID. The Gene ID is the unique identifier for this gene in the NCBI gene database. If I wanted to pull up STAT1, and i knew its gene ID, I could type it into the search box and it would come up right away.
The second column is the Description, which gives the spelled out name of the gene, sometimes an indication of its function, and the organism that it is found in.
The third column is the Location, or the Chromosome and bp where the gene is found.
The fourth column contains Aliases, or other names by which this gene is known. This field is helpful if a gene name has changed and you know the old name but not the new name.
Finally, the fifth column contains the MIM identifier, which is the unique identifier for this gene in the Online Mendelian Inheritance in Man (OMIM) database, which is a catalog of human genes and genetic disorders. Looking the gene up in OMIM gives you information about what phenotypes can occur when there are alterations in this gene, as well as a host of information about how the gene was cloned, its structure, and it’s function. This is a good place to start when studying a new gene.
The right hand column contains yet more filters. You can change what appears there by clicking “Manage Filters”.
Below filters, you can see your results divided up by taxon. Right now, it’s displaying Top Organisms, which works because we are looking for a human gene. If you need a gene from a more obscure organism, click “Tree” at the top or “More” at the bottom.
Do you really think there are over 300 STAT1 genes in humans? Probably not. Let’s click that number to see what’s going on.
Scrolling down the list, you can see that they are all indeed human genes, but only the first one appears to be called STAT1. Why are all these other genes coming up? There’s a clue in the search details box:
See where is says “stat1[All Fields]”? This means that if STAT1 comes up ANYWHERE in the gene record, it will be retrieved by this search. How do you prevent this? Instead of letting it search all fields, specify that it should search gene name only:
When you search this query, the number of entries goes down dramatically, and you can see by looking at results by taxon, only one human gene appears. You can get to it by clicking on the number.
Now that you understand a little better how the gene database works, how would you get to the human STAT1 gene in 1 query? Combine the gene query with the organism query:
STAT1[Gene Name] AND human[Organism]
Type that in the search box, and it’ll take you right to the gene page.
Next time we’ll dig into the content on the gene page.