Bioinformatics Bite: Searching the gene database

This bioinformatics bite is another basic skill needed to answer another patron question: how to effectively search the gene database. Last week, I showed you how to find official gene symbol. We’re going to look for the gene entry for human signal transducer and activator of transcription 1, or STAT1.

Let’s start at the gene database main page. Try entering STAT1 into the search box. This query returns over 800 gene results, but we’re looking for one specific gene. Not super efficient.  Let’s take a second to explore the results page though.

stat1gene page

The left hand column contains filters. For example, if you’re only interested in protein coding genes, you can apply that filter and get rid of noncoding genes. You can see what filters are currently applied by looking at which are checked in the left column.

The only filter that is applied right now is “Current Only” under status. The restricts the results to only gene predictions that are, well, current. NCBI is always updating Gene entries based on new experimental evidence, so some entries become obsolete, but you can still see the old entries by clicking ” See also 2 discontinued or replaced items” above the results box.

Next, let’s look at the results column in more detail:

STAT1 human

The first name is the official gene name and the Gene ID. The Gene ID is the unique identifier for this gene in the NCBI gene database. If I wanted to pull up STAT1, and i knew its gene ID, I could type it into the search box and it would come up right away.

The second column is the Description, which gives the spelled out name of the gene, sometimes an indication of its function, and the organism that it is found in.

The third column is the Location, or the Chromosome and bp where the gene is found.

The fourth column contains Aliases, or other names by which this gene is known. This field is helpful if a gene name has changed and you know the old name but not the new name.

Finally, the fifth column contains the MIM identifier, which is the unique identifier for this gene in the Online Mendelian Inheritance in Man (OMIM) database, which is a catalog of human genes and genetic disorders. Looking the gene up in OMIM gives you information about what phenotypes can occur when there are alterations in this gene, as well as a host of information about how the gene was cloned, its structure, and it’s function. This is a good place to start when studying a new gene.

The right hand column contains yet more filters. You can change what appears there by clicking “Manage Filters”.

right side filters

Below filters, you can see your results divided up by taxon. Right now, it’s displaying Top Organisms, which works because we are looking for a human gene. If you need a gene from a more obscure organism, click “Tree” at the top or “More” at the bottom.

results by taxon

Do you really think there are over 300 STAT1 genes in humans? Probably not. Let’s click that number to see what’s going on.

stat 1 human results

Scrolling down the list, you can see that they are all indeed human genes, but only the first one appears to be called STAT1. Why are all these other genes coming up? There’s a clue in the search details box:

human search details

See where is says “stat1[All Fields]”? This means that if STAT1 comes up ANYWHERE in the gene record, it will be retrieved by this search. How do you prevent this? Instead of letting it search all fields, specify that it should search gene name only:

STAT1[Gene Name]

When you search this query, the number of entries goes down dramatically, and you can see by looking at results by taxon, only one human gene appears. You can get to it by clicking on the number.

Now that you understand a little better how the gene database works, how would you get to the human STAT1 gene in 1 query? Combine the gene query with the organism query:

STAT1[Gene Name] AND human[Organism]

Type that in the search box, and it’ll take you right to the gene page.

Next time we’ll dig into the content on the gene page.

New Exhibit – Michael Keyes: Stories & Seasons in Woodcut

"Sunny Morning" woodcut print by Michael Keyes

The Health Sciences Library will be hosting an exhibit of woodcut prints by Aurora artist Michael Keyes from July 1st – August 31st. Stories & Seasons in Woodcut will be on display in the 3rd floor Gallery.

Woodcut prints are a type of relief printing. The prints are created when an artist cuts a picture into a wood block, then puts ink on the block and presses it to paper.

Michael Keyes will be hosting an Opening Reception on August 7th from 2:00 – 5:00 pm. Stop by during that time to meet Michael and ask him about woodcutting.

The Gallery is accessible during the library’s public access hours.

Bioinformatics bite: How to find the official name of a gene

Today’s bioinformatics bite will lay the groundwork for a patron question that would be WAY too long for one post. One common objective in searching NCBI is to find all the information that you can about a particular gene. To do this, you need to have 2 basic pieces of information:

1. The official gene symbol

2. The organism in which this gene is found.

Unfortunately, there is not a straightforward way to find official gene symbols in NCBI. The best practice is to go to the official genome database for your organism of interest. Here are databases for a few common organisms:

Let’s say that we found a really old paper that mentions the gene symbol GPR133 in humans. Let’s enter it into HGNC and see what happens.


This search returns 2 entries: ADGRD1 and it’s associated gene subfamily. Looks like the official symbol has changed. Let’s search for the official gene name in  NCBI. The first step is finding the gene in the Gene Database. If you’re navigating from the main NCBI page, click on the dropdown menu to the left of the search box.

Database dropdown Click to enlarge images.

Then without entering anything into the search box, click the NCBI Search button button, which will bring you to the gene database main page.

Then, we can search for this gene using its gene symbol with the [Gene Name] tag:

ADGRD1[Gene Name]

This search will return genes with that symbol from any organism:


Now you can pick the organism you’d like from the list. Next time i’ll explain how to find all the genes from a particular organism using the [Organism] tag, and go into some detail about the taxonomy database.

Thanks for reading

-Tobin Magle, Biomedical Sciences Research Support Specialist

Cochrane Review Matchmaking Through Social Media

[From the Cochrane blog:

Cochrane Review matchmaking through social media

Want to work on a Cochrane Review you have a passion for? What happens if you don’t know of any appropriate co-authors? Like many before her, Rawabi Aljadani faced this problem and has turned to social media to find a match. Below she tells us more about creating a Cochrane Review team with the help of social media.


A systematic review connects the dots and brings the understanding of different variables in an easy-to-digest format. Creating a systematic review that may assist decision makers in the healthcare field is a meaningful goal and is worth the effort it requires to see it through. Having a team that shares your same vision and determination to make the Cochrane Review happen is important to its success but finding that team may not be easy.

Your Cochrane Review team doesn’t need to sit in the same room, nor even in the same continent. A virtual team that works through social media and electronic communication can connect no matter the distance. Co-authors, statistician, and a consultant with experience on the review topic can work together to produce a Cochrane Review, one that may change practice and have an impact on care. Working with a virtual group can be challenging at times; there are language barriers and different time zones to navigate. The plus side is that you will be able to improve your teamwork, critical appraisal, and problem solving skills…and have a Cochrane Review as your finished product.

But how do you find a virtual team that shares your passion and drive? I have a few suggestions:

  • Post on Facebook; on your own page but also in relevant group pages.
  • Use Twitter to get the word out. Use hashtags that will connect you with people who are also interested in your topic. Make sure your tweet or at least your profile has your contact information.
  • Email your colleagues and contacts and ask them to share with their contacts.
  • Use relevant Listservs to post  messages and let people know.
  • Contact the Cochrane Review Group your idea would fall under. They may have someone they can connect with and they often have their own mailing lists and social media accounts they can help you advertise in.
  • Blog about it! If you don’t have a blog, ask to guest blog on relevant sites.

Since I posted my own review idea on Twitter and Facebook a number of people have contacted me. People from different countries and with different educational backgrounds, who all share the same passion and want to be a part of this team. Connecting with people who share my excitement for creating a Cochrane Review really motivates me to keep going and to do my best. I’ve successfully used social media to get ‘matched up’ with co-authors…now if there are any statisticians that want to work with a passionate team, please email me!

From my social media posts I’ve gotten many encouraging messages and have connected with others who have been struggling to create a perfect Cochrane Review team but hadn’t thought to post on social media. I encourage everyone looking to create a Cochrane Review team to look outside of their own contacts, consider other possibilities, and never keep their vision limited to a certain place. After all your perfect teammate may only be one social media post away!

Rawabi Aljadani


Pharmacist at NGHA-Jeddah

[Amanda Langdon via Lynne Fox]

SciENcv — Converting Profiles that Use the Old NIH Biosketch Format to the New NIH Biosketch Format

The newest release of SciENcv has a feature to help users convert biosketches from the former NIH format to the new format, which took effect January 2015. Step by step conversion instructions are in the latest NLM Technical Bulletin.

Dana Abbey, MLS

Bioinformatics bite: finding a list of genes in a genomic region

Today’s bioinformatics bite comes from my very first walk in question. A researcher wanted to know how to find a set of genes found in a given genomic region. This analysis is useful if your organism of interest has a known deletion associated with a phenotype, and you want to get at the molecular mechanisms of how that phenotype occurs

Let’s use the following genomic region as an example: Human X Chromosome, bases 151,073,054-151,383,976.

If you’re a visual person, UCSC genome browser is a good option to find this information. Here’s what the page looks like by default*.

The genomic region being shown is indicated in the upper left. if you want to change the region being shown, type the new region in the box in the middle, but make sure your syntax matches the example on the left. The example we’re using would be entered as chrX:151,073,054-151,383,976.

UCSC default      (Click on the screen shots to enlarge)

That is A LOT of information. If you want to remove the information you’re not interested in, click the “hide all” button below the image (inside red box in the image). Here’s what it looks like after:

UCSC gene track

Now let’s add some information back into the browser. Since we’re specifically interested in genes, go under the “Genes and Gene Predictions” tab and change the dropdown below your favorite gene annotation set from “hide” to “full”. I like using RefSeq genes.

UCSC genes view

Now we can see that the GABRA3 gene and 3 microRNA genes are present in this region. However, I don’t see a quick way of exporting the gene list, which can become a problem if you’re looking at a bigger region that contains more genes.

To get around this issue, you can use NCBI’s Gene database. First, click on the “Advanced” link below the search box.

NCBI gene advanced

One of the things I like most about NCBI is that you can use the dropdown menus on the search builder to select fields that make your search more specific. For example, you can select “Organism” from the dropdown menu, and type human in the corresponding search box, it restricts your search to human genes**.  I can also specify the chromosome (Chromosome field) i’m interested in and the base pair range (base position field) in the advanced search as shown below.

NCBI gene advanced search2

Notice how the syntax is different. NCBI uses a colon (:) instead of a dash (-) to indicate a range of base pairs. More information about how to format fields can be found in the FAQ.

But what happens when I run this search?


I got 3 genes, but not the ones I found on UCSC genome browser. Why doesn’t the GABRA3 gene come up? To find out, I searched for human GABRA3 in the gene database: (human[Organism] AND GABRA3[Gene Name])


The answer lies in the “Assembly” and Location fields of the Genomic Context box***. Compare the assembly number to the one at the top of UCSC genome browser:

UCSC assembly

The UCSC genome browser is using assembly #36 and NCBI is already on #38. And we can see from the location field, the gene is now at basepairs 152,166,234-152,451,359 in the newest assembly. Now when you search this, you get the same genes from UCSC genome browser.


The moral of the story is, ALWAYS check the version of the data that you’re working with.

Now if you want a version of the table can be saved and is machine readable, click on the button next to Send to, select File and Click create file.

NCBI gene send to file

Hope this helped. Ask Us if you have any questions!

-Tobin Magle, Biomedical Sciences Research Support Specialist


If your page doesn’t look like my screen shot, it’s probably because you’ve been here before and your browser remembers what you did last time. Reset the system by clicking “default tracks” under the image.

**If you type in human and search in all fields, it would pick up the word human anywhere in the gene record, including a description that says something like “this gene is a homolog of human protein…”.

***Even though the human genome is “complete”, it is still continually being tweaked as better data becomes available. For more information on NCBI genomic assemblies, look here.