This blog series has covered how to use both GEO Datasets (which holds both curated and uncurated datasets) and GEO Profiles (which holds expression profiles for individual genes from curated data sets).
But what if you want to see expression profiles of a gene from an uncharted dataset? That’s where GEO2R comes in. Once you’ve identified a dataset by searching GEO Datasets, you can start using GEO2R in 5 easy steps:
- Pick your experiment
- Define sample groups
- Assign samples to groups
- Perform the test
- Interpret the results table
Pick your experiment:
You need an accession number to start using GEO 2R. Which one do you use? Let’s use this dataset from Toxoplasma gondii as an example.
This record contains accession numbers (boxed in red) for the series, samples, and platform. GEO2R is looking for the Series Accession number, GSE73177. Enter this number into the search field in GEO2R, or click the Analyze with GEO2R link (blue arrow):
Define sample groups:
This experiment is measuring expression levels in 3 groups (parent strain, knockout strain, and complemented knockout strain**). Thus, we need to create 3 sample groups. To do this, click “Define Groups” link (green circle above.) This action activates a popup that allows you to enter free text to name the groups (red box, wt, ko, and comp in the following example)
Assign samples to groups:
Now you need to tell the program which samples belong to each group by selecting the samples that you want to put into a group, then clicking on the group you want to add them 2 in the Define Groups popup. In the example below, I have selected the complemented knockout samples (highlighted yellow) and will click the “comp” group to add them. After they are added to a group, the corresponding colors change to that of the group and the group column is populated, as in the case of wt and ko in the example. Repeat this process for all of your groups.
Perform the test:
Get a list of the top 250 differentially expressed genes using the default settings*, scroll down and click the Top 250 button under the GEO2R tab.
A table containing the top 250 differentially expressed probes from the platform that the probe ID, p-value, adjusted p-value, F statistic and probe sequence.
Interpret the results table:
Clicking on the probe ID will show you a graph of the gene expression among the groups you have specified. You can also click Sample Values to get the number values represented on this graph.
In this case, it looks like the gene that is probed by 55.m10280_at is highly expressed in the knockout relative to the wild type, but doesn’t revert to wild type levels in the complemented strain.
To determine the gene name of the probe used in this experiment, visit the corresponding platform record for this series. (You can find this by searching the Series accession in GEO datasets, and using the platform filter.) Then, scroll to the bottom of the page to see the platform data table, which gives probe IDs, identifiers for genes in the toxoplasma genome database, annotation,chromosome location and a description of the gene function if available. Search for the probe id in the table to find the corresponding gene. In this case, it’s a thioredoxin domain-containing protein.
But what if your gene of interest is not in the top 250? You can use the Profile Graph tab to search by probe id.
GEO2R also has basic QC tools. You can see the value distributions across samples to identify large scale problems in the dataset using the value distribution tab:
Finally, you can retrieve the R script for the analyses run in the R script tab.
NCBI also has a comprehensive tutorial on this tool if you’re interested.
Let me know if you have any questions!
- C. Tobin Magle, PhD, Biomedical Science Research Support Specialist
* For novice users, the default settings are a good place to start. The calculation uses the limma package, and you can view and change the default settings by clicking the Options tab.
** knocked out gene added back in to account for off target effects of the knockout.