Publishers are Requiring ORCID

orcid-logo

Do you have an ORCID number, a researcher ID number? There’s a growing list of journals and publishers, including PLoS, Wiley, the American Chemical Society, EMBO Press and others, that require corresponding authors to have an ORCID number.  If you don’t have one yet, you can register here.

What in the world is ORCiD?

If yorchids-nationalorchidgarden-20041025ou’re thinking that I merely misspelled the name these beautiful flowers,  you’d be mistaken.

ORCiD is a non-profit organization that provides unique identifiers for researchers.

This 16-digit identifier to any research output, such as publications, datasets, posters, basically anything. It’s basically a bar code that you can apply to your work to link you to your accomplishments.

ORCiD is currently being used worldwide. In fact, many publishers and international funding agencies require ORCiD iDs on funding applications and manuscript submissions.

Why are unique identifiers for researchers important?

A major challenge in cataloging research output for individuals and institutions is matching researchers to their output. Currently, the only way we have to distinguish researchers from one another are names and affiliations. This system is problematic for a number of reasons:

  • Names can change over the course of a career. This situation often occurs due to changes in marital status.
  • So do affiliations. Researchers almost certainly change their institution through their MD/PhD and postdoctoral training, but many also have other steps along the way.
  • Names are not unique. Many researchers, even within the same field, have the same or similar names, especially when databases only store first and middle initials.
  • Name formats are often not standardized. Researchers often publish under variations of the same names: some journals include first and middle names, some only accept first, not to mention typos.

To compound this problem, there is no one comprehensive source for all research publications. Thus, automated searches, such as the PubMed search strategy used with Colorado PROFILES, have issues with sensitivity and specificity. Faculty are asked to log in and curate their profiles, but many do not. A widely-used unique identifier for researchers that can travel with a researcher across institutions and can be integrated into many databases would solve this problem.

How does  orcid-logo fit in?

Because ORCiD is an independent non-profit organization, they provide an identifier that can be used for anything, anywhere. It stays with the researcher though name and affiliation changes. Even PubMed has an Author – Identifier field that uses ORCiD iDs now.

How is ORCiD different from other profile systems like ResearchGate and LinkedIn?

ORCiD isn’t meant to replace any of these systems. It’s not a professional networking platform like Research Gate and LinkedIn. ORCiD does provide an online profile system where researchers can display their accomplishments… or not. The ORCiD iD itself is useful even if you never fill out the profile system.  Additionally, these platforms do not provide a unique identifier. In fact, ResearchGate includes a field for ORCiD iD!

How is ORCiD different from other unique IDs for research output like Web of Science Researcher ID and Scopus Author ID?

Researcher ID and Author ID are, as indicated by their names, are unique IDs for researchers. However, their limitation is that they only link to citations within their respective databases (Web of Science and Scopus). Thus, they do not capture the whole picture. ORCiD iDs are platform agnostic and can import data from your Researcher and Author IDs, so you don’t have to start from scratch.

What about Google Scholar Citation Profile?

Google Scholar Citations is designed for you to keep track of your publications and associated citation metrics. ORCiD doesn’t do these things, but Google Scholar does not create a unique identifier.

But I already have all these things set up. Can I import things from these places?

Yes! Here are a list of tutorials:

ORCiD has not formed partnerships with ResearchGate and LinkedIn to allow direct transfer of information. ResearchGate does have a field where you can input your ORCiD iD.

How do I register for an ORCiD iD?

Individuals can get an ORCiD iD for free.

ORCiD has institutional partners that can automate this process for their faculty based on information they have on file. CU Anschutz has access to CU Boulder’s ORCiD membership. If you have questions or comments about the possibility of CU Anschutz using ORCiD, please contact tobin.magle@ucdenver.edu.

  • Tobin Magle, PhD. Bioinformationist.

 

 

Bioinformatics Bites: MedGen

This bioinformatics bite is going to be a little but more clinically oriented:

A patient is presenting with excess blood clotting, which she thinks might be related to “something that runs in her family”. How do I find known diseases and genes (if any) that are associated with that phenotype?

A good place to start to look for information about symptoms and diseases that are related to genetics is MedGen. This database organizes information related to human medical genetics, like symptoms (clinical features), related genes, diseases, or genomic loci.

A perfectly reasonable approach would be to type “clotting” into the MedGen search box. Here’s what those results look like:

clinfeaturestag

There are 94 results, the first of which is a clotting disorder, but one that is associated with too little clotting rather than too much clotting. If you scroll down, you see records that are not actually diseases:

otherconcepts

To find out what type of record you’re looking at, look at the text after concept ID (blue boxes). The screen captures above show a Disease or Syndrome, a Finding, and a Pharmacologic substance. Notice that the diseases has links to other databases (green circle) and the others do not.

So how do we specify that we’re looking for a patient symptom related to a genetic disease? Like all the other NCBI databases, MedGen has field tags.

Here are some useful ones:

  • Clinical Features: short stature[clinical features] – records for diseases that are associated with short stature
  • Related Genes: LMNB1[gene] – diseases associated with this gene
  • Disease name: achrondroplasia[title] – this disease
  • Chromosome: 6[chromosome]- diseases associated with alterations to chromosome 6

Also, if you look back at the first screen shot, you can see a link that says “See MedGen results with clotting as a clinical feature (5)“. MedGen automatically sensed that clotting was a clinical feature, or symptom, and narrowed your results down for you.

Now we’ve narrowed the MedGen results to those that have clotting listed as a clinical feature. If you read the description, you see that Factor V deficiency is the only one associated with excess clotting. The record also shows what gene is associated with this disorder (F5) and links to descriptions from other resources like GeneReviews and OMIM, as well as Professional guidelines and Recent clinical studies.

result page.png

So how do you find out if this is in fact what your patient has? Find out next week!

-C. Tobin Magle, PhD, Biomedical Sciences Research Support Specialist

 

Bioinformatics Bites: GEO2R

This blog series has covered how to use both GEO Datasets (which holds both curated and uncurated datasets) and GEO Profiles (which holds expression profiles for individual genes from curated data sets).

But what if you want to see expression profiles of a gene from an uncharted dataset? That’s where GEO2R comes in. Once you’ve identified a dataset by searching GEO Datasets, you can start using GEO2R in 5 easy steps:

  1. Pick your experiment
  2. Define sample groups
  3. Assign samples to groups
  4. Perform the test
  5. Interpret the results table

Pick your experiment:

You need an accession number to start using GEO 2R. Which one do you use? Let’s use this dataset from Toxoplasma gondii as an example.

search result

This record contains accession numbers (boxed in red) for the series, samples, and platform. GEO2R is looking for the Series Accession number, GSE73177.  Enter this number into the search field in GEO2R, or click the Analyze with GEO2R link (blue arrow):

definte groups

Define sample groups:

This experiment is measuring expression levels in 3 groups (parent strain, knockout strain, and complemented knockout strain**). Thus, we need to create 3 sample groups. To do this, click “Define Groups” link (green circle above.) This action activates a popup that allows you to enter free text to name the groups (red box, wt, ko, and comp in the following example)

define groups 2.png

Assign samples to groups: 

Now you need to tell the program which samples belong to each group by selecting the samples that you want to put into a group, then clicking on the group you want to add them 2 in the Define Groups popup. In the example below, I have selected the complemented knockout samples (highlighted yellow) and will click the “comp” group to add them. After they are added to a group, the corresponding colors change to that of the group and the group column is populated, as in the case of wt and ko in the example. Repeat this process for all of your groups.

assign to group.png

Perform the test:

Get a list of the top 250 differentially expressed genes using the default settings*, scroll down and click the Top 250 button under the GEO2R tab.

A table containing the top 250 differentially expressed probes from the platform that the probe ID, p-value, adjusted p-value, F statistic and probe sequence.

Interpret the results table:

Clicking on the probe ID will show you a graph of the gene expression among the groups you have specified. You can also click Sample Values to get the number values represented on this graph.

top250

In this case, it looks like the gene that is probed by 55.m10280_at is highly expressed in the knockout relative to the wild type, but doesn’t revert to wild type levels in the complemented strain.

To determine the gene name of the probe used in this experiment, visit the corresponding platform record for this series. (You can find this by searching the Series accession in GEO datasets, and using the platform filter.) Then, scroll to the bottom of the page to see the platform data table, which gives probe IDs, identifiers for genes in the toxoplasma genome database, annotation,chromosome location and a description of the gene function if available. Search for the probe id in the table to find the corresponding gene. In this case, it’s a thioredoxin domain-containing protein.

But what if your gene of interest is not in the top 250? You can use the Profile Graph tab to search by probe id.

GEO2R also has basic QC tools. You can see the value distributions across samples to identify large scale problems in the dataset using the value distribution tab:

value distribution.png

Finally, you can retrieve the R script for the analyses run in the R script tab.

NCBI also has a comprehensive tutorial on this tool if you’re interested.

Let me know if you have any questions!

  • C. Tobin Magle, PhD, Biomedical Science Research Support Specialist

* For novice users, the default settings are a good place to start. The calculation uses the limma package, and you can view and change the default settings by clicking the Options tab.

options tab

** knocked out gene added back in to account for off target effects of the knockout.

New Resource: Quetzal Advanced Version

Quetzal

The library now has access to the Advanced Version of Quetzal! Find it on the library’s database page, or follow the links/directions below.

Version descriptions:
Basic (Free) – Enhanced linguistic searching of PubMed documents to find relevant results.
Advanced – Includes patent, full-text searching and AHRQ Treatment Protocols.

Note: When creating a new individual account, users must be on campus, logged in with their PassportID after following the link below (if off campus), or connected by VPN.

For new users:

  1. Go to Quetzal.
  2. Scroll down to the bottom of the page.
  3. You will see the following message:

quetzal
If it doesn’t say “University of Colorado – Anschutz Medical Campus,” you are not on campus and will need to go through your Proxy Server or VPN

  1. Click the registration button at the bottom of the page:

quetzal 2

  1. Fill out the Registration Form.  Our organization should be pre-selected for “University of Colorado – Anschutz Medical Campus”.  If not, please select that from the drop-down list.  [Note: a separate registration is required for each individual user since each user uses their personal space to save searches, get weekly alerts, save results with personal annotations, and participate in private journal clubs.]
  2. Read the Terms of Use and check the box at the bottom of the Registration page.
  3. Click the Continue button.
  4. You should see a message saying “Your user registration was successful. You may Login to Quetzal® Search now.”  [IMPORTANT NOTE: If you see instead a new form that says “Choose your Subscription type”, then STOP.  You have not been correctly recognized as being from University of Colorado – Anschutz Medical Campus.  Be sure you are on site or logged in through your organization’s Proxy Server.  Or, make sure you are signing up for Quetzal® Advanced, not Professional.]
  5. Happy Searching!  You can follow along with the Guided Tour on your first visit. This will help you understand how to use Quetzal® effectively. You can also check out the information and 30” videos shown in the Quetzal® Quick Help section on the home page (after logging in).
  6. Any questions or problems?  Contact Quertle at info@quertle.com.

For users with an existing Quetzal® Basic account:

  1. Go to Quetzal.
  2. Login to your existing account.
  3. In the upper right corner of the search area, click on “Your Quetzal”.
  4. Click on the My Profile tab. In the Organization drop-down list, be sure our subscribing organization name (University of Colorado – Anschutz Medical Campus ) is chosen.  This should be the same name you see on the Quetzal welcome page in the “Happy News” message at the bottom of the page.  If it is not the same, please select the correct Organization from the list.
  5. Click on the My Account tab.  You should see a Happy News message.  [NOTE: If you do not see the Happy News message, then STOP.  You have not been correctly recognized as being from University of Colorado – Anschutz Medical Campus.  Be sure you are on campus or logged in through your organization’s Proxy Server.]
  6. Click “Change” on the Your Version line.
  7. Choose “Advanced” in the dialog that opens and then click Submit. You may need to log in again.
  8. On your My Account page, the line for “Your Version” should now display Quetzal® Advanced.
  9. Enjoy searching all the content Quetzal® has to offer and the full set of features and functions.
  10. Any questions or problems?  Contact Quertle at info@quertle.com.

Non-AMC campus members can still register for access to the Basic (free) versionhttps://www.quetzal-search.info/

Bioinformatics Bites: Expression of a single gene in GEO Profiles

This weeks bioinformatics bite will answer a question from the end of last week’s post:

What is the expression of ITGA2 gene in prostate cancer cells?

I’d check GEO profiles, because it is a gene centric question. Let’s start by typing in prostate cancer in the GEO profiles search box.

GEOProfiles Results

Note the link back to the GEO data set that this gene profile was derived from (green circle). You can also see the platform and the specific probe that measures this gene (orange box). Finally, you can see a cartoon of the expression level between sample and control on the right side of the record.

To narrow down your search to a specific gene, use the filters on the left to select an organism (red box) and a Gene symbol (blue box, make sure to check that you’re using the correct gene symbol). Let’s look for the ITGA2 gene in human derived samples.

GEOProfiles Results Filtered

After applying the filters (blue and red boxes), you can see the search strategy in the Search details (orange box).

To get a close up of the expression graph, click the cartoon on the right (green circle).

GEOProfiles Expression

At a glance, you can see that ITGA2 expression goes up when the microarray miR-205 is expressed (red bars). It also indicates how highly this gene is expressed relative to other genes from the same sample by percentile (blue squares).  It also lists the expression values from each sample in a table below, along with its rank.

If you need more information about how the samples were prepared, you can click on the GSM number in the table. From there, you can access general information about the experiment by clicking on the Series ID (GSE) on any sample page, or the original GEO profile record.

But what do you do if the gene you’re attempting to access data in an uncurated data set? NCBI has a tool for that: GEO2R. We will discuss how to use this tool next time.

  • C. Tobin Magle, PhD, Biomedical Science Research Support Specialist

Bioinformatics Bites: the GEO Databases

Wow! It’s been a long time since i’ve been able to take the time to write a post. I apologize. Sorry for the hiatus. I have been traveling and playing catchup and attending meetings post travel. I hope to get back to my weekly posts now.

This week’s bioinformatics bites addresses finding gene expression information using GEO profiles.

Background: The Gene Expression Omnibus (GEO) was created by the NCBI to store gene expression data from microarray experiments. Most of the content is still microarrays, but some NGS data is also present (with the raw reads in the SRA database). Additionally, it now contains other types of high-throughput genomics data, like CHiP-chip.

GEO exists as two separate databases:

  • GEO DataSets: original submitter-supplied records and curated data sets
  • GEO Profiles: Expression profiles of individual genes from curated GEO data sets

GEO Data Sets, which are labeled with GDS numbers, have three major components submitted:

  • Series (GSE) – List of expression profiles that conducted for the experiment (test, control, replicates)
  • Samples (GDS) – Information about the biological samples used in the experiments, including extraction procedures
  • Platforms (GPL) – What platform the samples were run on (like Affymetrix Mouse Genome 430 2.0 Array)

All 3 of these components are necessary to make use of the study. Both samples and series link to the CEL files, and platform gives you information about the chip the samples were run on. NCBI is working to assemble all of the components from each submitted study into a curated DataSet, but there is some lag in the process. NGS studies can’t be curated at this time.

After curation, the data sets are broken out by gene instead of experiment and the data are loaded into GEO profiles.

if you do a quick search for “cancer”, here are what the results look like.

GEO data Set Results GEO DataSets results

This looks a lot like the output of a Gene search, but with different filters. You can see in the red box that there are over 1000 cancer data sets in GEO, and that the top data set has 6 samples by following the red arrow. You can filter by study type (blue box), things like tissue or strain (purple box), or Organism (orange arrow). You also get a search details box, which shows that MeSH terms are applied to your text search, just like PubMed.

Now that we have a general idea of how the data bases are structures, I will answer a practical question in next week’s post:

What is the expression of ITGA2 gene in prostate cancer cells?

See you next week

  • C. Tobin Magle, Biomedical Sciences Research Support Specialist

Bioinformatics Bites: Repurposing publicly available data

I’m going in a bit of a different direction with this bioinformatics bite segment. Instead of explicitly describing how to use a database or tool, I wanted to tell you all about a little project that I’m working on with a vet student from CSU.

Just because I don’t have a lab or large amounts of research funding doesn’t mean I can’t do science! There’s a wealth of bioinformatic data publicly available online and some user friendly tools that are available. We’re using these freely available resources to ask questions about how the distribution of microbes in the environment correlates with specific landmarks.

Our research question is as follows: Are there differences in microbial populations at sites that are close to zoos in the NYC area compared to the farther away. To address this question, we are using data from the PathoMap project, which swabbed surfaces in transit stops all over New York City. (This project is also being expanded to the top 10 cities worldwide for public transit ridership by a project called MetaSub.) See this publication for more information.

pathomap Pathomap main page

We determined test and control sites by looking up how city planners determine the distance people are willing to walk to a transit stop. We are also using a tool called GeneGis 2 to analyze and visualize these data. See this publication for more information on GenGIS.

gengisGenGIS output from their wiki

Repurposing data forces you to think differently. Our research question was designed in the context of what data and tools were publicly available. This type of research will only get easier as the mindset of creating well-designed community resources expands. Initiatives like Big Data to Knowledge (BD2K) and the Center for Open Science are driving this new trend.

If you’re curious about data repurposing, or how to find datasets and tools, please see the bioinformatics section of our new Research Support Pages or set up a consultation to discuss your questions.

  • Tobin Magle, biomedical sciences research support specialist.

Bioinformatics Bites: Creating a custom database to search with BLAST

This week’s Bioinformatics Bite will show you how to run a BLAST search in a custom database that you create using Entrez queries. Also, if you’re interested, NCBI has a YouTube video that covers this topic.

To review, there are 2 ways to search the NCBI databases:

  1. Sequence homology via BLAST
  2. Text via Entrez

Remember Entrez? This search engine allows you to apply field tags to refine your search results. These field tags vary by database, which makes sense because different datatypes necessitate different contextual fields. For example, you can find the field tags for the sequence databases (Nucleotide, Protein, GSS, ESThere.

Let’s go back to our Constructing a BLAST Query post from last month. We’re going to focus on

Step 4: Choose your database (search set)

Here are my parameters:

  1. Algorithm typenucleotide BLAST
  2. Query: NM_001182936.1(Saccharomyces cerevisiae S288c Ras family GTPase RAS2)
  3. Search name: yeast ras2
  4. Database: refseq_RNA
  5. Specific algorithm: blastn (somewhat similar sequences)

I’m starting with a pre-made database and we’ll refine from there. Enter this information into nucleotide BLAST query page and hit BLAST.

Here is a graphic summary of the results:

ras2 alingment all(click to enlarge)

Let’s take a look at the results taxonomically by clicking Distance Tree of Results.

open distance tree

This action will open a new window.

ras2tree

The results are from 3 taxonomic groups: yeast, animals and protists. Let’s zero in on the animal group.

First, return to the original BLAST page.

Then, change the view so we only see sequences from animals by clicking Formatting options above the search summary and typing animals in the Organism field in the Limit results section. The taxid for animals will autofill as you type.

blast formatting

The top result from Drosophila wilistoni. Keep that e-value in mind.

dros eval format animals

Now, instead of filtering the search results from the refseqRNA database, let’s create a custom search set. First, click Edit and Resubmit above the search summary.

edit and resubmit

Now, limit the search to animals entering animals in the Organism field. This parameter reduces the number of records in the search set.

choose search set

Now, let’s look at the e-value from Drosophila wilistoni again.

dros eval animals only

It changed from 2e-47 to 1e-47, but all of the other values in the table stayed the same. it changes because the the e-value is dependent on the size of the search dataset. The larger the dataset, the more likely that you’ll get a coincidental match even though the sequences are related. Hence, the e-value went down as the size of the database went down.

I hope this tutorial has illustrated the importance of creating custom search sets and recording algorithm parameters.

  • Tobin Magle, Biomedical Sciences Research Support Specialist

Bioinformatics Bites: PubChem

This week’s bioinformatics bite is being outsourced to the NCBI Insights blog. Enjoy a concise explanation about how to identifying chemical targets to find cross reactions and prevent drug side effects:

http://ncbiinsights.ncbi.nlm.nih.gov/2015/09/04/identifying-chemical-targets-finding-potential-cross-reactions-and-predicting-side-effects/

  • Have a nice long weekend,

Tobin Magle, PhD, Biomedical sciences research support specialist

NCBI cracks 200 annotated eukaryotic genomes

The National Center for Biotechnology Information (NCBI) is the central repository for molecular data in the US. They don’t generate their own data: all of the sequences in their databases is submitted by external sources such as research labs, genomic sequencing consortia, and through the INSDC.

They do, however, provide an interface to access these data and create tools to analyze the data, which includes annotation a select set of genomes. Because NIH is primarily devoted to human health, they prioritize on eukaryotic genomes, especially mammals.

The NCBI has annotated over 200 eukaryotic genomes so far. Do you want to see your favorite organism annotated by NCBI? They take requests through their Help Desk.

  • Tobin Magle, PhD- Biomedical Sciences Research Support Specialist.

Bioinformatics Bites: Primer BLAST

This week’s bioinformatics bites is going to look over the features of Primer BLAST.

Back in my day (circa 2001-2014), we designed our primers by hand! Most of the places I worked, we’d have a printout of the genomic sequence we were working on that was annotated with transcripts and restriction site. We’d eyeball a good primer site and use OligoAnalyzer to find Tm and hairpins and self annealing and all that. One group I worked in would calculate Tm by counting up all the G’s and multiplying that by 4, then counting all the As and multiplying that by 2, and adding those two numbers together. There was no thought given to contaminating products in the design process. That was all trial and error in the PCR machine.

It was the dark ages.

Luckily, NCBI designed primer BLAST to help you all out. The primary function of this tool is primer design. First, enter a template.

PrimerBlastTemplate

(Click to enlarge images)

First, provide a unique identifier (accession or GI) for a record that’s in GenBank (like the NM_ number for an mRNA) or paste in a nucleotide sequence in FASTA format. i’m going to use the accession number (NM_001302688.1) for the human APOE gene mRNA.

Then, specify where you want your forward (sense) and reverse (antisense) primers to be in that sequence.

Finally, specify information about your primers (Tm) and desired PCR product (length). You can also use this tool to create a matching primer for a preexisting primer. We’ll use the example  5′-GGGAGCCCTATAATTGGACAAG-3′

PrimerBlastPrimerParameters

If you used a refseq mRNA as your template, specify parameters involving introns and exons, such as whether the  primers span an exon-exon junction and how many bases have to match on either side of the junction. You can also specify whether you want the product to span an intron on the genomic DNA.

Primer blast exonintron

Finally, specify how you want the algorithm to check for specificity of your primers by selecting your organism (or possible contaminating organism), the datta base you want to search, target size and specific primer parameters (mismatches etc.) PrimerBLAST specificity

For this demo, I used the template NM_001302688.1 and specified that my forward primer is 5′-GGGAGCCCTATAATTGGACAAG-3′. I also specified that I want the product include on intron. All other values were kept as default. (Another handy feature is that the interface highlights non-default parameters in yellow.)

After clicking submit, a nice graphical summary of the results is displayed. Note that because i specified a forward primer, all the PCR products (in blue) start in the same place:

primer BLAST graphic

For each primer pair, the sequence of both primers, the product length, the size of the intron they span are included. along with information about Tm, length, start and stop positions, and self binding for each primer.

primer blast primer pair

This is where the hack I mentioned last week comes in: because the primer that we entered into the search is also mapped onto the template, Primer BLAST effectively tells you the binding sure of the primer in terms of its position on the transcript we entered as the template. Pretty cool! Changing the template to the DNA accession number that contains this sequence would give us the genomic position of the primer, but we wouldn’t be able to do the cool stuff with introns.

Finally, it displays potential contaminating products you might see with human DNA as the PCR template:

primer blast unintended targets

In this case, it looks like we’d be amplifying another transcript variant of APOE. Definitely something to look out for when doing qPCR!

I hope you find this tool useful. Please let me know if I can help you with using primer BLAST or any other NCBI databases and tools.

  • Tobin Magle, Biomedical Sciences Research Support Specialist

Bioinformatics bites: How do I find primer binding sites?

This week’s bioinformatics bite comes from another actual patron question (paraphrased):

I have all these primers that someone else designed. How to I figure out where they bind and what they amplify?

Disclaimer: this isn’t actually the answer I gave to the person seeking help, but I’ve since found a more efficient tool.

Probably the fastest way to get this information is to use a simple tool called Primer Map.

Conveniently, they have an example primer mapping loaded into the browser:

Map these Primers:

(reverse) aacagctatgaccatg,
(T3) attaaccctcactaaag,
(KS) cgaggtcgacggtatcg,
(SK) tctagaactagtggatc,
(T7) aatacgactcactatag,
(-40) gttttcccagtcacgac,
(Sp6) atttaggtgacactatag,
(M13 for) gtaaaacgacggccagt,
(M13 rev) cacacaggaaacagctatgaccat,
(BGH rev) tagaaggcacagtcgagg,
(pGEX for) ctggcaagccacgtttggtg,
(pGEX rev) ggagctgcatgtgtcagagg,
(T7-EEV aaggctagagtacttaatacga,
(pUC/M13 Forward) gttttcccagtcacgac,
(pUC/M13 forward) cgccagggttttcccagtcacgac,
(pUC/M13 reverse) caggaaacagctatgac,
(pUC/M13 reverse) tcacacaggaaacagctatgac,
(Glprimer1) tgtatcttatggtactgtaactg,
(GLprimer2) ctttatgtttttggcgtcttcca,
(RVprimer3) ctagcaaaataggctgtccc,
(RVprimer4) gacgatagtcatgccccgcg,
(Lambda gt11 Forward) ggtggcgacgactcctggagcccg,
(Lambda gt11 Reverse) ttgacaccagaccaactggtaatg,
(Lambda gt10 Forward) cttttgagcaagttcagcctggttaag,
(lambda gt10 Reverse) gaggtggcttatgagtatttcttccagggta,
(Pinpoint Sequencing) cgtgacgcggtgcagggcg,
(pTarget Sequencing) ttacgccaagttatttaggtgaca

To this sequence:

>sample sequence
cagctggggggaggtggcgaggaagatgacgtggtcgaggtcgacggtatcgagttgtcgcggcagctgccaatacgactcactatagaggagaagtagcaagaaaaataacatgataattatcacgacaactacctggtgatgttgctagtaatattacttgttatttttctcgtcatcttcccggcgacgtcgccagcaacatctttagtgagggttaatcacctgctacttctcccgccacctccc

PrimerMapQuery

Once the template is in the top box, and the primers are in the bottom box, hit Submit. (The output gets a lot cleaner if you turn translation and restriction enzyme displays off in the settings.)

The results pop up in a new window. The first results section show where the primers bind, with forward (sense) primers highlighted in purple and reverse (antisense) primers highlighted in orange.

PrimerMapResultsSequence

The second part of the results page show a table with all the primers that you input, highlighting which ones that bound with the color that indicates their orientation:

PrimerMapResultsTable

The only thing that is missing is a their column that indicates the position on the template at which the primers bind. I guess beggars can’t be choosers though.

Next time I’ll show you how to do this using NCBI’s Primer BLAST. This algorithm is actually build for primer design, but it can be hacked for this purpose and provides better visualizations and more information.

  • Tobin Magle, PhD, Biomedical Sciences Research Support Specialist

Bioinformatics Bites: R tutorials for beginners

Today’s Bioinformatics Bite is based on a hypothetical question that I think a lot of people are afraid to ask:

I hear R is a great tool for doing bioinformatic analysis, but I have no idea how to code. How can I get started?

Well, I’d say the first step is to Install R.

This first installation installs the R coding language and a bare-bones editor to write and run code in.

If you want a nicer interface, I’d suggest installing R studio. which has a lot of bells and whistles that will made using R a lot easier. R Studio contains a text editor with highlighting, integrated help functions, an environment window that reminds you what variables you created and a console that allows you to execute your code right from the editor.

Rstudio

But how do you even know where to begin now that you have it installed? A good place to start for R basics is Swirl. Swirl is a tutorial system that you can use INSIDE R. All you have to do to install Swirl is type

install.packages(“swirl”)

inside the R console, and swirl will be automatically installed!

Now to run Swirl, just type

> library("swirl")
> swirl()

in the R console to load the swirl library and run the tutorials. Then you can pick from a variety of introductory tutorials that are closely linked to courses in the Johns Hopkins University Data Science Specialization on Coursera. Now we just need to get someone to write some Swirl tutorials for Bioconductor.

If you have any questions about how to set up R, don’t hesitate to Ask.

Tobin Magle,

Biomedical Sciences Research Support Specialist

tobin.magle@ucdenver.edu