Home - Advanced Search - About - FAQ - Contact Us - Downloads
Mammalian Protein Subcellular Localization Database

  Quick search:       [ search protein IDs, descriptions, domain IDs, domain names and locations ] [ HELP ]
Frequently Asked Questions
Q: How do I search for my favorite protein?   A: There are several ways to search for proteins:
  • protein ID from the following sources:
    PubMed (9242691)
    HGNC ID (936)
    MGI ID or Symbol (95705, B4galt1) UniProtKB/SwissProt/SPTrEMBL (Q6FH21, S35B2_MOUSE)
    Entrez Gene, Protein (81577, AAB36516)
    RefSeq Protein (NP_000012)
    Ensembl Peptide (ENSP00000265480, ENSMUSP00000037834)
    Note that the search is automatically a substring search - anything that matches your search term will be returned.
  • subcellular location:
    (see question below for location terms used in the database)
  • functional description:
    any word or phrase used in the functional description of the protein or gene name ("RING fin", "RING finger", "hypothetical RING finger containing protein"); phrases are used as the search term, rather than space-delimited words
  • Domain IDs or names:
    Pfam domain IDs or the descriptions of the domains (PF00018, SH3 domain) SCOP domain IDs or the descriptions of the domains (50978, WD40 repeat-like)
  • BLAST: (Advanced search form)
    All of the proteins stored in the database can be searched using a query sequence. The hits will be displayed with links to the individual protein records and their protein class and subcellular location source will be shown as part of the BLAST results.
  • Batch ID retrieval: (Advanced search form)
    A comma- or space-delimited list of IDs can be used to retrieve a large number of proteins in batch.
Q: Why is my favorite protein not in this database?   A: Only full length proteins from the FANTOM3 Isoform Protein Sequence are present in this database.
The transcriptional landscape of the mammalian genome. (2005)
Science, 309(5740):1559-63.
PubMed Abstract
Q: How do I cite this resource?   A: Please cite the following article:
Sprenger J, Fink JL, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD (2007).
LOCATE: a mammalian protein subcellular localization database.
Nucleic Acids Res, (Database issue).
PubMed Abstract

Fink JL, Aturaliya RN, Davis MJ, Zhang F, Hanson K, Teasdale MS, Kai C, Kawai J, Carninci P, Hayashizaki Y, Teasdale RD (2006).
LOCATE: a mouse protein subcellular localization database.
Nucleic Acids Res, 34(Database issue):D213-7.
PubMed Abstract
Q: How do I link to this resource?   A:

Link to an individual protein records using the URL:
where XXX is an accession number or ID from the following sources:
UniProtKB/SwissProt/SPTrEMBL, Entrez, RefSeq, or Ensembl

Q: There seems to be a color-code associated with the different protein classes. What class does each color represent?   A: cytoplasmic proteins
secreted proteins
Type I membrane proteins
Type II membrane proteins
Multipass transmembrane proteins
Q: What is the vocabulary used to describe experimental and literature-mined subcellular localization calls?   A: We have created the following hierarchical controlled vocabulary to describe experimental subcellular localization:

Tier 1 Tier 2 Tier 3
Nuclear (GO:0005634) Nuclear Envelope (GO:0005635)  
  Nuclear Speckles (GO:0016607)  
  Nucleolus (GO:0005730)  
Cytoplasmic (GO:0005737)   
Membrane Associated (GO:0016020) Membrane Associated Unknown  
  Plasma Membrane-Like (GO:0005886)  
  Cytoplasmic Puncta  
  Golgi-Like (GO:0005794)  
  Reticular Mitochondrial-Like (GO:0005739)
    Endoplasmic Reticulum-Like (GO:0005783)
  Perinuclear Puncta  

We have created the following controlled vocabulary to describe literature-mined subcellular localization:

Primary Call
Cytoplasm (GO:0005737)
Cytoplasmic Vesicles (GO:0016023)
Endoplasmic Reticulum (GO:0005783)
Endosomes (GO:0005768)
Extracellular (GO:0005576)
Golgi Apparatus (GO:0005794)
Lipid Particles (GO:0005811)
Lysosomes (GO:0005764)
Melanosome (GO:0042470)
Mitochondria (GO:0005739)
Nucleus (GO:0005634)
Peroxisome (GO:0005777)
Plasma Membrane (GO:0005886)
Synaptic Vesicles (GO:0008021)
Cellular Component Unknown (GO:0008372)
Secondary Call
Apical Plasma Membrane (GO:0016324)
Basolateral Plasma Membrane (GO:0016323)
Centrosome (GO:0005813)
Golgi Cis Cisterna (GO:0000137)
Cytoskeleton (GO:0005856)
Early Endosomes (GO:0005769)
ERGIC (GO:0005793)
Inner Mitochondrial Membrane (GO:0005743)
Late Endosomes (GO:0005770)
Medial-Golgi (GO:0005797)
Outer Mitochondrial Membrane (GO:0005741)
Secretory Granule (GO:0030141)
Golgi Trans Cisterna (GO:0000137)
Golgi Trans Face (GO:0005802)
Tight Junction (GO:0005923)
Transport Vesicle (GO:0030133)
Q: How was the data on variable membrane organisation generated?   A: For a detailed description of this analysis, refer to:
Davis MJ, Hanson KA, Clark F, Fink JL, Zhang F, Kasukawa T, Kai C, Kawai J, Carninci P, Hayashizaki Y, and Teasdale RD (2006).
Differential use of signal peptides and membrane domains is a common occurrence in the protein output of transcriptional units.
PLOS Genetics, 2(4):554-563.
PubMed Abstract
Q: What are the guidelines used in the literature-mining?  A:

  • The subcellular localization (SCL) data incorporated in this resource represent the “potential of a protein to be located in a particular subcellular compartment”. The subcellular localization described in the manuscript must be based on direct detection of the polypeptide sequence (i.e., using antibodies, epitope tags, GFP fusions, etc.). Reports of induced SCL are included and, likewise, variation across different cell lines is included.
  • Modification of the data presented in the literature is not done. If we are unable to determine what the SCL is from the manuscript then it is not included. Likewise, if the report is clearly inaccurate then it is not included. Re-evaluation of the data from publications is not attempted.
  • SCL requires data from individual cells rather than tissues; low resolution immunostaining of tissue is not acceptable and is not included.
  • Cell types, including platelets and red blood cells, which have differentiated and no longer maintain a typical organelle structure are not included. Haploid reproductive cell types including sperm and oocytes are also excluded.
  • To be included the protein needs to be detected directly. This excludes publications reporting the SCL of enzyme activity or via protein-protein interactions such as ligand binding to a receptor. Membrane fractionation studies are not to be included.
  • SCL needs to be identified for the full-length protein. Data for protein fusions with segments of individual proteins or mutated variants of proteins are not included.
  • The cell type that is analysed needs to be from a mammal. For example, a human protein expressed in a frog oocyte is not included.

Q: How are subcellular localization images acquired?   A:

For a detailed description of the methods used, refer to:

Aturaliya RN, Fink JL, Davis MJ, Teasdale MS, Hanson KA, Miranda KC, Forrest AR, Grimmond SM, Suzuki H, Kanamori M, Kai C, Kawai J, Carninci P, Hayashizaki Y, Teasdale RD (2006).
Subcellular localization of mammalian type II membrane proteins.
Traffic. 7(5):613-25.
PubMed Abstract

Epifluorescent microscopy is used for first-pass localization. Co-localizations are performed using confocal microscopy.

A collection of images is chosen based on analysis of the observed expression patterns with each construct. One of these images is chosen to represent the construct.

Images are captured at 60X or 100X magnification.

Q: How were the transmembrane domains and signal peptides predicted?   A:

For a detailed description of the methods used, refer to:

Davis MJ, Zhang F, Yuan Z, Teasdale RD (2006).
MemO: A consensus approach to the annotation of a protein's membrane organization
In Silico Biology 6:0037
Download article

Q: What does it mean to have a conflict in the N-terminal transmembrane domain prediction?   A:

In the membrane organization prediction pipeline, many programs are used to predict the presence of both transmembrane domains and signal peptides. Occasionally, a transmembrane domain and signal peptide are predicted in the same position in a protein. These conflicting predictions at the N-terminal of the protein sequence are resolved using a method based on discriminant functions. If both a transmembrane domain and a signal peptide are predicted to be present, this discrimination results in the prediction of one feature or the other.

For more details about the conflict resolution method, see:
Yuan Z, Davis MJ, Zhang F, Teasdale RD (2003).
Computational differentiation of N-terminal signal peptides and transmembrane helices.
Biochem Biophys Res Commun., 312: 1278-1283.
PubMed Abstract

Q: What versions of Pfam and SCOP were used to predict motifs/domains on the LOCATE proteins?   A:

Pfam: version 21.0
SCOP: version 1.69

Q: How were the spliced isoform graphs generated?   A:

The graphs were generated using a customized version of the Python Splicing Graph Module, available from DEDB :: Drosophila melanogaster Exon Database. The module was edited to allow coloring of observed isoforms based on the predicted membrane organization of that isoform.

Q: How were the proteins in the database matched to proteins in the PDB?   A:

Each LOCATE protein was BLASTed against the PDB database. Proteins were then assigned a score between 0 and 5 based on the summed score of 3 parameters: E-value, coverage, and sequence identity. If the E-value was less than 10e-10, a hit was given 1 point; if the E-value was less than 10e-50, that hit was given 2 points. If the coverage of the alignment was 80% or greater, the hit was given 2 points. If the percent identity of the alignment was 98% or greater, the hit was given 1 point. Only hits that scored 4 points or more were mapped to a LOCATE protein.

Q: What browsers are recommended for properly viewing this website?   A: Mozilla 1.0 and later
Netscape/Navigator 4.5 and later
Internet Explorer 3.0 and later
Opera 5.11 and later
iCab 2.8 (OS 9.x) or iCab (OS X)
Camino (OS X)
(Note: LOCATE has not been tested on all of the above browsers.)
Q: How can I compare and evaluate the results of the five different computational prediction tools?   A: A comparison study was done by Sprenger et al.. For more details about the results, see:
Sprenger J, Fink JL, Teasdale RD (2006)
Evaluation and comparison of mammalian subcellular localization prediction methods.
BMC Bioinformatics, 18;7 Suppl 5:S3.
PubMed Abstract
Q: How exactly does CELLO predict the localizations?   A: For more details about the CELLO method, see:
Yu CS, Chen YC, Lu CH, Hwang JK (2006)
Prediction of protein subcellular localization.
Proteins, 64: 643-651.
PubMed Abstract
Setting used:
Organisms: Eukaryotes
Sequences: Proteins
Q: How exactly does pTarget predict the localizations?   A: For more details about the pTarget method, see:
Guda, C., Subramaniam, S. (2005)
pTARGET a new method for predicting protein subcellular localization in eukaryotes.
Bioinformatics, 1;21: 3963-9.
PubMed Abstract
Setting used:
not available
Q: How exactly does Proteome Analyst predict the localizations?   A: For more details about the Proteome Analyst method, see:
Lu, Z., Szafron, D., Greiner, R. et al. (2004)
Predicting subcellular localization of proteins using machine-learned classifiers.
Bioinformatics, 4;20: 547-556.
PubMed Abstract
Setting used:
Organism Type: Animal
Q: How exactly does WoLFPSORT predict the localizations?   A: For more details about the WoLFPSORT method, see:
Horton, P., Park, K.J., Nakai, K., Obayashi, T. (2006)
Protein Subcellular Localization Prediction with WoLFPSORT.
APBC06, 39-48.
Setting used:
Organism Type: Animal
Q: How exactly does MultiLoc predict the localizations?   A: For more details about the MultiLoc method, see:
Hoeglund, A., Doennes, P. et al. (2006)
MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition.
Bioinformatics 15;22, 1158-65.
PubMed Abstract
Setting used:
Method: MultiLoc (Animal) 9 Locations
Advanced output format

Copyright 2004, 2005, 2006, 2007 Institute of Molecular Bioscience, The University of Queensland and the ARC Centre in Bioinformatics
We gratefully acknowledge the support of the Australian Research Council.