About DEPODIndex:1. Introduction2. Compilation of data3. Annotations in DEPOD-20194. Data visualization and access5. Funding6. Feedback7. License8. How to cite?1. Introduction (Top)DEPOD - the human DEPhOsphorylation Database is a manually curated database harboring human phosphatases, their experimentally verified protein and non-protein substrates and the dephosphorylation site information. It also harbors information about the phosphorylating protein kinases, protein interaction partners and pathways involving these phosphatases and substrates. DEPOD aims to be a valuable resource for studying human phosphatases and their substrate specificities and molecular mechanisms; phosphatase-targeted drug discovery and development; connecting phosphatases with kinases through their common substrates; completing the human phosphorylation/dephosphorylation network. 2.1 Compilation of phosphatases from humansA phosphatase is defined as an enzyme that hydrolyzes phosphomonoesters into a phosphate ion and a molecule with a free hydroxyl group. DEPOD focuses on human phosphatases, for which the E.C. number is 3.1.3.* (phosphoric monoester hydrolases). These phosphatases include both protein phosphatases, acting on hydroxyls of Serine/Threonine/Tyrosine residues in proteins and non-protein phosphatases, acting on non-protein substrates like phospholipids and phosphocarbohydrates. These phosphatases are initially retrieved from the Ensembl database. Then using the UniProt annotations and 186 phosphatase-related GO terms , we compiled phosphatases in humans. 2.2 Comilation of phosphatase substratesKnown substrates of human phosphatases which have experimental evidences are systematically collected and manually curated from several data sources:(1) "dephosphorylation" posttranslational modification (PTM) data in Human Protein Reference Database (HPRD); (2) "dephosphorylation" protein-protein interaction data from 13 different databases searched using the PSICQUIC service (a portal integrating popular protein-protein interaction databases like APID, BioGrid, IntAct, DIP, InnateDB, MPIDB, iRefIndex, MatrixDB, MINT, Interoporc, Reactome, Reactome-FIs, STRING, and BIND) at EMBL-EBI, UK; (3) substrate information from UniProt annotations; (4) substrate information from literatures searched with PubMed and Google. Criterion to regard a given protein as a SUBSTRATE :-For all the potential substrate data, the original literature was manually inspected. We include a protein as a substrate of a phosphatase ONLY and ONLY IF it is supported by in vitro and/or in vivo experimental evidences showing direct dephosphorylation of the entire protein . If only the regions of proteins (peptide or longer) were used in the depshophorylation reaction, the parent proteins containing these regions were NOT treated as valid substrates.Assessing RELIABILITY of a dephosphorylation reaction :-DEPOD scores the dephosphorylation interactions to evaluate their reliability. The reliability score takes into account both the bioassay type (in vitro/in vivo) of dephosphorylation experiments and the number of laboratories performing the experiments as follows:Score 1 () - in vitro OR in vivo experiments performed by a single lab, or UniProt annotation; Score 2 () - in vitro OR in vivo experiments performed by multiple labs OR in vitro AND in vivo experiments performed by a single lab; Score 3 () - in vitro AND in vivo experiments performed by multiple labs. The gene/protein names of both the phosphatase/s and the substrate/s and the respective source organism/s are manually inspected in original publications. The dephosphorylation site information in case of protein substrates, has been corrected to the corresponding amino acid positions in the most recent reviewed entry for that protein in the latest release of UniProt (presently November 2018). 3. Annotations (Top)DEPOD-2019 richly annotates each gene (phosphatase or protein substrate) by integrating information from several external public databases (please refer to the Table below)1. Basic Information tab gives an overview of a gene. E.g. Gene/protein names, synonyms, EC number etc. DEPOD-2019 now links each gene to four genome browsers namely Ensembl, UCSC, NCBI and 1000Genomes. In addition, we link-out to 69 other databases. Annotations about evolutionary conservation have also been added. DEPOD-2019 now gives a schematic representation of Pfam domain composition for a polypeptide. We have also integrated data from four disease databases namely OMIM, COSMIC, DisGeNet and ClinVar. Annotations about subcellular localization, function and catalytic activity are imported from UniProt. Only experimental annotations with original literature information are considered here. Information from ELM resource about short linear motifs (SLiMs) within proteins along with their regular expressions, positions within the sequence and evolutionary conservation has now been added. GO terms depicting the biological processes, molecular functions and cellular compartments where proteins are found have been imported from UniProt. Browsing these annotations has been facilitated by providing a show/hide toggle. 2. Phosphatases/Substrates tab lists the dephosphorylating phosphatases (in entries corresponding to substrates) or protein and non-protein substrates (in case of phosphatase gene entries) along with the reliability score of the dephosphorylation interaction, the bioassay type and a PubMed link to the original publication. This table is sorted according to the reliability score we assign to each interaction. DEPOD-2019 now gives conservation of the 10 amino acid stretch surrounding the central phospho-site. 3. Pathway tab gives IDs, names and descriptions of various biochemical and metabolic pathways involving a given gene/protein. This information is integrated from KEGG and REACTOME databases. Using pathway IDs, user can also ask which phosphatases and substrates (both protein and non-protein) other than the gene in question, are also involved in the same pathway. 4. Interacting proteins tab integrates information from three major protein-interaction databases namely BioGRID, IntAct and MINT. In addition, we provide all protein interacting partners of a given gene along with the reliability scores obtained from the respective databases. All protein interactions are supported by links to the respective original publications. Disclaimer: We (Koehn Group) do not vouch for the accuracy of the interaction-sources and also the reliability scores provided by the respective databases. 5. Phosphorylating kinases tab is a unique attempt by DEPOD-2019 to provide information about the bi-directional phosphoregulation of a given protein. We have integrated information about protein kinases that phosphorylate the protein in question. This information has been compiled from three major phosphorylation databases - phophoELM, PhosphoSitePlus and HPRD. For each phosphorylation interactions, we provide a link to the original publication and the source database from where this information was taken. Similar to the conservation profile of the de-phosphosites, we provide conservation profile for a 10 amino acid stretch surrounding the central phosphosite. Disclaimer: We (Koehn Group) do not vouch for the accuracy of the sources of phosphorylation interactions. We rely on the three databases listed above for this. 3.1 Data Sources, External Links and Softwares
4. Data Visualization and Access (Top)Search options: DEPOD-2019 entries can be searched using UniProt accession numbers, Gene names, synonyms, Entrez geneIDs and GenBank IDs. Non-protein substrates can be searched using ChEBI or KEGG ID, SMILEs or InChI notations and Molecular formula. We have also provided a quick-search utility for key-word based searches.Easy Downloads: User can now download FASTA sequences of protein substrates of any phosphatase using a link provided under the protein-substrate tab on the respective phosphatase entry. We now also provide a list of interacting protein partners of a given phosphatase or a substrate gene along with the corresponding reliability scores and the source databases as an excel sheet. Non-protein chemical substrates of phosphatases can be downloaded in two formats - .mol and .sdf Interactive visualizations: DEPOD-2019 allows user to visualize its features interactively. Interactive 3D-structure visualization has been enabled using Web3DMol software (Shi M et. al., Nucleic Acids Res, 2017). We have used Cytoscape.js project (Franz M et al, Bioinformatics, 2016) to enable user to visualize protein interaction networks on the "Interacting proteins" tab as well as the Kinase-Substrate-Phosphatase networks on the "Basic Information" tab and also on the "Phosphorylating Kinases" tab. Evolutionary conservation of the entire protein and the phosphosites and short linear motifs (SLiMs) within that protein using ProViz tool (Jehl P et. al. Nucleic Acids Res, 2016). 5. Funding Sources (Top)DEPOD started as a collaboration between Prof. Janet Thornton's Group (EMBL-EBI, Hinxton, UK), Prof. Maja Koehn's Group (Faculty of Biology, BIOSS and CIBSS, University of Freiburg, Germany) and Prof. Matthias Wilmanns's Group (EMBL, Hamburg, Germany). Please direct any queries and suggestions to Prof. Maja Koehn whose research group maintains this database. In order to help with the curation of the database, we would like to encourage experts to submit their data about new substrates. For new data entries provided by external experts, we offer to give credit for the contribution on the webpage and in the acknowledgements of further published updates. Please refer to our license policy here 8. DEPOD Publications (Top)Duan, G., Li, X., Köhn, M. (2015). The human DEPhOsphorylation database DEPOD: a 2015 update. Nucleic Acids Research, 43(Database issue):D531-5. (PMID: 25332398) Li X, Wilmanns M, Thornton J, Köhn M. (2013). Elucidating human phosphatase-substrate networks. Science Signaling, 6(275):rs10. (PMID: 23674824)
|