KnowEnG Knowledge Network
The KnowEnG Knowledge Network (KN) represents community data sets as a massive heterogeneous network, composed primarily of genes (Gene Nodes) and their annotations (Property Nodes) as well as their mutual relationships (Edges). The KN attempts to identify many relevant knowledge bases (Resources), to download their current contents (Datasets) regularly, and to clean and consolidate those disparate resources and construct a unified knowledge graph of multiple relationships (Edge Types) ready for integration with analytic methods.
Summary information about the Knowledge Network can be found here.
Links to tools to fetch subnetworks and map node names can be found here.
KN Contents by Species:
Species (TaxonID) | Network Edges (millions) | Property Nodes (thousands) | Gene Nodes (thousands) | Datasets |
---|---|---|---|---|
Total | 476.8 | 178.5 | 405.4 | 137 |
Homo sapiens (9606) | 49.4 | 151.7 | 30.0 | 74 |
Gasterosteus aculeatus (69293) | 48.6 | 5.2 | 20.8 | 3 |
Danio rerio (7955) | 39.9 | 14.8 | 25.9 | 6 |
Pan troglodytes (9598) | 35.9 | 5.9 | 18.7 | 3 |
Arabidopsis thaliana (3702) | 32.9 | 10.7 | 28.7 | 5 |
Mus musculus (10090) | 31.0 | 26.0 | 25.3 | 12 |
Canis familiaris (9615) | 30.5 | 21.9 | 19.9 | 6 |
Sus scrofa (9823) | 30.1 | 20.7 | 21.6 | 6 |
Taeniopygia guttata (59729) | 27.8 | 6.5 | 17.5 | 4 |
Xenopus tropicalis (8364) | 25.6 | 6.7 | 18.4 | 4 |
Macaca mulatta (9544) | 25.2 | 5.6 | 21.1 | 3 |
Rattus norvegicus (10116) | 25.1 | 24.9 | 22.4 | 6 |
Bos taurus (9913) | 25.0 | 22.3 | 20.0 | 8 |
Aedes aegypti (7159) | 16.8 | 4.1 | 15.8 | 3 |
Drosophila melanogaster (7227) | 13.1 | 13.2 | 14.5 | 5 |
Caenorhabditis elegans (6239) | 9.2 | 10.7 | 20.7 | 5 |
Gallus gallus (9031) | 6.1 | 18.2 | 18.3 | 6 |
Saccharomyces cerevisiae (4932) | 3.9 | 6.6 | 6.8 | 3 |
Daphnia pulex (6669) | 0.5 | 4.3 | 30.0 | 2 |
Apis mellifera (7460) | 0.0 | 4.5 | 8.9 | 1 |
KN Contents by Gene-Gene Relationships:
Edge Type Collection | Human Network Edges (millions) | Human Datasets | All Network Edges (millions) | All Datasets |
---|---|---|---|---|
Total | 24.3 | 8 | 448.7 | 42 |
Text_Mining/Integrated | 9.0 | 2 | 130.6 | 19 |
Coexpression | 7.3 | 2 | 119.8 | 19 |
Experimental_Interaction | 5.4 | 4 | 108.7 | 21 |
Conservation/Proximity | 1.6 | 2 | 26.1 | 36 |
Pathway_Database | 1.1 | 3 | 63.4 | 20 |
KN Contents by Property-Gene Relationships:
Edge Type Collection | Human Network Edges (millions) | Human Property Nodes (thousands) | Human Datasets | All Network Edges (millions) | All Property Nodes (thousands) | All Datasets |
---|---|---|---|---|---|---|
Total | 25.0 | 151.7 | 67 | 28.1 | 178.5 | 96 |
Tissue_Expression | 13.7 | 25.9 | 32 | 13.7 | 25.9 | 32 |
Disease/Drug | 6.0 | 82.3 | 13 | 6.3 | 83.4 | 17 |
Regulation | 4.4 | 3.3 | 10 | 4.4 | 3.3 | 10 |
Pathways | 0.6 | 16.9 | 5 | 1.4 | 34.6 | 5 |
Ontologies | 0.3 | 17.2 | 5 | 1.8 | 23.5 | 12 |
Protein_Domains | 0.0 | 6.2 | 2 | 0.5 | 7.8 | 20 |
KN Contents by Gene-Gene Edge Type:
Edge Type Collection | Human Network Edges (millions) | Human Gene Nodes (thousands) | Human Datasets | All Network Edges (millions) | All Gene Nodes (thousands) | All Datasets |
---|---|---|---|---|---|---|
STRING Text Mining from Abstracts (STRING_textmining) | 8.5 | 18.1 | 1 | 130.2 | 275.7 | 18 |
STRING Co-expression (STRING_coexpression) | 7.2 | 17.7 | 1 | 119.7 | 278.8 | 18 |
STRING Experimental PPI (STRING_experimental) | 5.1 | 16.3 | 1 | 108.5 | 244.2 | 18 |
Blastp Protein Sequence Similarity (blastp_homology) | 0.9 | 22.6 | 1 | 11.8 | 377.1 | 18 |
STRING Proximal Neighborhood (STRING_neighborhood) | 0.6 | 3.8 | 1 | 14.3 | 71.9 | 18 |
STRING Functional Databases (STRING_database) | 0.6 | 10.6 | 1 | 62.9 | 183.3 | 18 |
HumanNet Integrated Network (hn_IntNet) | 0.5 | 16.0 | 1 | 0.5 | 16.0 | 1 |
PPI Physical Association (PPI_physical_association) | 0.3 | 16.6 | 3 | 0.3 | 16.6 | 3 |
HumanNet Co-Expression of Human Genes (hn_HS_CX) | 0.2 | 10.9 | 1 | 0.2 | 10.9 | 1 |
Reactome PPI Neighboring Reaction (reactome_PPI_neighbouring_reaction) | 0.2 | 6.1 | 1 | 0.2 | 6.1 | 1 |
Pathway Commons In Complex With (pathcom_in_complex_with) | 0.1 | 7.2 | 1 | 0.1 | 7.2 | 1 |
Reactome PPI Reaction Partners (reactome_PPI_reaction) | 0.1 | 6.4 | 1 | 0.1 | 6.4 | 1 |
KN Contents by Property-Gene Edge Type:
Edge Type Collection | Human Network Edges (millions) | Human Property Nodes (thousands) | Human Gene Nodes (thousands) | Human Datasets | All Network Edges (millions) | All Property Nodes (thousands) | All Gene Nodes (thousands) | All Datasets |
---|---|---|---|---|---|---|---|---|
Enrichr Tissue Signature (enrichr_tissue_signature) | 8.3 | 6.2 | 20.7 | 5 | 8.3 | 6.2 | 20.7 | 5 |
Enrichr ChIP Gene Sets (enrichr_ChIP_gene_set) | 4.1 | 2.0 | 21.6 | 4 | 4.1 | 2.0 | 21.6 | 4 |
GEO Expression Set (GEO_expression_set) | 3.7 | 12.2 | 22.5 | 22 | 3.7 | 12.2 | 22.5 | 22 |
Enrichr LINCS Up Sets (LINCS_up_set) | 2.3 | 33.1 | 9.1 | 1 | 2.3 | 33.1 | 9.1 | 1 |
Enrichr LINCS Down Sets (LINCS_down_set) | 2.0 | 33.1 | 9.0 | 1 | 2.0 | 33.1 | 9.0 | 1 |
Allen Brain Atlas Signatures (allen_brain_atlas_signature) | 1.3 | 4.3 | 13.7 | 2 | 1.3 | 4.3 | 13.7 | 2 |
MSigDB Immunologic Signatures (msigdb_c7_all) | 0.9 | 4.9 | 19.0 | 1 | 0.9 | 4.9 | 19.0 | 1 |
MSigDB Chemical and Genetic Perturbations (msigdb_c2_cgp) | 0.4 | 3.4 | 19.0 | 1 | 0.4 | 3.4 | 19.0 | 1 |
Gene Ontology (gene_ontology) | 0.3 | 17.2 | 19.3 | 5 | 1.8 | 23.5 | 205.5 | 12 |
PPI Complex (PPI_complex) | 0.3 | 1.8 | 9.0 | 1 | 0.3 | 1.8 | 9.0 | 1 |
GeneSigDB Gene Signature (genesigdb_gene_signature) | 0.3 | 2.1 | 18.9 | 1 | 0.3 | 2.1 | 18.9 | 1 |
HMDB Metabolite Signatures (HMDB_metabolite_signatures) | 0.2 | 3.9 | 3.3 | 1 | 0.2 | 3.9 | 3.3 | 1 |
Pathway Commons Pathways (pathcom_pathway) | 0.2 | 12.5 | 13.5 | 1 | 0.2 | 12.5 | 13.5 | 1 |
ESCAPE Stem Cell Gene Set (ESCAPE_gene_set) | 0.2 | 0.3 | 14.5 | 1 | 0.2 | 0.3 | 14.5 | 1 |
Enrichr Signatures of Cancer Cell Lines (enrichr_cell_signature) | 0.2 | 1.1 | 16.2 | 2 | 0.2 | 1.1 | 16.2 | 2 |
Enrichr Phenotype Signature (enrichr_phenotype_signature) | 0.1 | 2.4 | 7.6 | 4 | 0.3 | 3.5 | 23.8 | 8 |
Reactome Pathways Curated (reactome_annotation) | 0.1 | 1.9 | 11.6 | 1 | 0.9 | 19.6 | 95.5 | 1 |
Achilles Genetic Fitness of Cell Lines (achilles_genetic_fitness) | 0.1 | 0.4 | 4.7 | 2 | 0.1 | 0.4 | 4.7 | 2 |
KEA Kinase Signatures (KEA_kinase_signatures) | 0.0 | 0.4 | 3.0 | 1 | 0.0 | 0.4 | 3.0 | 1 |
MSigDB Cancer Gene Neighborhoods (msigdb_c4_cgn) | 0.0 | 0.4 | 4.8 | 1 | 0.0 | 0.4 | 4.8 | 1 |
MSigDB Cancer Modules (msigdb_c4_cm) | 0.0 | 0.4 | 8.3 | 1 | 0.0 | 0.4 | 8.3 | 1 |
MSigDB Oncogenic Signatures (msigdb_c6_all) | 0.0 | 0.2 | 10.9 | 1 | 0.0 | 0.2 | 10.9 | 1 |
Enrichr Pathway Membership (enrichr_pathway) | 0.0 | 0.6 | 6.5 | 2 | 0.0 | 0.6 | 6.5 | 2 |
PANTHER Classification (panther_classification) | 0.0 | 0.1 | 2.0 | 1 | 0.0 | 0.1 | 2.0 | 1 |
PFam Prot Domains (pfam_prot) | 0.0 | 6.1 | 18.7 | 1 | 0.5 | 7.7 | 301.5 | 19 |
SILAC Phosphoproteomics (SILAC_phosphoproteomics) | 0.0 | 0.1 | 4.1 | 1 | 0.0 | 0.1 | 4.1 | 1 |
TargetScan MicroRNA (TargetScan_microRNA) | 0.0 | 0.2 | 6.1 | 1 | 0.0 | 0.2 | 6.1 | 1 |
MSigDB microRNA Targets (msigdb_c3_mir) | 0.0 | 0.2 | 7.4 | 1 | 0.0 | 0.2 | 7.4 | 1 |
KN Data Resources
Resource | Reference | Source Files | License |
---|---|---|---|
STRING | Szklarczyk D, Franceschini A, Wyder S, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447-52. Pubmed |
| The dataset obtained from STRING is distributed under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
BLAST | Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403-10. Pubmed |
| NCBI itself places no restrictions on the use or distribution of the data contained therein. Nor do we accept data when the submitter has requested restrictions on reuse or redistribution. Full disclaimer can be found at https://www.ncbi.nlm.nih.gov/home/about/policies/ |
Reactome | Fabregat A, Sidiropoulos K, Garapati P, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44(D1):D481-7. Pubmed |
| The Reactome data and source code continues to be publicly accessible under the terms of a Creative Commons Attribution 3.0 Unported License. |
BioGRID | Chatr-aryamontri A, Oughtred R, Boucher L, et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017;45(D1):D369-D379. Pubmed |
| BioGRID interaction data are 100% freely available to both commercial and academic users. |
DIP | Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004;32(Database issue):D449-51. Pubmed |
| Creative Commons Attribution-NoDerivs License. However, if you intend to distribute data from our database, you must ask us for permission first. |
IntAct | Orchard S, Ammari M, Aranda B, et al. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42(Database issue):D358-63. Pubmed |
| IntAct is released monthly. All IntAct data and software is freely available to all users, academic or commercial, under the terms of the Apache License, Version 2.0. |
HumanNet | Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21(7):1109-21. Pubmed |
| Contact Dr. Edward Marcotte, Email: marcotte AT icmb dot utexas dot edu |
Pathway Commons | Cerami EG, Gross BE, Demir E, et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011;39(Database issue):D685-90. Pubmed |
| Full list of data sources are available at http://www.pathwaycommons.org/#data |
Gene Ontology | Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015;43(Database issue):D1049-56. Pubmed |
| Creative commons license attribution 4.0 international |
Pfam | Finn RD, Coggill P, Eberhardt RY, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279-85. Pubmed |
| Pfam is freely available under the Creative Commons Zero ("CC0") licence. |
Enrichr | Kuleshov MV, Jones MR, Rouillard AD, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90-7. Pubmed |
| Enrichr's web-based tools and services are free for academic, non-profit use, but for commercial uses please contact MSIP for a license. |
MSigDB | Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545-50. Pubmed |
| MSigDB v6.0 is available under a Creative Commons style license, plus additional terms for some gene sets. The full license terms are available at http://software.broadinstitute.org/gsea/license_terms_list.jsp |