Mappings for Entities in the Knowledge Network
The KN_Mapper is a tool developed to map gene names to the gene node entity identifiers that are used in the Knowledge Network. This tool is available as a Docker image on the Dockstore with a Common Workflow Language (CWL) description and example usage. The source code is also available on Github with example data sets. The KN_Mapper tool uses a Redis Database hosted at knowredis.knoweng.org. For the 20 species Knowledge Network, the tool is able to map from over 14 million gene names to the over 400,000 gene entities.
Find out more at the Dockstore, Docker Hub, and on GitHub.
Fetching Subnetworks from The Knowledge Network
The KN_Fetcher is another tool developed by KnowEnG. Given a KN Edge Type identifier and a Species taxon identifier, it fetches the related subnetwork of the Knowledge Network. Like the previous tool, the KN_Fetcher Docker image and CWL description are available through the Dockstore and the source code is available on Github. The tool retrieves processed subnetworks of the KN and returns the data as a set of three files:
- interactions/edges of the subnetwork with references to their source
- identifiers, aliases, and descriptions for the nodes in the subnetwork
- summary and provenance metadata about the subnetwork
Find out more at the Dockstore, Docker Hub, and on GitHub.
Building the Knowledge Network
The Knowledge Network is created with a scalable cloud infrastructure that automatically checks the selected data resources for updates, downloads and parses relevant information and datasets, maps gene aliases to a set of stable identifiers, imports the resulting interactions and metadata into relational databases (schema), and finally extracts processed subnetworks for independent knowledge-guided analysis.
While the Knowledge Network Build pipeline can be run on a single machine, it is optimized for scalable cloud infrastructures. The pipeline is decomposedinto small, discrete tasks that can be distributed as unique Docker containers on a compute cloud with a shared file system. Scheduling the tasks is coordinated by the Mesos datacenter management system and dependencies between the tasks are enforced by the Chronos framework.
Find out more at our documentation, GitHub, and Docker Hub.