Skip to content

KEGG: Kyoto Encyclopedia of Genes and Genomes

Why this mattered

KEGG mattered because it helped move genomics from lists of genes toward computable biological systems. In the late 1990s, complete genome sequences were rapidly accumulating, but sequence alone did not explain how genes worked together. KEGG’s central move was to organize gene functions in terms of pathways, molecular interactions, ortholog groups, compounds, enzymes, and genome maps. That made “function” less a property of an isolated gene and more a position in a conserved network of reactions and regulatory relations.

The practical consequence was that a newly sequenced genome could be interpreted systematically: genes could be mapped onto known pathways, missing enzymes could be inferred, organism-specific metabolic capacity could be reconstructed, and orthologous relationships could be compared across species. This changed what genome annotation could mean. Instead of annotating genes one by one, researchers could ask whether an organism possessed a pathway, how that pathway differed from another organism’s, and what biological capabilities followed from those differences.

KEGG also anticipated later systems biology and functional genomics by treating databases as active analytical infrastructure rather than static archives. Its integration of genomes, pathways, enzymes, compounds, and computational tools made it a foundation for pathway enrichment analysis, metabolic reconstruction, comparative genomics, microbiome interpretation, drug-target reasoning, and multi-omics analysis. Many subsequent breakthroughs depended on exactly this shift: converting high-throughput molecular measurements into interpretable biological networks.

Abstract

Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

Sources