A comprehensive update of the sequence and structure classification of kinases
© Cheek et al; licensee BioMed Central Ltd. 2005
Received: 11 January 2005
Accepted: 16 March 2005
Published: 16 March 2005
A comprehensive update of the classification of all available kinases was carried out. This survey presents a complete global picture of this large functional class of proteins and confirms the soundness of our initial kinase classification scheme.
The new survey found the total number of kinase sequences in the protein database has increased more than three-fold (from 17,310 to 59,402), and the number of determined kinase structures increased two-fold (from 359 to 702) in the past three years. However, the framework of the original two-tier classification scheme (in families and fold groups) remains sufficient to describe all available kinases. Overall, the kinase sequences were classified into 25 families of homologous proteins, wherein 22 families (~98.8% of all sequences) for which three-dimensional structures are known fall into 10 fold groups. These fold groups not only include some of the most widely spread proteins folds, such as the Rossmann-like fold, ferredoxin-like fold, TIM-barrel fold, and antiparallel β-barrel fold, but also all major classes (all α, all β, α+β, α/β) of protein structures. Fold predictions are made for remaining kinase families without a close homolog with solved structure. We also highlight two novel kinase structural folds, riboflavin kinase and dihydroxyacetone kinase, which have recently been characterized. Two protein families previously annotated as kinases are removed from the classification based on new experimental data.
Structural annotations of all kinase families are now revealed, including fold descriptions for all globular kinases, making this the first large functional class of proteins with a comprehensive structural annotation. Potential uses for this classification include deduction of protein function, structural fold, or enzymatic mechanism of poorly studied or newly discovered kinases based on proteins in the same family.
We restrict the definition of "kinase" to enzymes that catalyze the transfer of the terminal phosphate group from ATP (with a few exceptions, such GTP, as discussed below) to a substrate containing an alcohol, nitrogenous, carboxyl, or phosphate group as the phosphoryl acceptor. The substrate of a kinase can be a small molecule, lipid, or protein. Kinases play indispensable roles in numerous cellular metabolic and signalling pathways, and they are among the best-studied enzymes at the structural, biochemical, and cellular levels. Despite that all kinases use the same phosphate donor (in most cases, ATP) and catalyze apparently the same phosphoryl transfer reaction, they display remarkable diversity in their structural folds and substrate recognition mechanisms. This is probably due largely to the extraordinarily diverse nature of the structures and properties of their substrates.
In order to investigate the relationship between structural fold and functional specificities in kinases, we carried out a comprehensive analysis of all available kinase structures and sequences  three years ago. This analysis surveyed more than 17,000 kinase sequences, which were classified into 30 families of homologous proteins. Furthermore, we found that 98% of these sequences fell into seven general fold groups with known structures. We were subsequently able to use this kinase classification scheme to analyze various aspects of kinase function and evolution, such as the shared catalytic and nucleotide substrate binding mechanisms across different kinase families and folds.
Such protein classification has been shown to be in demand by biologists because it is a useful tool for analyzing various aspects of sequence/structure/function relationships in proteins, such as structure prediction or identification of functionally important residues. Given the rapid increase in the sizes of sequence and structure databases, it is also essential that a protein classification scheme remain stable over time. Ideally, the backbone of a classification scheme should not require fundamental revisions with the inclusion of additional information. The nearly three years that have passed since the completion of the initial kinase survey have seen a large influx of sequence and structural data due to large-scale projects such as genome sequencing and structural genomics initiatives, as well as functional studies by numerous individual research laboratories. There appear to be sufficient new developments in the field that warrant an update of the initial kinase survey and an evaluation of whether the families and fold groups identified in the original classification scheme still provide a comprehensive framework for all available kinase sequences.
Here, we present an updated version of this kinase classification. This update serves two important purposes: to validate the robustness of the initial kinase classification scheme and to present, for the first time, a complete structural annotation of this large functional class of proteins. The updated kinase survey now includes over 59,000 sequences that belong to 25 families of homologous proteins. Despite that the total number of kinase sequences has increased more than three-fold, the framework of the original classification remains sufficient for describing all available kinase sequences. The initial survey also included large-scale structure predictions for kinases with unknown structure. The structures of several of these kinases have now been solved and the original fold group/family placements are shown to be correct. The two new kinase folds that have been characterized recently are discussed. Fold predictions were performed for those kinase families still lacking a homolog with solved structure.
Results and discussion
Overall, 59,402 sequences are classified into 25 families of homologous kinases. These kinase families are further assembled into 12 fold groups based on similarity of structural fold. 22 of the 25 families belong to 10 fold groups for which the structural fold is known. One additional family (polyphosphate kinase) is now associated with a predicted structural fold and presently forms a distinct fold group. The two remaining families are both integral membrane kinases and comprise the final fold group. Within a fold group, the core of the nucleotide-binding domain of each family has the same architecture, and the topology of the protein core is either identical or related by circular permutation. Homology between families within a fold group is not implied.
Classification of kinase activities by family and fold group, part 1.
Family and PFAM/COG members
Kinase Activity (E.C.)
Representative PDB or gi
Group 1: protein S/T-Y kinase/ atypical protein kinase/ lipid kinase/ ATP-grasp 23124 sequences
protein S/T-Y kinase/ atypical protein kinase: COG0478, COG2112, PF00069, PF00454, PF01163, PF01633 22074 sequences
18.104.22.168 Choline kinase (*)
22.214.171.124 Protein kinase
126.96.36.199 Phosphorylase kinase
188.8.131.52 Homoserine kinase
184.108.40.206 1-phosphatidylinositol 4-kinase
220.127.116.11 Streptomycin 6-kinase
18.104.22.168 Ethanolamine kinase
22.214.171.124 Streptomycin 3"-kinase
126.96.36.199 Kanamycin kinase
188.8.131.52 5-methylthioribose kinase
184.108.40.206 Viomycin kinase
220.127.116.11 [Hydroxymethylglutaryl-CoA reductase (NADPH 2 )] kinase
18.104.22.168 Protein-tyrosine kinase
22.214.171.124 [Isocitrate dehydrogenase (NADP+)] kinase
126.96.36.199 [Myosin light-chain] kinase
188.8.131.52 Hygromycin-B kinase
184.108.40.206 Calcium/calmodulin-dependent protein kinase
220.127.116.11 Rhodopsin kinase
18.104.22.168 [Beta-adrenergic-receptor] kinase (*)
22.214.171.124 [Myosin heavy-chain] kinase
126.96.36.199 [Tau protein] kinase (*)
188.8.131.52 Macrolide 2'-kinase
184.108.40.206 1-phosphatidylinositol 3-kinase
220.127.116.11 [RNA-polymerase]-subunit kinase
18.104.22.168 Phosphatidylinositol-4,5-bisphosphate 3-kinase
22.214.171.124 Phosphatidylinositol-4-phosphate 3-kinase
lipid kinase: PF01504 321 sequences
126.96.36.199 1-phosphatidylinositol-4-phosphate 5-kinase
188.8.131.52 1D-myo-inositol-trisphosphate 3-kinase (*)
184.108.40.206 Inositol-tetrakisphosphate 5-kinase
220.127.116.11 1-phosphatidylinositol 5-phosphate 4-kinase
18.104.22.168 1-phosphatidylinositol 3-phosphate 5-kinase
22.214.171.124 Inositol-polyphosphate multikinase
126.96.36.199 Inositol-hexakisphosphate kinase
ATP-grasp: PF01326 729 sequences
188.8.131.52 Inositol-tetrakisphosphate 1-kinase
184.108.40.206 Pyruvate, phosphate dikinase
220.127.116.11 Pyruvate, water dikinase
Group 2: Rossmann-like 17071 sequences
P-loop kinases: COG0645, COG1618, COG1663, COG1936, COG2019, PF00265, PF00406, PF00485, PF00625, PF00693, PF01121, PF01202, PF01583, PF01591, PF01712, PF02223, PF02224, PF02283 7732 sequences
18.104.22.168 Gluconokinase (*)
22.214.171.124 Thymidine kinase
126.96.36.199 Ribosylnicotinamide kinase
188.8.131.52 Dephospho-CoA kinase
184.108.40.206 Adenylylsulfate kinase
220.127.116.11 Pantothenate kinase
18.104.22.168 Protein kinase (bacterial)
22.214.171.124 Uridine kinase (*)
126.96.36.199 Shikimate kinase
188.8.131.52 Deoxycytidine kinase (*)
184.108.40.206 Deoxyadenosine kinase
220.127.116.11 Polynucleotide 5'-hydroxyl-kinase (*)
18.104.22.168 Deoxyguanosine kinase
22.214.171.124 Tetraacyldisaccharide 4'-kinase
126.96.36.199 Deoxynucleoside kinase (*)
188.8.131.52 Adenosylcobinamide kinase
184.108.40.206 Polyphosphate kinase
220.127.116.11 Phosphomevalonate kinase
18.104.22.168 Adenylate kinase
22.214.171.124 Nucleoside-phosphate kinase
126.96.36.199 Guanylate kinase
188.8.131.52 Thymidylate kinase
184.108.40.206 Nucleoside-triphosphate – adenylate kinase
220.127.116.11 (Deoxy)nucleoside-phosphate kinase
18.104.22.168 Cytidylate kinase
2.7.4.- Uridylate kinase
phosphoenolpyruvate carboxykinase: COG1493, PF01293, PF00821 815 sequences
22.214.171.124 Protein kinase (HPr kinase/phosphatase)
126.96.36.199 Phosphoenolpyruvate carboxykinase (GTP)
188.8.131.52 Phosphoenolpyruvate carboxykinase (ATP)
phosphoglycerate kinase: PF00162 1351 sequences
184.108.40.206 Phosphoglycerate kinase
220.127.116.11 Phosphoglycerate kinase (GTP)
aspartokinase: PF00696 2171 sequences
18.104.22.168 Carbamate kinase
22.214.171.124 Aspartate kinase
126.96.36.199 Acetylglutamate kinase (*)
188.8.131.52 Glutamate 5-kinase
2.7.4.- Uridylate kinase
phosphofructokinase-like: PF00365, PF00781, PF01219, PF01513 1998 sequences
184.108.40.206 NAD(+) kinase (*)
220.127.116.11 Diphosphate – fructose-6-phosphate 1- phosphotransferase (*)
18.104.22.168 Sphinganine kinase
22.214.171.124 Diacylglycerol kinase
126.96.36.199 Ceramide kinase
ribokinase-like: PF00294, PF01256, PF02110 2722 sequences
188.8.131.52 Adenosine kinase
184.108.40.206 Pyridoxal kinase (*)
220.127.116.11 Hydroxymethylpyrimidine kinase
18.104.22.168 Hydroxyethylthiazole kinase
22.214.171.124 Inosine kinase
126.96.36.199 Tagatose-6-phosphate kinase
188.8.131.52 ADP-dependent phosphofructokinase
184.108.40.206 ADP-dependent glucokinase
220.127.116.11 Phosphomethylpyrimidine kinase (*)
thiamin pyrophosphokinase 175 sequences
18.104.22.168 Thiamin pyrophosphokinase
glycerate kinase (previously Group 15) 107 sequences
22.214.171.124 Glycerate kinase (*)
Classification of kinase activities by family and fold group, part 2.
Family and PFAM/COG members
Kinase Activity (E.C.)
Representative PDB or gi
Group 3: ferredoxin-like fold kinases 10973 sequences
nucleoside-diphosphate kinase: PF00334 923 sequences
126.96.36.199 Nucleoside-diphosphate kinase
HPPK: PF01288 609 sequences
188.8.131.52 2-amino-4-hydroxy-6- hydroxymethyldihydropteridine pyrophosphokinase
guanido kinases: PF00217 324 sequences
184.108.40.206 Guanidoacetate kinase
220.127.116.11 Creatine kinase
18.104.22.168 Arginine kinase
22.214.171.124 Lombricine kinase
histidine kinase: PF00512, COG2172 9117 sequences
126.96.36.199 Protein kinase (Histidine kinase)
188.8.131.52 [Pyruvate dehydrogenase(lipoamide)] kinase
184.108.40.206 [3-methyl-2-oxobutanoate dehydrogenase (lipoamide)] kinase
Group 4: ribonuclease H-like 2768 sequences
COG0837, PF00349, PF00370, PF00871
220.127.116.11 Erythritol kinase
18.104.22.168 Glycerol kinase
22.214.171.124 Pantothenate kinase
126.96.36.199 Allose kinase
188.8.131.52 N-acetylglucosamine kinase
184.108.40.206 N-acylmannosamine kinase
220.127.116.11 Polyphosphate-glucose phosphotransferase
18.104.22.168 Beta-glucoside kinase
22.214.171.124 Acetate kinase
126.96.36.199 Butyrate kinase
188.8.131.52 Branched-chain-fatty-acid kinase
2.7.2.- Propionate kinase
Group 5: TIM β/α-barrel kinase 1119 sequences
184.108.40.206 Pyruvate kinase
Group 6: GHMP kinase 885 sequences
COG1685, COG1907, PF00288, PF01971
220.127.116.11 Galactokinase (*)
18.104.22.168 Mevalonate kinase
22.214.171.124 Homoserine kinase
126.96.36.199 Shikimate kinase
188.8.131.52 4-(cytidine 5'-diphospho)-2- C- methyl-D- erythritol kinase (*)
184.108.40.206 Phosphomevalonate kinase
Group 7: AIR synthetase-like 1843 sequences
220.127.116.11 Thiamine-phosphate kinase
18.104.22.168 Selenide, water dikinase
Group 8: riboflavin kinase (previously Group 10) 565 sequences
22.214.171.124 Riboflavin kinase (*)
Group 9: dihydroxyacetone kinase (previously Group 17) 197 sequences
126.96.36.199 Glycerone kinase (*)
Group 10: putative glycerate kinase (previously Group 16) 148 sequences
188.8.131.52 Glycerate kinase (*)
Group 11: polyphosphate kinase (previously Group 9) 446 sequences
184.108.40.206 Polyphosphate kinase
Group 12: integral membrane kinases (previously Group 8) 263 sequences
dolichol kinase: PF01879 127 sequences
220.127.116.11 Dolichol kinase
undecaprenol kinase 136 sequences
18.104.22.168 Undecaprenol kinase
Occurrences of the same kinase activity in more than one family reflect cases of convergent evolution to the same kinase activity from different ancestors. For example, homoserine kinase entries are found in the protein kinase-like family (Group 1) and the GHMP kinase family (Group 6). These proteins each carry out the same biochemical reaction and therefore have identical EC specifications, but they belong to two unrelated protein families. Currently, an experimental structure is available only for the homoserine kinase from the GHMP kinase family.
Framework of the classification remains unchanged
The updated classification includes 42,092 additional sequences, 343 additional kinase structures, and 12 additional kinase specificities compared to the original classification. Although the total number of kinase sequences included in the classification has an impressive increase of more than three-fold (from 17,310 to 59,402), all new kinase sequences were found to be homologous to the previously established families, and thus are contained in the existing family and fold group classification. Furthermore, 343 additional kinase structures have been solved since the initial classification was completed. The majority of these structures correspond to kinases for which at least one representative structure was already known. For example, dozens of additional eukaryotic protein serine-threonine/tyrosine kinase structures were solved. Structures of 20 kinases with previously uncharacterized structures were also published (indicated by asterisks in Tables 1 and 2). The structural folds for 15 of these 20 kinases were predicted by our initial kinase classification based on their homology to proteins with known structures. All 15 of these predicted folds were shown be to correct by the experimentally determined structures. For example, choline kinase was expected to have a protein kinase-like fold similar to the other members of Family 1a (protein S/T-Y kinases and atypical protein kinases). The crystal structure of choline kinase  shows that this protein does indeed adopt a eukaryotic protein kinase-like fold. Likewise, pyridoxal kinase was shown to have a ribokinase-like fold , as was predicted in the initial kinase classification. Thus, the placements of these kinases in the classification scheme remain unchanged.
The five remaining kinases with recently solved structures belong to families for which the fold was previously unknown. Two of these kinases, riboflavin kinase and dihydroxyacetone kinase, represent two new unique kinase folds. One glycerate kinase family, which was previously listed as an independent fold group, is now placed as an additional family in the Rossmann-like fold group due to similarities in architecture and topology of the predicted nucleotide-binding domain. As the nucleotide-binding domain cannot be confidently predicted for a second distinct glycerate kinase family, these sequences tentatively remain as a separate fold group. Lastly, inositol 1,4,5-trisphosphate 3-kinase is now known to be a member of the lipid kinase-like family (Family 2b).
Comparison of initial and updated kinase surveys.
Families with Known Structure
Fold Groups with Known Structure
Two new kinase folds are characterized
The second new kinase fold was revealed by the structure of the ATP-dependent dihydroxyacetone kinase from Citrobacter freundii . Dihydroxyacetone kinase sequences are also widely distributed in organisms in all three kingdoms of life. This protein contains two regions separated by an extended linker. The N-terminal region (termed K-domain) is homologous to the non-ATP dependent DhaK protein in E. coli and other gram negative bacteria. It consists of two α/β domains and is responsible for dihydroxyacetone binding. The C-terminal region (termed L-domain, homologous to the DhaL protein in E. coli) is the nucleotide-binding domain and is comprised of 8 antiparallel α-helices that form a closed barrel (Figure 1b). The α-helices are all slightly tilted away from the axis of the barrel, forming a pocket in which a phospholipid is bound. The bound ATP analog is found to be located at the top of the α-barrel. The N-terminus of one helix (H4) is pointed toward the γ-phosphate of ATP and, together with a glycine-rich loop between helices H3 and H4, forms the primary binding site for the ATP phosphates. Ser432 interacts with the ATP α-phosphate, while Ser431 interacts with the ATP β-and γ-phosphates. Two Mg2+ ions are coordinated by all three phosphates of ATP and by the three highly conserved aspartates (Asp380, Asp385 and Asp387) located on a loop between helices H1 and H2. Additionally, the adenine ring is packed against several hydrophobic side chains (Leu435, Thr476, and Met477). Dihydroxyacetone kinase is the only kinase known to have an all-α nucleotide-binding domain. It represents another new fold group (now Group 10) in our kinase classification scheme as its fold is unlike any other kinase with known structure.
Two glycerate kinase families now with solved structures
The initial kinase survey included two protein families with putative glycerate kinase activity. These proteins fall into two separate fold groups since no significant sequence similarity was detected between the two families despite their presumably identical biochemical activities. One family (previously Group 15) contains glycerate kinases from bacterial species, primarily of the firmicutes group and of the gamma subdivision of the proteobacterial group. The second family (previously Group 16) consists of proteins from eukaryotes and archaea in addition to several different bacterial species. Representative structures from each of these two families have recently been solved.
Putative glycerate kinase from Thermotoga maritima (PDB|1o0u) (Joint Center for Structural Genomics. Crystal Structure of Glycerate Kinase (TM1585) from Thermotoga maritima at 2.95 A Resolution. To be published.) is a member of the second glycerate kinase family (previously Group 16). The fold of this protein also consists of two non-similar α/β domains (Figure 2b). The N-terminal α/β domain has Rossmann-like topology with the central 6-stranded β-sheet in the order of 654123. The C-terminal domain contains a 6-stranded mixed β-sheet with strand order 126345 and several helices packed on both sides of the β-sheet. The active site is likely to be in the cleft between the two α/β domains. In this structure, six highly conserved polar or charged residues are found with side chains pointing into the presumed active site (Figure 2b). The C-terminal domain contributes four of these highly conserved residues, while the Rossmann-like domain contributes the remaining two residues in addition to a glycine-rich loop. Each of these domains contains one highly conserved basic residue that could potentially interact with the triphosphate tail of the bound ATP: Lys47 in the Rossmann-like domain and Arg325 in the C-terminal domain. Based on the available information, it is not possible to confidently predict which domain is responsible for nucleotide binding in this putative glycerate kinase. Therefore, this family is kept as a separate fold group until its active site is characterized. The annotation for these putative glycerate kinases is based on a gene found in a 5-kb fragment that is apparently responsible for complementation in Methylbacterium extorquens AM1 mutants lacking glycerate kinase activity . However, other family members are annotated as putative glycerate dehydrogenases/hydroxypyruvate reductases based genetic analysis of the tartrate utilization pathway in Agrobacterium vitis . Glycerate kinase and glycerate dehydrogenase/hydroxypyruvate reductase catalyze successive steps in the serine metabolism pathway. Therefore, the exact biochemical function of this enzyme family remains to be resolved.
Inositol 1,4,5-trisphosphate 3-kinase is a member of the lipid kinase-like family
The mode of nucleotide binding in I3P3K is also very similar to that of eukaryotic protein kinases, as each of the critical nucleotide binding and Mg2+ binding residues in I3P3K plays the same role as a corresponding protein kinase residue. Lys209 (human I3P3K; hI3P3K) forms a hydrogen bond with the α-and β-phosphates of ATP and corresponds to the highly conserved Lys72 in protein kinase A (PKA). This lysine residue is oriented by Glu215 in hI3P3K (corresponding to Glu91 in PKA). A second highly conserved lysine residue (Lys264 in hI3P3K) interacts with the 3-OH phosphate acceptor group of the inositol 1,4,5-trisphosphate substrate and likely stabilizes the γ-phosphate during transfer, similar to the role of Lys168 in PKA. Although Lys264 (hI3P3K) and Lys168 (PKA) are contributed by different structural elements in different regions of the protein sequence, these residues are found in equivalent spatial locations and likely play the same role in catalysis. A Mn2+ ion is coordinated by Asp416, which corresponds to the conserved magnesium-binding residue Asp184 of the DFG motif in PKA. Ser399 is expected to coordinate a second divalent cation that is not modeled in the I3P3K structure, as this residue is found in the equivalent spatial location of the conserved magnesium-binding residue Asn171 in PKA. These active site similarities also extend to members of the lipid kinase family, such as phosphatidylinositol phosphate kinase IIβ (PIPK; PDB|1bo1) (Figure 3b), although a representative structure with bound nucleotide has not yet been solved.
Although I3P3K shares similarity of the overall fold as well as the active site with the related lipid kinase and protein kinase-like families, I3P3K is more closely related to the lipid kinase family. I3P3K and the lipid kinases share conserved motifs, including the substrate-binding/catalytic "DLK" motif (Asp262 to Lys264 in hI3P3K) and the magnesium-binding "SSLL" motif (Ser398 to Leu401 in hI3P3K), which are not found in protein kinases. Additionally, DALI  identifies lipid kinase representative PIPK (PDB|1bo1) as the closest structural neighbor of I3P3K . Thus, based on the similarity of structural fold and the conservation of critical nucleotide-binding, magnesium-binding, and catalytic residues I3P3K can be assigned to the lipid kinase family in the kinase classification (Family 1b) despite the lack of significant sequence similarity.
Predicted folds for remaining kinases with unknown structures
Fold predictions (see Methods) were made for each remaining family of kinases without a solved structure, with the exception of the integral membrane kinases (dolichol kinase and undecaprenol kinase). These kinases include inositol 1,4,5-trisphosphate 3-kinase, inositol 1,3,4,5,6-pentakisphosphate 2-kinase, eukaryotic pantothenate kinase, and polyphosphate kinase.
Inositol 1,4,5-trisphosphate 3-kinase (I3P3K; previously Group 11) and inositol 1,3,4,5,6-pentakisphosphate 2-kinase (I5P2K; previously Group 12) both catalyze phosphorylation reactions in the production of inositol polyphosphate (IP) second messengers. These kinases were placed in separate fold groups in the initial survey based on a lack of significant sequence similarity to each other or any other known kinase family. Before the crystal structures of I3P3K were reported during the preparation of this manuscript, fold predictions for both I3P3K and I5P2P were carried out. The results of fold predictions guided by 3D-Jury , secondary structure predictions, and observed presence of critical conserved sequence motifs indicated that both of these IP kinases would likely adopt a structural fold similar to lipid kinases and eukaryotic protein kinases, which are possibly related families that share a common ATP-binding site and structural core . Furthermore, a multiple alignment of representative I3P3K, I5P2K, lipid kinase, and protein kinase sequences shows that the critical functional residues in these proteins are also conserved in the IP kinases (Figure 3b). For example, I3P3K and I5P2K each have the conserved lysine/arginine residue that is important for orienting the α-and β-phosphates of nucleotide's triphosphate tail and the aspartate/glutamate that interacts with the ribose of ATP (residues 1 and 3 in Figure 3b, respectively). Additionally, the serine/threonine and the aspartate residues involved in coordinating the two requisite Mg2+ cations are conserved in these kinases as well (residues 5 and 6 in Figure 3b, respectively). Both I3P3K and I5P2K also have a predicted glycine-rich loop in the N-terminal region of the protein. From the multiple sequence alignment, it becomes apparent that I3P3K and I5P2K are more closely related to the other lipid kinases than to protein kinases. In addition to phosphorylating similar substrates, the IP kinases and lipid kinase family members each have critical active site lysine residue involved in stabilizing the γ-phosphate of ATP during transfer (residue 4 in Figure 3b) that has migrated in the sequence/structure relative to the protein kinase-like family. The solved structures of I3P3K from human  and rat , which were published during manuscript preparation and are discussed above, confirm this non-trivial fold assignment as well as the predicted functional roles played by the conserved active site residues. This further increased our confidence in the I5P2K prediction. Thus, I3P3K and I5P2K are now assigned as members of the lipid kinase-like family (Family 1b) in the kinase classification.
Comprehensive structural annotation of kinases
Of the 25 kinase families, 22 currently have at least one homolog with a solved structure (Tables 1 and 2). The structural folds of each domain within one additional family (polyphosphate kinase) are predicted as discussed above. The two remaining families are integral membrane kinases. Although the tertiary structure of dolichol kinase and undecaprenol kinase are not yet determined, secondary structure predictions indicate that both of these families adopt all α-helical conformations. Thus, structural annotations of all kinase families are now revealed, including fold descriptions for all globular kinases, and the kinase fold groups listed in Tables 1 and 2 present the complete structural depiction of this entire functional class of proteins. The structural folds adopted by kinases include some of the most widely spread protein folds, including the Rossmann-like fold, ferredoxin-like fold, and TIM β/α-barrel fold. The kinase fold repertoire also includes representatives of all major classes (all α, all β, α+β, α/β) of protein structures, demonstrating that nature has found ways to utilize all varieties of secondary structure combinations to carry out the kinase reaction.
Additional kinase activities in the classification
The updated classification also includes 12 additional kinase activities. However, 7 of these activities reflect changes within the EC database rather than newly characterized kinase sequences. For example, while the structure of adenosylcobinamide kinase was published in 1998, its EC number (EC 22.214.171.124) was only assigned in April 2004. The sequences and structures of this kinase were included in initial kinase survey (e.g. PDB|1cbu ) and were placed in the P-loop kinase family of the Rossmann-like fold group.
The updated kinase classification does include 5 newly annotated or characterized kinases (indicated by underlining in Tables 1 and 2). The first sequences associated with each of these activities (126.96.36.199 Fucokinase, 188.8.131.52 5-dehydro-2-deoxygluconokinase, 184.108.40.206 Ceramide kinase, and 220.127.116.11 Inositol-tetrakisphosphate 5-kinase) were identified after the initial kinase survey was completed. Sequences with [Hydroxymethylglutaryl-CoA reductase (NADPH2)] kinase activity (18.104.22.168) were included in the initial classification, although only a general kinase activity ("AMP-activated protein kinase") was assigned at the time. Thus, the specific kinase activity of this enzyme is a new addition to the kinase classification as well.
As can be seen from Tables 1 and 2, all of these newly annotated kinases (underlined in the tables) belong to existing kinase families containing members that are well characterized both biochemically and structurally. The link between these kinases and members of the existing families can all be identified by BLAST  with E-values less than 1e-5. Therefore, the catalytic mechanisms of these newly annotated kinases may be inferred from their closely related homologs.
Two families previously annotated as kinases are removed from the classification
Two kinase families were removed from the classification. In both cases, the sequences were annotated as kinases in the NCBI database, but further biochemical studies have indicated that they most likely do not have kinase activity. The case of the ThrH protein (previously Family 2g – L-2-Haloacid dehalogenase-like family) is an unusual one. Earlier genetic studies of the thrH gene of Pseudomonas aeruginosa have shown that over-expression of ThrH complements both homoserine kinase and phosphoserine phosphatase activities in vivo. The gene product of thrH was thus annotated as "bifunctional homoserine kinase/phosphoserine phosphatase isoenzyme" . A more recent structural and biochemical study of ThrH has shown that this protein does not have ATP-dependent kinase activity. Instead it possesses phosphoserine phosphatase activity and is also able to transfer a phosphate group from phosphoserine to homoserine, presumably via a phospho-enzyme intermediate . Thus, although ThrH is able to generate phosphohomoserine and complement homoserine kinase activity in vivo, it achieves this through a completely different chemical mechanism from that of true homoserine kinase. Thus, ThrH is in fact a phosphatase and phosphotransferase but not a kinase and is subsequently removed from the kinase classification.
The putative tagatose 6-phosphate kinase (T6P kinase, previously Group 13) activity was initially suggested for the agaZ gene product based on the computational analysis and reconstruction of the putative N-acetylgalactosamine metabolic pathway in E. coli . However, no tagatose 6-phosphate kinase activity for either AgaZ or its homolog GatZ can be detected experimentally, and genetic studies have suggested that gatZ is associated with tagatose-1,6-bisphosphate aldolase activity [32, 33]. Results of transitive PSI-BLAST searches and fold predictions with 3D-Jury  also suggest similarity between AgaZ and tagatose- and fructose-bisphosphate aldolases with TIM β/α barrel fold. The alignment of the putative T6P kinases with the aldolase families revealed that several residues in the aldolases that are involved in substrate binding and catalysis are also conserved in the AgaZ/GatZ protein family (data not shown). These include two histidine residues involved in the coordination of the catalytic Zn2+ and the aspartate residue that is proposed to protonate the substrate during the aldolase reaction . Thus, based on both functional study and structural prediction, it is likely that these proteins carry out an aldolase reaction rather than a kinase reaction. Therefore, this family is removed in the updated kinase classification as well.
More diversity of structural folds and nucleotide binding in kinases
In the original kinase survey study, the substrate binding and phosphoryl transfer reaction mechanisms were analyzed across protein families and fold groups, and several distinct modes of nucleotide binding have emerged. One recurring theme observed was the bound nucleotide located at the C-terminus of β-strands and N-terminus of α-helices (i.e. on β-α loops). Signature motifs that interact with the nucleotide are also common. These motifs are often rich in glycines and found on both β-α and β-β loops. One such example is the so-called P-loop (phosphate-binding loop) formed by the Walker-A motif, which is found in a variety of different proteins that bind nucleotides . The P-loop, which is located on a β-α loop, wraps around the triphosphate tail of the bound nucleotide. Together with the positive dipole of the α-helix and some positively charged side chains, a strong anion hole is created for the binding of the phosphate groups of the nucleotide.
The newly characterized riboflavin kinase is the only known kinase with an all-β structural core. RFK contains a novel nucleotide-binding motif that encompasses an arched loop (L1 in Fig. 1a), a 310 helix, and a reverse turn followed by a short β-strand (Figure 1a). This short β-strand encompasses the highly conserved PTAN sequence motif. The threonine and asparagine in the motif are directly involved in the coordination of Mg2+ and thus the binding of MgATP (Figure 1a). The mechanism of the phosphotransfer reaction in RFK appears to be direct in-line transfer of the γ-phosphate of ATP to the 5' hydroxyl group of riboflavin, which may be activated by a glutamate residue (Glu86) [7, 8]. The most unique features of RFK appear to be that the phosphate is transferred through a hole beneath the highly dynamic Loop L1, and the proper positioning of the catalytic residues depends on binding of the substrates.
The second kinase with a novel fold is dihydroxyacetone kinase, which is the only known kinase with an all-α nucleotide-binding domain. The binding of MgATP is accomplished in part through interactions with a glycine-rich loop between helices H3 and H4 and the N-terminal positive dipole of Helix H4. Uniquely, two Mg2+ ions were found to ligand to the ATP phosphates and a cluster of three highly conserved aspartate residues on a loop between helices H1 and H2. The mechanism of phosphotransfer in DhaK is not clear since the complex conformation of the crystal structure is influenced by the crystal packing and appears not in its active form. A reaction mechanism involving a phospho-enzyme intermediate cannot be ruled out at this point.
Thus, the newly characterized riboflavin kinase and dihydroxyacetone kinase reveal spectacularly different structures compared to those previously known and have enriched the kinase fold repertoire, which now includes all major classes of protein structures with α/β, α+β, all-β, and all-α folds. Although the substrate recognition and catalytic mechanisms of the two newly characterized enzymes share certain features with other kinases, such as utilization of a helix dipole and backbone amide groups in a glycine-rich loop for nucleotide binding, as a whole they are distinctly unique.
Supplementary material is available by anonymous ftp at ftp://iole.swmed.edu/pub/kinase/, which includes lists of NCBI gene identification (gi) numbers for sequences from each kinase family and a table cataloging functional residues from kinase family representatives.
We have performed an updated comprehensive survey of all available kinase sequences and structures. All experimentally characterized kinase families, with the exception of the integral membrane kinases, are now associated with a known or predicted structural fold. Therefore, the kinases are the first large functional class with a comprehensive structural annotation for its known members. Additionally, we find that, despite a three-fold increase in the number of kinase sequences and two-fold increase in the number of kinase structures, the framework of our classification remains sufficient for describing all available kinases. Furthermore, we find that no fold predictions made in the initial kinase survey are now shown to be incorrect. Thus, the updated kinase survey serves to confirm the soundness of our classification scheme in addition to presenting the final global picture of this entire functional class. Potential uses for this classification include deduction of protein function, structural fold, or enzymatic mechanism of poorly studied or newly discovered kinases based on proteins in the same family.
Constructing updated families of homologous kinases sequences
The updated version of the kinase classification scheme was assembled with the same strategy that was applied in the construction of our first kinase survey , with the previous classification used as a framework for this update. Briefly, the hmmsearch program of the HMMER2 package  was used to assign sequences from the NCBI non-redundant (nr) database (July 2, 2004; 2,911,742 sequences, including environmental sequences) to a set of 57 profiles describing catalytic kinase domains (E-value cutoff 0.1). This set of kinase profiles (from Pfam  version 5.4 and COG version 2 ) was constructed during the initial kinase survey. As these profiles had been assembled into families of homologous sequences in the initial classification scheme, the sequences assigned to these profiles by hmmsearch were then placed in the appropriate kinase families. Additionally, the GREFD program of the SEALS package  was used to extract from the nr all sequences for which the definition line contained the pattern "kinase". For any kinase sequence not already assigned to a kinase family (either by hmmsearch or in the previous classification), three iterations of PSI-BLAST  were carried out against the nr database (E-value cutoff 0.001). Any kinase producing a hit to a sequence already assigned (either by hmmsearch or in the previous classification) was subsequently placed in the corresponding kinase family. The appropriate placement of the remaining unassigned kinase sequences was determined by manual inspection of multiple alignments, secondary structure predictions, and distant PSI-BLAST hits. These proteins were placed into existing kinase families based on the presence of conserved catalytic residues and other distinguishing motifs as well as overall sequence similarity. Sequences that were fragments, non-kinase entries (e.g. kinase inhibitors), or non-catalytic entries (e.g. regulatory subunits) were removed. Such sequences were identified by their annotations in the non-redundant database and by their lengths being too short to cover the complete protein. In the case of non-kinase or non-catalytic entries, lack of kinase activity was confirmed based on either literature available concerning the sequences in question or on obvious homology to a protein with known non-kinase function. The lists of newly identified kinase sequences were appended to those for each of the kinase families included in the initial classification.
The meaning of families and fold groups in the new version of the classification remain unaltered: the families contain homologous kinase sequences, while the fold groups imply similarity of structural fold but not homology.
Fold group classification
In the initial classification, fold groups were assembled based solely on similarity of structures. Families in the same fold group share structurally similar nucleotide-binding domains that are of the same architecture and topology (or related by circular permutation) for at least the core of the domain. Some of the recently solved kinase structures allowed for the merging of certain kinase families to previously established fold groups based on these same structural similarity guidelines.
To provide fold assignments for the remaining structurally uncharacterized kinase families, initial analysis was performed with standard sequence similarity search methods such as transitive PSI-BLAST , RPS-BLAST , and profile HMMs from SMART . All searches were initiated with the representative sequences (selected in the initial survey ) of the families. Transitive PSI-BLAST (E-value threshold 0.01) was run against the NCBI non-redundant protein sequence database until convergence. CDD (RPS-BLAST)  and SMART (profile HMMs)  web tools were used with default settings to detect distant homology to other conserved protein domains annotated in the SMART, PFAM  and COG  databases. In addition, RPS-BLAST was also exploited to compare query sequences directly to the PDB using the GRDB system . Further analysis was carried out using Meta Server , which assembles various secondary structure prediction and fold recognition methods. Collected predictions were screened with 3D-Jury , the consensus method of fold recognition servers. The default servers used by the 3D-Jury system for consensus building include: ORFeus , Meta-BASIC ., FFAS03 , mGenTHREADER , INBGU , RAPTOR , FUGUE-2 , and 3D-PSSM . Final fold/template selections were based on 3D-Jury reliability scores as well as those of individual servers, correctness of mapping of predicted and observed secondary structure elements, and conservation of functionally and/or structurally important residues. In the case of inositol 1,3,4,5,6-pentakisphosphate 2-kinase, initial fold assignment was based on functional analogy to 1-phosphatidylinositol-4-phoshate 5-kinase, which phosphorylates similar substrates.
Multiple sequence alignments for considered protein families were prepared using PCMA  followed by manual adjustment. Sequence-to-structure alignments between analyzed kinase families and their distantly related template families were built using consensus alignment approach and 3D assessment  based mainly on 3D-Jury results for representative kinase sequences. Sequences of distantly related proteins of known structure were aligned first based on the superposition of their 3D structures. In the case of inositol 1,3,4,5,6-pentakisphosphate 2-kinase, sequence-to-structure alignment was prepared manually with respect to the results of secondary structure predictions and the preservation of functionally critical residues as well as the hydrophobic core of the protein.
Alterations within the classification
Although the framework of the classification remains essentially unchanged, the organization within the classification has been slightly modified. More specifically, the numbering of the fold groups has been adjusted so that all kinase families with unsolved structures are at the end. Furthermore, the EC (Enzyme Commission) numbers have been updated to reflect the current organization of the EC database. Therefore, the EC content of each family may differ somewhat between the initial and updated classifications, but these changes do not indicate new additions to the family unless otherwise indicated.
This work was supported by NIH grants GM67165 to NVG and GM63689 to HZ. SC was supported by NIH training grant T32 GM08297 to the University of Texas Southwestern Graduate Program in Molecular Biophysics.
- Cheek S, Zhang H, Grishin NV: Sequence and structure classification of kinases. J Mol Biol 2002, 320: 855–881. 10.1016/S0022-2836(02)00538-7View ArticlePubMedGoogle Scholar
- Peisach D, Gee P, Kent C, Xu Z: The crystal structure of choline kinase reveals a eukaryotic protein kinase fold. Structure (Camb) 2003, 11: 703–713. 10.1016/S0969-2126(03)00094-7View ArticleGoogle Scholar
- Li MH, Kwok F, Chang WR, Lau CK, Zhang JP, Lo SC, Jiang T, Liang DC: Crystal structure of brain pyridoxal kinase, a novel member of the ribokinase superfamily. J Biol Chem 2002, 277: 46385–46390. 10.1074/jbc.M208600200View ArticlePubMedGoogle Scholar
- Gerdes SY, Scholle MD, D'Souza M, Bernal A, Baev MV, Farrell M, Kurnasov OV, Daugherty MD, Mseeh F, Polanuyer BM, Campbell JW, Anantha S, Shatalin KY, Chowdhury SA, Fonstein MY, Osterman AL: From genetic footprinting to antimicrobial drug targets: examples in cofactor biosynthetic pathways. J Bacteriol 2002, 184: 4555–4572. 10.1128/JB.184.16.4555-4572.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Santos MA, Jimenez A, Revuelta JL: Molecular characterization of FMN1, the structural gene for the monofunctional flavokinase of Saccharomyces cerevisiae. J Biol Chem 2000, 275: 28618–28624. 10.1074/jbc.M004621200View ArticlePubMedGoogle Scholar
- Karthikeyan S, Zhou Q, Mseeh F, Grishin NV, Osterman AL, Zhang H: Crystal structure of human riboflavin kinase reveals a b barrel fold and a novel active site arch. Structure (Camb) 2003, 11: 265–273. 10.1016/S0969-2126(03)00024-8View ArticleGoogle Scholar
- Bauer S, Kemter K, Bacher A, Huber R, Fischer M, Steinbacher S: Crystal structure of Schizosaccharomyces pombe riboflavin kinase reveals a novel ATP and riboflavin-binding fold. J Mol Biol 2003, 326: 1463–1473. 10.1016/S0022-2836(03)00059-7View ArticlePubMedGoogle Scholar
- Karthikeyan S, Zhou Q, Osterman AL, Zhang H: Ligand binding-induced conformational changes in riboflavin kinase: structural basis for the ordered mechanism. Biochemistry 2003, 42: 12532–12538. 10.1021/bi035450tView ArticlePubMedGoogle Scholar
- Siebold C, Arnold I, Garcia-Alles LF, Baumann U, Erni B: Crystal structure of the Citrobacter freundii dihydroxyacetone kinase reveals an eight-stranded a-helical barrel ATP-binding domain. J Biol Chem 2003, 278: 48236–48244. 10.1074/jbc.M305942200View ArticlePubMedGoogle Scholar
- Chistoserdova L, Lidstrom ME: Identification and mutation of a gene required for glycerate kinase activity from a facultative methylotroph, Methylobacterium extorquens AM1. J Bacteriol 1997, 179: 4946–4948.PubMed CentralPubMedGoogle Scholar
- Crouzet P, Otten L: Sequence and mutational analysis of a tartrate utilization operon from Agrobacterium vitis. J Bacteriol 1995, 177: 6518–6526.PubMed CentralPubMedGoogle Scholar
- Gonzalez B, Schell MJ, Letcher AJ, Veprintsev DB, Irvine RF, Williams RL: Structure of a human inositol 1,4,5-trisphosphate 3-kinase: substrate binding reveals why it is not a phosphoinositide 3-kinase. Mol Cell 2004, 15: 689–701. 10.1016/j.molcel.2004.08.004View ArticlePubMedGoogle Scholar
- Miller GJ, Hurley JH: Crystal structure of the catalytic core of inositol 1,4,5-trisphosphate 3-kinase. Mol Cell 2004, 15: 703–711. 10.1016/j.molcel.2004.08.005View ArticlePubMedGoogle Scholar
- Rao VD, Misra S, Boronenkov IV, Anderson RA, Hurley JH: Structure of type IIb phosphatidylinositol phosphate kinase: a protein kinase fold flattened for interfacial phosphorylation. Cell 1998, 94: 829–839. 10.1016/S0092-8674(00)81741-9View ArticlePubMedGoogle Scholar
- Holm L, Sander C: Dali: a network tool for protein structure comparison. Trends Biochem Sci 1995, 20: 478–480. 10.1016/S0968-0004(00)89105-7View ArticlePubMedGoogle Scholar
- Ginalski K, Elofsson A, Fischer D, Rychlewski L: 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003, 19: 1015–1018. 10.1093/bioinformatics/btg124View ArticlePubMedGoogle Scholar
- Grishin NV: Phosphatidylinositol phosphate kinase: a link between protein kinase and glutathione synthase folds. J Mol Biol 1999, 291: 239–247. 10.1006/jmbi.1999.2973View ArticlePubMedGoogle Scholar
- Yun M, Park CG, Kim JY, Rock CO, Jackowski S, Park HW: Structural basis for the feedback regulation of Escherichia coli pantothenate kinase by coenzyme A. J Biol Chem 2000, 275: 28093–28099.PubMedGoogle Scholar
- Calder RB, Williams RS, Ramaswamy G, Rock CO, Campbell E, Unkles SE, Kinghorn JR, Jackowski S: Cloning and characterization of a eukaryotic pantothenate kinase gene (panK) from Aspergillus nidulans. J Biol Chem 1999, 274: 2014–2020. 10.1074/jbc.274.4.2014View ArticlePubMedGoogle Scholar
- Locher KP, Hans M, Yeh AP, Schmid B, Buckel W, Rees DC: Crystal structure of the Acidaminococcus fermentans 2-hydroxyglutaryl-CoA dehydratase component A. J Mol Biol 2001, 307: 297–308. 10.1006/jmbi.2000.4496View ArticlePubMedGoogle Scholar
- Bork P, Sander C, Valencia A: An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp70 heat shock proteins. Proc Natl Acad Sci U S A 1992, 89: 7290–7294.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuroda A, Kornberg A: Polyphosphate kinase as a nucleoside diphosphate kinase in Escherichia coli and Pseudomonas aeruginosa. Proc Natl Acad Sci U S A 1997, 94: 439–442. 10.1073/pnas.94.2.439PubMed CentralView ArticlePubMedGoogle Scholar
- Davies DR, Interthal H, Champoux JJ, Hol WG: The crystal structure of human tyrosyl-DNA phosphodiesterase, Tdp1. Structure (Camb) 2002, 10: 237–248. 10.1016/S0969-2126(02)00707-4View ArticleGoogle Scholar
- Stuckey JA, Dixon JE: Crystal structure of a phospholipase D family member. Nat Struct Biol 1999, 6: 278–284. 10.1038/6716View ArticlePubMedGoogle Scholar
- Ahn K, Kornberg A: Polyphosphate kinase from Escherichia coli. Purification and demonstration of a phosphoenzyme intermediate. J Biol Chem 1990, 265: 11734–11739.PubMedGoogle Scholar
- Kumble KD, Ahn K, Kornberg A: Phosphohistidyl active sites in polyphosphate kinase of Escherichia coli. Proc Natl Acad Sci U S A 1996, 93: 14391–14395. 10.1073/pnas.93.25.14391PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson TB, Thomas MG, Escalante-Semerena JC, Rayment I: Three-dimensional structure of adenosylcobinamide kinase/adenosylcobinamide phosphate guanylyltransferase from Salmonella typhimurium determined to 2.3 A resolution. Biochemistry 1998, 37: 7686–7695. 10.1021/bi973178fView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Patte JC, Clepet C, Bally M, Borne F, Mejean V, Foglino M: ThrH, a homoserine kinase isozyme with in vivo phosphoserine phosphatase activity in Pseudomonas aeruginosa. Microbiology 1999, 145: 845–853.View ArticlePubMedGoogle Scholar
- Singh SK, Yang K, Karthikeyan S, Huynh T, Zhang X, Phillips MA, Zhang H: The thrH gene product of Pseudomonas aeruginosa is a dual activity enzyme with a novel phosphoserine:homoserine phosphotransferase activity. J Biol Chem 2004, 279: 13166–13173. 10.1074/jbc.M311393200View ArticlePubMedGoogle Scholar
- Reizer J, Ramseier TM, Reizer A, Charbit A, Saier MHJ: Novel phosphotransferase genes revealed by bacterial genome sequencing: a gene cluster encoding a putative N-acetylgalactosamine metabolic pathway in Escherichia coli. Microbiology 1996, 142: 231–250.View ArticlePubMedGoogle Scholar
- Nobelmann B, Lengeler JW: Molecular analysis of the gat genes from Escherichia coli and of their roles in galactitol transport and metabolism. J Bacteriol 1996, 178: 6790–6795.PubMed CentralPubMedGoogle Scholar
- Brinkkotter A, Kloss H, Alpert C, Lengeler JW: Pathways for the utilization of N-acetyl-galactosamine and galactosamine in Escherichia coli. Mol Microbiol 2000, 37: 125–135. 10.1046/j.1365-2958.2000.01969.xView ArticlePubMedGoogle Scholar
- Hall DR, Bond CS, Leonard GA, Watt CI, Berry A, Hunter WN: Structure of tagatose-1,6-bisphosphate aldolase. J Biol Chem 2002, 277: 22018–22024. 10.1074/jbc.M202464200View ArticlePubMedGoogle Scholar
- Saraste M, Sibbald PR, Wittinghofer A: The P-loop--a common motif in ATP- and GTP-binding proteins. Trends Biochem Sci 1990, 15: 430–434. 10.1016/0968-0004(90)90281-FView ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14: 755–763. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res 2004, 32 Database issue: D138–41. 10.1093/nar/gkh121View ArticleGoogle Scholar
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000, 28: 33–36. 10.1093/nar/28.1.33PubMed CentralView ArticlePubMedGoogle Scholar
- Walker DR, Koonin EV: SEALS: a system for easy analysis of lots of sequences. Proc Conf Intell Syst Mol Biol 1997, 5: 333–339.Google Scholar
- Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH: CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 2003, 31: 383–387. 10.1093/nar/gkg087PubMed CentralView ArticlePubMedGoogle Scholar
- Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucleic Acids Res 2004, 32 Database issue: D142–4. 10.1093/nar/gkh088View ArticleGoogle Scholar
- NCBI Conserved Domain Search[http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi]
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4: 41. 10.1186/1471-2105-4-41PubMed CentralView ArticlePubMedGoogle Scholar
- Gene Relational Data Base[http://basic.bioinfo.pl]
- Bujnicki JM, Elofsson A, Fischer D, Rychlewski L: Structure prediction meta server. Bioinformatics 2001, 17: 750–751. 10.1093/bioinformatics/17.8.750View ArticlePubMedGoogle Scholar
- Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L: ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 2003, 31: 3804–3807. 10.1093/nar/gkg504PubMed CentralView ArticlePubMedGoogle Scholar
- Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L: Detecting distant homology with Meta-BASIC. Nucleic Acids Res 2004, 32: W576–81.PubMed CentralView ArticlePubMedGoogle Scholar
- Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000, 9: 232–241.PubMed CentralView ArticlePubMedGoogle Scholar
- Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 1999, 287: 797–815. 10.1006/jmbi.1999.2583View ArticlePubMedGoogle Scholar
- Fischer D: Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomput 2000, 119–130.Google Scholar
- Xu J, Li M, Lin G, Kim D, Xu Y: Protein threading by linear programming. Pac Symp Biocomput 2003, 264–275.Google Scholar
- Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310: 243–257. 10.1006/jmbi.2001.4762View ArticlePubMedGoogle Scholar
- Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000, 299: 499–520. 10.1006/jmbi.2000.3741View ArticlePubMedGoogle Scholar
- Pei J, Sadreyev R, Grishin NV: PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 2003, 19: 427–428. 10.1093/bioinformatics/btg008View ArticlePubMedGoogle Scholar
- Ginalski K, Rychlewski L: Protein structure prediction of CASP5 comparative modeling and fold recognition targets using consensus alignment approach and 3D assessment. Proteins 2003, 53 Suppl 6: 410–417. 10.1002/prot.10548View ArticlePubMedGoogle Scholar
- Kraulis PJ: MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J of App Crystall 1991, 24: 946–950. 10.1107/S0021889891004399View ArticleGoogle Scholar
- Kobe B, Heierhorst J, Feil SC, Parker MW, Benian GM, Weiss KR, Kemp BE: Giant protein kinases: domain interactions and structural basis of autoregulation. EMBO Journal 1996, 15: 6810–6821.PubMed CentralPubMedGoogle Scholar
- Aleshin AE, Kirby C, Liu X, Bourenkov GP, Bartunik HD, Fromm HJ, Honzatko RB: Crystal structures of mutant monomeric hexokinase I reveal multiple ADP binding sites and conformational changes relevant to allosteric regulation. J Mol Biol 2000, 296: 1001–1015. 10.1006/jmbi.1999.3494View ArticlePubMedGoogle Scholar