Natural glycans are known to take part in many key biological processes such as cell adhesion, recognition, receptor activation or signal transduction, and they also exhibit major structural functions in both bacteria and plants. In addition, bacterial glycans act as virulence, osmoprotection and desiccation protection factors . The diversity of structures within the mammalian glycome seems limited and is well described in the literature . On the other hand, the bacterial glycome exhibits greater diversity, stemming largely from the distinct cell wall architecture of these organisms.
The cell envelope of either Gram-positive or Gram-negative bacteria is based on peptidoglycan, a polymer in which polysaccharide chains are cross-linked with short peptide chains . Gram-negative bacteria possess an additional outer membrane that is composed of a lipopolysaccharide-protein complex. Gram-positive bacteria have no outer membrane, but the peptidoglycan wall is thicker (> 30 nm vs. 10 nm in Gram-negative bacteria) and contains polysaccharides with teichoic acids attached (a carbohydrate polymer containing alditols and phosphodiester linkages).
Both Gram-positive and Gram-negative bacteria produce extracellular polysaccharides, present either as a discrete capsule covalently attached to the cell envelope or as a slime weakly bound to the cell surface. These various glycoconjugates and polysaccharides on the surface of the cell often contain the antigenic determinants that initiate an immunogenic response in a host. In addition, these surface carbohydrates provide recognition elements for pathogens such as bacteriophages.
The lipopolysaccharide of Gram-negative bacteria contains lipid A, a phosphorylated GlcN-GlcN disaccharide moiety, N- and O-acylated with fatty acids which anchor the molecule in the outer leaflet of the outer membrane. Lipid A is covalently linked to a heteropolysaccharide which interacts with the environment and consists of an inner core (commonly containing Kdo (3-deoxy-D-manno-oct-2-ulosonic acid) and manno-heptoses) and an outer O-specific chain, a complex polysaccharide which determines the serological or antigenic properties of the lipopolysaccharide [4, 5]. These so-called O-antigens are mainly heteropolymers containing a large variety of residues (mainly monosaccharides, but also alditols, amino acids, etc.). These components, together with the capsular polysaccharides (K-antigens [6, 7]), can elicit an immune response in higher organisms.
The structures of the various carbohydrate antigens are unique, often being characterized by repeating units in the polymer structure. Indeed, all types of monosaccharides, including L-rhamnose (6-deoxy-L-mannose) and L-fucose (6-deoxy-L-galactose), are found in bacteria, together with rarer, modified sugars, such as 3,6-dideoxyhexoses and Kdo. Knowledge of the structures of surface carbohydrates and their variations is required for understanding how cellular recognition, adhesion, and the immune response operate at the molecular level. This understanding provides a basis for the design of synthetic carbohydrate-based vaccines, diagnostic agents, and immunostimulators. Certain fragments of bacterial polysaccharides, in the form of appropriate glycoconjugates, are known to act as vaccines .
Carbohydrates represent the most diverse class of biopolymers, and there is growing interest in the study and analysis of this diversity and its biomedical significance. For example, vertebrate glycan variability is assumed to act as a barrier that prevents the spread of an infection within a given population . Although it is widely known that the diversity of carbohydrates is much greater in bacteria than in mammals, no systematic attempt has been undertaken to examine the diversity of bacterial carbohydrates in detail. The structures deposited in glycoscience databases have been only sporadically evaluated. However, statistical structure-oriented investigations using carbohydrate databases were proven to be useful for immunochemical research and serotyping . Systematic analysis of all publicly available data will not only expand our general knowledge and understanding of the complexity of glycans in biological systems but will also offer a framework for the design of more comprehensive high-throughput screening methods or devices.
Comprehensive data concerning carbohydrate diversity within the entire bacterial world will be useful for the classification of bacteria according to their glycan structures and facilitate the search for the most widespread carbohydrate markers of various bacterial taxonomic groups. These markers are critical for medical applications, and a simple ranking by abundance is a good starting point for the design of synthetic biologically-active carbohydrates and for corresponding immunological studies. In particular, the statistics of monomer composition reveal potential taxonomic markers and also simplify the creation of carbohydrate microarrays by providing candidates for spotting .
A one-enzyme-class/one-saccharide-linkage paradigm applies for almost all individual steps of glycan biosynthesis. Accordingly, complete information on the diversity of disaccharide fragments allows one to describe the diversity of the glycosyltransferases expressed in individual taxonomic groups, and these enzymes may become potential targets for antimicrobial treatment.
For this study we performed statistical analyses of the Bacterial Carbohydrate Structure Data Bank (BCSDB), the largest database for bacterial glycans containing nearly all known bacterial glycan structures published up to 2007 . For comparison the mammalian glycans documented in the GLYCOSCIENCES.de database  (derived mainly from CarbBank ) have also been examined. The properties analyzed include glycan size, branching, and charge density, as well as the frequency of occurrence of specific monosaccharide residues, residue pairs and their linkage configurations. Precise definitions for the terminology used in this study can be found in the Methods section.