Seminar room RB35 (Roslagstullsbacken 35, the SBC house)
Description
Estimations show that over half of all mammalian proteins are glycosylated. Of the estimated 22.216 human gene products, only 11.935 are well characterised as SwissProt entries. Of these, only 387 (3.2%) have experimentally verified glycosylation site information. To bridge this gap, prediction methods are needed.
There are many types of protein glycosylation, each defined by a) the nature of the glycan attached and b) the nature of the protein-glycan linkage. All but one takes place in extracellular proteins or extracellular parts of membrane proteins, one type is intracellular. They are often classified according to the nature of the attachment atom on the protein: N-glycosylation is glycan linkage to the side-chain nitrogen of asparagine residues, O-glycosylation to oxygen atoms of serines or threonines and C-mannosylation to one of the carbons of the pyranose part of tryptophans. This classification is somewhat misleading, since a large number of different types of O-glycosylation has been identified. Each type of glycosylation is catalyzed by one or more distinct glycosyltransferases and differences in recognition sequences between enzymes is often large. Therefore, we choose to develop glycosylation site predictors one glycosylation type at the time.
We have previously developed predictors for mucin-type O-glycosylation sites, NetOGlyc 3.0 (2005, 161 citations), and for C-mannosylation sites, NetCGlyc (2007, 4 citations). We are currently finishing the development of two proteoglycan site predictors. One specifically trained only on mammalian sequences and since the recognition sequences seem to be surprisingly evolutionary conserved, we have also developed a general predictor by adding data from C.elegans and chicken to the training set. We have also made an effort to develop a predictor on N-glycosylation sites that will out-perform any simple pattern rule. In this process we have gathered a data set consisting of 1825 experimentally verified positive and 18572 negative sites. Among the negative sites, there are 205 that follow the PROSITE N-glycosylation pattern N{P}S/T{P}. Using this data set, we have been able to verify previous findings by von Heijne et al that N-glycosylation is less likely to take place close to a transmembrane sequence (<20 aa). On the other hand, our data does not support that N-glycosylation is less likely to take place close to the C-terminal, a result of a similar study from von Heijne et al.
All our predictors are or will be available at http://www.cbs.dtu.dk/services