Aula 2 apresentação

•
Exatas

Lucious Rosaeae
26/02/2020
E aí, curtiu este material?
Ajude a incentivar outros estudantes a melhorar o conteúdo
Gostou desse material? Compartilhe! 🧡
Biologia

329.095 Materiais compartilhados
Baixe o app para aproveitar ainda mais
Leia os materiais offline, sem usar a internet. Além de vários outros recursos!
Prévia do material em texto
Theory
A Mechanism for the Evolution
of Phosphorylation Sites
Samuel M. Pearlman,1,2 Zach Serber,1,3 and James E. Ferrell Jr.1,*
1Department of Chemical and Systems Biology
2Biomedical Informatics Program
Stanford University School of Medicine, Stanford, CA 94305, USA
3Present address: Amyris, Inc., 5885 Hollis Street, Suite 100, Emeryville, CA 94608, USA
*Correspondence: james.ferrell@stanford.edu
DOI 10.1016/j.cell.2011.08.052
SUMMARY
Protein phosphorylation provides a mechanism for
the rapid, reversible control of protein function.
Phosphorylation adds negative charge to amino
acid side chains, and negatively charged amino
acids (Asp/Glu) can sometimes mimic the phosphor-
ylated state of a protein. Using a comparative geno-
mics approach, we show that nature also employs
this trick in reverse by evolving serine, threonine,
and tyrosine phosphorylation sites from Asp/Glu
residues. Structures of three proteins where phos-
phosites evolved from acidic residues (DNA topo-
isomerase II, enolase, and C-Raf) show that the rele-
vant acidic residues are present in salt bridges with
conserved basic residues, and that phosphorylation
has the potential to conditionally restore the salt
bridges. The evolution of phosphorylation sites
from glutamate and aspartate provides a rationale
for why phosphorylation sometimes activates pro-
teins, and helps explain the origins of this important
and complex process.
INTRODUCTION
Protein phosphorylation is a ubiquitous mechanism for the
temporal and spatial regulation of proteins involved in almost
every cellular process. Protein phosphorylation is important in
prokaryotes, where the best-characterized kinases catalyze
the phosphorylation of histidine residues (Laub and Goulian,
2007). Phosphorylation appears to be even more important
and more widespread in eukaryotic cells, but while histidine
phosphorylation does occur in at least some eukaryotes, most
eukaryotic protein phosphorylation occurs at serine, threonine,
and tyrosine residues (Manning et al., 2002a). The human
genome includes about 500 genes for protein kinases that
phosphorylate serine, threonine, and/or tyrosine residues
(Manning et al., 2002b) and about 200 genes for phosphoprotein
phosphatases that dephosphorylate them (Alonso et al., 2004).
Thus, approximately 3.5% of human genes are devoted to
proteins that directly regulate protein phosphorylation. The
substrates of these kinases and phosphatases are numerous;
934 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
one commonly cited estimate is that approximately 30% of
proteins are phosphorylated (Cohen, 2000; Holt et al., 2009).
Since many phosphoproteins are phosphorylated at multiple
sites, there are probably tens of thousands of different phos-
phorylation sites in the human proteome.
This raises the question of how these myriad phosphorylation
sites have evolved. One possibility might be that the protein
evolved first, with some structurally or functionally important
Ser, Thr, or Tyr residue present on the surface of the protein,
and that the phosphorylation of this residue evolved later. A
strong prediction of this model is that most phosphorylations
should negatively regulate protein function, since a fully func-
tional protein is assumed to be present before the phosphoryla-
tion has evolved (Figure 1A).
Phosphorylation does sometimes inactivate proteins. For
example, phosphorylation of the prokaryotic citric acid cycle
enzyme isocitrate dehydrogenase blocks the enzyme’s active
site (Hurley et al., 1990, 1989), providing a rapid and reversible
off-switch for the protein. Another example is the eukaryotic
cyclin dependent kinase CDK1, which is inactivated by the
Wee1 and Myt1 protein kinases through the phosphorylation of
two residues, Thr 14 and Tyr 15, in its catalytic cleft (Morgan,
1995).
However, many phosphorylations are activating rather than in-
activating. For example, CDK1 activity is positively regulated by
the phosphorylation of Thr 161, a residue just outside the cata-
lytic cleft (Brown et al., 1999; Morgan, 1995). It is less intuitive
how activating phosphorylationsmight evolve, since presumably
the nonphosphorylated ancestor of the phosphoprotein would
be inactive.
One possible mechanism is suggested by mutational studies
of isocitrate dehydrogenase. Thorsness and Koshland showed
that substituting an acidic aspartate residue for the serine phos-
phorylation site mimicked the phosphorylated state of the
protein (Thorsness and Koshland, 1987). The rationale behind
this observation is that acidic residues are negatively charged,
like pSer, pThr and pTyr (although Glu and Asp are singly
charged whereas pSer, pThr, and pTyr are, nominally, doubly
charged at physiological pH [Cooper et al., 1983]), and Asp
and Glu are approximately isosteric with pSer and pThr (Fig-
ure 1B). This ‘‘trick’’ of substituting Asp or Glu for pSer or pThr
turns out to be widely applicable. Although there are examples
where two acidic residues are needed to mimic one phosphory-
lation event (Strickfaden et al., 2007), as well as examples where
mailto:james.ferrell@stanford.edu
http://dx.doi.org/10.1016/j.cell.2011.08.052
Figure 1. A Possible Mechanism for the Evolution of Activating
Phosphorylation Sites from Acidic Residues
(A) Phosphorylation of a surface-exposed serine residue in an active protein
(yellow) would be expected to decrease the protein’s activity (gray).
(B) Structures of phosphoserine, phosphothreonine, aspartic acid, and glu-
tamic acid. Carbon atoms are represented in green, phosphorus atoms in
magenta, and oxygen atoms in red.
(C) Phosphorylation of a serine or threonine residue could conditionally restore
an important electrostatic interaction originally mediated by an acidic amino
acid, thereby activating the protein.
the activity of the Asp or Glu mutant is more like that of the non-
phosphorylated form than the phosphorylated form (Posada and
Cooper, 1992), often the Asp or Glu mutants do faithfully mimic
the phosphorylated state.
Perhaps nature has been using this same trick in reverse.
Imagine that a protein initially evolves with a functionally impor-
tant glutamate on its surface; for example, in a salt bridge that
stabilizes a fold that is necessary for proper function (Figure 1C).
A mutation then occurs (perhaps in a nonessential, duplicated
gene) and the negatively charged glutamate is replaced by
a neutral serine. This destabilizes the salt bridge, rendering the
protein inactive. If some kinase is able to phosphorylate the
serine, the negatively charged phosphate could reestablish
the salt bridge, stabilize the fold, and reactivate the protein. A
switchable tyrosine phosphorylation site seems less likely to
evolve by this mechanism, since phosphotyrosine is less struc-
turally similar to Glu and Asp than pSer and pThr are, but in cases
where net charge is more important than the details of the side
chain structure, evolution of a phosphorylated tyrosine from
acidic amino acids might be possible.
Here, we have examined the sequence record for evidence of
the evolution of ancestral glutamate or aspartate residues into
phosphorylation sites. Our findings support the idea that some
phosphorylation sites—perhaps �5% of the sites we have
examined—have evolved from acidic residues, and that such
sites appeared both at the divergence of eukaryotes from
prokaryotes and at subsequent times in evolution.
RESULTS
Glutamate and Aspartate Substitution Rates
at Phosphoserines versus Control Serines
If some phosphosites evolved from acidic amino acids, it might
be possible to detect those ancestral origins in homologous
proteins that have continued to exist in the unphosphorylatable,
acidic form. To test this idea, we asked whether aspartate and
glutamate are found at higher rates at the positions of known
phosphorylation sites than at sites not known to be phosphory-
lated. If so, it could mean that phosphosites evolved fromAsp/
Glu, acidic residues evolved from phosphosites, or both.
We created a database of experimentally verified phospho-
proteins collected from the literature (Ballif et al., 2004; Beauso-
leil et al., 2004; Blom et al., 1999; Ficarro et al., 2002). Because
most of the phosphosites were serines (7854 pS, 1688 pT, and
566 pY in our analysis) and we would expect different amino
acid substitution rates for serines, threonines, and tyrosines,
we initially chose to focus on the subset of phosphoproteins
containing verified phosphoserines. For each phosphoprotein
we obtained its most closely related homologs using BLAST
(Altschul et al., 1990). We then generated multiple sequence
alignments for each phosphoprotein and its homologs using
PROBCONS (Do et al., 2005), a choice made for its high degree
of accuracy rather than speed of alignment (Essoussi et al.,
2008). For some large alignments (more than 100 sequences),
memory limitations inherent to PROBCONS made it necessary
to use another alignment tool, MUSCLE (Edgar, 2004). MUSCLE
is capable of handling larger alignments and also produces high-
quality alignments.
A sample of this procedure is shown in Figure 2A. The phos-
phoprotein in this case is the S. cerevisiae enolase Eno1, and
the phosphorylation site is Ser 10. The alignment included
233 enolase homologs, some with serines at the position of Ser
10. We then determined the frequencies at which the nineteen
other amino acids were found at the position of the phospho-
serine (Figure 2A, red column). We compared these frequencies
to the frequencies at which amino acids were found at the posi-
tions of control serines—serines not known to be phosphory-
lated—in S. cerevisiae enolase (Figure 2A, black columns).
The control serine results are shown by the black bars in Fig-
ure 2B. Alanine, a common amino acid that is structurally similar
to serine (Dayhoff, 1978), was the most common replacement,
whereas tryptophan, a rare amino acid that is structurally dissim-
ilar to serine (Dayhoff, 1978), was the least common (Figure 2B).
To see which amino acids were enriched at serine residues
relative to their overall abundance in this set of proteins, we
divided the replacement percentages plotted in Figure 2B by
the abundances of the respective amino acids at all positions
in the proteins in our multiple sequence alignments (Figure 2C,
Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc. 935
Figure 2. Enrichment of Asp and Glu at the Positions of Phospho-
serines
(A) Construction of amultiple sequence alignment for a phosphorylated protein
showing the phosphosite (red) and control serines (serines not known to be
phosphorylated, black).
(B) Overall incidences at which various amino acids were found at the position
of phosphoserines (n = 7584) and control serines (n = 257,481) in the multiple
sequence alignments. Error bars are standard deviations, obtained by boot-
strap resampling.
(C) Replacement percentages normalized to the overall abundance of each
amino acid in the multiple sequence alignments. Error bars are standard
deviations as calculated by the formula ðA=BÞ3
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ða=AÞ2 + ðb=BÞ2
q
, where A
is the replacement percentage at phosphoserines, B is the replacement per-
centage at control serines, and a and b are the bootstrapped standard devi-
ations of A and B.
(D) Enrichment of various amino acids at phosphoserines relative to control
serines. Error bars are standard deviations, calculated by bootstrapping as in
(C). P values were calculated by bootstrapping under the null hypothesis.
936 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
black bars). By this measure, Thr, Asn, and Ala were present at
the highest relative rates, and isoleucine, leucine, and trypto-
phan at the lowest (Figure 2C).
The red bars in Figures 2B and 2C show the incidence of each
amino acid at the positions of the phosphoserines in our dataset.
Thr, Glu, and Aspwere foundmore frequently at phosphoserines
than at control serines. Bootstrapping yielded low p values for
these three enrichments (Figure 2D). The large hydrophobic
amino acids, which were underrepresented at the positions of
control serines (Figure 2C), were even more underrepresented
at phosphoserines (Figure 2D). Another group has also recently
shown that Glu and Asp are overrepresented at the positions
of phosphoserines relative to control serines (Kurmangaliyev
et al., 2011).
Glu and Asp are found preferentially on the surfaces rather
than the interiors of proteins (Miller et al., 1987; Tjong et al.,
2007). Since phosphoserines might be expected to favor protein
surfaces more than control serines do, the enrichment of Asp/
Glu residues at phosphoserines versus control serines could
simply reflect this shared preference for the surface. If so, then
the other amino acids with a strong preference for surface expo-
sure (Lys, Arg, Gln, and Asn) (Miller et al., 1987; Tjong et al., 2007)
should be enriched at phosphoserines. However, Lys, Arg, and
Gln were less likely to be found at the position of phosphoserines
than at control serines, and Asn was not significantly more likely
to be found at phosphoserines than at control serines (Figure 2D).
Thus the enrichment of acidic residues at phosphosites appears
not to be due to a general enrichment of polar or charged
residues.
The conservation of phosphoserines was higher than that of
control serines (43 ± 1% versus 24 ± 0.2%), and the frequency
at which gaps were found at the positions of phosphoserines
was lower than that of control serines (13 ± 1% versus 30 ±
0.3%). This suggests that phosphoserines are more likely to be
critical to the function of a protein than control serines.
A Subset of Phosphoserines Has Very High Percentages
of Asp/Glu Replacement
We next asked whether the overrepresentation of Asp/Glu resi-
dues was evenly distributed over all of the phosphoproteins, or
Figure 3. A Subset of the Phosphosites Have Very High Asp/Glu
Replacement Percentages
(A) Percentage of phosphosites (red) and control serine sites (black) which had
various Asp/Glu replacement percentages in the alignments.
(B) Percentage of phosphosites (red) and control serine sites (black) which had
various Asn/Gln replacement percentages in the alignments.
Replacement percentages were calculated from the 234 multiple sequence
alignments that contained at least sixty homologs. Standard deviations and p
values were calculated by boostrapping. See also Table S1.
arose from a subset of phosphosites with very high replacement
frequencies. To address this question we binned our alignments
according to the combined Asp+Glu perecentages at the posi-
tion of phosphoserines or control serines.
We carried out this binning on all of the sequence alignments
with at least 60 homologs; these alignments contained 234
phosphoserines and 16,198 control serines. The fraction of the
alignments with a given Asp/Glu percentage declined approxi-
mately exponentially for the control serines. For the phosphoser-
ines (Figure 3A, red) there was a similar decline up until the 50%
bin, above which the distribution leveled off. Approximately
6.8% of phosphoserine sites had at least 50% Asp or Glu at their
position in the multiple sequence alignments, as opposed to just
1.8% for control serines (p = 4 3 10�6). Thus, there is a group of
proteins with a very high frequency of replacement of Glu or Asp
for phosphoserines, suggesting that phosphosites evolved from
acidic residues (or vice versa) in these protein families. Similar
trends were seen when cutoffs of more or less than 60 homologs
were used for inclusion in the binning. Thus, approximately 5%
more of the bona fide phosphorylation sites had Asp+Glu
replacement percentages of >50% than did the control serines.
As a further control, we looked for an indication that the high
Asp/Glureplacement percentages might be due to the structural
similarity of Asp and Glu to pSer rather than the charge similarity.
If so, we might expect more phosphosites than control serine
sites to have high percentages of Asn+Gln replacement, since
Asn and Gln are structurally similar to Asp and Glu. As seen in
Figure 3B, we do not find an overabundance of high Asn+Gln
replacements percentages at the position of phosphoserines.
In fact, only 0.46% of phosphoserines have >50% replacement
by Asn or Gln, compared to 0.65% for control serines (p =
0.69). This suggests that it is the acidic nature of Asp and Glu
that promoted their evolutionary overrepresentation at the posi-
tion of phosphoserines.
Phylogenetic Analysis
The high frequency at which Asp/Glu residues were found at the
positions of phosphoserines suggests that acidic amino acids
may have often evolved into phosphosites, or vice versa, during
evolution; however, pairwise alignments are insufficient to iden-
tify evolutionary events and therefore cannot distinguish these
hypotheses from alternatives. Phylogenetic analysis and recon-
struction of ancestral sequence states allow the timing and
direction of historical sequence changes to be inferred,
providing a direct test of our hypothesis.
We began by examining the 16 serine-phosphorylated
proteins with the highest Asp/Glu replacement percentages;
later we extended the analysis to phosphothreonine and phos-
photyrosine sites with high Asp/Glu replacement percentages
as well. Phylogenetic trees were inferred from the multiple
sequence alignments using a maximum likelihood method
implemented in the software package PhyML (Guindon andGas-
cuel, 2003). The leaves of the trees were colored to indicate
which amino acid was present in each homolog at the position
of the phosphosite. Green denoted the phosphorylatable amino
acids Ser, Thr, and Tyr; red denoted the acidic residues Asp and
Glu, and gray represented any other amino acid. We used the
maximum likelihood trees inferred by PhyML as input to a second
maximum likelihood implementation, FASTML (Pupko et al.,
2002), to infer the sequences of the hypothetical ancestral
proteins represented by the internal nodes, and colored these
nodes according to the amino acid hypothesized to be present
at the position of the phosphosite.
Most of the 16 trees displayed one of three basic patterns
of Ser/Thr/Tyr versus Asp/Glu residues. Examples of each
are shown in Figure 4, Figure 5, and Figure 6 and described
below.
Emergence of Phosphorylation Sites at the Prokaryotic/
Eukaryotic Split
Elongation Factor eEF2
The protein eEF2 is a translation factor required for the transloca-
tion step in protein synthesis (Jorgensen et al., 2006). Human
eEF2 is reported to be phosphorylated at Ser 502 (Beausoleil
et al., 2004). The functional significance of this phosphorylation
is uncertain.
The EF2 tree included eukaryotic eEF2 proteins, archaeal
aEF2 proteins, and bacterial EF-G proteins (Figure 4A). The
amino acids in the vicinity of Ser 502 were well conserved
throughout the tree (Figure 4C). The tree was rooted using the
related E. coli protein EF4 as an outgroup, which placed the
root of the tree between the archaea and the bacteria (Figure 4A),
consistent with current views of deep phylogenetics (Ciccarelli
et al., 2006; Pace, 2009).
Almost all (27/28) of the eukaryotic eEF2 homologs had aphos-
phorylatable residue at the position corresponding to Ser 502.
Most (211/224) of the bacterial, mitochondrial, and archaeal
homologs, had either Asp or Glu at that position. All of the eu-
karyotic eEF2 proteins with serine residues at the position of
Ser 502 appear to have been derived from a single Glu / Ser
change at the divergence of eukaryotes from archaea, which
has been maintained ever since (Figure 4A). The two eEF2
proteins with Thr residues at the position of Ser 502 may have
Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc. 937
Figure 4. Evolution of Phosphorylation Sites from Acidic Residues at the Divergence of Eukaryotes from Prokaryotes
(A) Homologs of human eEF2, which is phosphorylated at S502. The tree is colored to depict the amino acids present in homologs of human eEF2 at the positions
corresponding to pS502. Human eEF2 is denoted by the star. eEF2 homologs were identified using BLAST. Homologs with E-values < 10�16 were aligned. Trees
were inferred by PhyML using maximum likelihood. The tree was rooted using E. coli EF4 as an outgroup. Leaves were colored according to the amino acid at the
relevant position. Internal nodes were colored based on the maximum likelihood amino acid inferred for those ancestral sequences using FASTML. The amino
acids inferred for the hypothetical ancestors of human eEF2 are shown.
938 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
arisen from an independent Glu/ Thr amino acid change (Table
S2A available online).
DNA Topoisomerase II
DNA topoisomerase II (Topo II) is a conserved enzyme that
catalyzes the passage of one DNA duplex through another
(Schoeffler and Berger, 2008). Eukaryotic Topo II is a homo-
dimer. Its prokaryotic homolog, gyrase, is an A2B2 tetramer,
with gyrase B corresponding to the N-terminal half of eukary-
otic Topo II and gyrase A corresponding to the C-terminal
half. It has been reported that human topoisomerase IIb is
phosphorylated at Thr 639 and Ser 640 (Figure 4B) (Wang
et al., 2010), with the functional consequences of these phos-
phorylations unknown.
The BLAST search using human Topo IIb as query yielded
sequences including vertebrate topoisomerase IIa and IIb
proteins, other eukaryotic Topo II proteins, and prokaryotic
gyrase proteins, with the position of the phosphosite present in
the gyrase B chains. The amino acids surrounding the phospho-
site were well conserved throughout the tree (Figure 4D). We
were unable to identify a satisfactory paralog to use as an out-
group; the rooting shown in Figure 4B is arbitrary.
The Topo II tree consisted of two main clades (Figure 4B). The
first included the eukaryotic Topo II proteins plus three viral Topo
II proteins whose genes were likely to have been transduced
from their eukaryotic hosts. The second included all of the bacte-
rial, archaeal, and chloroplastic/mitochondrial Topo II proteins.
All of the animal, fungal, plant, and protist Topo II homologs
had phosphorylatable Thr and Ser residues at the sites corre-
sponding to Thr 639 and Ser 640, respectively. All of the bacte-
rial, archaeal, and chloroplastic/mitochondrial Topo II homologs
had a glutamate residue at the position of Thr 639 and aMet or Ile
residue at the position of Ser 640 (Figure 4B). If one assumes that
bacteria and archaea diverged before eukaryotes and archaea
did (Ciccarelli et al., 2006; Pace, 2009), then the Glu 639/Met
640-containing Topo II proteins probably came first, with the
Thr 639/Ser 640 arising later.
Emergence of Phosphorylation Sites Later in Evolution
Enolase
Some phosphosites apparently evolved after the divergence of
eukaryotes from bacteria and archaea. One example of this is
provided by enolase, a key enzyme in glycolysis and gluconeo-
genesis. The S. cerevisiae enolases Eno1 and Eno2 have been
reported to be phosphorylated at Ser 10 (Ficarro et al., 2002),
and the region surrounding this residue is well conserved among
prokaryotes and eukaryotes (Figure 5C).
Enolase homologs with phosphorylatable residues at this
position were largely confined to the fungal clade of the phyloge-
netic tree (Figure 5A). Of the nonfungal sequences, most con-
tained either Asp or Glu at this position.
Figure 5B shows a tree inferred for 30 fungal enolases from 23
dikaryal fungal species. All seven basidiomycetes enolases had
(B) Homologs of human topoisomerase II (Topo II) b were identified and aligned, a
phosphorylated at T639 and S640. The former residue is replaced by a Glu in all of
species names and bootstrap values, is availableas Figure 1B. The asterisks de
(C and D) Amino acid sequences close to the phosphosites for a few selected e
For larger versions of the trees shown in panels (A) and (B), and control trees, se
nonacidic, nonphosphorylatable amino acids at the Ser 10 posi-
tion, with Gln being the most common. Almost all of the ascomy-
cetes enolases had phosphorylatable residues at this site, and
the remaining two (one of the two S. pombe enolase paralogs
and the K. waltii enolase) had Gln residues.
Reconstruction of the sequences of the ancestral species
suggests that the primordial enolase protein possessed a Glu
residue at the position of pSer 10, with this Glu residue persisting
in most present day prokaryotes, protists, and vertebrates
(Figures 5A and 5B). This residue apparently then became
a Gln in some eukaryotes, and persists as a Gln in invertebrates,
plants, and basidiomycetes fungi. Finally, the Gln residue
became a phosphorylatable amino acid shortly after the diver-
gence of the ascomycetes fungi from the basidiomycetes fungi.
This dates the evolution of the phosphosite to approximately 400
mya (Taylor and Berbee, 2006), the time when the ascomycetes
and basidiomycetes are thought to have diverged, and raises the
possibility that amide amino acids like Gln can act as intermedi-
aries in the evolution of phosphosites.
Emergence of Phosphosites in Particular Paralogs of
a Multiparalog Protein Family
Raf
Some of the BLAST searches yielded families of paralogous
proteins where a phosphosite was present in some modern pa-
ralogs and a Asp/Glu residue was present in others. One
example is provided by the Raf family of protein kinases. Most
vertebrates possess three Raf homologs: A-Raf, B-Raf, and
C-Raf. The C-Raf proteins have a pair of adjacent tyrosine phos-
phorylation sites (Tyr 340 and Tyr 341 in human C-Raf) just
N-terminal to its kinase domain, and these phosphorylations
are critical for C-Raf activation (Fabian et al., 1993; Marais
et al., 1995). These Tyr phosphorylation sites are present in
A-Raf as well, but the corresponding sites in B-Raf are Asp resi-
dues, and the Asp residues are critical for B-Raf function (Fabian
et al., 1993; Marais et al., 1995).
To investigate the evolutionary relationships among these Raf
proteins, we aligned the sequences of all of the Raf homologs
present in the KEGG database and inferred a maximum likeli-
hood tree. As shown in Figures 6A and 6B and Table S4A, almost
all of the C-Raf proteins have YY residues at the positions of Tyr
340 and 341. The one exception is from the horse Equus cabal-
lus, which has a duplicated C-Raf gene; one of its C-Raf proteins
has the typical YY residues, and the other has CY in their place.
All of the A-Raf sequences contain YY residues. All of the B-Raf
proteins have DD residues, and all of nonvertebrate Raf proteins
have some variation on DD (DD, ED, DE, EG, or EN). Interestingly,
the EN variation is confined to a sub-clade of the insects, and all
of those Raf proteins have not only lost the second acidic
residue, but have also gained a new acidic residue just
N-terminal to the EN site (Figures 6A and 6B). These findings
indicate that C-Raf’s activating Tyr phosphorylations evolved
nd trees were inferred and colored as described for panel A. Human Topo II is
the prokaryotic gyrase B proteins. A larger version of this figure, which includes
note three viral Topo II homologs.
EF2 and Topo II homologs.
e Table S2.
Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc. 939
Figure 5. Evolution of Phosphorylation Sites from Acidic Residues Later in Evolution
(A and B) S. cerevisiae enolase Eno1. Homologs of the S. cerevisiae enolase Eno1 were identified and aligned, and trees were inferred as described for Figure 4.
The root of the tree was placed between the archaea and eukaryotes. An expanded view of thirty fungal enolases is shown in panel (B). The fungus Cryptococcus
neoformans was used as an outgroup to define the root of the tree.
(C) Amino acid sequences close to the phosphosite for a few selected enolase homologs.
See also Table S3.
940 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
Figure 6. Evolution of Phosphorylation Sites from Acidic Residues in Particular Eukaryotic Paralogs
(A) Homologs of human C-Raf were identified and aligned, trees were inferred as described for Figure 4. The human Ksr2 protein was used as an outgroup to
define the root of the tree.
(B) Local alignments of various Raf proteins in the vicinity of phosphosites Y340 and Y341.
(C) Homologs of bovine GRK5 were identified and aligned, and trees were inferred as described for Figure 4. The human Akt2 protein was used as an outgroup to
define the root of the tree.
(D) Local alignments of the seven bovine GRK proteins and two Drosophila GRK proteins in the vicinity of phosphosites S484 and T485.
See also Table S4.
from a primordial Raf protein with a pair of acidic residues in their
place, with phosphorylation conditionally restoring a function
formerly provided by the acidic residues.
The G-Protein-Coupled Receptor Kinases
The GRK family of protein kinases provide another example of
paralogs where some possess phosphorylation sites and others
Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc. 941
possess acidic residues. Bovine GRK5 is reported to be auto-
phosphorylated at Ser 484 and Thr 485, two sites C-terminal to
the GRK5 kinase domain, with these phosphorylations
increasing the activity of GRK5 toward the b2-adrenergic
receptor by 15- to 20-fold (Kunapuli et al., 1994). The corre-
sponding residues are also known to be phosphorylated in
GRK1 and GRK6 (Olsen et al., 2006; Palczewski et al., 1992),
and the region around these residues is well aligned in all of
the GRK-family proteins (Figure 6E).
The GRK tree divides into two main clades. One contains the
vertebrate GRK1 and 4–7 proteins together with their closest
invertebrate relatives, and the other contains GRK2 and 3 and
their closest invertebrate relatives (Figures 6C and 6D). Most of
the GRK1, 4, 5, and 6 proteins have a Ser residue at the position
of Ser 484 and a Thr residue at the position of Thr 485 (Figures 6C
and 6D and Tables S3B and S3C). Most of the GRK2 and 3
proteins have four acidic residues (DEED) in place of Ser 484
and Thr 485. The GRK7 proteins possess a conserved serine in
the position of Ser 484 and a single acidic amino acid in the posi-
tion of Thr 485. Thus, as was the case with Raf, phosphosites
and acidic residues are confined to particular paralogs, with
both the phosphosites and acidic residues generally being main-
tained throughout the subsequent evolution of the paralogs.
Figure 6E shows the residues around the positions of Ser 484/
Thr 485 in the several GRKs. Note that different GRK paralogs
employ different combinations of fixed charges (acidic residues)
and conditional charges (phosphorylation sites) in this region.
Evolution of a Phosphosite into an Acidic Residue
The GRK trees (Figures 6C and 6D and Tables S3B and S3C)
were rooted using human Akt2, the closest human non-GRK
homolog of the GRKs, as an outgroup. With the root defined
this way, it was difficult to determine whether the DEED or ST
pattern is likely to have come first, or whether both evolved inde-
pendently from nonacidic, nonphosphorylatable residues.
However, the tree illustrates another interesting phenomenon.
It seems likely that one subgroup of the GRKs replaced Thr 485
with a glutamate residue, which is now present in the GRK7 pro-
teins (Figures 6C and 6D). Thus, the GRKs provide an example
of evolution of a phosphosite into an acidic amino acid, changing
a conditional negative charge to a fixed negative charge.
Another such example is provided by the CMGC family of
protein kinases. The primordial CMGC kinase is predicted by
maximum likelihood reconstruction to have had a T-X-Y dual
phosphorylation motif, as the present-day MAP kinase proteins
do, whereasa small clade of CDK1-like kinases appear to
have replaced the Tyr residue with a Glu (Table S4H).
Trees for Control Serine Sites
We next examined several ‘‘control trees’’—phylogenetic trees
where the leaves, branches and internal nodes were colored
based on the amino acids present at serine sites that are not
known to be phosphosites (Tables S2B and S2C, Tables S3C–
S3E, and Tables S4D–S4G). In the case of eEF2 we identified
two control serines, Ser 530 and Ser 774, that had Asp/Glu
replacement percentages that were roughly equal to those of
the phosphosite, pSer 502. The Ser 530 tree (Table S2B) was
quite similar to the phosphosite tree (Figure 4A); all of the eukary-
942 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
otic eEF2 homologs had phosphorylatable residues at this site,
whereas almost all of the prokaryotic homologs possessed
acidic residues. According to our hypothesis Ser 530 could
be a good candidate for an undiscovered phosphosite. Several
other control serine sites with high Asp/Glu replacement
percentages are shown in Table S1. It will be of interest to deter-
mine whether high Asp/Glu replacement percentages can be
used to successfully predict phosphorylation sites.
The Ser 774 tree (Table S2C) was qualitatively different, with
both acidic and phosphorylatable residues scattered throughout
the eukaryotes, bacteria, archaea, and mitochondria. Most of
the control trees constructed for the enolase and GRK families
also showed scattered distributions of Asp/Glu versus Ser/Thr/
Tyr residues (Tables S3C–S3E and Tables S4D–S4G). These
findings suggest that patterns of acidic residues versus phos-
phorylatable residues seen at phosphosites (Figure 4, Figure 5,
and Figure 6) were not adventitious; instead it appears that
both types of residues have been actively maintained.
Topo II, Enolase, and Raf Protein Structures
The original rationale for this work was that some phosphoryla-
tion sites have arisen from structurally important Asp/Glu resi-
dues (Figure 1). We were therefore interested in investigating
the structural contexts of the phosphosites and corresponding
acidic residues for the protein families shown in Figure 4, Fig-
ure 5, and Figure 6. Were the acidic residues present in salt
bridges? If so, was the hydrogen bond donor a conserved basic
residue? Once the phosphosite had evolved, was the salt bridge
lost in the dephospho-form, and might it be regained through
phosphorylation?
In the case of Topo II, crystal structures are available for
the S. cerevisiae Topo II homodimer, which possesses the phos-
phosite conserved among eukaryotes (Thr 607), and for the
E. coli gyrase A2B2 tetramer, which has an acidic residue (Glu
744) in its place. Glu 744 resides in the gyrase B chains and
participates in an interchain salt bridge with Lys 65 in gyrase A
(Figure 7A). The distance between the Lys side chain nitrogen
and the closest Glu oxygen is 3.1 Å, compatible with a salt bridge
(Vogt et al., 1997). Lys 65 is well conserved throughout the Topo
II/gyrase family.
We superimposed the S. cerevisiae Topo II homodimer struc-
ture on the E. coli gyrase tetramer structure (Figure 7A). The
conserved basic residue (Lys 720) resides close to Lys 65, and
the phosphosite (Thr 607) sits close to where the Glu residue in
gyrase resides, but the salt bridge is lost; the distance from
the side chain oxygen in Thr 607 to the closest nitrogen in Lys
720 is 7.3 Å. It is structurally plausible that the phosphorylation
of Thr 607 could restore the salt bridge present in bacterial
gyrase.
Crystal structures are also available forS. cerevisiae Eno1 in its
nonphosphorylated form, and for E. coli enolase, which has aGlu
residue at the position of Ser 10. As shown in Figure 7B, theE. coli
enolase possesses a 2.7 Å salt bridge between the Glu residue
and a conserved basic residue (Arg 57). In nonphosphorylated
yeast Eno1, a basic residue (Lys 56) is present at the position of
Arg 57, but the distance between this residue’s amino nitrogen
and the Ser 10 oxygen is 12.8 Å, (Figure 7B). Phosphorylation
of Ser 10 could plausibly restore the lost salt bridge.
Figure 7. Salt Bridges in Phosphoprotein Homologs Possessing Glu
Residues at the Positions of the Phosphosites
(A) Topoisomerase II. Yellow represents the E. coli gyrase A2B2 tetramer (PDB
ID 3NUH). Gray represents the S. cerevisiae Topo II homodimer (2RGR).
(B) Enolase. Yellow represents E. coli enolase (1E9I). Gray represents
S. cerevisiae Eno1 protein (1ONE).
(C) Raf. Yellow represents human B-Raf (1UWJ). Gray represents human
C-Raf (3OMV).
In all three panels, the structures were superimposed so as to minimize the
overall RMS deviation of the positions of the aligned residues, using UCSF
Chimera (Pettersen et al., 2004).
A third example is provided by the Raf family proteins. In
human B-Raf, one of the two Asp residues (Asp 448) forms
a 3.5 Å salt bridge with a conserved Arg residue, Arg 505. This
residue is part of the aC helix, a structural element critical for
protein kinase activation (Figure 7C). The interaction between
Asp 448 and Arg 505 has been hypothesized to stabilize the
active conformation of B-Raf (Wan et al., 2004). The salt bridge
is lost in C-Raf; the distance from the hydroxyl oxygen of the cor-
responding tyrosine (Tyr 341) to the nearest nitrogen in the cor-
responding arginine (Arg 398) is 7.7 Å. It seems plausible that
phosphorylation of Tyr 341 could again restore the salt bridge.
Thus, crystal structures provide support for the underlying
scenario (Figure 1) that motivated our hypothesis that some
phosphosites have evolved from acidic residues.
DISCUSSION
Here, we have used comparative genomic analysis to demon-
strate that some phosphorylation sites have evolved from acidic
amino acids. The acidic amino acids Asp and Glu were found
to be present at the positions of phosphoserines more often
than they were at control serines (Figure 2). Additionally, other
charged and polar amino acids did not show increased inci-
dence, indicating the importance of the negative charge in this
enrichment. We did not see a similar enrichment of Asp and
Glu at Thr or Tyr phosphorylation sites. This may have been
simply due to the smaller number of Thr and Tyr phosphorylation
sites in our database, as other evidence, discussed below,
supports the hypothesis that all three types of phosphorylation
sites (Ser, Thr, and Tyr) can evolve and have evolved from acidic
residues.
For our high-confidence set of multiple sequence align-
ments—those with at least 60 homologs—6.8%of the 234 phos-
phosites had very high (>50%) incidences of Asp or Glu at the
position of the phosphosite, while only 1.8% of the 16198 control
serine sites did (Figure 3). This suggests that �5% of the phos-
phorylation sites we examined may have evolved from acidic
residues.
Phylogenetic analysis demonstrates that the switch between
acidic and phosphorylatable residues occurred for some
proteins at the evolutionary division between eukaryotes and
prokaryotes (Figure 4). In the two examples examined in detail
here (eEF2 and Topo II), most or all of the bacterial and archaeal
species possessed a Glu residue, and most or all of the eukary-
otic species possessed a Ser or Thr residue at the position of the
phosphosite. If one assumes that the split between bacteria and
archaea occurred prior to the split between prokaryotes and
eukaryotes, the phosphorylation site probably evolved from the
acidic residue rather than vice versa. This interpretation is sup-
ported by the maximum likelihood (FASTML) reconstruction of
the ancestral sequences for the eEF2 tree. Reconstruction of
the ancestors for the Topo II tree is also consistent with this
direction of evolution (Glu / Thr), although we have less confi-
dence in the rooting of this tree. The phosphosite appears to
have evolved once (Topo II) or possibly twice (eEF2), and to
have then been maintainedover long evolutionary times.
At other times the switch between an acidic amino acid and
a phosphosite occurred later in evolution. For example, it
Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc. 943
occurred at the divergence of ascomycetes from basidiomy-
cetes fungi (�400 mya) in the case of enolase (Figure 5A–5C).
In the GRK and Raf families, the switch occurred early in animal
evolution, with modern species retaining Asp/Glu-residues in
some paralogs (GRK2/3, B-Raf) and acquiring phosphorylatable
residues in others (GRK1, 4–7, A-Raf, and C-Raf). It appears that
Raf gene duplication has allowed a primordial DD-containing Raf
to evolve new conditional regulation. For GRKs it is not clear
whether the acidic residues or phosphosites came first.
With the GRKs, it appears that two phosphorylation sites have
replaced four acidic residues, consistent with the observation
that sometimes a pair of acidic amino acids better mimics
a phosphorylation than a single amino acid does (Strickfaden
et al., 2007). In most of the examples examined here, however,
phosphorylation sites appear to have replaced single amino
acids, in line with Thorsness and Koshland’s classic study
(Thorsness and Koshland, 1987).
Our original rationale for hypothesizing that phosphorylation
sites evolved from acidic residues provided an explanation for
the fact that phosphorylation sometimes activates proteins. In
two of the seven examples examined in detail here (Raf, GRK),
the phosphorylation is in fact known to be activating. In the other
cases the functional significance of the phosphorylations
remains untested. We predict that many of these will prove to
be activating phosphorylations.
Finally, we found examples where a phosphorylation site
appears to have become fixed as a glutamate residue. In the
GRK7 protein kinases, a glutamate residue appears to have
evolved from a threonine phosphosite that remains present in
the GRK1, 4, 5, and 6 proteins (Figure 6C). A second example
is provided by the CMGC family of protein kinases, where one
clade of the kinases (the CDK1-like kinases) appears to have
evolved a glutamate residue in the place of an activating tyrosine
phosphorylation site (Table S3H).
Phosphorylation sites are sometimes found in conserved,
structured regions of proteins, but more frequently they are
found in poorly conserved, putatively unstructured regions
(Holt et al., 2009; Moses et al., 2007). The prevailing hypothesis
is that the former type of site is more likely to be involved in
complicated allosteric regulation, and the latter type of site is
more likely to be involved in simpler types of regulation such
as bulk electrostatic effects or the generation of phosphoepi-
topes (Holt et al., 2009; Moses et al., 2007; Pawson et al.,
2001; Serber and Ferrell, 2007). Our original rationale (Figure 1)
seems more applicable to well-conserved phosphorylation sites
in structured regions. Indeed, despite the fact that �60%–90%
of phosphorylation sites are believed to be the poorly conserved
type (Holt et al., 2009; Kurmangaliyev et al., 2011; Moses et al.,
2007), all five of the phosphosites examined in Figure 4, Figure 5,
and Figure 6 are evolutionarily well-conserved and present in
structured regions of the proteins. This raises the interesting
possibility that phosphosites that evolved from Asp/Glu residues
will be more likely than average to be involved in allosteric
regulation.
The crystal structures of Topo II, enolase, and Raf show that in
the Asp/Glu-containing versions of these protein, the acidic
residue is involved in a salt bridge with a conserved basic
residue, and in the Ser/Thr/Tyr-containing versions, the salt
944 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
bridge is lost, with phosphorylation having the potential to
restore it (Figure 7). These findings provide strong support for
the rationale that motivated these studies (Figure 1).
Taken together, this work provides insight into the evolution of
phosphorylation, and in particular provides a rationale for how
well-conserved, activating phosphorylations have evolved.
EXPERIMENTAL PROCEDURES
Identifying Homologs
The list of proteins closely related to each of phosphoprotein in our database
was generated using stand-alone protein BLAST (Altschul et al., 1990) (blastp)
against the nonredundant ‘‘nr’’ protein database using default options. From
these results, only those that were closely related (E-value % 10�16) were
used for subsequent alignment and analysis.
Alignments
For each phosphoprotein and its close relatives, a multiple sequence align-
ment of the protein sequences was generated using PROBCONS 1.09
(Do et al., 2005). Default options were used with the exception that iterative
refinement was set to the maximum of 1,000 iterations using the argument—
iterative-refinement 1,000. For very large alignments MUSCLE 3.6 (Edgar,
2004) was used for its high quality alignment generation as well as reduced
memory requirements.
Binning and Bootstrapping
The bins for glutamate/aspartate replacement and their error bars were gener-
ated by bootstrap sampling, using the phosphosites for which there were at
least 60 homologs in the alignment (n = 234). Thus 234 phosphosites were
chosen with replacement from this set, and each site in the sample was placed
in the appropriate bin based on what percentage of replacement amino
acids were either glutamate or aspartate. This was repeated 1,000 times.
The percentage replacement in each bin was calculated as the mean of the
percentage of sites in each bin in the bootstrap samples. The error bars are
the standard deviation of the bootstrap sample percentages from this mean.
The same procedure was repeated for control serine sites (n = 16,198).
The p values reported in Figure 2 and Figure 3 were also calculated by boot-
strapping. The p values for the amino acid replacement ratios (Figure 2) were
obtained using the following method: All phosphoserine sites and control
serine sites were grouped together into a new, combined data set. This set
(representing the null hypothesis that amino acids are found at the same
frequency at the positions of phosphoserines and control serines) was
sampled with replacement, drawing a bootstrap phosphoserine set and
control serine set, each the same size as the original sets. This was repeated
10,000 times. The p value for each amino acid was calculated as the
percentage of bootstrap samples which resulted in a replacement ratio at least
as great as the ratio for that amino acid calculated using the true phosphoser-
ine and control sets.
For Figure 3, we pooled the phosphoserine and control serine alignments,
and randomly sampled with replacement 234 as mock phosphoserines and
16000 as mock control serine sites. For the Asp+Glu distribution, the sampling
was repeated 106 times and we calculated the fraction of the bootstrap
samples that showed at least 5% (6.8% to 1.8%) difference in Asp+Glu
replacement percentages between mock phosphoserine alignments and the
mock control serine alignments. For the Asn+Gln distribution, we calculated
the fraction of the bootstrap samples that showed at least 0.19% (0.46% to
0.65%) difference in Asn+Gln replacement perentages between mock phos-
phoserine alignments and the mock control serine alignments.
Trees
Phylogenetic trees depicting the relationship of the aligned sequences were
inferred using PhyML 3.0 (Guindon and Gascuel, 2003). Gapped regions that
were poorly conserved were first removed from each alignment. Each align-
ment was then tested using ProtTest 3.0 (Abascal et al., 2005) to determine
the most appropriate protein evolution model and associated parameters for
phylogenetic inference. The models, parameters and ungapped alignments
were then used in PhyML to infer the trees. The leaf nodes in each tree were
colored to depict the amino acid present at the position of interest for each
taxon in the alignmentfrom which the tree was inferred.
The internal nodes were colored according to the amino acid at the
predicted ancestral sequence at that node. These sequences were recon-
structed using maximum likelihood by FASTML (Pupko et al., 2002). FASTML
uses the inferred maximum likelihood tree, the alignment from which the tree
was inferred, and the protein evolution model and parameters to calculate
the most likely ancestral sequence at each internal node.
SUPPLEMENTAL INFORMATION
Supplemental Information includes four tables and can be found with this
article online at doi:10.1016/j.cell.2011.08.052.
ACKNOWLEDGMENTS
We thank Kurt Thorn, Peter Chien, Michael Reese, Arend Sidow, Serafim
Batzoglou, Rob Tibshirani, Bill Weis, the participants of the 2011 Evolution
of Protein Phosphorylation Keystone Meeting, our colleagues in the Stanford
Department of Chemical and Systems Biology, and members of the Ferrell
lab for many helpful comments. This work was supported by grants from the
National Institutes of Health (R01 GM046383 and T15 LM007033) and the
Arnold Beckman Foundation. Molecular graphics images were produced
using the UCSF Chimera package from the Resource for Biocomputing,
Visualization, and Informatics at the University of California, San Francisco
(supported by NIH P41 RR001081).
Received: February 21, 2011
Revised: June 28, 2011
Accepted: August 19, 2011
Published: November 10, 2011
REFERENCES
Abascal, F., Zardoya, R., and Posada, D. (2005). ProtTest: selection of best-fit
models of protein evolution. Bioinformatics 21, 2104–2105.
Alonso, A., Sasin, J., Bottini, N., Friedberg, I., Osterman, A., Godzik, A., Hunter,
T., Dixon, J., and Mustelin, T. (2004). Protein tyrosine phosphatases in the
human genome. Cell 117, 699–711.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic
local alignment search tool. J. Mol. Biol. 215, 403–410.
Ballif, B.A., Villen, J., Beausoleil, S.A., Schwartz, D., and Gygi, S.P. (2004).
Phosphoproteomic analysis of the developing mouse brain. Mol. Cell. Proteo-
mics 3, 1093–1101.
Beausoleil, S.A., Jedrychowski, M., Schwartz, D., Elias, J.E., Villen, J., Li, J.,
Cohn, M.A., Cantley, L.C., and Gygi, S.P. (2004). Large-scale characterization
of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130–
12135.
Blom, N., Gammeltoft, S., and Brunak, S. (1999). Sequence and structure-
based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol.
294, 1351–1362.
Brown, N.R., Noble, M.E., Endicott, J.A., and Johnson, L.N. (1999). The struc-
tural basis for specificity of substrate and recruitment peptides for cyclin-
dependent kinases. Nat. Cell Biol. 1, 438–443.
Ciccarelli, F.D., Doerks, T., von Mering, C., Creevey, C.J., Snel, B., and Bork,
P. (2006). Toward automatic reconstruction of a highly resolved tree of life.
Science 311, 1283–1287.
Cohen, P. (2000). The regulation of protein function by multisite phosphoryla-
tion–a 25 year update. Trends Biochem. Sci. 25, 596–601.
Cooper, J.A., Sefton, B.M., and Hunter, T. (1983). Detection and quantification
of phosphotyrosine in proteins. Methods Enzymol. 99, 387–402.
Dayhoff M.O., ed. (1978). Atlas of Protein Sequence and Structure (Silver
Springs M.D.: National Biomedical Research Foundation).
Do, C.B., Mahabhashyam, M.S., Brudno, M., and Batzoglou, S. (2005).
ProbCons: Probabilistic consistency-based multiple sequence alignment.
Genome Res. 15, 330–340.
Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy
and high throughput. Nucleic Acids Res. 32, 1792–1797.
Essoussi, N., Boujenfa, K., and Limam, M. (2008). A comparison of MSA tools.
Bioinformation 2, 452–455.
Fabian, J.R., Daar, I.O., and Morrison, D. (1993). Critical tyrosine residues
regulate the enzymatic and biological activity of Raf-1 kinase. Mol. Cell. Biol.
13, 7170–7179.
Ficarro, S.B., McCleland, M.L., Stukenberg, P.T., Burke, D.J., Ross, M.M.,
Shabanowitz, J., Hunt, D.F., and White, F.M. (2002). Phosphoproteome anal-
ysis by mass spectrometry and its application to Saccharomyces cerevisiae.
Nat. Biotechnol. 20, 301–305.
Guindon, S., and Gascuel, O. (2003). A simple, fast, and accurate algorithm to
estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704.
Holt, L.J., Tuch, B.B., Villen, J., Johnson, A.D., Gygi, S.P., and Morgan, D.O.
(2009). Global analysis of Cdk1 substrate phosphorylation sites provides
insights into evolution. Science 325, 1682–1686.
Hurley, J.H., Dean, A.M., Thorsness, P.E., Koshland, D.E., Jr., and Stroud,
R.M. (1990). Regulation of isocitrate dehydrogenase by phosphorylation
involves no long-range conformational change in the free enzyme. J. Biol.
Chem. 265, 3599–3602.
Hurley, J.H., Thorsness, P.E., Ramalingam, V., Helmers, N.H., Koshland, D.E.,
Jr., and Stroud, R.M. (1989). Structure of a bacterial enzyme regulated by
phosphorylation, isocitrate dehydrogenase. Proc. Natl. Acad. Sci. USA 86,
8635–8639.
Jorgensen, R., Merrill, A.R., and Andersen, G.R. (2006). The life and death of
translation elongation factor 2. Biochem. Soc. Trans. 34, 1–6.
Kunapuli, P., Gurevich, V.V., and Benovic, J.L. (1994). Phospholipid-stimu-
lated autophosphorylation activates the G protein-coupled receptor kinase
GRK5. J. Biol. Chem. 269, 10209–10212.
Kurmangaliyev, Y.Z., Goland, A., and Gelfand, M.S. (2011). Evolutionary
patterns of phosphorylated serines. Biol. Direct 6, 8.
Laub, M.T., and Goulian, M. (2007). Specificity in two-component signal trans-
duction pathways. Annu. Rev. Genet. 41, 121–145.
Manning, G., Plowman, G.D., Hunter, T., and Sudarsanam, S. (2002a). Evolu-
tion of protein kinase signaling from yeast to man. Trends Biochem. Sci. 27,
514–520.
Manning, G., Whyte, D.B., Martinez, R., Hunter, T., and Sudarsanam, S.
(2002b). The protein kinase complement of the human genome. Science
298, 1912–1934.
Marais, R., Light, Y., Paterson, H.F., and Marshall, C.J. (1995). Ras recruits
Raf-1 to the plasma membrane for activation by tyrosine phosphorylation.
EMBO J. 14, 3136–3145.
Miller, S., Janin, J., Lesk, A.M., and Chothia, C. (1987). Interior and surface of
monomeric proteins. J. Mol. Biol. 196, 641–656.
Morgan, D.O. (1995). Principles of CDK regulation. Nature 374, 131–134.
Moses, A.M., Heriche, J.K., and Durbin, R. (2007). Clustering of phosphoryla-
tion site recognition motifs can be exploited to predict the targets of cyclin-
dependent kinase. Genome Biol. 8, R23.
Olsen, J.V., Blagoev, B., Gnad, F., Macek, B., Kumar, C., Mortensen, P., and
Mann,M. (2006). Global, in vivo, and site-specific phosphorylation dynamics in
signaling networks. Cell 127, 635–648.
Pace, N.R. (2009). Mapping the tree of life: progress and prospects. Microbiol.
Mol. Biol. Rev. 73, 565–576.
Palczewski, K., Buczylko, J., Van Hooser, P., Carr, S.A., Huddleston, M.J., and
Crabb, J.W. (1992). Identification of the autophosphorylation sites in rhodopsin
kinase. J. Biol. Chem. 267, 18991–18998.
Pawson, T., Gish, G.D., and Nash, P. (2001). SH2 domains, interaction
modules and cellular wiring. Trends Cell Biol. 11, 504–511.
Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc. 945
http://dx.doi.org/doi:10.1016/j.cell.2011.08.052
Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M.,
Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera–a visualization system
for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612.
Posada, J., and Cooper, J.A. (1992). Requirements for phosphorylation of
MAP kinase during meiosis in Xenopus oocytes. Science 255, 212–215.
Pupko, T., Pe’er, I., Hasegawa, M., Graur, D., and Friedman, N. (2002). A
branch-and-bound algorithm for the inference of ancestral amino-acid
sequences when the replacement rate varies among sites: Application to the
evolution of five gene families. Bioinformatics 18, 1116–1123.
Schoeffler, A.J., and Berger, J.M. (2008). DNA topoisomerases: harnessing
and constraining energy to govern chromosometopology. Q. Rev. Biophys.
41, 41–101.
Serber, Z., and Ferrell, J.E., Jr. (2007). Tuning bulk electrostatics to regulate
protein function. Cell 128, 441–444.
Strickfaden, S.C., Winters, M.J., Ben-Ari, G., Lamson, R.E., Tyers, M., and Pry-
ciak, P.M. (2007). A mechanism for cell-cycle regulation of MAP kinase
signaling in a yeast differentiation pathway. Cell 128, 519–531.
946 Cell 147, 934–946, November 11, 2011 ª2011 Elsevier Inc.
Taylor, J.W., and Berbee, M.L. (2006). Dating divergences in the Fungal Tree of
Life: review and new analyses. Mycologia 98, 838–849.
Thorsness, P.E., and Koshland, D.E., Jr. (1987). Inactivation of isocitrate dehy-
drogenase by phosphorylation ismediated by the negative charge of the phos-
phate. J. Biol. Chem. 262, 10422–10425.
Tjong, H., Qin, S., and Zhou, H.X. (2007). PI2PE: protein interface/interior
prediction engine. Nucleic Acids Res. 35, W357-362.
Vogt, G., Woell, S., and Argos, P. (1997). Protein thermal stability, hydrogen
bonds, and ion pairs. J. Mol. Biol. 269, 631–643.
Wan, P.T., Garnett, M.J., Roe, S.M., Lee, S., Niculescu-Duvaz, D., Good, V.M.,
Jones, C.M., Marshall, C.J., Springer, C.J., Barford, D., et al. (2004). Mecha-
nism of activation of the RAF-ERK signaling pathway by oncogenic mutations
of B-RAF. Cell 116, 855–867.
Wang, Z., Udeshi, N.D., Slawson, C., Compton, P.D., Sakabe, K., Cheung,
W.D., Shabanowitz, J., Hunt, D.F., and Hart, G.W. (2010). Extensive crosstalk
between O-GlcNAcylation and phosphorylation regulates cytokinesis. Sci.
Signal. 3, ra2.
	A Mechanism for the Evolution of Phosphorylation Sites
	Introduction
	Results
	Glutamate and Aspartate Substitution Rates at Phosphoserines versus Control Serines
	A Subset of Phosphoserines Has Very High Percentages of Asp/Glu Replacement
	Phylogenetic Analysis
	Emergence of Phosphorylation Sites at the Prokaryotic/Eukaryotic Split
	Elongation Factor eEF2
	DNA Topoisomerase II
	Emergence of Phosphorylation Sites Later in Evolution
	Enolase
	Emergence of Phosphosites in Particular Paralogs of a Multiparalog Protein Family
	Raf
	The G-Protein-Coupled Receptor Kinases
	Evolution of a Phosphosite into an Acidic Residue
	Trees for Control Serine Sites
	Topo II, Enolase, and Raf Protein Structures
	Discussion
	Experimental Procedures
	Identifying Homologs
	Alignments
	Binning and Bootstrapping
	Trees
	Supplemental Information
	Acknowledgments
	References