71
Protein Structure Modeling
400 for SwissProt and hence, it is expected that the average 
protein  will  have  two  structural  domains  that  must  be 
examined.
  4. If no domains can be detected, one can resort to identifying 
“block  structures”  in  a  multiple  sequence  alignment.  The 
multiple sequence alignment can be generated using blast or 
PSI-BLAST from NCBI webpage, http://blast.ncbi.nlm.nih.
gov/Blast.cgi.  Viewing  the  alignment  of  longer  proteins 
sometimes has a “blocky” appearance where one part of the 
sequence has numerous homologs that do not cover the other 
parts. These blocks are indicative of domains and thus putative 
domains can be identified by the block boundaries.
  5. The  online  databases  are  quite  comprehensive,  but  newly 
sequenced  proteins  are,  for  obvious  reasons,  not  present. 
However, because all the tools presented here are available via 
web services, it is possible to model these proteins too.
  6. There are also proteins that belong to protein families that 
are less studied for which most of these techniques fail. Note 
that the tools presented  herein  are  dependent  on knowing 
something about homologs to the protein of interest.
References
  1.  Pacheco, B., Maccarana, M., Goodlett, DR., 
Malmström,  A.,  Malmström,  L.  (2008), 
Identification of the active site of DS-epimerase 
1 and requirement of N-glycosylation for enzyme 
function.  J  Biol  Chem  2009  Jan  16;  284(3): 
1741–7.
  2.  Berman,  H.,  Henrick,  K.,  Nakamura,  H., 
Markley,  JL.  (2007),  The  worldwide  Protein 
Data Bank (wwPDB): ensuring a single, uni-
form archive of PDB data. Nucleic Acids Res 35: 
D301–3 (pmid: 17142228).
  3.  Rohl, CA., Strauss, CE., Misura, KM., Baker, D. 
(2004),  Protein  structure  prediction  using 
Rosetta. Methods Enzymol 383: 66–93. (pmid: 
15063647).
  4.  Eswar, N., Eramian, D., Webb, B., Shen, MY., 
Sali,  A.  (2008),  Protein  structure  modeling 
with Modeller. Methods Mol Biol 426: 145–59. 
(pmid: 18542861).
  5.  Pieper,  U.,  Eswar,  N.,  Davis,  FP.,  Braberg, 
H.,  Madhusudhan,  MS.,  Rossi,  A.,  Marti-
Renom,  M.,  Karchin,  R.,  Webb,  BM., 
Eramian, D., Shen, MY., Kelly, L., Melo, F., 
Sali,  A.  (2006),  MODBASE:  a  database  of 
annotated comparative protein structure mod-
els and associated resources. Nucleic Acids Res 
34: D291–5. (pmid: 16381869).
  6.  Simons,  KT.,  Kooperberg,  C.,  Huang,  E., 
Baker, D. (1997), Assembly of protein tertiary 
structures  from  fragments  with  similar  local 
sequences  using  simulated  annealing  and 
Bayesian  scoring  functions.  J  Mol  Biol  268: 
209–25. (pmid: 9149153).
  7.  Das, R., Qian, B., Raman, S., Vernon, R., 
Thompson, J., Bradley, P., Khare, S., Tyka, 
MD.,  Bhat,  D.,  Chivian,  D.,  Kim,  DE., 
Sheffler,  WH.,  Malmström,  L.,  Wollacott, 
AM., Wang, C., Andre, I., Baker, D. (2007), 
Structure prediction for CASP7 targets using 
extensive all-atom refinement with Rosetta@
home.  Proteins  1:  118–28.  (pmid: 
17894356).
  8.  Shortle, D., Simons, KT., Baker, D. (1998), 
Clustering  of  low-energy  conformations  near 
the  native  structures  of  small  proteins.  Proc 
Natl  Acad  Sci  USA  95:  11158–62.  (pmid: 
9736706).
  9.  Riffle,  M.,  Malmström,  L.,  Davis,  TN.  The 
yeast  resource  center  public  data  repository. 
(2005),  Nucleic  Acids  Res  33:  D378–82. 
(pmid: 15608220).
 10.  Kim, DE., Chivian, D., Malmström, L., Baker, 
D.  (2005),  Automated  prediction  of  domain 
boundaries in CASP6 targets using Ginzu and