浏览全部资源
扫码关注微信
复旦大学高分子科学系 聚合物分子工程国家重点实验室 上海 200433
[ "李剑锋,男,1980年生. 1999~2010年于复旦大学高分子科学系获得学士、硕士、博士学位;2007~2009年在加拿大McMaster大学公派出国留学生;2012~2013年复旦大学高分子系讲师,2013~2019年复旦大学高分子系副教授. 2019年至今,复旦大学高分子系教授. 主要从事高分子缠结理论、机器学习在高分子物理中的应用、非平衡热力学方法、大脑理论模型构建等方面研究." ]
纸质出版日期:2022-06-20,
网络出版日期:2022-03-17,
收稿日期:2021-12-28,
录用日期:2022-01-28
移动端阅览
王天尧,李剑锋.深度学习在蛋白质结构预测中的应用及启示[J].高分子学报,2022,53(06):581-591.
Wang Tian-yao,Li Jian-feng.Application of Deep Learning in Protein Structure Prediction and Its Inspirations[J].ACTA POLYMERICA SINICA,2022,53(06):581-591.
王天尧,李剑锋.深度学习在蛋白质结构预测中的应用及启示[J].高分子学报,2022,53(06):581-591. DOI: 10.11777/j.issn1000-3304.2021.21401.
Wang Tian-yao,Li Jian-feng.Application of Deep Learning in Protein Structure Prediction and Its Inspirations[J].ACTA POLYMERICA SINICA,2022,53(06):581-591. DOI: 10.11777/j.issn1000-3304.2021.21401.
蛋白质结构预测通常指借助计算机计算模拟方法从氨基酸序列推断其三维空间结构. 而空间结构决定其生理功能,故结构预测问题尤为重要. 基于单纯物理学的预测仅能应对较短蛋白质且精度不高. 而基于数据驱动和生物信息学的方法近十多年备受重视. 本文主要回顾近十多年来深度学习在蛋白质预测领域的应用,重点介绍Deepmind团队的AlphaFold方法,此方法预测在单域蛋白质达到了中低分辨率实验精度,一定程度上解决了困扰人们五十多年的蛋白质结构预测难题.
The goal of protein structure prediction is to determine
usually based on computer simulations or calculations
the three dimensional structure from a given amino acid sequence. Protein structure prediction is important since the 3D protein structure will further determine its biological functions. Nevertheless
traditional prediction method based on physics can only effectively deal with short proteins with the low accuracy. In the past decade
data-driven methods and methods based on genetic knowledge have become popular. This review covers several important developments about the deep-learning methods on protein prediction in the past ten years. Considering the education background of readers of the journal
we will first present a self-consistent but concise introduction about the prerequisite concepts and methods
related with genetic information and basis of deep learning
to understand the deep-learning methods for the protein structure prediction. The prerequisite concepts and methods include position-specific scoring matrix (PSSM)
multiple sequence alignment (MSA)
contact map
distogram
protein data bank (PDB)
critical assessment of protein structure prediction (CASP)
template modelling score (TM score)
universal approximation theorem and several important types of neural network closely related with protein structure prediction. Then
we compared the two most popular methods employed in the data-driven protein structure prediction
template-based method and template-free method. As for the deep-learning methods
the AlphaFold method from Deepmind will be specially discussed
which has achieved the prediction accuracy comparable to median or low experimental accuracy that even rendered some people to think that it has resolved the protein structure prediction problem to some extent. Nevertheless
all the above methods are too "overwhelming" and not friendly for beginners. Therefore
this review also introduced a simplest structure prediction problem
HP protein prediction problem
together with the corresponding deep-learning solution
strongly-correlated neural network
to novices in this area.
蛋白质折叠深度学习神经网络结构预测
Protein foldingDeep learningNeural networkStructural prediction
Lumry R, Eyring H. J Phys Chem, 1954, 58(2): 110-120. doi:10.1021/j150512a005http://dx.doi.org/10.1021/j150512a005
Dill K A, MacCallum J L. Science, 2012, 338(6110): 1042-1046. doi:10.1126/science.1219021http://dx.doi.org/10.1126/science.1219021
Anfinsen C B. Science, 1973, 181(4096): 223-230. doi:10.1126/science.181.4096.223http://dx.doi.org/10.1126/science.181.4096.223
Pauling L, Corey R B, Branson H R. Proc Natl Acad Sci USA, 1951, 37: 206-212. doi:10.1073/pnas.37.4.205http://dx.doi.org/10.1073/pnas.37.4.205
Dill K A. Biochem, 1990, 29(31): 7133-7155. doi:10.1021/bi00483a001http://dx.doi.org/10.1021/bi00483a001
Leopold P E, Montal M, Onuchic J N. Proc Natl Acad Sci USA, 1992, 89: 8721-8725. doi:10.1073/pnas.89.18.8721http://dx.doi.org/10.1073/pnas.89.18.8721
Bryngelson J D, Onuchic J N, Socci N D, Wolynes P G. Proteins, 1995, 21: 167-195. doi:10.1002/prot.340210302http://dx.doi.org/10.1002/prot.340210302
Dill K A, Chan H S. Nat Struct Biol, 1997, 4: 10-19. doi:10.1038/nsb0197-10http://dx.doi.org/10.1038/nsb0197-10
Dill K A. Biochem, 1985, 24: 1501-1509. doi:10.1021/bi00327a032http://dx.doi.org/10.1021/bi00327a032
Bryngelson J D, Wolynes P G. Proc Natl Acad Sci USA, 1987, 84: 7524-7528. doi:10.1073/pnas.84.21.7524http://dx.doi.org/10.1073/pnas.84.21.7524
Karplus M. Fold Des, 1997, 2: S69-S75. doi:10.1016/s1359-0278(97)00067-9http://dx.doi.org/10.1016/s1359-0278(97)00067-9
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl S A A, Ballard A J, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior A W, Kavukcuoglu K, Kohli P, Hassabis D. Nature, 2021, 596: 583-594. doi:10.1038/s41586-021-03819-2http://dx.doi.org/10.1038/s41586-021-03819-2
Kuhlman B, Bradley P. Nat Rev Mol Cell Biol, 2019, 20: 681-697. doi:10.1038/s41580-019-0163-xhttp://dx.doi.org/10.1038/s41580-019-0163-x
Pearce R, Zhang Y. Curr Opin Struct Biol, 2021, 68: 194-207. doi:10.1016/j.sbi.2021.01.007http://dx.doi.org/10.1016/j.sbi.2021.01.007
Senior A W, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson A W R, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones D T, Silver D, Kavukcuoglu K, Hassabis D. Nature, 2020, 577: 706-727. doi:10.1038/s41586-019-1923-7http://dx.doi.org/10.1038/s41586-019-1923-7
Li J F, Zhang H D, Chen J Z Y. Phys Rev Lett, 2019, 123: 108002. doi:10.1103/physrevlett.123.108002http://dx.doi.org/10.1103/physrevlett.123.108002
Stormo G D, Schneider T D, Gold L, Ehrenfeucht A. Nucleic Acids Res, 1982, 10(9): 2997-3011. doi:10.1093/nar/10.9.2997http://dx.doi.org/10.1093/nar/10.9.2997
Fang C, Shang Y, Xu D. Proteins, 2018, 86(5): 592-598. doi:10.1002/prot.25487http://dx.doi.org/10.1002/prot.25487
Jiang Q, Jin X, Lee S J, Yao S W. J Mol Graph Model, 2017 76: 379-402
Wang J, Zhao F, Peng J, Xu J B. Proteomics, 2019, 11 (19): 3786-3792
Botelho S, Simas G, Silveira P. Lect Notes Comput Sci , 2006, 4634. doi:10.1007/11893295_5http://dx.doi.org/10.1007/11893295_5
Kountouris P, Hirst J D. BMC Bioinf, 2009, 10 (1): 437-450. doi:10.1186/1471-2105-10-437http://dx.doi.org/10.1186/1471-2105-10-437
Bouziane H, Messabih B, Chouarfia A. Soft Comput, 2015, 19(6):1663-1678. doi:10.1007/s00500-014-1355-0http://dx.doi.org/10.1007/s00500-014-1355-0
Bouziane H, Messabih B, Chouarfia A. Evol Bioinform, 2011, 7(7): 171. doi:10.4137/ebo.s7931http://dx.doi.org/10.4137/ebo.s7931
Kountouris P, Agathocleous M, Promponas V J, Chritodoulou G, Hadjicostas S, Vassiliades V, Christodoulou C. ACM Trans Comput Biol Bioinf, 2012, 9(3): 731-739. doi:10.1109/tcbb.2012.22http://dx.doi.org/10.1109/tcbb.2012.22
Zhang C X, Zheng W, Mortuza S M, Li Y, Zhang Y. Bioinformatics, 2020, 36(7): 2105-2112. doi:10.1093/bioinformatics/btz863http://dx.doi.org/10.1093/bioinformatics/btz863
Thompson JD, Linard B, Lecompte O, Poch O. PLoS One, 2011, 6(3): e18093. doi:10.1371/journal.pone.0018093http://dx.doi.org/10.1371/journal.pone.0018093
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: 31st Conference on Neural Information Processing Systems. Long Beach: NIPS, 2017. 1-7
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Proteins, 2019, 87(12): 1011-1020. doi:10.1002/prot.25823http://dx.doi.org/10.1002/prot.25823
Roy A, Kucukural A, Zhang Y. Nat Protoc, 2010, 5(4): 725-738. doi:10.1038/nprot.2010.5http://dx.doi.org/10.1038/nprot.2010.5
Zhang Y, Skolnick J. Proc Natl Acad Sci USA, 2004, 101(20): 7594-7599. doi:10.1073/pnas.0305695101http://dx.doi.org/10.1073/pnas.0305695101
Song Y F, DiMaio F, Wang R Y R, Kim D, Miles C, Brunette T J, Thompson J, Baker D. Structure, 2013, 21: 1735-1742. doi:10.1016/j.str.2013.08.005http://dx.doi.org/10.1016/j.str.2013.08.005
Sali A, Blundell T L. J Mol Biol, 1993, 234: 779-815
Zhang J, Zhang Y. PLoS One, 2010, 5(10): 315386. doi:10.1371/journal.pone.0013303http://dx.doi.org/10.1371/journal.pone.0013303
Soding J. Bioinformatics, 2005, 21(7): 951-960. doi:10.1093/bioinformatics/bti125http://dx.doi.org/10.1093/bioinformatics/bti125
Xu J. Proc Natl Acad Sci USA, 2019, 116(34): 16856-16865. doi:10.1073/pnas.1821309116http://dx.doi.org/10.1073/pnas.1821309116
Wu S T, Zhang Y. Proteins, 2008, 72(2): 547-556. doi:10.1002/prot.21945http://dx.doi.org/10.1002/prot.21945
Altschul S F, Madden T L, Schaffer A A, Zhang J H, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res, 1997, 25(17): 3389-3402
Wang S, Sun S Q, Li Z, Zhang R Y, Xu J B. PLoS Comput Biol, 2017, 13(1): e1005324. doi:10.1371/journal.pcbi.1005324http://dx.doi.org/10.1371/journal.pcbi.1005324
Thompson J D, Higgins D G, Gibson T J. Nucleic Acids Res, 1994, 22(22): 4673-4680. doi:10.1093/nar/22.22.4673http://dx.doi.org/10.1093/nar/22.22.4673
Notredame C, Higgins D G, Heringa J. J Mol Biol, 2000, 302(1): 205-217. doi:10.1006/jmbi.2000.4042http://dx.doi.org/10.1006/jmbi.2000.4042
Katoh K, Misawa K, Kuma K, Miyata T. Nucleic Acids Res, 2002, 30(14): 3059-3066. doi:10.1023/a:1016093719807http://dx.doi.org/10.1023/a:1016093719807
Edgar R C. Nucleic Acids Res, 2004, 32(5): 1792-1797. doi:10.1093/nar/gkh340http://dx.doi.org/10.1093/nar/gkh340
Larkin M A, Blackshields G, Brown N P, Chenna R, McGettigan P A, McWilliam H, Valentin F, Wallace I M, Wilm A, Lopez R, Thompson J D, Gibson T J, Higgins D G. Bioinformatics, 2007, 23(21): 2947-2948. doi:10.1093/bioinformatics/btm404http://dx.doi.org/10.1093/bioinformatics/btm404
Sievers F, Wilm A, Dineen D, Gibson T J, Karplus K, Li W Z, Lopez R, McWilliam H, Remmert M, Soding J, Thompson J D, Higgins D G. Mol Syst Biol, 2011, 7(1): 539. doi:10.1038/msb.2011.75http://dx.doi.org/10.1038/msb.2011.75
Katoh K, Standley D M. Mol Biol Evol, 2013, 30(4): 772-780. doi:10.1093/molbev/mst010http://dx.doi.org/10.1093/molbev/mst010
Sievers F, Higgins D G. Curr Protoc Bioinformatics, 2014, 48(1): 3.11.1-3.11.16. doi:10.1002/0471250953.bi0313s48http://dx.doi.org/10.1002/0471250953.bi0313s48
Katoh K, Rozewicki J, Yamada K D. Brief Bioinform, 2019, 20(4): 1160-1166. doi:10.1093/bib/bbx108http://dx.doi.org/10.1093/bib/bbx108
Dill K A. Biochemistry, 1985, 24(6): 1501-1509. doi:10.1021/bi00327a032http://dx.doi.org/10.1021/bi00327a032
Gobel U, Sander C, Schneider R, Valencia A. Proteins, 1994, 18: 309-317. doi:10.1002/prot.340180402http://dx.doi.org/10.1002/prot.340180402
Kass I, Horovitz A. Proteins, 2002, 48: 611-617. doi:10.1002/prot.10180http://dx.doi.org/10.1002/prot.10180
Baldassi C, Zamparo M, Feinaucer C, Procaccini A, Zecchina R, Weight M, Pagnani A. PloS One, 2014, 9: 392721. doi:10.1371/journal.pone.0092721http://dx.doi.org/10.1371/journal.pone.0092721
Ekeberg M, Lovkvist C, Lan Y L, Weight M, Aurell E. Phys Rev E, 2013, 87: 012707. doi:10.1103/physreve.87.012707http://dx.doi.org/10.1103/physreve.87.012707
Li Y, Hu J, Zhang C X, Yu D J, Zhang Y. Bioinformatics, 2019, 35(22): 4647-4655. doi:10.1093/bioinformatics/btz291http://dx.doi.org/10.1093/bioinformatics/btz291
Sun H P, Huang Y, Wang X F, Zhang Y, Shen H B. Proteins, 2015, 83(3): 485-496. doi:10.1002/prot.24744http://dx.doi.org/10.1002/prot.24744
Atchley W R, Wollenberg K R, Fitch W M, Terhalle W, Dress A W. Mol Biol Evol, 2000, 17(1): 164-178. doi:10.1093/oxfordjournals.molbev.a026229http://dx.doi.org/10.1093/oxfordjournals.molbev.a026229
Fodor A A, Aldrich R W. Proteins, 2004, 56: 211-221. doi:10.1002/prot.20098http://dx.doi.org/10.1002/prot.20098
Weight M, White R A, Szurmant H, Hoch J A, Hwa T. Proc Natl Acad Sci USA, 2009, 106(1): 67-72. doi:10.1073/pnas.0805923106http://dx.doi.org/10.1073/pnas.0805923106
Morcos F, Pagnani A, Lunt B, et al. Proc Natl Acad Sci USA, 2011, 108(49): E1293-E1301. doi:10.1073/pnas.1111471108http://dx.doi.org/10.1073/pnas.1111471108
Balkrishnan S, Kamisetty H, Carbonell J G, Lee S I, Langmead C J. Proteins, 2011, 79(4): 1061-1078. doi:10.1002/prot.22934http://dx.doi.org/10.1002/prot.22934
Jones D T, Buchan D W A, Cozzetto D, Pontil M. Bioinformatics, 2012, 28(2): 184-190. doi:10.1093/bioinformatics/btr638http://dx.doi.org/10.1093/bioinformatics/btr638
Marks D S, Colwell L J, Sheridan R, Hopf T A, Pagnani A, Zecchina R, Sander C. PloS One, 2011, 6(12): e28766. doi:10.1371/journal.pone.0028766http://dx.doi.org/10.1371/journal.pone.0028766
Berman H M, Battistuz T, Bhat T N, Bluhm W F, Bourne P E, Burkhardt K, Feng Z, Gilliland G L, Iype L, Jain S. Acta Crystallogr D, 2002, 58(6): 899-907. doi:10.1107/s0907444902003451http://dx.doi.org/10.1107/s0907444902003451
Hou J, Wu T Q, Cao R Z, Cheng J L. Proteins, 2019, 87(12): 1165-1178. doi:10.1002/prot.25697http://dx.doi.org/10.1002/prot.25697
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Proteins, 2019, 87(12): 1011-1020. doi:10.1002/prot.25823http://dx.doi.org/10.1002/prot.25823
Xu J R, Zhang Y. Bioinformatics, 2010, 26(7): 889-895. doi:10.1093/bioinformatics/btq066http://dx.doi.org/10.1093/bioinformatics/btq066
Mariani V, Biasini M, Barbato A, Schwede T. Bioinformatics, 2013, 29(21): 2722-2728. doi:10.1093/bioinformatics/btt473http://dx.doi.org/10.1093/bioinformatics/btt473
Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge: MIT Press, 2016. 96-152
Hornik K, Tinchcombe M, White H. Neural Netw, 1989, 2: 359-366. doi:10.1016/0893-6080(89)90020-8http://dx.doi.org/10.1016/0893-6080(89)90020-8
He K M, Zhang X Y, Ren S Q, Sun J. 2016 IEEE Conference on Computer Vision and Pattern Recognition. Lasvegas, Nevada: IEEE, 2016. 770-778. doi:10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90
Zheng W, Zhang C X, Wuyun Q Q, Pearce R, Li Y, Zhang Y. Nucleic Acids Res, 2019, 47: W429-W436. doi:10.1093/nar/gkz384http://dx.doi.org/10.1093/nar/gkz384
Jones D T, McGuffin L J. Proteins, 2003, 53: 480-485. doi:10.1002/prot.10542http://dx.doi.org/10.1002/prot.10542
Simons K T, Kooperberg C, Huang E, Baker D. J Mol Biol, 1997, 268: 209-225. doi:10.1006/jmbi.1997.0959http://dx.doi.org/10.1006/jmbi.1997.0959
Xu D, Zhang Y. Proteins, 2012, 80(7): 1715-1735. doi:10.1002/prot.24065http://dx.doi.org/10.1002/prot.24065
Jones D T. J Mol Biol, 1999, 292: 195-202. doi:10.1006/jmbi.1999.3091http://dx.doi.org/10.1006/jmbi.1999.3091
Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. J Chem Phys, 1953, 21(6): 1087-1092. doi:10.1063/1.1699114http://dx.doi.org/10.1063/1.1699114
Bowie J U, Eisenberg D. Proc Natl Acad Sci USA, 1994, 91: 4436-4440. doi:10.1073/pnas.91.10.4436http://dx.doi.org/10.1073/pnas.91.10.4436
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee G R, Wang J, Cong Q, Kinch L N, Schaeffer R D, Millan C, Park H, Adams C, Glassman C R, DeGiovanni A, Pereira J H, Rodrigues A V, van Dijk A A, Ebrecht A C, Opperman D J, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy M K, Dalwadi U, Yip C K, Burke J E, Garcia K C, Grishin N V, Adams P D, Read R J, Baker D. Science, 2021, 373: 871-876. doi:10.1126/science.abj8754http://dx.doi.org/10.1126/science.abj8754
Chen L. Deep Learning and Practice with MindSpore. Singapore: Springer Nature, 2021. 17-60. doi:10.1007/978-981-16-2233-5http://dx.doi.org/10.1007/978-981-16-2233-5
LeCun Y, Bottou L, Bengio Y, Haffner P. Proc IEEE, 1998, 86(11): 2278-2324. doi:10.1109/5.726791http://dx.doi.org/10.1109/5.726791
0
浏览量
276
下载量
3
CSCD
关联资源
相关文章
相关作者
相关机构