›› 2013, Vol. 56 ›› Issue (10): 1217-1228.doi:

• 综述 • 上一篇    下一篇

系统发育分析中的最大简约法及其优化

郑巍1,2,3, 罗阿蓉2, 史卫峰4, 郑为民1,5, 朱朝东2,*   

  1. (1. 中国科学院深圳先进技术研究院, 广东深圳 518055; 2. 中国科学院动物研究所, 动物进化与系统学(院)重点实验室, 北京 100101;
     3. 中国科学院大学, 北京 100049; 4. 泰山医学院基础医学院, 山东泰安 271016; 5. 中国科学院信息工程研究所, 北京 100093)
  • 出版日期:2013-10-20 发布日期:2013-10-20

Phylogenetic algorithms: maximum parsimony and its optimization

ZHENG Wei1,2,3, LUO A-Rong2, SHI Wei-Feng4, ZHENG Wei-Min1,5, ZHU Chao-Dong2,*   

  1. (1. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China; 2. Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; 3. University of Chinese Academy of Sciences, Beijing 100049, China; 4. School of Basic Medical Sciences, Taishan Medical College, Tai’an, Shandong 271016, China; 5. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093,China)
  • Online:2013-10-20 Published:2013-10-20

摘要: 随着生物技术的不断发展和系统发育学的深入研究, 在重构系统发育树时, 研究人员往往要面对更多的挑战和困难, 比如: (1)需要分析的样本数(物种数或个体数)不断增加; (2)需要分析的数据量迅速扩大。尤其在基因组测序技术的推动下, 基于分子信息的系统发育重建需要极大的计算量, 因此数学方法、 计算机技术以及其他辅助工具对于系统发育重建的效率和精确度起着至关重要的作用。最大简约法(maximum parsimony)是一种重要的系统发育重建方法, 提高其计算效率对系统发育学研究具有重要意义, 针对该算法的优化改进需要生物学家和计算机专家的共同努力。本文通过详细地阐述最大简约法的计算流程, 分析其参数选择对计算效率的影响, 帮助更多的计算机使用者, 在并不了解系统发育学基础的情况下, 更方便地针对实际的系统发育算法问题给出更好、 更快、 更精准的解决方案; 同时为系统发育研究工作者, 较为清晰地解释最大简约法的构树思想和计算逻辑, 推动针对最大简约法的不断改进与优化。

关键词: 系统发育, 系统发育重建, 算法, 最大简约法, 计算流程, 计算效率, 优化

Abstract: With the continuous development of biotechnoglogy and progresses in phylogenetics, researchers now are facing more and more challenges and difficulties in reconstructing phylogenetic trees: 1) species number (or individual number) of the specific taxon of research is always increasing; 2) the number of taxonomical characters (for example molecular information) of each species (or individual) is also enlarging. Especially with the efforts of genomesequencing technology, phylogenetic reconstruction based on molecular information requires massive computation. Mathematical methods, computer technologies and other auxiliary means play key roles in enhancing the efficiency and accuracy of phylogenetic reconstruction. Maximum parsimony (MP) is a very important method for phylogenetic reconstruction, and it needs efforts of both biologists and computer scientists to enhance its computational efficiency. In this article, we elaborated the calculation procedure of the MP method in details and analyzed the influences of parameter selection on computational efficiency, in order to help more computer researchers without detailed knowledge of phylogenetics to present better, quicker and more precise solutions to phylogenetic reconstruction in practice. In the meantime, we tried to explain the basic principles and computational logic of the MP method for phylogenetic researchers to push forward continuous improvement and optimization of using maximum parsimony in biology.

Key words: Phylogenetics, phylogenetic reconstruction, algorithm, maximum parsimony, calculation procedure, computational efficiency, optimization