昆虫学报 ›› 2023, Vol. 66 ›› Issue (12): 1626-1637.doi: 10.16380/j.kcxb.2023.12.009

• 研究论文 • 上一篇    下一篇

基于基因组规模数据的白蚁系统发育关系

宋南, 王淼淼, 刘小龙, 林兴雨, 席玉强, 尹新明*   

  1. (河南农业大学植物保护学院, 河南省害虫绿色防控国际联合实验室, 河南省害虫生物防控工程实验室, 郑州 450002)
  • 出版日期:2023-12-20 发布日期:2024-01-21

Phylogenetic relationships of termites inferred from the genome-scale data

SONG Nan, WANG Miao-Miao, LIU Xiao-Long, LIN Xing-Yu, XI Yu-Qiang, YIN Xin-Ming*   

  1.  (Henan International Laboratory for Green Pest Control, Henan Engineering Laboratory of Pest Biological Control, College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China)
  • Online:2023-12-20 Published:2024-01-21

摘要: 【目的】 本研究旨在利用白蚁转录组和低覆盖度全基因组测序(low-coverage whole-genome sequencing)数据重建白蚁领科(Termitoidae)昆虫内高阶元类群(科和亚科)之间的系统发育关系,为研究白蚁系统进化提供系统基因组学分析思路。【方法】 通过下载数据库中67种白蚁及8种近缘蜚蠊目(Blattodea)昆虫的转录组(70个)和低覆盖度全基因组(5个)测序数据,利用BUSCO对这些序列数据进行评估以及单拷贝核基因的筛选。使用软件MAFFT分别对获得的单拷贝核基因的核苷酸和氨基酸序列进行比对,并通过trimAl对比对结果进行修剪。使用Phykit生成不同完整性(分别包含50%和25%的缺失数据)的核苷酸和氨基酸序列数据超级矩阵,据此研究缺失数据对系统发育重建的影响。使用IQTREE构建各个矩阵的最大似然(maximum likelihood, ML)树。利用ASTRAL总结氨基酸数据集faa_all包含的每个标记构建的ML树,得到物种树。使用IQ-TREE中的FcLM分析检测不同树的拓扑结构,进而获得不同数据集对可能的系统发育关系的支持度。【结果】 从白蚁现有的转录组和低覆盖度全基因组测序数据中获得1 325个单拷贝核基因,基于这些单拷贝核基因构建了核苷酸和氨基酸序列数据的基因组规模的超级矩阵,核苷酸序列数据集包含了144 294~1 839 525个位点, 氨基酸数据集包含了48 098~613 175个位点。不同类型的数据矩阵产生了相似的白蚁领科系统发育关系, 3个核苷酸数据矩阵产生了相同的科间系统发育关系。对于氨基酸数据,3个串联基因数据集中的2个产生了与核苷酸序列数据集基本相同的科间系统发育关系。支持白蚁领科为单系群,澳白蚁科(Mastotermitidae)是所有其他白蚁科的姐妹群。在大多数分析中,古白蚁科(Archotermopsidae)与胃白蚁科(Stolotermitidae)互为姐妹群关系,两者共同组成白蚁领科中的第2个分支。木白蚁科(Kalotermitidae)也是白蚁中相对古老的一支,其系统发育位置在古白蚁科与胃白蚁科之后。木白蚁科与新等翅类(Neoisoptera)构成了姐妹群关系。所有分析都强烈支持新等翅类的单系性。在新等翅类中,杆白蚁科(Stylotermitidae)是其余白蚁的姐妹群;齿白蚁科(Serritermitidae)是新等翅类中相对原始的一支。鼻白蚁科(Rhinotermitidae)是一个非单系群。白蚁科(Termitidae)是一个单系群,6个串联基因数据集中的4个以及物种树支持大白蚁亚科(Macrotermitinae)是白蚁科内所有其他亚科的姐妹群。多数分析支持尖白蚁亚科(Apicotermitinae)是白蚁科中的第2个分支。串联基因数据矩阵支持白蚁亚科(Termitinae)为非单系,但是物种树恢复白蚁亚科为单系群。【结论】 本研究显示了转录组和低覆盖度全基因组测序数据在重建白蚁领科系统发育关系中的实用性,构建了与以前研究基本一致的系统发育关系。但是,仍需要进一步扩充数据取样,包括标本和分子标记,来阐明这一昆虫类群亚科之间的系统发育关系。

关键词: 白蚁, 转录组, 全基因组, 单拷贝核基因, 系统发育, 单系群

Abstract: 【Aim】 This study aims to reconstruct the phylogenetic relationships among the higher-level taxa (families and subfamilies) of Termitoidae using transcriptome and low-coverage whole-genome sequencing data, providing a phylogenomic approach for studing the systematic evolution of termites. 【Methods】 By downloading the existing transcriptome (70) and low-coverage whole-genome (5) sequencing data of 67 species of termites and 8 related species of Blattodea, we used BUSCO to evaluate these sequence data and screen the singlecopy nuclear genes. MAFFT was used to align the nucleotide and amino acid sequences of the obtained single-copy nuclear genes, and trimAl was used to trim the alignments. Phykit was used to generate nucleotide and amino acid sequence data supermatrices with different completeness (including 50% and 25% missing data, respectively) to investigate the effect of missing data on the phylogenetic reconstruction. IQ-TREE was used to construct the maximum likelihood (ML) trees based on each matrix. In addition, ASTRAL was utilized to summarize the ML trees constructed on each marker included in the amino acid dataset faa_all, and obtain the species tree. Finally, FcLM analysis implemented in IQ-TREE was used to test the topology structure of different trees and obtain support for possible phylogenetic relationships inferred from different datasets. 【Results】 A total of 1 325 single-copy nuclear genes from the existing transcriptome and low-coverage whole-genome sequencing data of termites were obtained. Based on these single-copy nuclear genes, we constructed genomescale supermatrices of nucleotide and amino acid sequence data, with nucleotide sequence datasets ranging from 144 294 to 1 839 525 sites and amino acid datasets ranging from 48 098 to 613 175 sites. Different types of data matrices generated similar phylogenetic relationships in Termitoidae and three nucleotide data matrices produced the same inter-family phylogenetic relationships. For the amino acid data, two out of the three concatenated gene datasets produced inter-family phylogenetic relationships that were largely consistent with the nucleotide sequence datasets. This study supports the monophyly of Termitoidae and suggests that Mastotermitidae is the sister group to all other termite families. In most analyses, Archotermopsidae and Stolotermitidae were sister groups, forming the 2nd diverging branch within Termitoidae. The family Kalotermitidae is also a relatively ancient lineage within the termites, positioned after the Archotermopsidae and Stolotermitidae. The family Kalotermitidae forms a sister group relationship with the Neoisoptera. All analyses strongly support the monophyly of Neoisoptera. Within Neoisoptera, the family Stylotermitidae is the sister group to the remaining termite lineages. The family Serritermitidae is also a relatively basal lineage within Neoisoptera. The family Rhinotermitidae is a non-monophyletic group. The family Termitidae is a monophyletic group, with four out of the six concatenated gene datasets and species trees supporting the subfamily Macrotermitinae as the sister group to all other subfamilies within Termitidae. Most analyses support the subfamily Apicotermitinae as the 2nd branch within Termitidae. The concatenated gene data matrix supports the non-monophyly of Termitinae, but the species tree recovered Termitinae as a monophyletic group. 【Conclusion】 This study demonstrates the utility of transcriptome and low-coverage whole-genome sequencing data in reconstructing the phylogenetic relationships within Termitoidae, yielding results consistent with the previous studies. However, further data sampling, including specimens and molecular markers, is needed to elucidate the inter-subfamily relationships within this insect group.

Key words:  Termite, transcriptome, whole-genome, single-copy nuclear gene, phylogeny, monophyletic group