昆虫学报 ›› 2020, Vol. 63 ›› Issue (12): 1461-1472.doi: 10.16380/j.kcxb.2020.12.004

• 研究论文 • 上一篇    下一篇

基于第三代纳米孔测序技术的东方蜜蜂微孢子虫全长转录组构建及注释

陈华枝1,#, 杜宇1,#, 范小雪1, 祝智威1, 蒋海宾1, 王杰1, 范元婵1熊翠玲1,2, 郑燕珍1, 付中民1,2, 徐国钧1, 陈大福1,*, 郭睿1,2,*   

  1. (1. 福建农林大学动物科学学院(蜂学学院), 福州 350002; 2. 福建农林大学蜂疗研究所, 福州 350002)
  • 出版日期:2020-12-20 发布日期:2021-01-14

Construction and annotation of the full-length transcriptome of Nosema ceranae based on the third-generation nanopore sequencing technology

CHEN Hua-Zhi1,#, DU Yu1,#, FAN Xiao-Xue1, ZHU Zhi-Wei1, JIANG Hai-Bin1, WANG Jie1, FAN Yuan-Chan1, XIONG Cui-Ling1,2, ZHENG Yan-Zhen1, FU Zhong-Min1,2, XU Guo-Jun1, CHEN Da-Fu1,*, GUO Rui1,2,*    

  1. (1. College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China; 2. Apitherapy Research Institute, Fujian Agriculture and Forestry University, Fuzhou 350002, China)
  • Online:2020-12-20 Published:2021-01-14

摘要: 【目的】本研究旨在利用Oxford Nanopore测序技术组装和注释东方蜜蜂微孢子虫Nosema ceranae的高质量全长转录组。【方法】采用Nanopore PromethION系统对东方蜜蜂微孢子虫的纯净孢子进行转录组测序。通过识别每条clean read两端引物鉴定全长转录本序列。利用Blast工具将全长转录本比对Nr, Swiss-Prot, KOG, eggNOG, Pfam, GO和KEGG数据库,获得相应注释信息。分别利用蛋白结构域分析方法CPC, CNCI, CPAT和Pfam对长链非编码RNA(long noncoding RNA, lncRNA)进行预测,获得高可信度lncRNA。利用CPM(counts per million)法计算每一条全长转录本的表达量。【结果】利用Nanopore PromethION系统对东方蜜蜂微孢子虫转录组测序共测得6 988 795条raw reads,经质控获得6 953 469条clean reads,其中包含5 143 999条全长转录本。共鉴定到10 243条非冗余全长转录本,N50和平均读长分别为1 042 bp和894 bp,最大读长为4 855 bp。有9 342, 4 038, 4 283, 2 569, 4 859和3 450条全长转录本分别注释到Nr, KOG, eggNOG, Pfam, GO和KEGG数据库。注释到东方蜜蜂微孢子虫、蜜蜂微孢子虫Nosema apis和家蚕微孢子虫Nosema bombycis的全长转录本数量最多。共鉴定到87条高可信度lncRNA,包含49条正义链lncRNA(sense lncRNA)、25条反义链lncRNA(anti-sense lncRNA)和13条基因间区lncRNA。本研究的测序量足以检测到全部表达的全长转录本,全长转录本的表达量(CPM)范围在0.1到10 000以上。【结论】本研究构建和注释了东方蜜蜂微孢子虫的高质量全长转录组数据,可为病原的比较转录组分析、转录本的可变剪接和可变腺苷酸化分析、简单重复序列 (simple sequence repeat, SSR)位点挖掘、基因结构优化以及基因全长序列克隆及功能研究提供关键基础。

关键词: 东方蜜蜂微孢子虫, 全长转录组, 长链非编码RNA, 第三代测序技术, 纳米孔测序

Abstract: 【Aim】 This study aims to assemble and annotate a high-quality full-length transcriptome of Nosema ceranae using Oxford Nanopore sequencing technology. 【Methods】 The transcriptome of clean spores of N. ceranae was sequenced using Nanopore PromethION system. Full-length transcripts were identified by recognizing primers at both ends of every clean read. Full-length transcripts were aligned to Nr, Swiss-Prot, KOG, eggNOG, Pfam, GO and KEGG databases to gain the corresponding annotations. Protein domain analysis methods including CPC, CNCI, CPAT and Pfam were used to predict long noncoding RNAs (lncRNAs), and the intersection was determined to be high-reliability lncRNAs. The expression level of each full-length transcript was calculated using CPM (counts per million) method. 【Results】 A total of 6 988 795 raw reads were obtained by Nanopore PromethION sequencing system, and 6 953 469 clean reads were gained after quality control, including 5 143 999 full-length transcripts. Besides, 10 243 non-redundant fulllength transcripts were identified, with the N50, the average length and the maximum length of 1 042, 894 and 4 855 bp, respectively. Furthermore, 9 342, 4 038, 4 283, 2 569, 4 859 and 3 450 full-length transcripts were annotated to Nr, KOG, eggNOG, Pfam, GO and KEGG, respectively. Additionally, the majority of full-length transcripts were annotated to N. ceranae, Nosema apis and Nosema bombycis. Totally, 87 high-reliability lncRNAs were identified, including 49 sense lncRNAs, 25 antisense lncRNAs and 13 intergenic lncRNAs. The sequencing depth in this study was enough to detect all expressed fulllength transcripts with the expression level (CPM) ranging from 0.1 to more than 10 000. 【Conclusion】 The high-quality full-length transcriptome of N. ceranae was constructed and annotated in this study, laying a key foundation for comparative transcriptome analysis, investigation of alternative splicing and alternative adenylation of transcripts, identification of simple sequence repeat (SSR) loci, optimization of gene structure, and full-length sequence cloning and functional study of genes.

Key words: Nosema ceranae, full-length transcriptome, long noncoding RNA, third-generation sequencing technology, nanopore sequencing