Acta Entomologica Sinica ›› 2020, Vol. 63 ›› Issue (12): 1461-1472.doi: 10.16380/j.kcxb.2020.12.004

Construction and annotation of the full-length transcriptome of Nosema ceranae based on the third-generation nanopore sequencing technology

CHEN Hua-Zhi1,#, DU Yu1,#, FAN Xiao-Xue1, ZHU Zhi-Wei1, JIANG Hai-Bin1, WANG Jie1, FAN Yuan-Chan1, XIONG Cui-Ling1,2, ZHENG Yan-Zhen1, FU Zhong-Min1,2, XU Guo-Jun1, CHEN Da-Fu1,*, GUO Rui1,2,*    

  1. (1. College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China; 2. Apitherapy Research Institute, Fujian Agriculture and Forestry University, Fuzhou 350002, China)
  • Online:2020-12-20 Published:2021-01-14

Abstract: 【Aim】 This study aims to assemble and annotate a high-quality full-length transcriptome of Nosema ceranae using Oxford Nanopore sequencing technology. 【Methods】 The transcriptome of clean spores of N. ceranae was sequenced using Nanopore PromethION system. Full-length transcripts were identified by recognizing primers at both ends of every clean read. Full-length transcripts were aligned to Nr, Swiss-Prot, KOG, eggNOG, Pfam, GO and KEGG databases to gain the corresponding annotations. Protein domain analysis methods including CPC, CNCI, CPAT and Pfam were used to predict long noncoding RNAs (lncRNAs), and the intersection was determined to be high-reliability lncRNAs. The expression level of each full-length transcript was calculated using CPM (counts per million) method. 【Results】 A total of 6 988 795 raw reads were obtained by Nanopore PromethION sequencing system, and 6 953 469 clean reads were gained after quality control, including 5 143 999 full-length transcripts. Besides, 10 243 non-redundant fulllength transcripts were identified, with the N50, the average length and the maximum length of 1 042, 894 and 4 855 bp, respectively. Furthermore, 9 342, 4 038, 4 283, 2 569, 4 859 and 3 450 full-length transcripts were annotated to Nr, KOG, eggNOG, Pfam, GO and KEGG, respectively. Additionally, the majority of full-length transcripts were annotated to N. ceranae, Nosema apis and Nosema bombycis. Totally, 87 high-reliability lncRNAs were identified, including 49 sense lncRNAs, 25 antisense lncRNAs and 13 intergenic lncRNAs. The sequencing depth in this study was enough to detect all expressed fulllength transcripts with the expression level (CPM) ranging from 0.1 to more than 10 000. 【Conclusion】 The high-quality full-length transcriptome of N. ceranae was constructed and annotated in this study, laying a key foundation for comparative transcriptome analysis, investigation of alternative splicing and alternative adenylation of transcripts, identification of simple sequence repeat (SSR) loci, optimization of gene structure, and full-length sequence cloning and functional study of genes.

Key words: Nosema ceranae, full-length transcriptome, long noncoding RNA, third-generation sequencing technology, nanopore sequencing