Acta Entomologica Sinica ›› 2024, Vol. 67 ›› Issue (3): 346-357.doi: 10.16380/j.kcxb.2024.03.005

• RESEARCH PAPERS • Previous Articles     Next Articles

Improvement of the sequences and functional annotations of the Apis cerana reference genome with the nanopore long-read data of the gut transcriptome of larval A. cerana cerana workers

LI Kun-Ze 1,#, SONG Yu-Xuan 1,#, ZANG He 1, JING Xin1, FAN Xiao-Xue1, CHEN Ying1, NA Zhi-Hao1, CHEN Da-Fu1,2,3, FU Zhong-Min1,2,3,*, GUO Rui1,2,3,*   

  1. (1. College of Bee Science and Biomedicine, Fujian Agriculture and Forestry University, Fuzhou 350002, China; 2. National & Local United Engineering Laboratory of Natural Biotoxin, Fuzhou 350002, China; 3. Apicultural Research Institute of Fujian Province, Fuzhou 350002, China)
  • Online:2024-03-20 Published:2024-04-17

Abstract:  【Aim】 The obtained nanopore long-read data of Apis cerana cerana transcriptome were compared with the reference genome of A.cerana, and the structures of the annotated genes were optimized. The unannotated new genes and new transcripts were identified and functionally annotated, and their SSR loci, complete ORFs and transcription factor (TF) families and members were predicted and verified, so as to improve the sequence and functional annotations of the reference genome of A. cerana. 【Methods】 Based on the high-quality transcriptome nanopore sequencing data of the 4-, 5- and 6-day-old larvae of A. cerana cerana workers infected with Ascosphaera apis, the identified full-length transcripts were mapped to the reference genome of A. cerana with gffcompare software to optimize the structures of the annotated genes. The unannotated novel genes and transcripts in the reference genome were identified utilizing the gffcompare software and mapped to the Nr, KOG, eggNOG, GO and KEGG databases for functional annotation. MISA, TransDecoder v3.0.0 and animalTFDB 2.0 software were employed to respectively predict the SSR loci, complete ORFs as well as TF families and members. 【Results】 A total of 4 648 annotated genes in the reference genome of A. cerana were structurally optimized, the 5′UTR and 3′UTR of 1 336 genes were simultaneously extended, while the 5′UTR of 1 688 genes and the 3′UTR of 1 624 genes were respectively extended. A total of 2 148 novel genes were identified, among which 818, 298, 587, 359 and 333 genes could be annotated to Nr, KOG, eggNOG, GO and KEGG databases, respectively. A total of 35 432 novel transcripts were identified, among which 30 974, 21 222, 29 025, 19 852, and 9 214 could be respectively annotated to the aforementioned five databases. A total of 22 541 SSR loci were detected, of which the numbers of SSRs with single, double, three and six base repeat were 12 078, 7 140, 2 825 and 43, respectively. The number of mixed SSRs was 2 964, and the type with the highest distribution frequency was single base repeat (153.37/Mb), and 58 TF families and 1 611 members were predicted. A total of 28 775 complete ORFs were predicted, of which the ORFs with the coding lengths ranging from 100 to 200 aa (38.99 %) were the most abundant. 【Conclusion】 These results optimize the structures of the annotated genes in the A. cerana reference genome and supplement novel genes, novel transcripts, SSR, complete ORFs, and TFs that were unannotated in the reference genome.

Key words: Apis cerana, A. cerana cerana, 3rd-generation sequencing technology, nanopore sequencing, full-length transcript, transcriptome, genome