Acta Entomologica Sinica ›› 2020, Vol. 63 ›› Issue (11): 1345-1357.doi: 10.16380/j.kcxb.2020.11.007

• RESEARCH PAPERS • Previous Articles     Next Articles

Elongation of genic untranslated regions, exploration of SSR loci and identification of unannotated genes and transcripts based on the nanopore sequencing dataset of Ascosphaera apis

DU Yu1,#, FU Zhong-Min1,#, ZHU Zhi-Wei1, WANG Jie1, FENG Rui-Rong1, WANG Xiu-Na2,3JIANG Hai-Bin1, FAN Yuan-Chan1, FAN Xiao-Xue1, XIONG Cui-Ling1, ZHENG Yan-Zhen1, XU Guo-Jun1, CHEN Da-Fu1, GUO Rui1,*   

  1.  (1. College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002, China; 2. College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China; 3. Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Fujian Agriculture and Forestry University, Fuzhou 350002, China)
  • Online:2020-11-20 Published:2020-12-08

Abstract: 【Aim】 This study aims to improve the annotation information of the current reference genome of Ascosphaera apis by utilizing previously gained nanopore long-read sequencing data, and to identify and perform functional annotation of unannotated novel genes and novel transcripts. 【Methods】 Based on the previously gained nanopore long-read sequencing data, full-length transcripts of A. apis were compared with transcripts annotated in the reference genome using gffcompare software to prolong untranslated regions (UTRs). The open reading frames (ORFs) of genes in A. apis and their corresponding amino acid sequences were predicted using TransDecoder software. MISA software was used to survey simple sequence repeat (SSR) loci within transcripts with a length above 500 bp. Based on Blast tool, novel genes and novel transcripts were aligned to the Nr, KOG, eggNOG, Swiss-Prot, Pfam, GO and KEGG databases to gain their corresponding functional annotations. 【Results】 Totally, UTRs of 9 481 genes in A. apis were prolonged, among which 4 744 and 4 737 genes were prolonged at 5′UTR and 3′UTR, respectively. In addition, 10 492 complete ORFs were predicted, among which the ORFs encoding proteins distributed in 0-100 aa and 100-200 aa in length were the most abundant, accounting for 38.96% and 36.90% of the total ORFs, respectively. A total of 5 286 SSRs were identified, and the numbers of mononucleotide repeats, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, pentanucleotide repeats and hexanucleotide repeats were 1 870, 826, 2 398, 138, 43 and 11, respectively. Besides, 1 558 novel genes were identified, among which 1 556, 731, 330, 592, 1 177, 709 and 589 were annotated to the Nr, Swiss-Prot, Pfam, KOG, eggNOG, GO and KEGG databases, respectively. Additionally, 14 403 novel transcripts were identified, among which 14 376, 8 524, 7 276, 7 405, 12 035, 7 891 and 6 855 were respectively annotated to the aforementioned seven databases. 【Conclusion】 By using the previously obtained nanopore long-read sequencing data, the complete ORFs of genes in A. apis has been predicted, the UTRs of annotated genes in reference genome have been elongated, the SSR loci have been explored, and a number of unannotated novel genes and novel transcripts have been identified and their functions annotated. These findings well improve the current genome annotation of A. apis, and offer a basis for further study on its omics and molecular biology.

Key words: Ascosphaera apis, long-read sequencing technology, full-length transcriptome; genome, honeybee, chalkbrood