Select
Improvement of the sequences and functional annotations of the Apis cerana reference genome with the nanopore long-read data of the gut transcriptome of larval A. cerana cerana workers
LI Kun-Ze, SONG Yu-Xuan , ZANG He , JING Xin, FAN Xiao-Xue, CHEN Ying, NA Zhi-Hao, CHEN Da-Fu, FU Zhong-Min, GUO Rui
2024, 67(3):
346-357.
doi:10.16380/j.kcxb.2024.03.005
Abstract
(
129 )
PDF (2704KB)
(
169
)
Related Articles |
Metrics
【Aim】 The obtained nanopore long-read data of Apis cerana cerana transcriptome were compared with the reference genome of A.cerana , and the structures of the annotated genes were optimized. The unannotated new genes and new transcripts were identified and functionally annotated, and their SSR loci, complete ORFs and transcription factor (TF) families and members were predicted and verified, so as to improve the sequence and functional annotations of the reference genome of A. cerana. 【Methods】 Based on the high-quality transcriptome nanopore sequencing data of the 4-, 5- and 6-day-old larvae of A. cerana cerana workers infected with Ascosphaera apis , the identified full-length transcripts were mapped to the reference genome of A. cerana with gffcompare software to optimize the structures of the annotated genes. The unannotated novel genes and transcripts in the reference genome were identified utilizing the gffcompare software and mapped to the Nr, KOG, eggNOG, GO and KEGG databases for functional annotation. MISA, TransDecoder v3.0.0 and animalTFDB 2.0 software were employed to respectively predict the SSR loci, complete ORFs as well as TF families and members. 【Results】 A total of 4 648 annotated genes in the reference genome of A. cerana were structurally optimized, the 5′UTR and 3′UTR of 1 336 genes were simultaneously extended, while the 5′UTR of 1 688 genes and the 3′UTR of 1 624 genes were respectively extended. A total of 2 148 novel genes were identified, among which 818, 298, 587, 359 and 333 genes could be annotated to Nr, KOG, eggNOG, GO and KEGG databases, respectively. A total of 35 432 novel transcripts were identified, among which 30 974, 21 222, 29 025, 19 852, and 9 214 could be respectively annotated to the aforementioned five databases. A total of 22 541 SSR loci were detected, of which the numbers of SSRs with single, double, three and six base repeat were 12 078, 7 140, 2 825 and 43, respectively. The number of mixed SSRs was 2 964, and the type with the highest distribution frequency was single base repeat (153.37/Mb), and 58 TF families and 1 611 members were predicted. A total of 28 775 complete ORFs were predicted, of which the ORFs with the coding lengths ranging from 100 to 200 aa (38.99 %) were the most abundant. 【Conclusion】 These results optimize the structures of the annotated genes in the A. cerana reference genome and supplement novel genes, novel transcripts, SSR, complete ORFs, and TFs that were unannotated in the reference genome.