›› 2015, Vol. 58 ›› Issue (10): 1037-1045.

• 研究论文 •    下一篇

德国小蠊全基因组中微卫星分布规律

王晨1,#, 杜联明1,#, 李鹏1, 杨茗羽1, 李午佼1,沈咏梅2, 张修月1, 岳碧松1,*   

  1. (1. 四川大学生命科学学院, 生物资源与生态环境教育部重点实验室, 成都 610064;2. 药用美洲大蠊四川省重点实验室, 成都 610081)
  • 出版日期:2015-10-20 发布日期:2015-10-20
  • 作者简介:王晨, 女, 1991年生, 陕西西安人, 硕士研究生, 研究方向为生物信息学, E-mail: cwang@stu.scu.edu.cn; 杜联明, 男, 1988年生, 重庆人, 博士研究生, 研究方向为生物信息学, E-mail: adullb@qq.com

Distribution patterns of microsatellites in the genome of the German cockroach (Blattella germanica)

WANG Chen1,#, DU Lian-Ming1,#, LI Peng1, YANG Ming-Yu1, LI Wu-Jiao1, SHEN Yong-Mei2, ZHANG Xiu-Yue1, YUE Bi-Song1,*   

  1. (1. Key Laboratory of Bioresources and Ecoenvironment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610064, China; 2. Sichuan Key Laboratory of Medicinal Periplaneta America, Chengdu 610081, China)
  • Online:2015-10-20 Published:2015-10-20

摘要: 【目的】分析德国小蠊 Blattella germanica 全基因组中微卫星的数量和分布规律,并对外显子中含有微卫星的基因进行功能注释。【方法】使用微卫星搜索软件查找德国小蠊基因组中微卫星的数量、重复次数以及所有微卫星的位置信息,编写Python脚本对微卫星进行定位,并通过Blast2Go和KASS程序对外显子中含有微卫星的基因进行功能注释。【结果】共找到1~6碱基重复类型的微卫星序列604 386个,总长度15 301 255 bp,约占全基因组序列(约2.04 Gb)的0.75%,分布频率为1/3.37 kb,微卫星序列的长度主要在12~60个碱基长度范围内。不同类型的微卫星中,三碱基(226 876)重复类型微卫星数量最多,占微卫星总数的37.54%;四碱基(150 355)重复类型次之,占微卫星总数的24.88%;其余依次是单碱基(141 167)、二碱基(60 877)、五碱基(21 570)和六碱基(3 541)重复类型,分别占微卫星总数的23.36%, 10.07%, 3.57%和0.59%。出现最多的重复拷贝类别有:ATT, AAT, A, T, AAAT, ATTT和AT,共411 789个微卫星,占微卫星总数的68.13%,这7种类别的微卫星数量均大于30 000个。共有2 372个微卫星在外显子上,它们分别位于1 481个基因上。GO功能注释结果表明,其中434条归类于细胞组分(cellular component),402条归类于分子功能(molecular function),660条归类于生物学过程(biological process)。KEGG通路分析结果表明,与新陈代谢相关的基因最多(380个),其次是与机体系统相关的(276个),与遗传信息进程相关的基因最少(92个)。【结论】本研究为进一步系统深入分析德国小蠊微卫星功能及微卫星分子标记筛选打下了基础。

关键词: 德国小蠊, 微卫星, 生物信息学, 功能注释, 基因组, 外显子

Abstract: 【Aim】 The objective of this study is to analyze the number and distribution of microsatellites in the whole genome of the German cockroach, Blattella germanica, and get the functional annotation information of genes containing microsatellites in exons. 【Methods】 The microsatellite number, repetition and location information were calculated by using microsatellite search tool. The distribution information of microsatellites in the genome was calculated by custom Python scripts, and all genes containing microsatellites were annotated by using the programs of Blast2Go and KAAS. 【Results】 A total of 604 386 simple sequence repeats (SSRs) with 1-6 bp nucleotide motifs were identified, with a total length of 15 301 255 bp, indicating that about 0.75% of the B. germancia genome (2.04 Gb) is occupied by SSRs and that there is a locus per 3.37 kb. The length of the microsatellite sequences mainly ranges from 12 to 60 bp. Among different types of microsatellites, trinucleotide microsatellites (226 876, 37.54%) are the most abundant SSRs, followed by tetranucleotide microsatellites (150 355, 24.88%), mononucleotide microsatellites (141 167, 23.36%), dinucleotide microsatellites (60 877, 10.07%), pentanucleotide microsatellites (21 570, 3.57%) and hexa-nucleotide microsatellites (3 541, 0.59%). The predominant repeat types are ATT, AAT, A, T, AAAT, ATTT and AT, with a total number of 411 789, accounting for 68.13% of the total SSRs. The number of each of these 7 categories is greater than 30 000. There are 2 372 microsatellites in the exons of 1 481 genes. The results of GO annotation indicated that 434 GO terms are classified as cellular component, 402 GO terms are related to molecular function and 660 GO terms are related to the biological process. Aligned to KEGG database, most of these genes are associated with metabolism, followed by genes related to organismal system, and genes related to genetic information processing are the least. 【Conclusion】 This study lays a foundation for further in-depth analysis of microsatellite function and developing microsatellite markers of B. germanica.

Key words: Blattella germanica, microsatellite, bioinformatics, functional annotation, genome, exon