
草地贪夜蛾基因组微卫星的分布规律
Distribution of microsatellites in the genome of Spodoptera frugiperda
张雪莲 王红梅 王 磊 唐瑞祥 岳碧松 孟 杨
点击:1045次 下载:35次
DOI:10.7679/j.issn.2095-1353.2020.132
作者单位:四川大学生命科学学院,生物资源与生态环境教育部重点实验室,成都 610064; 四川大学生命科学学院,四川省濒危野生动物保护生物学重点实验室,成都 610064
中文关键词:草地贪夜蛾;微卫星;基因组;染色体;功能注释
英文关键词:Spodoptera frugiperda; microsatellite; genome; chromosome; functional annotation
中文摘要:
【目的】 本研究旨在全基因组水平上分析草地贪夜蛾Spodoptera frugiperda完美微卫星的分布规律并对微卫星位于外显子的基因进行GO功能分析,为开发微卫星标记并进一步开展功能研究提供数据基础。【方法】 利用软件Krait v0.10.2鉴定草地贪夜蛾基因组的完美微卫星并分析其多样性;利用Python脚本对完美微卫星进行定位,分析其在全基因组不同位置的分布规律;进一步利用BLAST软件、Swiss-Prot蛋白质数据库和R包的clusterProfiler v3.14.3工具包对微卫星位于外显子的基因进行GO功能分析。【结果】 共搜索到完美微卫星64 025个,其中以单碱基微卫星为主,有28 782个,其余依次是四碱基(19 278个)、三碱基(7 685个)、二碱基(5 734个)、五碱基(1 639个)和六碱基微卫星(907个),进一步分析发现6类完美微卫星分别以A、AC、ATC、AAAT、AAAAT和AAAGTC重复拷贝类别为主。随着重复拷贝类别碱基数目的增加,完美微卫星偏好的重复拷贝次数范围逐渐变小。除23、28和30号染色体,6类完美微卫星在其余染色体上的分布情况与其总体分布情况相一致,进一步定位于基因间区的完美微卫星有47 714个,定位于基因上的完美微卫星有16 311个,其中内含子区有16 103个,外显子区有208个。完美微卫星位于外显子分布于196个基因,GO注释分析发现,有132个基因得到注释,所得GO条目为357个,其中122个GO条目归于细胞组分;117个GO条目与分子功能有关;118个GO条目参与到生物学过程中。GO富集条目则主要与磷酸酶和激酶活性等分子功能以及RNA聚合酶Ⅱ介导的转录等生物学过程有关。【结论】 本研究初步了解了草地贪夜蛾基因组完美微卫星的分布情况,其基因组完美微卫星的总体相对丰度较低,且大量分布于基因组的非编码区,其中除了单碱基微卫星外,四碱基和三碱基微卫星的相对丰度相对较高,能为开发草地贪夜蛾多态性微卫星标记提供丰富的候选位点。
英文摘要:
[Objectives] To analyze the distribution of perfect SSRs (P-SSRs) in the entire genome of Spodoptera frugiperda and the GO function of genes containing P-SSRs in the exon regions, in order to provide a data base for the development of microsatellite markers and further functional studies. [Methods] Krait v0.10.2 was used to identify P-SSRs and analyze their diversity, after which a python script was used to locate P-SSRs so that their distribution in different regions of the genome could be analyzed. Finally, BLAST software, the Swiss-Prot protein database and the R package’s clusterProfiler v3.14.3 toolkit were combined to analyze the functions of genes containing P-SSRs in exon regions. [Results] A total of 64 025 P-SSRs were identified, of these mononucleotide SSRs (28 782) were the most abundant, followed by tetranucleotide SSRs (19 278), trinucleotide SSRs (7 685), dinucleotide SSRs (5 734), pentanucleotide SSRs (1 639) and hexanucleotide SSRs (907). The main repeated copy categories in the above 6 types of P-SSRs were A, AC, ATC, AAAT, AAAAT and AAAGTC, respectively. The number of repetitions of P-SSRs gradually decreased as the number of repeated bases increased. In addition to chromosomes 23, 28 and 30, the distribution of the 6 types of P-SSRs on other chromosomes was consistent with their overall distribution. A further 47 714 P-SSRs were located in the intergenic regions and 16 311 in the genes, including 16 103 in intron regions and 208 in exon regions. P-SSRs in the exon regions were distributed on 196 genes, 132 genes of which were annotated by GO annotation analysis. There were 357 GO terms, of which 122 were attributed to the cellular component, 117 to molecular function and 118 to biological processes. GO enrichment terms were mainly related to molecular functions such as phosphatase and kinase activity, and to biological processes such as RNA polymerase II mediated transcription.[Conclusion] A preliminarily analysis of the distribution of P-SSRs in the S. frugiperda genome was successfully completed. Although the total relative abundance of P-SSRs is relatively low, P-SSRs are widely distributed in the intergenic regions of the genome. In addition to mononucleotide SSRs, the relative abundance of tetranucleotide SSRs and trinucleotide SSRs is relatively high, which could provide an abundance of candidate loci for use as microsatellite molecular markers.