昆虫学报 ›› 2024, Vol. 67 ›› Issue (9): 1251-1261.doi: 10.16380/j.kcxb.2024.09.009

• 研究论文 • 上一篇    下一篇

基于Local-Global-VIT细粒度分类算法的蝴蝶识别

李建祥1, 李小林1, 王荣2, 张元孜1, 陈淑武1, 张飞萍2,3, 黄世国1,3,*   

  1. (1. 福建农林大学计算机与信息学院, 福州 350002; 2. 福建农林大学林学院, 福州 350002; 3. 生态公益重大有害生物防控福建省高校重点实验室, 福州 350002)
  • 出版日期:2024-09-20 发布日期:2024-10-22

Butterfly recognition based on Local-Global-VIT fine-grained classification algorithm

LI Jian-Xiang1, LI Xiao-Lin1, WANG Rong2, ZHANG Yuan-Zi1, CHEN Shu-Wu1, ZHANG Fei-Ping2,3, HUANG Shi-Guo1,3,*   

  1. (1. College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China; 2. College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China; 3. Key Laboratory of Integrated Pest Management in Ecological Forests, Fujian Province University, Fuzhou 350002, China)
  • Online:2024-09-20 Published:2024-10-22

摘要: 【目的】准确鉴别蝴蝶种类,动态观测蝴蝶群落多样性变化对生境质量评估、生态环境恢复等方面具有重要意义。针对现有蝴蝶识别方法仅依靠整体特征,忽略了局部特征导致识别生态图像能力不足的问题,本研究旨在开发一种Local-Global-VIT细粒度分类算法的蝴蝶识别方法。【方法】本研究以5科200种共计25 279张蝴蝶图像为识别对象,采用多种数据增强方法扩充图像数据;通过视觉Transformer(vision transformer, VIT)层级结构及自注意力机制逐层选择局部令牌并保留至最后一层学习蝴蝶局部判别部位信息;聚合高层全局令牌消除复杂背景干扰;通过对比损失拉大类间距提高区分度。除此之外,使用合理的学习率调整策略和迁移学习方法,优化了模型收敛过程,在不增加参数量的情况下提高了性能。【结果】 Local-Global-VIT算法在大规模细粒度公开数据集Butterfly-200上识别准确率达91.20%,较改进前提升了1.15%,比最优的一般害虫识别算法EfficientNet_b0和细粒度分类算法TransFG准确率分别高了1.83%和0.64%, F1分值分别提高了1.89%和0.88%。【结论】Local-Global-VIT算法以细粒度识别方式有效解决了蝴蝶类内差异大、类间差异小的分类难题,能准确地识别蝴蝶种类,有助于高效评估生境质量。

关键词: 蝴蝶, 图像识别, 细粒度分类, vision transformer, 局部令牌选择, 全局令牌聚合

Abstract: 【Aim】 Identifying butterfly species accurately and monitoring changes in butterfly community diversity dynamically play a significant role in habitat quality assessment and ecological environment restoration. This study aims to develop a Local-Global-VIT fine-grained classification algorithm-based method for butterfly recognition to address the limitation of existing butterfly recognition methods by relying solely on global features but overlooking local features, consequently, leading to inadequate recognition of ecological images. 【Methods】 A dataset of 25 279 butterfly images from 200 species across five families for recognition was used. Various data augmentation techniques were employed to expand the image data. By utilizing the hierarchical structure and self-attention mechanism of vision transformer (VIT), the method selected local tokens layer by layer and retains them until the final layer learned the discriminative local features of butterflies. High-level global tokens were aggregated to mitigate interference from complex backgrounds. Contrastive loss was optimized to widen the inter-class gap and improve differentiation. Additionally, a reasonable learning rate adjustment strategy and transfer learning methods were applied to optimize the model’s convergence process, thereby improving performance without increasing the number of parameters. 【Results】 The recognition accuracy of the Local-Global-VIT algorithm reached 91.20% on the extensive fine-grained Butterfly-200 public dataset, which represented an improvement of 1.15% over previous methods. Therefore, the accuracy of the Local-Global-VIT algorithm exhibited an enhancement by 1.83% and 0.64%, respectively, and its F1-scores increased by 1.89% and 0.88%, respectively, in comparison to the state-of-the-art general pest recognition algorithm EfficientNet_b0 and the fine-grained classification algorithm, TransFG. 【Conclusion】 The Local-Global-VIT algorithm effectively addresses the challenge of distinguishing between significantly different intra-class characteristics and subtle inter-class differences in butterflies through fine-grained recognition, and can accurately identifies various butterfly species, thus contributing to the efficient habitat quality assessment.

Key words: Butterfly, image recognition, finegrained classification, vision transformer, local tokens selection, global tokens aggregation