基于随机森林的小型容器积水蚊虫孳生预测模型的研究

Study on prediction model of mosquito breeding in small containers based on random forest

  • 摘要:
    背景 白纹伊蚊是上海市居民小区的优势蚊种,居民区内有众多类型的小型容器积水,为白纹伊蚊提供了大量的孳生环境,从而导致蚊媒传染病传播风险增加。
    目的 采用随机森林算法对农村集中改建小区的小型容器积水蚊虫孳生情况进行预测,掌握城市化进程中集中改建小区内环境因素对小型容器积水蚊虫孳生的影响。
    方法 在上海市两处农村集中改建小区(分别为A小区和B小区)内开展小型容器积水调查,记录并分析积水所处环境,发现有蚊虫的卵、幼虫或蛹的积水,即记录为阳性积水。以户为单位应用空间权重矩阵,并采用全局Moran's I指数分别对两个小区环境中积水和阳性积水进行空间自相关分析。Moran's I大于0时,表示数据呈现空间正相关;Moran's I小于0时,表示数据呈现空间负相关;Moran's I为0时,空间呈随机分布;结合P值和Z值结果,探索小区环境中的小型积水和阳性积水的空间分布特征。采用机器学习中随机森林算法,对环境相关的孳生因素进行分类排序,预测小型容器积水蚊虫孳生情况;运用受试者操作特征(ROC)曲线等指标评估模型拟合效果。
    结果 积水所处建筑的方位(χ2=23.350,P<0.001),所处是否为空地(χ2=8.83,P=0.003),其上方是否有乔木生长(χ2=11.02,P=0.001)等环境因素均对积水阳性率的影响有统计学意义。采用全局Moran's I指数对小型积水及阳性积水空间分布特征分析结果显示,小区A小型积水全局Moran's I指数为−0.092(Z=−1.09,P=0.274),小区B全局Moran's I指数为0.034(Z=0.52,P=0.602);小区A阳性积水全局Moran's I指数为−0.092(Z=−1.14,P=0.255),小区B阳性积水全局Moran's I指数为0.070(Z=0.95,P=0.342);由于两个小区P值均大于0.1,Z值均介于−1.65~1.65之间,以户为单位的积水及阳性积水的空间分布特征均为随机分布,未发现空间聚集的特征。拟合的随机森林算法分类预测模型中,采用重要性前10个特征因素的拟合模型,ROC曲线下面积(AUC)值为0.95,预测拟合效果较为理想。分类排序结果提示,积水所在户的积水数及阳性积水数是最主要的影响因素。
    结论 采用环境因素指标构建的随机森林模型可以用于开展小型容器积水蚊虫孳生情况的预测,为科学防控蚊媒孳生提供依据。

     

    Abstract:
    Background Aedes albopictus is the dominant mosquito species in residential areas in Shanghai. There are many types of small containers with accumulated water in residential areas, providing a large number of breeding environments for Aedes alpopicuts and leading to an increasing transmission risk of mosquito-borne diseases.
    Objective To use random forest to predict breeding of Aedes mosquitoes in small aquatic container habitat in two concentrated reconstruction communities of rural areas in Shanghai, and to understand associated influence of environmental factors on the breeding of Aedes mosquitoes in the process of urbanization.
    Methods Small-scale habitat surveys of Aedes mosquitoes were carried out in two suburb concentrated reconstruction communities (Community A and B) in Shanghai, and the environment where the habitat was located was recorded and analyzed in both communities. The habitat where eggs, larvae, or pupae were found was recorded as positive. Spatial weight matrix was applied on a household basis, and global Moran's I index was used to carry out spatial autocorrelation analysis on the small-scale habitat and positive habitat in the environment of the two communities. When Moran's I is greater than 0, it means that the data present a positive spatial correlation; when Moran's I is less than 0, it means that the data are spatially negatively correlated; when Moran's I is 0, the spatial distribution is random. Combining the results of P and Z values, we explored the spatial distribution characteristics of small-scale habitat and positive habitat in the community environment. Random forest algorithm in machine learning was used to classify and sort environmental-related factors, and predict the breeding of Aedes mosquitoes in small aquatic habitat; receiver operating characteristic (ROC) curve was used to carry out model fitting evaluation.
    Results The environmental factors including building location (χ2=23.35, P<0.001), open space (χ2=8.83, P=0.003), and having trees (χ2=11.02, P=0.001) had a significant impact on the positive rate of small-scale habitat. The results of spatial characteristics analysis showed that the global Moran's I index of small-scale habitat was −0.092 (Z=−1.09, P=0.274) in Community A and 0.034 (Z=0.52, P=0.602) in Community B, and the global Moran's I index of positive habitat was −0.092 (Z=−1.14, P=0.255) in Community A and 0.070 (Z=0.95, P=0.342) in Community B. Since the P values of Community A and B were greater than 0.1 and the Z values were between −1.65 and 1.65, for both small-scale habitat and positive habitat the spatial characteristics were randomly distributed and no significant spatial aggregation was found. In the fitted random forest algorithm classification prediction model with the top 10 characteristic factors of importance, the area under curve (AUC) value was 0.95, and the prediction fitting effect was satisfactory. The results of classification and sorting indicated that counts of household small-scale habitat and positive habitat were the most important factors for breeding.
    Conclusion The random forest model constructed by environmental factor indicators can be used to predict the breeding situation of Aedes mosquitoes in small-scale aquatic habitat, and provide a basis for scientific prevention and control of mosquito breeding for the target area.

     

/

返回文章
返回