基于同步辐射X射线荧光光谱与机器学习的非靶标金属组学方法区分暴露于不同形态汞的水稻

Non-targeted metallomics based on synchrotron radiation X-ray fluorescence spectroscopy and machine learning for screening inorganic or methylmercury-exposed rice plants

  • 摘要:
    背景 汞是一种全球性污染物,严重威胁人类健康。不同形态汞的毒性不同,建立区分暴露于不同形态汞的样品的方法有助于针对性开展汞的治理,为降低人类汞暴露风险提供依据。
    目的 建立基于同步辐射X射线荧光(SRXRF)光谱与机器学习相结合的非靶标金属组学方法,从而区分暴露于无机汞(IHg)或甲基汞(MeHg)的水稻。
    方法 水稻种子分别暴露于超纯水(对照组)、0.1 mg·L−1的IHg(IHg组)或MeHg(MeHg组)溶液中,种子发芽后继续培养21 d,收集水稻叶片、烘干、称重、压片。利用SRXRF测定各组水稻叶中金属组的含量。采用不同机器学习模型如软独立建模聚类分析(SIMCA)、最小二乘判别分析(PLS-DA)和逻辑回归(LR)对不同组叶片的SRXRF全光谱进行分类和识别,筛选出区分效果最优的模型以区分暴露于IHg或MeHg的水稻。进一步利用特征元素作为输入参数以提升运算速度,减少模型计算量,优化模型。
    结果 SRXRF显示,对照组、IHg组和MeHg组的SRXRF光谱强度各不相同,提示IHg或MeHg暴露可干扰水稻叶中金属组的稳态平衡。将SRXRF光谱进行主成分分析(PCA),发现对照组能与汞暴露组很好区分,但无法区分IHg组和MeHg组。利用PLS-DA、SIMCA和LR三个模型进行区分,发现训练集的准确率都高于98%,验证集的准确率都高于95%,交叉验证集的准确率都高于94%,其中LR模型的准确率均高于PLS-DA模型和SIMCA模型。以线性模型LR挑选出的K、Ca、Mn、Fe、Zn为特征元素区分IHg组和MeHg组的预测准确率为92.05%。与全光谱模型相比,利用特征光谱预测模型虽然预测准确率下降,但模型输入参数减少了99.51%,且精确度、召回率和F1得分在84.48%以上,同样可用于区分暴露于不同形态汞的水稻。
    结论 基于SRXRF和机器学习的非靶标金属组学方法可快速识别暴露于不同形态汞的水稻,减少人体摄入汞的风险。

     

    Abstract:
    Background Mercury, as a global heavy metal pollutant, poses a serious threat to human health. The toxicity of mercury depends on its chemical form. Distinguishing the forms of mercury in the environment is of great significance for mercury management and reducing human mercury exposure risks.
    Objective To establish a non-targeted metallomics method based on synchrotron radiation X-ray fluorescence (SRXRF) spectroscopy combined with machine learning to screen inorganic mercury (IHg) or methylmercury (MeHg) exposed rice plants.
    Methods Rice seeds were exposed to ultra-pure water (control group), 0.1 mg·L−1 IHg (IHg group) or MeHg (MeHg group) solutions, respectively. After germination, the seedlings were cultured for 21 d, and rice leaves were collected, dried, weighed, and pressed. The content of metallome in rice leaves was determined by SRXRF. Machine learning models including soft independent modeling cluster analysis (SIMCA), partial least squares discriminant analysis (PLS-DA), and logistic regression (LR) were used to classify the SRXRF full spectra of different groups and find the best model to distinguish rice exposed to IHg or MeHg. Besides, characteristic elements were selected as input parameters to optimize the model by improving computing speed and reducing model calculation.
    Results The SRXRF spectral intensities of the control group, IHg group, and MeHg group were different, indicating that exposure to IHg and MeHg can interfere the homeostasis of metallome in rice leaves. The results of principal component analysis (PCA) of SRXRF spectra showed that the control group could be well distinguished from the mercury exposed groups, but the IHg group and the MeHg group were mostly overlapped. The accuracy rates of the three models (PLS-DA, SIMCA, and LR) were higher than 98% for the training set, higher than 95% for the validation set, and higher than 94% for the cross-validation set. Besides, the accuracy of the LR model was higher than that of the PLS-DA model and the SIMCA model. Furthermore, the accuracy was 92.05% when using characteristic elements K, Ca, Mn, Fe, and Zn selected by LR to distinguish the IHg group and the MeHg group. Compared with the full spectra model, although the prediction accuracy of the characteristic spectral model decreased, the input parameters of the model decreased by 99.51%, and precision, recall, and F1 score were above 84.48%, indicating that the model could distinguish rice exposed to different mercury forms.
    Conclusion Non-targeted metallomics method based on SRXRF and machine learning can be applied for high-throughput screening of rice exposed to different forms of mercury and thus decrease the risks of people being exposed to mercury.

     

/

返回文章
返回