Abstract:
                                      Background Mercury, as a global heavy metal pollutant, poses a serious threat to human health. The toxicity of mercury depends on its chemical form. Distinguishing the forms of mercury in the environment is of great significance for mercury management and reducing human mercury exposure risks. 
Objective To establish a non-targeted metallomics method based on synchrotron radiation X-ray fluorescence (SRXRF) spectroscopy combined with machine learning to screen inorganic mercury (IHg) or methylmercury (MeHg) exposed rice plants. 
Methods Rice seeds were exposed to ultra-pure water (control group), 0.1 mg·L−1 IHg (IHg group) or MeHg (MeHg group) solutions, respectively. After germination, the seedlings were cultured for 21 d, and rice leaves were collected, dried, weighed, and pressed. The content of metallome in rice leaves was determined by SRXRF. Machine learning models including soft independent modeling cluster analysis (SIMCA), partial least squares discriminant analysis (PLS-DA), and logistic regression (LR) were used to classify the SRXRF full spectra of different groups and find the best model to distinguish rice exposed to IHg or MeHg. Besides, characteristic elements were selected as input parameters to optimize the model by improving computing speed and reducing model calculation. 
Results The SRXRF spectral intensities of the control group, IHg group, and MeHg group were different, indicating that exposure to IHg and MeHg can interfere the homeostasis of metallome in rice leaves. The results of principal component analysis (PCA) of SRXRF spectra showed that the control group could be well distinguished from the mercury exposed groups, but the IHg group and the MeHg group were mostly overlapped. The accuracy rates of the three models (PLS-DA, SIMCA, and LR) were higher than 98% for the training set, higher than 95% for the validation set, and higher than 94% for the cross-validation set. Besides, the accuracy of the LR model was higher than that of the PLS-DA model and the SIMCA model. Furthermore, the accuracy  was 92.05% when using characteristic elements K, Ca, Mn, Fe, and Zn selected by LR to distinguish the IHg group and the MeHg group. Compared with the full spectra model, although the prediction accuracy of the characteristic spectral model decreased, the input parameters of the model decreased by 99.51%, and precision, recall, and F1 score were above 84.48%, indicating that the model could distinguish rice exposed to different mercury forms. 
Conclusion Non-targeted metallomics method based on SRXRF and machine learning can be applied for high-throughput screening of rice exposed to different forms of mercury and thus decrease the risks of people being exposed to mercury.