Prediction of concentration immediately dangerous to life or health of benzene and its derivatives based on quantitative structure-activity relationship
R2分别为 R_\mathrmt\mathrmr\mathrma\mathrmi\mathrmn^2=0.8526和 R_\mathrmt\mathrme\mathrms\mathrmt^2=0.8505，均方根误差（RMSE）=0.5243、平均绝对误差（MAE）=0.4610，内、外验证系数： Q_\mathrml00^2 =0.8476、 Q_\mathrme\mathrmx\mathrmt^2 =0.8905。经比较，ANN模型各性能验证参数均优于MLR和SVM模型，且所有物质均在应用域之内。 结论
With the increasing exposure to hazardous chemicals in the workplace and frequency of occupational injuries and occupational safety accidents, the acquisition of occupational exposure limits of hazardous chemicals is imminent.
To obtain more unknown immediately dangerous to life or health (IDLH) concentrations of hazardous chemicals in the workplace by exploring the application of quantitative structure-activity relationship (QSAR) prediction method to IDLH concentrations, and to provide a theoretical basis and technical support for the assessment and prevention of occupational injuries.
QSAR was used to correlate the IDLH values of 50 benzene and its derivatives with the molecular structures of target compounds. Firstly, affinity propagation algorithm was applied to cluster sample sets. Secondly, Dragon 2.1 software was used to calculate and pre-screen 537 molecular descriptors. Thirdly, the genetic algorithm was used to select six characteristic molecular descriptors as dependent variables and to construct a multiple linear regression model (MLR) and two nonlinear models using support vector machine (SVM) and artificial neural network (ANN) respectively. Finally, model performance was evaluated by internal and external validation and Williams diagram was drawn to determine the scopes of selected models.
The ANN model results showed that R_\mathrmt\mathrmr\mathrma\mathrmi\mathrmn^2 =0.8526 and R_\mathrmt\mathrme\mathrms\mathrmt^2 =0.8505 respectively, root mean square (RMSE) error=0.5243, mean absolute error (MAE)=0.4610, internal and external validation coefficients Q_\mathrml00^2 =0.8476 and Q_\mathrme\mathrmx\mathrmt^2 =0.8905 respectively. By comparison, the performance verification parameters of the ANN model were superior to the MLR and SVM models, and all substances were in the applicable domain.
At present, the ANN model has the best performance in fitting ability, stability, and prediction, and is suitable for predicting IDLH concentrations of benzene and its derivatives. Predicting the IDLH concentraitons of benzene and its derivatives by QSAR method is an effective method, and provides a theoretical basis and technical support for the development of occupational health and safety.