基于定量结构-活性关系预测苯及其衍生物的立即威胁生命或健康的浓度

Prediction of concentration immediately dangerous to life or health of benzene and its derivatives based on quantitative structure-activity relationship

  • 摘要:
    背景  随着工作场所危化品暴露的增多,员工的职业健康伤害和职业安全事故频发,危化品职业接触限值的获取迫在眉睫。
    目的  为了获取更多未知的立即威胁生命或健康的浓度(IDLH),探索定量结构-活性关系(QSAR)预测研究方法应用于IDLH的情况,从而为评估与防控职业健康伤害提供一定的理论依据和技术支持。
    方法  本文运用QSAR将50种苯及其衍生物的IDLH与化合物的分子结构关联起来并展开预测研究。首先应用近邻传播聚类算法对样本集进行聚类划分,随后运用Dragon2.1软件计算并预筛出537种分子描述符,然后运用遗传算法筛选出的6个特征分子描述符作为应变量,分别构建了多元线性回归模型(MLR)以及支持向量机(SVM)与人工神经网络(ANN)两种非线性模型。最后,采用内、外验证评估模型的性能并绘制Williams图确定模型的适用范围。
    结果  ANN模型训练集和测试集的R2分别为 R_\mathrmt\mathrmr\mathrma\mathrmi\mathrmn^2=0.8526和 R_\mathrmt\mathrme\mathrms\mathrmt^2=0.8505,均方根误差(RMSE)=0.5243、平均绝对误差(MAE)=0.4610,内、外验证系数: Q_\mathrml00^2 =0.8476、 Q_\mathrme\mathrmx\mathrmt^2 =0.8905。经比较,ANN模型各性能验证参数均优于MLR和SVM模型,且所有物质均在应用域之内。
    结论  目前,ANN模型具备最好的拟合能力、稳定性、预测性,适用于预测苯及其衍生物的IDLH。通过QSAR的方法预测苯及其衍生物的IDLH值是一种有效方法,为职业健康与安全的发展提供了一定的理论依据和技术支持。

     

    Abstract:
    Background  With the increasing exposure to hazardous chemicals in the workplace and frequency of occupational injuries and occupational safety accidents, the acquisition of occupational exposure limits of hazardous chemicals is imminent.
    Objective  To obtain more unknown immediately dangerous to life or health (IDLH) concentrations of hazardous chemicals in the workplace by exploring the application of quantitative structure-activity relationship (QSAR) prediction method to IDLH concentrations, and to provide a theoretical basis and technical support for the assessment and prevention of occupational injuries.
    Methods  QSAR was used to correlate the IDLH values of 50 benzene and its derivatives with the molecular structures of target compounds. Firstly, affinity propagation algorithm was applied to cluster sample sets. Secondly, Dragon 2.1 software was used to calculate and pre-screen 537 molecular descriptors. Thirdly, the genetic algorithm was used to select six characteristic molecular descriptors as dependent variables and to construct a multiple linear regression model (MLR) and two nonlinear models using support vector machine (SVM) and artificial neural network (ANN) respectively. Finally, model performance was evaluated by internal and external validation and Williams diagram was drawn to determine the scopes of selected models.
    Results  The ANN model results showed that R_\mathrmt\mathrmr\mathrma\mathrmi\mathrmn^2 =0.8526 and R_\mathrmt\mathrme\mathrms\mathrmt^2 =0.8505 respectively, root mean square (RMSE) error=0.5243, mean absolute error (MAE)=0.4610, internal and external validation coefficients Q_\mathrml00^2 =0.8476 and Q_\mathrme\mathrmx\mathrmt^2 =0.8905 respectively. By comparison, the performance verification parameters of the ANN model were superior to the MLR and SVM models, and all substances were in the applicable domain.
    Conclusion  At present, the ANN model has the best performance in fitting ability, stability, and prediction, and is suitable for predicting IDLH concentrations of benzene and its derivatives. Predicting the IDLH concentraitons of benzene and its derivatives by QSAR method is an effective method, and provides a theoretical basis and technical support for the development of occupational health and safety.

     

/

返回文章
返回