JP6941353B2

JP6941353B2 - Toxicity prediction method and its use

Info

Publication number: JP6941353B2
Application number: JP2017135877A
Authority: JP
Inventors: 敏彦澤田; 裕昭和佐田; 智裕橋本
Original assignee: Tokai National Higher Education and Research System NUC
Current assignee: Tokai National Higher Education and Research System NUC
Priority date: 2017-07-12
Filing date: 2017-07-12
Publication date: 2021-09-29
Anticipated expiration: 2037-07-12
Also published as: JP2019020791A

Description

本発明は化合物の毒性予測に関する。詳しくは、化合物の毒性を予測する方法、システム及びプログラムに関する。 The present invention relates to predicting toxicity of compounds. More specifically, it relates to methods, systems and programs for predicting the toxicity of compounds.

化合物の毒性は、in vitroやin vivoの試験によって、各種毒性指標（例えばhERG阻害、生分解性、変異原性）に基づき評価される。各毒性指標には固有の判定基準が設定され、当該判定基準に従い、毒性の有無が判定される。毒性評価のための試験には多くの時間と費用がかかるため、事前に毒性を予測し、試験に供する化合物（候補化合物）を事前に選定（絞りこむ）することが望まれる。即ち、実際の試験を行うことなく化合物の毒性を予測するニーズが存在する。予め毒性を予測できれば、候補化合物の数の低減に伴い、試験に要する時間及び費用を削減できる。その上、仮想の化合物に代表される、実際の試験が行えない又は困難な化合物の毒性も把握できることになる。この利点は、特に新規化合物の開発において重要であり、新規化合物の設計効率を高め、成功率向上と開発費削減に寄与する。化合物開発における動物実験抑制の世界的動向（REACH規則）を受けて、化合物の毒性予測に対する需要は一層高まっている。 The toxicity of a compound is evaluated by in vitro and in vivo tests based on various toxicity indicators (eg hERG inhibition, biodegradability, mutagenicity). Unique criteria are set for each toxicity index, and the presence or absence of toxicity is determined according to the criteria. Since the test for toxicity evaluation takes a lot of time and cost, it is desirable to predict the toxicity in advance and select (narrow down) the compounds (candidate compounds) to be used in the test in advance. That is, there is a need to predict the toxicity of a compound without conducting actual tests. If toxicity can be predicted in advance, the time and cost required for the test can be reduced as the number of candidate compounds is reduced. In addition, the toxicity of compounds that cannot be actually tested or are difficult to test, such as virtual compounds, can be grasped. This advantage is particularly important in the development of new compounds, and contributes to improving the design efficiency of new compounds, improving the success rate, and reducing development costs. With the global trend of suppressing animal experiments in compound development (REACH regulation), the demand for predicting the toxicity of compounds is increasing.

これまでに開発された毒性予測システム／プログラム等では、一般に、供試化合物に毒性が有る又は無い、との判定結果を予測精度とともに出力する（例えば特許文献１〜３、非特許文献１〜５を参照）。予測精度として、交差検証又は外部検証における一致率が用いられることが多い。一致率の値が高いほど予測精度が高いと判断される。一致率は、(化合物の毒性予測結果と化合物の毒性試験結果が一致した数)／(毒性予測した化合物の全数)と定義される。交差検証では、データをトレーニングセットとテストセットに分け、トレーニングセットを用いて予測方法を構築し、構築した予測方法の予測精度を、テストデータを用いて検証する。外部検証では、交差検証に用いたデータから独立したデータを用い、構築した予測方法の予測精度を検証する。交差検証や外部検証を利用したとしても、定性的な判定（即ち、毒性が有又は無）にかわりはなく、化合物間の比較（優劣の判定）は難しい。また、化合物の構造との関係で判定するものではないことから、判定結果の信頼性は高いとはいえない。 In the toxicity prediction systems / programs developed so far, generally, the judgment result that the test compound has or is not toxic is output together with the prediction accuracy (for example, Patent Documents 1 to 3 and Non-Patent Documents 1 to 5). See). As the prediction accuracy, the concordance rate in cross-validation or external validation is often used. It is judged that the higher the value of the match rate, the higher the prediction accuracy. The concordance rate is defined as (the number of matches between the toxicity prediction result of the compound and the toxicity test result of the compound) / (the total number of compounds predicted to be toxic). In cross-validation, data is divided into a training set and a test set, a prediction method is constructed using the training set, and the prediction accuracy of the constructed prediction method is verified using test data. In external verification, the prediction accuracy of the constructed prediction method is verified using data independent of the data used for cross-validation. Even if cross-validation or external validation is used, there is no change in qualitative judgment (that is, with or without toxicity), and comparison between compounds (judgment of superiority or inferiority) is difficult. Moreover, since the determination is not made in relation to the structure of the compound, the reliability of the determination result cannot be said to be high.

国際公開第２００９／０２５０４５号パンフレットInternational Publication No. 2009/025045 Pamphlet 国際公開第２００９／０７８０９６号パンフレットInternational Publication No. 2009/078096 Pamphlet 国際公開第２０１０／０１６１０９号パンフレットInternational Publication No. 2010/016109 Pamphlet

Wang S., ET AL, "Recent developments in computational prediction of HERG blockage", Current Topics in Medicinal Chemistry, (The United Arab Emirates), Bentham Science Publishers, 2013, vol. 13, iss. 11, p. 1317-1326, DOI: 10.2174/15680266113139990036Wang S., ET AL, "Recent developments in computational prediction of HERG blockage", Current Topics in Medicinal Chemistry, (The United Arab Emirates), Bentham Science Publishers, 2013, vol. 13, iss. 11, p. 1317-1326 , DOI: 10.2174 / 15680266113139990036 Blay V., ET AL, "Biodegradability Prediction of Fragrant Molecules by Molecular Topology", ACS Sustainable Chemistry and Engineering, (The United States of America), The American Chemical Society Publications, June 2016, vol. 4, iss. 8, p. 4224-4231, DOI: 10.1021/acssuschemeng.6b00717Blay V., ET AL, "Biodegradability Prediction of Fragrant Molecules by Molecular Topology", ACS Sustainable Chemistry and Engineering, (The United States of America), The American Chemical Society Publications, June 2016, vol. 4, iss. 8, p . 4224-4231, DOI: 10.1021 / acssuschemeng.6b00717 Jolly R., ET AL, "An evaluation of in-house and off-the-shelf in silico models: Implications on guidance for mutagenicity assessment", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier B.V., April 2015, vol. 71, iss. 3, p. 388-397, DOI: 10.1016/j.yrtph.2015.01.010Jolly R., ET AL, "An evaluation of in-house and off-the-shelf in silico models: Implications on guidance for mutagenicity assessment", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier BV, April 2015, vol. 71, iss. 3, p. 388-397, DOI: 10.1016 / j.yrtph. 2015.01.010 Greene N., ET AL, "A practical application of two in silico systems for identification of potentially mutagenic impurities", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier B.V., May 2015, vol. 72, iss. 2, p. 335-349, DOI: 10.1016/j.yrtph.2015.05.008Greene N., ET AL, "A practical application of two in silico systems for identification of potentially mutagenic impurities", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier BV, May 2015, vol. 72, iss. 2, p. 335-349, DOI: 10.1016 / j.yrtph.2015.05.008 Ferrari T., ET AL, "An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts", Chemistry Central Journal, (The United Kingdom), Springer Open, July 2010, vol. 4, Suppl. 1, S2, DOI: 10.1186/1752-153X-4-S1-S2Ferrari T., ET AL, "An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts", Chemistry Central Journal, (The United Kingdom), Springer Open, July 2010, vol. 4, Suppl. 1, S2 , DOI: 10.1186 / 1752-153X-4-S1-S2 Lazar、in silico toxicology gmbh社、ウェブサイトhttps://lazar.in-silico.ch/predictLazar, in silico toxicology gmbh, website https://lazar.in-silico.ch/predict PASS、Vladimir Poroikov ET AL. ウェブサイトhttp://www.pharmaexpert.ru/passonline/PASS, Vladimir Poroikov ET AL. Website http://www.pharmaexpert.ru/passonline/ HazardExpert Pro、CompuDrug International, Inc.社、ウェブサイトhttp://www.compudrug.com/hazardexpertproHazardExpert Pro, CompuDrug International, Inc., website http://www.compudrug.com/hazardexpertpro CompuDrug International, Inc.社、ウェブサイトhttp://www.compudrug.com/faqCompuDrug International, Inc., website http://www.compudrug.com/faq

ところで、構造情報を利用しつつ化合物の毒性を判定する方法／システムも開発されている。その一つである化合物毒性予測ソフトウェアLazar（非特許文献６）の特徴は、ユーザーが入力した化合物の構造情報に対して毒性有りの確率と毒性無しの確率をそれぞれ算出し、表示することである。しかしながら、毒性有りの確率と毒性無しの確率を足し合わせても確率100%にならず、予測結果の評価が難しい。特に、化合物間の比較が困難である。別の化合物毒性予測ソフトウェアPASS（非特許文献７）も、Laserと同様の問題を抱える。予測結果における、毒性有りの確率と毒性無しの確率を足したものが100%になるソフトウェア（HazardExpert Pro）（非特許文献８）も開発されている。このソフトウェアでは、ユーザーが入力した化合物の構造情報を利用し、毒性フラグメント構造に注目して毒性有りの確率を算出し、表示する。しかしながら、HazardExpert Proを開発したCompuDrug International, Inc.自らが「毒性有りの確率は正確な値ではない」と認めるように（非特許文献９）、その精度、信頼性は高くない。 By the way, a method / system for determining the toxicity of a compound while utilizing structural information has also been developed. One of the features of the compound toxicity prediction software Lazar (Non-Patent Document 6) is that the probability of toxicity and the probability of non-toxicity are calculated and displayed for the structural information of the compound input by the user, respectively. .. However, even if the probability of toxicity and the probability of non-toxicity are added together, the probability is not 100%, and it is difficult to evaluate the prediction result. In particular, comparisons between compounds are difficult. Another compound toxicity prediction software PASS (Non-Patent Document 7) has the same problem as Laser. Software (HazardExpert Pro) (Non-Patent Document 8) has also been developed in which the sum of the probability of toxicity and the probability of non-toxicity in the prediction result is 100%. This software uses the structural information of the compound input by the user, focuses on the toxic fragment structure, calculates the probability of toxicity, and displays it. However, as CompuDrug International, Inc., which developed Hazard Expert Pro, admits that "the probability of toxicity is not an accurate value" (Non-Patent Document 9), its accuracy and reliability are not high.

そこで本発明は、精度及び信頼性が高く、且つその評価が容易な予測結果が得られる化合物毒性予測手段を提供することを課題とする。 Therefore, it is an object of the present invention to provide a compound toxicity prediction means capable of obtaining a prediction result having high accuracy and reliability and easy evaluation thereof.

上記課題を解決するため、以下の発明が提供される。
［ｉ］（１）使用者が入力した供試化合物の構造情報を受信するステップと、
（２）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するステップと、
（３）前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を生成するステップと、
（４）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するステップであって、毒性有りの確率と毒性無しの確率を足し合わせると100%であるステップと、及び
（５）算出した前記確率を出力するステップと、
を含み、
ステップ（４）が、
（４−１）前記分子記述子の値を正規化するステップ、及び
（４−２）正規化済みの値を用いて前記供試化合物の毒性の有無の確率を算出するステップ、
からなる、
コンピュータによって実行される、化合物の毒性を予測する方法。
［ｉｉ］前記3次元分子構造が、半経験的分子軌道法によって構造が最適化された3次元分子構造、非経験的分子軌道法によって構造が最適化された3次元分子構造、密度汎関数法によって構造が最適化された3次元分子構造、分子力学法、半経験的分子軌道法、非経験的分子軌道法又は密度汎関数法によって立体配座探索された3次元分子構造、及び分子力学法、半経験的分子軌道法、非経験的分子軌道法及び密度汎関数法の任意の組合せによって構造が最適された3次元分子構造、からなる群より選択される１個以上の3次元分子構造である、［ｉ］に記載の予測方法。
［ｉｉｉ］供試化合物の構造情報を入力するための入力手段と、
使用者が入力した供試化合物の構造情報を受信するための受信手段と、
受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するための第１生成手段と、
前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出するための第１算出手段と、
前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するための算出手段であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である第２算出手段と、及び
算出した前記確率を出力するための出力手段と、
を含み、
前記第２算出手段は、前記分子記述子の値を正規化し、正規化済みの値を用いて前記供試化合物の毒性の有無の確率を算出する、化合物の毒性を予測するシステム。
［ｉｖ］使用者が入力した供試化合物の構造情報を受信する処理と、
受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成する処理と、
前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出する処理と、
前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出する処理であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である処理において、前記分子記述子の値を正規化し、正規化済みの値を用いて前記供試化合物の毒性の有無の確率を算出する処理と、及び
算出した前記確率を出力する処理と、
をコンピュータに実行させるためのプログラム。
その他、本発明は、以下のような形態として実現することも可能である。
［１］（１）使用者が入力した供試化合物の構造情報を受信するステップと、
（２）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するステップと、
（３）前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を生成するステップと、
（４）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するステップであって、毒性有りの確率と毒性無しの確率を足し合わせると100%であるステップと、及び
（５）算出した前記確率を出力するステップと、
を含む、化合物の毒性を予測する方法。
［２］ステップ（４）が、以下のステップからなる、［１］に記載の予測方法。
（４−１）前記分子記述子の値を正規化するステップ、及び
（４−２）正規化済みの値を用いて前記供試化合物の毒性の有無の確率を算出するステップ。
［３］前記3次元分子構造が、半経験的分子軌道法によって構造が最適化された3次元分子構造、非経験的分子軌道法によって構造が最適化された3次元分子構造、密度汎関数法によって構造が最適化された3次元分子構造、及び分子力学法、半経験的分子軌道法、非経験的分子軌道法又は密度汎関数法によって立体配座探索された3次元分子構造、分子力学法、半経験的分子軌道法、非経験的分子軌道法及び密度汎関数法の任意の組合せによって構造が最適された3次元分子構造、からなる群より選択される１個以上の3次元分子構造である、［１］又は［２］に記載の予測方法。
［４］前記3次元分子構造が、半経験的分子軌道法によって構造が最適化された２個以上の3次元分子構造である、［１］又は［２］に記載の予測方法。
［５］前記１個以上の分子記述子が、１個以上の3次元分子記述子と１個以上の量子化学分子記述子を含む、［１］〜［４］のいずれか一項に記載の予測方法。
［６］前記１個以上の分子記述子が、１個以上の3次元分子記述子、１個以上の量子化学分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子、を含む、［１］〜［４］のいずれか一項に記載の予測方法。
［７］前記毒性予測モデルが、毒性の有無が既知の複数の化合物の正規化済み分子記述子の値を用いた機械学習で構築した毒性予測モデルである、［１］〜［６］のいずれか一項に記載の予測方法。
［８］前記機械学習が、サポートベクターマシン、ベイジアンネットワーク、ニューラルネットワーク、アダブースト、ランダムフォレスト及びアクティブラーニングからなる群より選択される一つ以上の機械学習である、［７］に記載の予測方法。
［９］前記供試化合物の化学式を生成するステップを更に含み、
ステップ（５）では、生成した化学式と前記確率を関連づけて出力する、［１］〜［８］のいずれか一項に記載の予測方法。
［１０］前記供試化合物が２個以上であり、
ステップ（５）では、供試化合物毎に前記確率を出力する、［１］〜［９］のいずれか一項に記載の予測方法。
［１１］ステップ（５）において、前記確率とともに、前記供試化合物の毒性の有無の判定結果を出力する、［１］〜［１０］のいずれか一項に記載の予測方法。
［１２］ステップ（５）の出力が、表形式での表示である、［１］〜［１１］のいずれか一項に記載の予測方法。
［１３］前記毒性が、細菌を用いた復帰突然変異試験で判定される変異原性である、［１］〜［１２］のいずれか一項に記載の予測方法。
［１４］供試化合物の構造情報を入力するための入力手段と、
使用者が入力した供試化合物の構造情報を受信するための受信手段と、
受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するための第１生成手段と、
前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出するための第１算出手段と、
前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するための算出手段であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である第２算出手段と、及び
算出した前記確率を出力するための出力手段と、
を含む、化合物の毒性を予測するシステム。
［１５］前記入力手段として機能する入力装置と、
前記第１生成手段、前記第１算出手段及び前記第２算出手段として機能する演算装置と、
前記出力手段として機能する出力装置と、
主記憶装置と、及び
システムの制御を行う制御装置と、
を含む、［１４］に記載のシステム。
［１６］プログラムが格納される補助記憶装置を更に備える、［１５］に記載のシステム。
［１７］使用者が入力した供試化合物の構造情報を受信する処理と、
受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成する処理と、
前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出する処理と、
前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出する処理であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である処理と、及び
算出した前記確率を出力する処理と、
をコンピュータに実行させるためのプログラム。
［１８］［１７］に記載のプログラムを格納した、コンピュータ読み取り可能な記憶媒体。 In order to solve the above problems, the following inventions are provided.
[I] (1) A step of receiving the structural information of the test compound input by the user, and
(2) A step of generating a three-dimensional molecular structure whose structure is optimized based on the received structural information, and
(3) One or more molecular descriptions including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. Steps to generate child values and
(4) It is a step in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the sum of the probability of toxicity and the probability of non-toxicity is 100%. A step and
(5) A step of outputting the calculated probability and
Including
Step (4) is
(4-1) A step of normalizing the value of the molecular descriptor, and
(4-2) A step of calculating the probability of the presence or absence of toxicity of the test compound using the normalized value,
Consists of
A method of predicting the toxicity of a compound, performed by a computer.
[Ii] The three-dimensional molecular structure is a three-dimensional molecular structure whose structure is optimized by the semi-empirical molecular orbital method, a three-dimensional molecular structure whose structure is optimized by the ab initio molecular orbital method, and a density general function method. Three-dimensional molecular structure optimized by, molecular structure, semi-empirical molecular orbital method, ab initio molecular orbital method, or three-dimensional molecular structure searched by density general function method, and molecular dynamics method , A three-dimensional molecular structure whose structure is optimized by any combination of the semi-empirical molecular orbital method, the ab initio molecular orbital method, and the density general function method, with one or more three-dimensional molecular structures selected from the group consisting of. A prediction method according to [i].
[Iii] An input means for inputting structural information of the test compound, and
A receiving means for receiving the structural information of the test compound input by the user, and
A first generation means for generating a three-dimensional molecular structure whose structure is optimized based on the received structural information, and
The value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. The first calculation means for calculating
It is a calculation means for the toxicity prediction model to calculate the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and when the probability of toxicity and the probability of non-toxicity are added, it is 100%. With a second calculation means, and
An output means for outputting the calculated probability and
Including
The second calculation means is a system for predicting the toxicity of a compound, which normalizes the value of the molecular descriptor and calculates the probability of the presence or absence of toxicity of the test compound using the normalized value.
[Iv] The process of receiving the structural information of the test compound input by the user, and
A process of generating a three-dimensional molecular structure with an optimized structure based on the received structural information, and
The value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. And the process of calculating
In the process in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the total of the probability of toxicity and the probability of non-toxicity is 100%. , The process of normalizing the value of the molecular descriptor and calculating the probability of the presence or absence of toxicity of the test compound using the normalized value, and
Processing to output the calculated probability and
A program that lets your computer run.
In addition, the present invention can also be realized in the following forms.
[1] (1) A step of receiving the structural information of the test compound input by the user, and
(2) A step of generating a three-dimensional molecular structure whose structure is optimized based on the received structural information, and
(3) One or more molecular descriptions including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. Steps to generate child values and
(4) It is a step in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the sum of the probability of toxicity and the probability of non-toxicity is 100%. A certain step, and (5) a step of outputting the calculated probability, and
A method of predicting the toxicity of a compound, including.
[2] The prediction method according to [1], wherein step (4) comprises the following steps.
(4-1) A step of normalizing the value of the molecular descriptor, and (4-2) a step of calculating the probability of the presence or absence of toxicity of the test compound using the normalized value.
[3] The three-dimensional molecular structure is a three-dimensional molecular structure whose structure is optimized by the semi-empirical molecular orbital method, a three-dimensional molecular structure whose structure is optimized by the ab initio molecular orbital method, and a density general function method. 3D molecular structure whose structure is optimized by , A three-dimensional molecular structure whose structure is optimized by any combination of the semi-empirical molecular orbital method, the ab initio molecular orbital method, and the density general function method, with one or more three-dimensional molecular structures selected from the group consisting of. The prediction method according to [1] or [2].
[4] The prediction method according to [1] or [2], wherein the three-dimensional molecular structure is two or more three-dimensional molecular structures whose structures have been optimized by the semi-empirical molecular orbital method.
[5] The item according to any one of [1] to [4], wherein the one or more molecular descriptors include one or more three-dimensional molecular descriptors and one or more quantum chemical molecular descriptors. Prediction method.
[6] The one or more molecular descriptors are one or more three-dimensional molecular descriptors, one or more quantum chemical molecular descriptors, one or more two-dimensional molecular descriptors, and one or more one-dimensional molecules. The prediction method according to any one of [1] to [4], which comprises a descriptor and one or more 0-dimensional molecular descriptors.
[7] Any of [1] to [6], wherein the toxicity prediction model is a toxicity prediction model constructed by machine learning using the values of normalized molecular descriptors of a plurality of compounds known to have toxicity. The prediction method described in item 1.
[8] The prediction method according to [7], wherein the machine learning is one or more machine learning selected from the group consisting of a support vector machine, a Bayesian network, a neural network, AdaBoost, a random forest, and active learning.
[9] Further including a step of generating a chemical formula of the test compound.
In step (5), the prediction method according to any one of [1] to [8], which outputs the generated chemical formula in association with the probability.
[10] The number of the test compounds is two or more,
The prediction method according to any one of [1] to [9], wherein in step (5), the probability is output for each test compound.
[11] The prediction method according to any one of [1] to [10], which outputs the determination result of the presence or absence of toxicity of the test compound together with the probability in step (5).
[12] The prediction method according to any one of [1] to [11], wherein the output of step (5) is displayed in a tabular format.
[13] The prediction method according to any one of [1] to [12], wherein the toxicity is mutagenicity determined by a return mutation test using a bacterium.
[14] An input means for inputting structural information of the test compound and
A receiving means for receiving the structural information of the test compound input by the user, and
A first generation means for generating a three-dimensional molecular structure whose structure is optimized based on the received structural information, and
The value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. The first calculation means for calculating
It is a calculation means for the toxicity prediction model to calculate the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and when the probability of toxicity and the probability of non-toxicity are added, it is 100%. A second calculation means, and an output means for outputting the calculated probability.
A system for predicting the toxicity of compounds, including.
[15] An input device that functions as the input means and
An arithmetic unit that functions as the first generation means, the first calculation means, and the second calculation means.
An output device that functions as the output means,
The main memory and the control device that controls the system,
The system according to [14].
[16] The system according to [15], further comprising an auxiliary storage device in which a program is stored.
[17] The process of receiving the structural information of the test compound input by the user, and
A process of generating a three-dimensional molecular structure with an optimized structure based on the received structural information, and
The value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. And the process of calculating
A process in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the process is 100% when the probability of toxicity and the probability of non-toxicity are added together. , And the process of outputting the calculated probability,
A program that lets your computer run.
[18] A computer-readable storage medium containing the program according to [17].

本発明によれば、使用者（ユーザー）が入力した化合物の構造情報を特有の処理に供することにより、高い精度及び信頼性の予測結果が出力される。予測結果では、毒性有りの確率と毒性無しの確率を足し合わせると確率100%となる。従って、予測結果の評価がし易く、即ち、供試化合物間の比較が容易である。例えば、毒性無しの化合物を欲している場合に本発明を実施し、毒性無しの確率54%（言い換えれば、毒性有りの確率46%）の化合物と、毒性無しの確率63%（言い換えれば、毒性有りの確率37%）の化合物が見出されれば、単純に数値の比較によって、後者の化合物を有力な候補として選出することが可能となる。 According to the present invention, by subjecting the structural information of the compound input by the user (user) to a specific process, a prediction result with high accuracy and reliability can be output. According to the prediction result, the probability of 100% is obtained by adding the probability of toxicity and the probability of non-toxicity. Therefore, it is easy to evaluate the prediction result, that is, it is easy to compare the test compounds. For example, the present invention is carried out when a non-toxic compound is desired, and a compound having a non-toxic probability of 54% (in other words, a toxicity probability of 46%) and a non-toxic compound having a non-toxic probability of 63% (in other words, toxicity). If a compound with a probability of existence (37%) is found, the latter compound can be selected as a promising candidate by simply comparing the numerical values.

本発明の毒性予測方法のフローチャート。The flowchart of the toxicity prediction method of this invention. 毒性予測システムの構成例。Configuration example of toxicity prediction system. 実施例１のAmes変異原性予測結果の出力例。An output example of the Ames mutagenicity prediction result of Example 1. 実施例２のAmes変異原性予測結果。Ames mutagenicity prediction result of Example 2. 実施例３でAmes変異原性を予測した農薬3種（anthraquinone、diquat及びchlormequat）のSMILES形式の構造情報。Structural information in SMILES format of three pesticides (anthraquinone, diquat and chlormequat) whose Ames mutagenicity was predicted in Example 3. anthraquinoneの3次元分子構造。内部座標を用いて表示した。Three-dimensional molecular structure of anthraquinone. Displayed using internal coordinates. diquatの3次元分子構造。内部座標を用いて表示した。Three-dimensional molecular structure of diquat. Displayed using internal coordinates. chlormequatの3次元分子構造。内部座標を用いて表示した。Three-dimensional molecular structure of chlormequat. Displayed using internal coordinates. 実施例３で用いた分子記述子（一部）の値。The value of the molecular descriptor (part) used in Example 3. 農薬3種（anthraquinone、diquat及びchlormequat）のAmes変異原性予測結果。Ames mutagenicity prediction results of 3 pesticides (anthraquinone, diquat and chlormequat). 比較例１、２のAmes変異原性予測結果。Ames mutagenicity prediction results of Comparative Examples 1 and 2.

１．化合物の毒性を予測する方法
本発明の第１の局面は化合物の毒性を予測する方法（以下、「本発明の予測方法」とも呼ぶ）に関する。本発明の予測方法は、細胞や動物を用いることなく、供試化合物（その毒性が評価される化合物）の毒性を評価することができる。細胞や動物を用いた毒性評価と本発明を併用すれば、極めて効率的な毒性評価が可能となる。 1. 1. Method for Predicting Toxicity of Compound The first aspect of the present invention relates to a method for predicting toxicity of a compound (hereinafter, also referred to as “prediction method of the present invention”). The prediction method of the present invention can evaluate the toxicity of a test compound (a compound whose toxicity is evaluated) without using cells or animals. If the present invention is used in combination with the toxicity evaluation using cells or animals, extremely efficient toxicity evaluation becomes possible.

一般に「化合物の毒性」は、急性毒性（経口）、急性毒性（経皮）、急性毒性（吸入）、皮膚腐食性、皮膚刺激性、眼に対する損傷性／刺激性、遺伝毒性、発がん性、生殖毒性、神経毒性、呼吸器感作性、皮膚感作性、生殖細胞変異原性、生態毒性、生物濃縮性、生分解性等の指標によって評価される。本発明の予測方法における「化合物の毒性」を規定する評価指標は特に限定されない。好ましい一態様では、細菌を用いた復帰突然変異試験（bacterial reverse mutation test （Ames試験））で判定される変異原性を指標とした毒性の予測に本発明の予測方法が適用される。尚、Ames試験の方法と判定のルールはOECD TG 471に規定されている。 In general, "toxicity of compounds" refers to acute toxicity (oral), acute toxicity (transdermal), acute toxicity (inhalation), skin corrosiveness, skin irritation, eye damage / irritation, genetic toxicity, carcinogenicity, and reproduction. It is evaluated by indicators such as toxicity, neurotoxicity, respiratory sensitivities, skin sensitivities, germ cell mutagenicity, ecotoxicity, bioaccumulation, and biodegradability. The evaluation index that defines the "toxicity of the compound" in the prediction method of the present invention is not particularly limited. In a preferred embodiment, the prediction method of the present invention is applied to the prediction of toxicity using the mutagenicity as an index determined by the bacterial reverse mutation test (Ames test) using bacteria. The Ames test method and judgment rules are stipulated in OECD TG 471.

本発明では以下ステップ（１）〜（５）を行う。
（１）使用者が入力した供試化合物の構造情報を受信するステップ
（２）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するステップ
（３）前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出するステップ
（４）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するステップであって、毒性有りの確率と毒性無しの確率を足し合わせると100%であるステップ
（５）算出した前記確率を出力するステップ In the present invention, the following steps (1) to (5) are performed.
(1) Step of receiving the structural information of the test compound input by the user (2) Step of generating a three-dimensional molecular structure whose structure is optimized based on the received structural information (3) The three-dimensional molecule Using the structure, the step of calculating the value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor. (4) It is a step in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the sum of the probability of toxicity and the probability of non-toxicity is 100%. A step (5) A step of outputting the calculated probability

以下、図１に示すフローチャートを参照しながら、本発明の予測方法の詳細を説明する。尚、本発明の予測方法は、後述の毒性予測システム等によって実行することができる Hereinafter, the details of the prediction method of the present invention will be described with reference to the flowchart shown in FIG. The prediction method of the present invention can be carried out by a toxicity prediction system or the like described later.

まず、使用者（ユーザー）が入力した供試化合物の構造情報が受信される（ステップ（１））。使用者は、供試化合物の構造情報を用意しておく。構造情報の書式は、例えば、SMILES（smi形式と略称されることがある）（参考文献１及び参考文献２)、MDL MOL（参考文献３)、SDF(参考文献３)、CDX（binary file created by PerkinElmer, Inc.'s software ChemDraw（登録商標）、ChemBioDraw（登録商標））等である。供試化合物の数は１個又は２個以上であり、後者の場合には、供試化合物毎に構造情報が入力される。 First, the structural information of the test compound input by the user (user) is received (step (1)). The user prepares the structural information of the test compound. The format of the structural information is, for example, SMILES (sometimes abbreviated as smi format) (Reference 1 and Reference 2), MDL MOL (Reference 3), SDF (Reference 3), CDX (binary file created). by PerkinElmer, Inc.'s software ChemDraw®, ChemBioDraw®), etc. The number of test compounds is one or two or more, and in the latter case, structural information is input for each test compound.

供試化合物には、タンパク質、抗体、長鎖のDNAやRNA、ポリスチレン、ポリアクリレートなどの高分子化合物ではなく、分子量800以下の有機化合物が適する。また、重金属以外の化合物を供試化合物にするとよい。 As the test compound, an organic compound having a molecular weight of 800 or less is suitable, not a polymer compound such as a protein, an antibody, a long-chain DNA or RNA, polystyrene, or a polyacrylate. Further, a compound other than a heavy metal may be used as a test compound.

次に、受信した構造情報に基づき、構造が最適化された3次元分子構造が生成される（ステップ（２））。構造の最適化には、例えば、半経験的分子軌道法、非経験的分子軌道法、密度汎関数法を利用できる。また、分子力学法、半経験的分子軌道法、非経験的分子軌道法又は密度汎関数法によって立体配座探索することによって構造の最適化を行ってもよい。このステップでは、以上のような最適化手法の単独又は２種類以上の併用によって、構造が最適化された１個以上の3次元構造が生成されることになる。分子力学法、半経験的分子軌道法、非経験的分子軌道法及び密度汎関数法を併用して構造を最適化する場合の例を以下に示す。尚、２種類以上の構造最適化手法を併用するか否かの判断においては、処理時間の長さや各装置（演算装置、制御装置、主記憶装置等）に掛かる負荷等を考慮するとよい。
＜併用の例１＞
最初に半経験的分子軌道法で構造を最適化し、得られた3次元分子構造を非経験的分子軌道法又は密度汎関数法で更に構造を最適化する。
＜併用の例２＞
最初にハートリー−フォック法で構造を最適化し、得られた3次元分子構造を密度汎関数法、Moller-Plesset摂動法、配置間相互作用法又はクラスター展開法で更に構造を最適化する。
＜併用の例３＞
最初に、半経験的分子軌道法で構造を最適化し、得られた3次元分子構造をハートリー−フォック法で更に構造を最適化し、得られた3次元分子構造を密度汎関数法、Moller-Plesset摂動法、配置間相互作用法、又はクラスター展開法で更に構造を最適化する。
＜併用の例４＞
Quantum Mechanics/Molecular Mechanics法又はour own N-layered integrated molecular orbital and molecular mechanics法で構造を最適化する。 Next, a three-dimensional molecular structure with an optimized structure is generated based on the received structural information (step (2)). Semi-empirical molecular orbital method, ab initio molecular orbital method, and density functional theory can be used for structural optimization, for example. Further, the structure may be optimized by conformational search by the molecular mechanics method, the semi-empirical molecular orbital method, the ab initio molecular orbital method or the density functional theory. In this step, one or more three-dimensional structures whose structures have been optimized are generated by the above optimization methods alone or in combination of two or more types. An example of optimizing the structure by using the molecular mechanics method, the semi-empirical molecular orbital method, the ab initio molecular orbital method, and the density functional theory together is shown below. In determining whether or not to use two or more types of structural optimization methods together, it is advisable to consider the length of processing time, the load applied to each device (arithmetic device, control device, main memory device, etc.), and the like.
<Example of combined use 1>
First, the structure is optimized by the semi-empirical molecular orbital method, and the obtained three-dimensional molecular structure is further optimized by the ab initio molecular orbital method or the density functional theory.
<Example 2 of combined use>
First, the structure is optimized by the Hartree-Fock method, and the resulting three-dimensional molecular structure is further optimized by the density functional theory, Moller-Plesset perturbation method, configuration interaction method, or cluster expansion method.
<Example 3 of combined use>
First, the structure is optimized by the semi-empirical molecular orbital method, the obtained three-dimensional molecular structure is further optimized by the Hartree-Fock method, and the obtained three-dimensional molecular structure is obtained by the density functional theory, Moller-. Further structural optimization is performed by the Plesset perturbation method, the configuration interaction method, or the cluster expansion method.
<Example 4 of combined use>
The structure is optimized by the Quantum Mechanics / Molecular Mechanics method or our own N-layered integrated molecular orbital and molecular mechanics method.

一般に、化合物の多くは条件によって複数の安定した3次元構造を取る。構造が最適化された3次元構造を複数（即ち２個以上）生成することは、この点を反映させたものとなり、より有用な予測結果をもたらす。尚、好ましい一態様では、半経験的分子軌道法によって構造が最適化された２個以上の3次元分子構造を生成し、次のステップへ進む。 In general, many compounds have multiple stable three-dimensional structures depending on the conditions. Generating multiple (ie, two or more) three-dimensional structures with optimized structures reflects this point and provides more useful prediction results. In a preferred embodiment, two or more three-dimensional molecular structures whose structures are optimized by the semi-empirical molecular orbital method are generated, and the process proceeds to the next step.

「構造の最適化」とは、分子を構成する原子の位置を変化させることによって、分子のエネルギーを極小化することである（参考文献４）。「半経験的分子軌道法」は、経験的パラメータを使用したハートリー−フォック方程式の近似式に基づいて分子の電子状態のエネルギーを算出する方法である（参考文献５）。「非経験的分子軌道法」は、ハートリー−フォック法又はMoller-Plesset摂動法、配置間相互作用法、クラスター展開法等によって分子の電子状態のエネルギーを算出する方法である（参考文献６）。「密度汎関数法」は、電子密度の汎関数によって分子の電子状態のエネルギーを算出する方法である（参考文献７）。「分子力学法」は、古典力学的原子核間ポテンシャルエネルギー関数に基づいて分子ポテンシャルエネルギーを算出する方法である（参考文献８）。「立体配座探索」は、分子の立体配座を系統的に多数発生させた後、立体配座それぞれの構造を最適化することである（参考文献９）。Quantum Mechanics/Molecular Mechanics法及びour own N-layered integrated molecular orbital and molecular mechanics法は、分子のエネルギーを算出する前記手法の複数を混合（ハイブリッド）して、分子のエネルギーを算出する方法である。 "Structural optimization" is to minimize the energy of a molecule by changing the position of the atoms that make up the molecule (Reference 4). The "semi-empirical molecular orbital method" is a method of calculating the energy of the electronic state of a molecule based on an approximate expression of the Hartree-Fock equation using empirical parameters (Reference 5). The "ab initio molecular orbital method" is a method of calculating the energy of the electronic state of a molecule by the Hartree-Fock method, the Moller-Plesset perturbation method, the configuration interaction method, the cluster expansion method, or the like (Reference 6). .. The "density functional theory" is a method of calculating the energy of the electronic state of a molecule by a functional of electron density (Reference 7). The "molecular mechanics method" is a method of calculating the molecular potential energy based on the classical mechanical internuclear potential energy function (Reference 8). The "conformational search" is to systematically generate a large number of molecular conformations and then optimize the structure of each conformation (Reference 9). The Quantum Mechanics / Molecular Mechanics method and our own N-layered integrated molecular orbital and molecular mechanics method are methods for calculating the energy of a molecule by mixing (hybridizing) a plurality of the above-mentioned methods for calculating the energy of a molecule.

構造が最適化された3次元分子構造の生成には、例えば、CORINA Classic（参考文献１０）、SYBYL（登録商標）-X Suite（参考文献１１）、Open Babel（参考文献１２）、The Chemistry Development Kit（参考文献１３）、RDKit（参考文献１４）、Chem3D^TM（参考文献１５）、ChemBio3D（登録商標）（参考文献１６）、MarvinSketch（参考文献１７）、Balloon（参考文献１８）、TINKER（参考文献１９）、Amber（参考文献２０）、AmberTools（参考文献２１）、CHARMM（参考文献２２）、NAMD（参考文献２３）、BOSS（参考文献２４）、VEGA ZZ/VEGA Command line（参考文献２５）、GROMOS^TM（参考文献２６）、GROMACS（参考文献２７）、MOPAC（登録商標）（参考文献２８）、GAMESS（参考文献２９）、Firefly（参考文献３０）、Gaussian（登録商標）（参考文献３１）、Spartan（参考文献３２）、Q-Chem（参考文献３３）、HyperChem（参考文献３４）、Molecular Operating Environment（参考文献３５）、BIOVIA（登録商標） Discovery Studio（参考文献３６）、BIOVIA（登録商標） Material Studio（参考文献３７）、ConfGen（参考文献３８）、LigPrep（参考文献３９）、Desmond Molecular Dynamics System（参考文献４０）、Jaguar（参考文献４１）、MacroModel（参考文献４２）、MOLGEN（参考文献４３）、CONFLEX（登録商標）（参考文献４４）、OMEGA（参考文献４５）、VConf（参考文献４６）、Key3D（参考文献４７）、Molpro（参考文献４８）、Molcas（参考文献４９）、ADF（参考文献５０）、TURBOMOLE（参考文献５１）、PQS（参考文献５２）、MPQC（参考文献５３）、Dalton（参考文献５４）、LSDalton（参考文献５５）、COLUMBUS（参考文献５６）、NWChem（参考文献５７）、PSI4（参考文献５８）、CFOUR（参考文献５９）、ACES（参考文献６０）、ORCA（参考文献６１）、SMASH（参考文献６２）、ABINIT-MP（参考文献６３）、NTChem（参考文献６４）、PAICS（参考文献６５）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令することにより、構造が最適化された3次元分子構造を生成してもよい。 For the generation of structure-optimized three-dimensional molecular structures, for example, CORINA Classic (Reference 10), SYBYL®-X Suite (Reference 11), Open Babel (Reference 12), The Chemistry Development. Kit (Reference 13), RDKit (Reference 14), Chem3D ^TM (Reference 15), ChemBio3D® (Reference 16), MarvinSketch (Reference 17), Balloon (Reference 18), TINKER (Reference 18) Reference 19), Amber (Reference 20), AmberTools (Reference 21), CHARMM (Reference 22), NAMD (Reference 23), BOSS (Reference 24), VEGA ZZ / VEGA Command line (Reference 25) , GROMOS ^TM (Reference 26), GROMACS (Reference 27), MOPAC (Registered Trademark) (Reference 28), GAMESS (Reference 29), Firefly (Reference 30), Gaussian (Registered Trademark) (Reference 31) ), Spartan (Reference 32), Q-Chem (Reference 33), HyperChem (Reference 34), Molecular Operating Environment (Reference 35), BIOVIA® Discovery Studio (Reference 36), BIOVIA (Registration) Trademarks) Material Studio (Reference 37), ConfGen (Reference 38), LigPrep (Reference 39), Desmond Molecular Dynamics System (Reference 40), Jaguar (Reference 41), MacroModel (Reference 42), MOLGEN (Reference 42) Reference 43), CONFLEX® (Reference 44), OMEGA (Reference 45), VConf (Reference 46), Key3D (Reference 47), Molpro (Reference 48), Molcas (Reference 49). , ADF (reference 50), TURBOMOLE (reference 51), PQS (reference 52), MPQC (reference 53), Dalton (reference 54), LS Dalton (reference 55), COLUMBUS (reference 56), NWChem (reference 57), PSI4 (reference 58), CFOUR (reference 59), ACES (reference 60), Computer software such as ORCA (Reference 61), SMASH (Reference 62), ABINIT-MP (Reference 63), NTChem (Reference 64), PAICS (Reference 65) can be used. By instructing the above software from the outside, a three-dimensional molecular structure whose structure is optimized may be generated.

続いて、構造が最適化された3次元分子構造を用い、１個以上の分子記述子の値が算出される（ステップ（３））。用いられる分子記述子の少なくとも一つは3次元分子記述子、4次元分子記述子又は量子化学分子記述子である。構造が最適化された3次元分子構造に基づくことから、正確性ないし信頼性の高い、分子記述子（3次元分子記述子、4次元分子記述子、量子化学分子記述子等）の値が算出され、精度の高い予測結果の出力が可能となる。 Subsequently, the values of one or more molecular descriptors are calculated using the three-dimensional molecular structure whose structure is optimized (step (3)). At least one of the molecular descriptors used is a three-dimensional molecular descriptor, a four-dimensional molecular descriptor or a quantum chemical molecular descriptor. Since the structure is based on the optimized 3D molecular structure, the value of the molecular descriptor (3D molecular descriptor, 4D molecular descriptor, quantum chemical molecular descriptor, etc.) with high accuracy or reliability is calculated. Therefore, it is possible to output highly accurate prediction results.

好ましくは、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される二つ以上の分子記述子（例えば、3次元分子記述子と4次元分子記述子の併用や、3次元分子記述子と量子分子記述子の併用、3次元分子記述子、4次元分子記述子及び量子化学分子記述子併用等）の値が算出される。例えば、3次元分子記述子が単独で又は他の分子記述子（即ち、4次元分子記述子及び／又は量子化学分子記述子）との組合せで用いられる場合には、好ましくは20種類以上、更に好ましくは30種類以上、より一層好ましくは50種類以上の3次元分子記述子の値が算出されるようにするとよい。その値が算出される3次元分子記述子の種類（数）の上限は特に限定されない。但し、処理時間の長さや各装置（演算装置、制御装置、主記憶装置等）に掛かる負荷等を考慮し、例えば3,084種類を上限にすることができる。4次元分子記述子についても同様であり、好ましくは200種類以上、更に好ましくは30種類以上、より一層好ましくは50種類以上の値が算出されるようにするとよい（上限は例えば6,480種類）。量子化学分子記述子の場合も同様であり、好ましくは3種類以上、更に好ましくは5種類以上、より一層好ましくは10種類以上の値が算出されるようにするとよい（上限は例えば171種類）。 Preferably, two or more molecular descriptors selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor (for example, a combination of a three-dimensional molecular descriptor and a four-dimensional molecular descriptor). , 3D molecular descriptor and quantum molecular descriptor combined use, 3D molecular descriptor, 4D molecular descriptor and quantum chemical molecular descriptor combined use, etc.) are calculated. For example, when a 3D molecular descriptor is used alone or in combination with another molecular descriptor (ie, a 4D molecular descriptor and / or a quantum chemical molecular descriptor), preferably 20 or more types, and further It is preferable to calculate the values of 30 or more types, and even more preferably 50 or more types of three-dimensional molecular descriptors. The upper limit of the type (number) of the three-dimensional molecular descriptor from which the value is calculated is not particularly limited. However, in consideration of the length of processing time and the load applied to each device (arithmetic unit, control device, main storage device, etc.), for example, 3,084 types can be set as the upper limit. The same applies to the four-dimensional molecular descriptor, and it is preferable to calculate values of 200 types or more, more preferably 30 types or more, and even more preferably 50 types or more (upper limit is 6,480 types, for example). The same applies to the quantum chemical molecular descriptor, and it is preferable to calculate values of 3 types or more, more preferably 5 types or more, and even more preferably 10 types or more (upper limit is 171 types, for example).

3次元分子記述子、4次元分子記述子及び／又は量子化学分子記述子の他、2次元分子記述子、1次元分子記述子、0次元分子記述子等の値も算出されるようにしてもよい。この場合においても、その値が算出される分子記述子の組合せは特に限定されない。その値が算出される分子記述子の組合せの例は以下の通りである。
例１）１個以上の3次元分子記述子、１個以上の4次元分子記述子、１個以上の量子化学分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子の組合せ
例２）１個以上の3次元分子記述子、１個以上の量子化学分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子の組合せ
例３）１個以上の3次元分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子の組合せ In addition to the 3D molecular descriptor, the 4D molecular descriptor and / or the quantum chemical molecular descriptor, the values of the 2D molecular descriptor, the 1D molecular descriptor, the 0-dimensional molecular descriptor, etc. are also calculated. good. Even in this case, the combination of molecular descriptors from which the value is calculated is not particularly limited. An example of a combination of molecular descriptors for which the value is calculated is as follows.
Example 1) One or more 3D molecular descriptors, 1 or more 4D molecular descriptors, 1 or more quantum chemical molecular descriptors, 1 or more 2D molecular descriptors, 1 or more 1D molecules Combination of one or more 0-dimensional molecular descriptors Example 2) One or more three-dimensional molecular descriptors, one or more quantum chemical molecular descriptors, one or more two-dimensional molecular descriptors, one or more 1-dimensional molecular descriptor of 1 or more 0-dimensional molecular descriptor Example 3) 1 or more 3-dimensional molecular descriptor, 1 or more 2-dimensional molecular descriptor, 1 or more 1-dimensional molecular descriptor Combination of one or more 0-dimensional molecular descriptors

その値が算出される分子記述子の総数は特に限定されないが、好ましくは、800個以上、更に好ましくは1,000個以上、より一層好ましくは1,500個以上の分子記述子の値が算出される。原則、分子記述子の数を多くすれば、より信頼性の高い予測結果が得られる。その一方で、分子記述子の数が多すぎると、過度の処理時間を要すること、各装置（演算装置、制御装置、主記憶装置等）に過度な負荷がかかる等の弊害があるため、分子記述子の総数を1,000個〜10,000個の範囲内にするとよい。 The total number of molecular descriptors for which the value is calculated is not particularly limited, but preferably 800 or more, more preferably 1,000 or more, and even more preferably 1,500 or more molecular descriptor values are calculated. In principle, increasing the number of molecular descriptors will give more reliable prediction results. On the other hand, if the number of molecular descriptors is too large, there are adverse effects such as excessive processing time being required and excessive load being applied to each device (arithmetic unit, control device, main memory device, etc.). The total number of descriptors should be in the range of 1,000 to 10,000.

「3次元分子記述子」とは、Radial Distribution Function descriptor、Weighted Holistic Invariant Molecular descriptor、Charged partial surface area等（参考文献６６）である。「4次元分子記述子」とは、Comparative Molecular Fields Analysis、GRID、conformational descriptor、4次元分子フィンガープリント等（参考文献６６及び参考文献６７）である。「量子化学分子記述子」とは、最高被占軌道エネルギー、最低空軌道エネルギー、イオン化ポテンシャル、電子親和力、双極子モーメント等（参考文献６６及び参考文献６８）である。「2次元分子記述子」とは、グラフ保存量である、Walk Count Descriptor、Path Count Descriptor等、トポロジー記述子である、Topological Distance Matrix Descriptor、Zagreb Index descriptor等（参考文献６６）である。「1次元分子記述子」とは、官能基の数、フラグメント構造の数、分子フィンガープリント等（参考文献６６）である。「0次元分子記述子」とは、分子量、炭素原子の数、自由回転が可能な単結合の数等である（参考文献６６）。 The “three-dimensional molecular descriptor” is a Radial Distribution Function descriptor, a Weighted Holistic Invariant Molecular descriptor, a Charged partial surface area, etc. (Reference 66). The "four-dimensional molecular descriptor" is Comparative Molecular Fields Analysis, GRID, conformational descriptor, four-dimensional molecular fingerprint, etc. (Reference 66 and Reference 67). The "quantum chemical molecular descriptor" is the highest occupied orbital energy, the lowest empty orbital energy, ionization potential, electron affinity, dipole moment, etc. (Reference 66 and Reference 68). The "two-dimensional molecular descriptor" is a graph conserved quantity such as Walk Count Descriptor and Path Count Descriptor, and a topology descriptor such as Topological Distance Matrix Descriptor and Zagreb Index descriptor (Reference 66). The "one-dimensional molecular descriptor" is the number of functional groups, the number of fragment structures, the molecular fingerprint, and the like (Reference 66). The "zero-dimensional molecular descriptor" is a molecular weight, the number of carbon atoms, the number of single bonds capable of free rotation, and the like (Reference 66).

分子記述子の値の算出には、例えば、DRAGON（参考文献６９）、CODESSA PRO（参考文献７０）、ADAPT（参考文献７１）、ADMET Predictor（参考文献７２）、CORINA Symphony（参考文献７３）、Pentacle（参考文献７４）、VolSurf+（参考文献７５）、ISIDA Fragmentor（参考文献７６）、JOELib（参考文献７７）、Molconn-Z（参考文献７８）、PowerMV（参考文献７９）、PreADMET（参考文献８０）、PaDEL-Descriptor（参考文献８１）、cinfony（参考文献８２）、Chemopy（参考文献８３）、The Chemistry Development Kit（参考文献１３）、RDKit（参考文献１４）、Open Babel（参考文献１２）、ToMoCoMD-CARDD（参考文献８４）、QuaSAR-Descriptor（参考文献８５）、Molecular Operating Environment（参考文献３５）、SYBYL（登録商標）-X Suite（参考文献１１）、BIOVIA（登録商標） Discovery Studio（参考文献３６）、BIOVIA（登録商標） Material Studio（参考文献３７）、QikProp（参考文献８６）、Jaguar（参考文献４１）、MacroModel（参考文献４２）、VCharge（参考文献８７）、MarvinSketch（参考文献１７）、Spartan（参考文献３２）、MOPAC（登録商標）（参考文献２８）、GAMESS（参考文献２９）、Gaussian（登録商標）（参考文献３１）、HyperChem（参考文献３４）、Q-Chem（参考文献３３）、BOSS（参考文献２４）、Firefly（参考文献３０）、Molpro（参考文献４８）、Molcas（参考文献４９）、ADF（参考文献５０）、TURBOMOLE（参考文献５１）、PQS（参考文献５２）、MPQC（参考文献５３）、Dalton（参考文献５４）、LSDalton（参考文献５５）、COLUMBUS（参考文献５６）、NWChem（参考文献５７）、PSI4（参考文献５８）、CFOUR（参考文献５９）、ACES（参考文献６０）、ORCA（参考文献６１）、SMASH（参考文献６２）、ABINIT-MP（参考文献６３）、NTChem（参考文献６４）、PAICS（参考文献６５）、Mold2（参考文献８８）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令することにより分子記述子の値を算出してもよい。 For the calculation of the value of the molecular descriptor, for example, DRAGON (reference 69), CODESSA PRO (reference 70), ADAPT (reference 71), ADMET Predictor (reference 72), CORINA Symphony (reference 73), Pentacle (Reference 74), VolSurf + (Reference 75), ISIDA Fragmentor (Reference 76), JOELib (Reference 77), Molconn-Z (Reference 78), PowerMV (Reference 79), PreADMET (Reference 80) ), PaDEL-Descriptor (Reference 81), cinfony (Reference 82), Chemopy (Reference 83), The Chemistry Development Kit (Reference 13), RDKit (Reference 14), Open Babel (Reference 12), ToMoCoMD-CARDD (Reference 84), QuaSAR-Descriptor (Reference 85), Molecular Operating Environment (Reference 35), SYBYL®-X Suite (Reference 11), BIOVIA® Discovery Studio (Reference) 36), BIOVIA® Material Studio (37), QikProp (86), Jaguar (41), MacroModel (42), VCharge (87), MarvinSketch (17) ), Spartan (Reference 32), MOPAC (Registered Trademark) (Reference 28), GAMESS (Reference 29), Gaussian (Registered Trademark) (Reference 31), HyperChem (Reference 34), Q-Chem (Reference) Reference 33), BOSS (Reference 24), Firefly (Reference 30), Molpro (Reference 48), Molcas (Reference 49), ADF (Reference 50), TURBOMOLE (Reference 51), PQS (Reference 51). 52), MPQC (Reference 53), Dalton (Reference 54), LSDalton (Reference 55), COLUMBUS (Reference 56), NWChem (Reference 57), PSI4 (Reference 58), CFOUR (Reference 59) ), ACES (Reference 60), ORCA (Reference 61), SMASH (Reference 62), AB Computer software such as INIT-MP (Reference 63), NTChem (Reference 64), PAICS (Reference 65), and Mold2 (Reference 88) can be used. The value of the molecular descriptor may be calculated by instructing the above software from the outside.

ステップ（３）に続き、供試化合物の毒性の有無の確率が算出される（ステップ（４））。確率の算出には毒性予測モデルが用いられる。毒性予測モデルは、ステップ（３）で算出された分子記述子の値を用いて供試化合物の毒性の有無の確率を算出する。本発明の最大の特徴の１つは、毒性有りの確率と毒性無しの確率を足し合わせると100%となるように毒性の有無の確率を算出することである。尚、供試化合物が２個以上の場合には、供試化合物毎に毒性の有無の確率が算出されることになる。 Following step (3), the probability of the presence or absence of toxicity of the test compound is calculated (step (4)). A toxicity prediction model is used to calculate the probabilities. In the toxicity prediction model, the probability of the presence or absence of toxicity of the test compound is calculated using the value of the molecular descriptor calculated in step (3). One of the greatest features of the present invention is to calculate the probability of presence or absence of toxicity so that the sum of the probability of having toxicity and the probability of non-toxicity is 100%. When the number of test compounds is two or more, the probability of presence or absence of toxicity is calculated for each test compound.

毒性予測モデルには、毒性の有無が既知の複数の化合物の正規化済み分子記述子の値を用いた機械学習で構築した毒性予測モデルを用いるとよい。機械学習の例は、サポートベクターマシン、ベイジアンネットワーク、ニューラルネットワーク、アダブースト、ランダムフォレスト、アクティブラーニングである。これらの中の２つ以上を併用することにしてもよい。機械学習には、例えば、LibSVM（参考文献８９）、TensorFlow^TM（参考文献９０）、Chainer（登録商標）（参考文献９１）、Jubatus（登録商標）（参考文献９２）、Caffe（参考文献９３）、Theano（参考文献９４）、Torch（参考文献９５）、neon^TM（参考文献９６）、MXNet（参考文献９７）、The Microsoft Cognitive Toolkit（参考文献９８）、R(C)（参考文献９９）、MATLAB（登録商標）（参考文献１００）、Mathematica（登録商標）（参考文献１０１）、SAS（登録商標）（参考文献１０２）、RapidMiner（登録商標）（参考文献１０３）、KNIME（登録商標）（参考文献１０４）、WeKa（参考文献１０５）、shogun-toolbox/shogun（参考文献１０６）、Orange（参考文献１０７）、Apache Mahout^TM（参考文献１０８）、scikit-learn（参考文献１０９５）、mlpy（参考文献１１０）、XGBoost（参考文献１１１）、Deeplearning4j（参考文献１１２）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令し、機械学習を実行させてもよい。 As the toxicity prediction model, it is preferable to use a toxicity prediction model constructed by machine learning using the values of the normalized molecular descriptors of a plurality of compounds whose toxicity is known to be present. Examples of machine learning are support vector machines, Bayesian networks, neural networks, AdaBoost, Random Forest, and active learning. Two or more of these may be used together. For machine learning, for example, LibSVM (Reference 89), TensorFlow ^TM (Reference 90), Chainer (Registered Trademark) (Reference 91), Jubatus (Registered Trademark) (Reference 92), Caffe (Reference 93). , Theano (Reference 94), Torch (Reference 95), neon ^TM (Reference 96), MXNet (Reference 97), The Microsoft Cognitive Toolkit (Reference 98), R (C) (Reference 99), MATLAB (registered trademark) (reference 100), Mathematica (registered trademark) (reference 101), SAS (registered trademark) (reference 102), RapidMiner (registered trademark) (reference 103), KNIME (registered trademark) ( Reference 104), WeKa (Reference 105), shogun-toolbox / shogun (Reference 106), Orange (Reference 107), Apache Mahout ^TM (Reference 108), scikit-learn (Reference 1095), mlpy (Reference 104) Computer software such as Reference 110), XGBoost (Reference 111), and Deeplearning4j (Reference 112) can be used. Machine learning may be executed by instructing the above software from the outside.

基本的には、毒性予測モデルの構築に使用する既知化合物の種類は多いほど、毒性予測モデルの信頼性が増す。好ましくは300種以上、更に好ましくは3,000種以上、より一層好ましくは7,500種以上の既知化合物を毒性予測モデルの構築に使用する。 Basically, the more known compounds used to build a toxicity prediction model, the more reliable the toxicity prediction model. More than 300, more preferably 3,000, and even more preferably 7,500 or more known compounds are used to build toxicity prediction models.

ステップ（４）として、好ましくは、以下の２つのステップを行う。
（４−１）分子記述子の値を正規化するステップ
（４−２）正規化済みの値を用いて供試化合物の毒性の有無の確率を算出するステップ As step (4), the following two steps are preferably performed.
(4-1) Step of normalizing the value of the molecular descriptor (4-2) Step of calculating the probability of the presence or absence of toxicity of the test compound using the normalized value.

ステップ（４−１）は、毒性の有無が既知の複数の化合物の対応する分子記述子と比較できるようにするステップである。例えば、毒性の有無が既知の複数の化合物についての分子記述子の値を正規化する際の計算（処理）を行うことになる。このステップによって得られる正規化済みの値を用い、ステップ（４−２）において、供試化合物の毒性の有無の確率が算出される。 Step (4-1) is a step that enables comparison with the corresponding molecular descriptors of a plurality of compounds whose toxicity is known to be present. For example, calculations (processing) are performed when normalizing the values of molecular descriptors for a plurality of compounds whose toxicity is known. Using the normalized values obtained in this step, the probability of the presence or absence of toxicity of the test compound is calculated in step (4-2).

ステップ（４）で算出された確率は所定の形式で出力される（ステップ（５））。様々な形式で出力することが可能である。例えば、表形式やグラフ形式等によって表示される。好ましくは、Excel（登録商標）（参考文献１１７）、Libre Office（参考文献１１８）、Apache Open Office^TM（参考文献１１９）等、汎用的なソフトウェアで読み取り／表示可能なように出力される。尚、供試化合物が２個以上の場合には、各供試化合物の毒性の有無の確率が出力されることになり、その典型的な表示態様は、全ての供試化合物の確率を一覧で表示するものであるが、これに限られるものではない。 The probability calculated in step (4) is output in a predetermined format (step (5)). It is possible to output in various formats. For example, it is displayed in a table format or a graph format. Preferably, it is output so that it can be read / displayed by general-purpose software such as Excel (registered trademark) (Reference 117), Libre Office (Reference 118), and Apache Open Office ^{TM (Reference 119).} When there are two or more test compounds, the probabilities of the presence or absence of toxicity of each test compound are output, and the typical display mode is a list of the probabilities of all the test compounds. It is to be displayed, but it is not limited to this.

毒性の有無の確率とともに、供試化合物の毒性の有無の判定結果（典型的には「毒性あり」又は「毒性なし」、或いはこれらに準じたもの）を出力することにしてもよい。当該判定結果は、例えば、複数の供試化合物の中からより効率的に候補化合物（毒性が低い又は毒性がないと予想される有望な化合物）を選抜ないし選定することを可能にする。 Along with the probability of the presence or absence of toxicity, the determination result of the presence or absence of toxicity of the test compound (typically "toxic" or "non-toxic", or something similar thereto) may be output. The determination result enables, for example, more efficiently selecting or selecting a candidate compound (a promising compound having low toxicity or expected to be non-toxic) from a plurality of test compounds.

本発明の一態様では、供試化合物の化学式を生成するステップ（ステップ（６））も行い、ステップ（５）では、当該ステップで生成された化学式と、ステップ（４）で算出された確率が関連づけて（例えば表形式に統合して）出力される。このような出力は、化合物の構造と毒性との関連を示すことになり、より有益な予測結果となる。「化学式」として、化学構造式や分子式等を採用できるが、好ましくは、ユーザーが化合物の幾何学的構造を認識することができる点から化学構造式を採用する。化学式の生成には、例えば、The Chemistry Development Kit（参考文献１３）、RDKit（参考文献１４）、Open Babel（参考文献１２）、MedChem Designer^TM（参考文献１１３）、ChemBioDraw（登録商標）（参考文献１１４）、ChemDraw（登録商標）（参考文献１１５）、MarvinSketch（参考文献１７）、BIOVIA（登録商標） Draw（参考文献１１６）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令することにより化学式を生成してもよい。 In one aspect of the present invention, the step of generating the chemical formula of the test compound (step (6)) is also performed, and in step (5), the chemical formula generated in the step and the probability calculated in step (4) are calculated. It is output in association with each other (for example, integrated into a tabular format). Such output would indicate a link between compound structure and toxicity, providing more informative predictions. As the "chemical formula", a chemical structural formula, a molecular formula, or the like can be adopted, but preferably, the chemical structural formula is adopted from the viewpoint that the user can recognize the geometric structure of the compound. For the generation of chemical formulas, for example, The Chemistry Development Kit (Reference 13), RDKit (Reference 14), Open Babel (Reference 12), MedChem Designer ^TM (Reference 113), ChemBioDraw (Registered Trademark) (Reference). Computer software such as 114), ChemDraw® (reference 115), MarvinSketch (reference 17), BIOVIA® Draw (reference 116) can be used. The chemical formula may be generated by instructing the above software from the outside.

本発明による予測結果は、典型的には、次の段階の毒性評価に利用される。具体的には、本発明の予測結果に基づき、細胞や動物を用いた毒性評価に供すべき化合物を選定ないし選抜する。このように本発明を利用することにより、極めて効率的な毒性評価が実現される。 The prediction results according to the present invention are typically used for the next step of toxicity assessment. Specifically, based on the prediction result of the present invention, a compound to be used for toxicity evaluation using cells or animals is selected or selected. By utilizing the present invention in this way, extremely efficient toxicity evaluation is realized.

２．毒性予測システム
図２は本発明の毒性予測システムの構成例を概念的に示す図である。この例の毒性予測システム１は、入力装置２、演算装置３、出力装置４、主記憶装置５、制御装置６、補助記憶装置７を備えるコンピュータシステムである。図２中の実線矢印は、データの流れ方向を示す。図２中の破線矢印は制御信号の流れ方向を示す。尚、本発明の毒性予測システムは、任意の汎用コンピュータを利用して構築することもできる。 2. Toxicity Prediction System FIG. 2 is a diagram conceptually showing a configuration example of the toxicity prediction system of the present invention. The toxicity prediction system 1 of this example is a computer system including an input device 2, an arithmetic unit 3, an output device 4, a main storage device 5, a control device 6, and an auxiliary storage device 7. The solid arrow in FIG. 2 indicates the data flow direction. The broken line arrow in FIG. 2 indicates the flow direction of the control signal. The toxicity prediction system of the present invention can also be constructed using any general-purpose computer.

入力装置２は、例えば、キーボード、マウス、タッチパネル等であり、ユーザーは入力装置を操作し、１個以上の供試化合物の構造情報を入力する。主記憶装置（メインメモリ）５はRAM及び／又はROMである。主記憶装置５には、補助記憶装置７に格納されたプログラム及びデータが取り込み格納される。補助記憶装置７はハードディスクドライブ、光ディスク装置、SSD等である。コンピュータが読み取り可能な記録媒体から、或いはネットワーク又はクラウド上の他のコンピュータ／サーバからプログラムがインストールされるように構成してもよい。 The input device 2 is, for example, a keyboard, a mouse, a touch panel, or the like, and the user operates the input device to input structural information of one or more test compounds. The main storage device (main memory) 5 is RAM and / or ROM. The program and data stored in the auxiliary storage device 7 are taken in and stored in the main storage device 5. The auxiliary storage device 7 is a hard disk drive, an optical disk device, an SSD, or the like. The program may be configured to be installed from a computer-readable recording medium or from another computer / server on the network or in the cloud.

制御装置６は、主記憶装置５に取り込み格納されたプログラムに従って、他の装置を制御する。補助記憶装置７には、コンピュータシステムの出力を格納することができる。出力装置４は例えばディスプレイである。ユーザーは、出力装置を介してコンピュータシステムの出力を視認することが可能である。演算装置３は、主記憶装置５に格納されたデータを取り込んで、制御装置６から送られた演算命令に基づいて演算を行い、演算結果を主記憶装置５に返す。 The control device 6 controls other devices according to a program taken in and stored in the main storage device 5. The auxiliary storage device 7 can store the output of the computer system. The output device 4 is, for example, a display. The user can visually recognize the output of the computer system via the output device. The arithmetic unit 3 takes in the data stored in the main storage device 5, performs an operation based on the arithmetic instruction sent from the control device 6, and returns the arithmetic result to the main storage device 5.

３．プログラム、記憶媒体
本発明は毒性予測システムに用いるコンピュータプログラムも提供する。本発明のコンピュータプログラムは、コンピュータに以下の処理（ｉ）〜（ｖ）を実行させる。尚、本発明のコンピュータプログラムは、例えば、CD（Compact Disc）-ROM、CD-R、CD-RW、DVD（Digital Versatile Disc）、DVD-RAM、BD（Blu-ray（登録商標） Disc）、MO（Magneto Optical disc）、SSD、磁気テープ、或いは各種メモリーカード（USBフラッシュメモリー、SDメモリーカード等）等のコンピュータ読み取り可能な記憶媒体に格納した状態、或いはクラウドコンピュータ等からダウンロードする形態で提供される。また、ネットワークを介して接続されたコンピュータの補助記憶装置に本発明のコンピュータプログラムを格納することや、ネットワークを通じて他のコンピュータに本発明のコンピュータプログラムを転送することなども可能である。
（ｉ）使用者が入力した供試化合物の構造情報を受信する処理
（ｉｉ）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成する処理
（ｉｉｉ）前記3次元分子構造から１個以上の分子記述子の値を算出する処理
（ｉｖ）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出する処理であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である処理
（ｖ）算出した前記確率を出力する処理 3. 3. Programs, Storage Media The present invention also provides computer programs used in toxicity prediction systems. The computer program of the present invention causes a computer to perform the following processes (i) to (v). The computer program of the present invention includes, for example, CD (Compact Disc) -ROM, CD-R, CD-RW, DVD (Digital Versatile Disc), DVD-RAM, BD (Blu-ray (registered trademark) Disc), and the like. It is provided in a state of being stored in a computer-readable storage medium such as MO (Magneto Optical disc), SSD, magnetic tape, or various memory cards (USB flash memory, SD memory card, etc.), or downloaded from a cloud computer, etc. NS. It is also possible to store the computer program of the present invention in the auxiliary storage device of a computer connected via a network, transfer the computer program of the present invention to another computer through the network, and the like.
(I) Process of receiving the structural information of the test compound input by the user (ii) Process of generating a three-dimensional molecular structure whose structure is optimized based on the received structural information (iii) The three-dimensional molecule Process of calculating the value of one or more molecular descriptors from the structure (iv) A process of calculating the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor by the toxicity prediction model, which is toxicity. Processing that is 100% when the probability of existence and the probability of non-toxicity are added (v) Processing that outputs the calculated probability

＜実施例１：農薬2種の毒性予測＞
概要、図２に示した汎用的コンピュータシステムを利用し、本発明の毒性予測方法（図１）を実行した。化合物の構造の最適化には、半経験的分子軌道法の一つであるpm3法が実行可能なソフトウェアGAMESSを使用した。また、構造最適化された3次元分子構造から、3次元分子記述子35種、量子化学分子記述子4種、0次元分子記述子121種、1次元分子記述子907種、2次元分子記述子160種の値を算出することにした。一部の分子記述子（参考文献６８及び参考文献１２０を参照して算出した）を除き、ソフトウェアPaDEL-descriptor（参考文献８１及び参考文献１２１）を使用して分子記述子の値が算出された。一方、Ames変異原性が既知の化合物約8,000種の正規化済み分子記述子の値を用い、サポートベクターマシンによる機械学習で毒性予測モデルを構築した。 <Example 1: Toxicity prediction of two pesticides>
Overview, the toxicity prediction method of the present invention (FIG. 1) was carried out using the general-purpose computer system shown in FIG. To optimize the structure of the compound, we used software GAMESS, which can execute the pm3 method, which is one of the semi-empirical molecular orbital methods. In addition, from the structure-optimized 3D molecular structure, 35 types of 3D molecular descriptors, 4 types of quantum chemical molecular descriptors, 121 types of 0-dimensional molecular descriptors, 907 types of 1-dimensional molecular descriptors, and 2 types of 2-dimensional molecular descriptors. We decided to calculate 160 kinds of values. The molecular descriptor values were calculated using the software Padel-descriptor (references 81 and 121), with the exception of some molecular descriptors (calculated with reference to reference 68 and reference 120). .. On the other hand, using the values of the normalized molecular descriptors of about 8,000 compounds with known Ames mutagenicity, a toxicity prediction model was constructed by machine learning using a support vector machine.

農薬2種（ジクロベニル（dichlobenil）、テフルベンズロン（teflubenzuron）の構造情報をSMILES形式（Clc1cccc(Cl)c1C#N、Fc1cccc(F)c1C(=O)NC(=O)Nc1cc(Cl)c(F)c(Cl)c1F）で入力し、化合物の化学構造式と予測結果を統合し、表形式で出力させた。尚、SMILES形式構造情報は化合物の一次構造であって、書式が確立している（参考文献１及び参考文献２)。SMILES形式構造ファイルは、既存ソフトウェア（例えばMarvinSketch（参考文献１７）、ChemDraw（登録商標）（参考文献１１５）、BIOVIA（登録商標） Draw（参考文献１１６）等）で簡便に作成できる。 Structural information of two pesticides (dichlobenil and teflubenzuron) in SMILES format (Clc1cccc (Cl) c1C # N, Fc1cccc (F) c1C (= O) NC (= O) Nc1cc (Cl) c (F) C (Cl) c1F) was input, the chemical structural formula of the compound and the prediction result were integrated and output in tabular format. The SMILES format structural information is the primary structure of the compound, and the format is established. (Reference 1 and Reference 2). SMILES format structure files include existing software (for example, MarvinSketch (Reference 17), ChemDraw (registered trademark) (Reference 115), BIOVIA (registered trademark) Draw (Reference 116), etc. ) Can be easily created.

図３は、農薬2種について、細菌を用いた復帰突然変異試験で判定される変異原性を予測した結果である。細菌を用いた復帰突然変異試験はAmes試験と呼称される。供試化合物として用いた農薬2種は、Ames変異原性が無いことが知られている。図３に示す通り、当該農薬2種のAmes変異原性の確率が適切且つ高い精度で算出された。また、Ames変異原性有りの確率と無しの確率を足し合わせると100％になることから、化合物間の比較が容易である。 FIG. 3 shows the results of predicting the mutagenicity of two pesticides as determined by a reversion mutation test using bacteria. The return mutation test using bacteria is called the Ames test. The two pesticides used as test compounds are known to have no Ames mutagenicity. As shown in FIG. 3, the probabilities of Ames mutagenicity of the two pesticides were calculated appropriately and with high accuracy. In addition, the probability of having Ames mutagenicity and the probability of not having Ames mutagenicity add up to 100%, so comparison between compounds is easy.

＜実施例２：農薬724種の毒性予測（予測精度の検証）＞
本発明のシステムの予測精度を検証するため、Ames変異原性の有無が既知の農薬724種のAmes変異原性を予測した。その結果、本発明のシステムによる予測結果とAmes試験の結果の一致率は、657／724×100＝90.7（％）であり、本発明のシステムの予測精度が高いことが裏づけられた（図４）。また、本発明の予測システムは724種の全てについて予測結果を出力可能であった。即ち、実用性に極めて優れたものであることが示された。 <Example 2: Toxicity prediction of 724 kinds of pesticides (verification of prediction accuracy)>
In order to verify the prediction accuracy of the system of the present invention, the Ames mutagenicity of 724 pesticides known to have Ames mutagenicity was predicted. As a result, the concordance rate between the prediction result by the system of the present invention and the result of the Ames test was 657/724 × 100 = 90.7 (%), which proved that the prediction accuracy of the system of the present invention was high (Fig. 4). ). In addition, the prediction system of the present invention was able to output prediction results for all 724 types. That is, it was shown to be extremely practical.

＜実施例３：農薬3種を用いた予測精度の比較＞
農薬3種（anthraquinone、diquat及びchlormequat）のAmes変異原性を本発明の方法で予測した。尚、構造の最適化の方法等、特に言及しない点については、上記の実施例と同様である。 <Example 3: Comparison of prediction accuracy using three types of pesticides>
The Ames mutagenicity of three pesticides (anthraquinone, diquat and chlormequat) was predicted by the method of the present invention. The points not particularly mentioned, such as the method of optimizing the structure, are the same as those in the above embodiment.

この実施例でAmes変異原性を評価した農薬3種のSMILES形式の構造情報を図５に示す。図６〜８には、構造情報に基づいて生成された、農薬3種の3次元分子構造を、内部座標を用いて表示した。内部座標は、2原子の位置で定義される結合長、3原子の位置で定義される結合角、4原子の位置で定義されるねじれ角から構成される。結合長の単位はオングストロームである。結合角とねじれ角の単位は、°（度）である。 Figure 5 shows the structural information of the three pesticides whose Ames mutagenicity was evaluated in this example in the SMILES format. In FIGS. 6 to 8, the three-dimensional molecular structures of the three pesticides generated based on the structural information are displayed using the internal coordinates. Internal coordinates consist of a bond length defined by the position of 2 atoms, a bond angle defined by the position of 3 atoms, and a helix angle defined by the position of 4 atoms. The unit of bond length is angstrom. The unit of bond angle and helix angle is ° (degrees).

3次元分子構造に基づいて、3次元分子記述子35種、量子化学分子記述子4種、0次元分子記述子119種、1次元分子記述子795種、及び2次元分子記述子149種の値を算出した。算出された値の一部を図９に示す。毒性予測モデルの構築には、Ames変異原性が既知の化合物9,719種の正規化済み分子記述子の値を用いた。 Values of 35 3D molecular descriptors, 4 quantum chemical molecular descriptors, 119 0-dimensional molecular descriptors, 795 1-dimensional molecular descriptors, and 149 2D molecular descriptors based on the 3D molecular structure Was calculated. A part of the calculated value is shown in FIG. To construct the toxicity prediction model, the values of the normalized molecular descriptors of 9,719 compounds with known Ames mutagenicity were used.

農薬3種の予測結果の出力を図１０に示す。農薬毎にAmes変異原性の有りの確率と無しの確率（二つの確率を足すと100％になる）が計算され出力される。供試化合物として用いた農薬3種は、Ames試験によってAmes変異原性が無いことが判明している。 The output of the prediction results of the three pesticides is shown in FIG. The probability of having Ames mutagenicity and the probability of not having Ames mutagenicity (the sum of the two probabilities is 100%) is calculated and output for each pesticide. The three pesticides used as test compounds have been found to be non-Ames mutagenic by the Ames test.

比較例として、生成される分子記述子を変えた場合の予測結果を求めた。
＜比較例１＞
0次元分子記述子119種、1次元分子記述子795種、2次元分子記述子149種の値が算出されることにし（3次元分子記述子35種と量子化学分子記述子4種を除いた）、上記と同様の処理によってAmes変異原性を予測した。上記システム（実施例３）との違いは、3次元分子記述子35種と量子化学分子記述子4種を含まない点である。
＜比較例２＞
構造最適化されていない3次元分子構造に基づいて、3次元分子記述子29種、0次元分子記述子119種、1次元分子記述子795種、2次元分子記述子149種の値が算出されることにし、上記と同様の処理によってAmes変異原性を予測した。上記システム（実施例３）との違いは、3次元分子記述子の値が、構造最適化されていない3次元構造に基づいて算出されている点と、量子化学分子記述子4種を含まない点である。 As a comparative example, the prediction result when the generated molecular descriptor was changed was obtained.
<Comparative example 1>
Values of 119 0-dimensional molecular descriptors, 795 1-dimensional molecular descriptors, and 149 2-dimensional molecular descriptors will be calculated (excluding 35 3D molecular descriptors and 4 quantum chemical molecular descriptors). ), Ames mutagenicity was predicted by the same treatment as above. The difference from the above system (Example 3) is that it does not include 35 types of 3D molecular descriptors and 4 types of quantum chemical molecular descriptors.
<Comparative example 2>
Based on the non-structure-optimized 3D molecular structure, values of 29 3D molecular descriptors, 119 0-dimensional molecular descriptors, 795 1-dimensional molecular descriptors, and 149 2D molecular descriptors are calculated. The Ames mutagenicity was predicted by the same treatment as above. The difference from the above system (Example 3) is that the value of the 3D molecular descriptor is calculated based on the 3D structure that is not structurally optimized, and does not include 4 types of quantum chemical molecular descriptors. It is a point.

比較例１、２の予測結果を図１１に示す。実施例３の出力結果（図１０）と比較すると、比較例１、２のいずれも、農薬3種の全てについて、Ames変異原性有りの確率が高く、予測精度及び正確性に劣ることがわかる。注目すべきことに、比較例１、２では、diquatについてAmes変異原性なしの確率よりもAmes変異原性有りの確率の方が高く、Ames試験の結果に反する予測結果を示した。 The prediction results of Comparative Examples 1 and 2 are shown in FIG. Comparing with the output result of Example 3 (FIG. 10), it can be seen that in all of Comparative Examples 1 and 2, the probability of having Ames mutagenicity is high for all three types of pesticides, and the prediction accuracy and accuracy are inferior. .. Notably, in Comparative Examples 1 and 2, the probability of having Ames mutagenicity was higher than the probability of having no Ames mutagenicity for diquat, and the prediction result was contrary to the result of the Ames test.

本発明によれば、既知化合物はもとより、今後開発される化合物（仮想化合物を含む）の毒性予測が可能である。また、実際の試験が行えない又は困難な化合物の毒性の予測も可能となる。毒性の無い又は毒性の低い化合物の開発が要求される、化成品、医薬品、農薬、動物薬品（家畜やペット）、香粧品、洗剤、染料、インク、添加剤、その他の素材の毒性評価に本発明を適用可能である。 According to the present invention, it is possible to predict the toxicity of compounds (including virtual compounds) to be developed in the future as well as known compounds. It also makes it possible to predict the toxicity of compounds for which actual testing cannot be performed or is difficult. This book is used for toxicity evaluation of chemical products, pharmaceuticals, pesticides, veterinary chemicals (livestock and pets), cosmetics, detergents, dyes, inks, additives, and other materials that require the development of non-toxic or low-toxic compounds. The invention is applicable.

この発明は、上記発明の実施の形態及び実施例の説明に何ら限定されるものではない。特許請求の範囲の記載を逸脱せず、当業者が容易に想到できる範囲で種々の変形態様もこの発明に含まれる。本明細書の中で明示した論文、公開特許公報、及び特許公報などの内容は、その全ての内容を援用によって引用することとする。 The present invention is not limited to the description of the embodiments and examples of the above invention. Various modifications are also included in the present invention as long as those skilled in the art can easily conceive without departing from the description of the scope of claims. The contents of the papers, published patent gazettes, patent gazettes, etc. specified in this specification shall be cited by reference in their entirety.

＜参考文献＞
1. Weininger D., "SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, February 1988, vol. 28, iss.1, p. 31-36, DOI: 10.1021/ci00057a005
2. Weininger D., ET AL, "SMILES. 2. Algorithm for generation of unique SMILES notation", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, May 1989, vol. 29, iss. 2, p. 97-101, DOI: 10.1021/ci00062a008
3. Dalby A., ET AL, "Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 32, iss. 3, May 1992, p. 244-255. DOI: 10.1021/ci00007a012
4. Schlegel H. B., "Geometry optimization", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Ltd., May 2011, vol. 1, iss. 5, p.790-809, DOI: 10.1002/wcms.34
5. Zerner M. C., ET AL(Eds),"Semiempirical Molecular Orbital Methods", Reviews in Computational Chemistry, (Germany), Wiley-VCH, Inc., 1991, vol. 2, p. 313-365, DOI: 10.1002/9780470125793.ch8
6. Friesner R. A., "Ab initio quantum chemistry: Methodology and applications", Proceedings of the National Academy of Sciences of the United States of America, (The United States of America), The United States National Academy of Sciences, May 2005, vol. 102, no. 19, p. 6648-6653, DOI: 10.1073/pnas.0408036102
7. Parr R. G., "Density Functional Theory", Annual Review of Physical Chemistry, (The United States of America), Annual Reviews, Inc., 1983, vol. 34, p. 631-656, https://doi.org/10.1146/annurev.pc.34.100183.003215
8. Bowen J. P. ET AL, Lipkowitz K. B. ET AL (Eds), "Molecular Mechanics: The Art and Science of Parameterization", Reviews in Computational Chemistry, (Germany), WILEY-VCH, Inc., 1991, vol. 2, p. 81-97, DOI: 10.1002/9780470125793
9. Mazzanti A., ET AL, "Recent trends in conformational analysis", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Inc., November 2011, vol. 2, iss. 4, p. 613-641, DOI: 10.1002/wcms.96
10. CORINA Classic、Molecular Networks GmbH社およびAltamira, LLC.社、ウェブサイト https://www.mn-am.com/products/corina
11. SYBYL（登録商標）-X Suite、Certara USA, Inc.社、ウェブサイトhttps://www.certara.com/software/molecular-modeling-and-simulation/sybyl-x-suite/
12. O'Boyle N. M., ET AL, "Open Babel: An open chemical toolbox", Journal of Cheminformatics, (The United States of America), Springer Publishing, 3:33, Oct 2011, DOI: 10.1186/1758-2946-3-33
13. Steinbeck C., ET AL, "The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 43, iss. 2, February 2003, p. 493-500, DOI: 10.1021/ci025584y
14. RDKit, Landrum G., "RDKit Documentation Release 2017.03.1", Online Documentation, ウェブサイトhttp://www.rdkit.org/docs/Overview.html
15. Chem3D^TM、PerkinElmer, Inc.社、ウェブサイトhttp://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/ChemOfficeProfessional/
16. ChemBio3D（登録商標）、PerkinElmer, Inc.社、ウェブサイトhttps://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
17. MarvinSketch、ChemAxon社、ウェブサイトhttps://www.chemaxon.com/products/calculator-plugins/molecular-modelling/
18. Balloon、Vainio M., ウェブサイトhttp://users.abo.fi/mivainio/balloon/
19. TINKER、Ponder J., ET AL, ウェブサイトhttps://dasher.wustl.edu/tinker/
20. Amber、Case D. A., ET AL, ウェブサイトhttp://ambermd.org/
21. AmberTools、ウェブサイトhttp://ambermd.org/#AmberTools
22. CHARMM、Karplus M., ET AL, ウェブサイトhttps://www.charmm.org/charmm/
23. NAMD、Theoretical and Computational Biophysics Group, Beckman Institute, University of Illinois, ウェブサイト http://www.ks.uiuc.edu/Research/namd/
24. BOSS、 Jorgensen W. L., ET AL, ウェブサイトhttp://zarbi.chem.yale.edu/software.html
25. VEGA ZZ/VEGA Command line、Pedretti A., ET AL: ウェブサイトhttp://www.vegazz.net/
26. GROMOS^TM、van Gunsteren W. F., ET AL, ウェブサイトhttp://www.igc.ethz.ch/gromos.html
27. GROMACS、van der Spoel D., ET AL, ウェブサイトhttp://www.gromacs.org/
ウェブサイトhttp://openmopac.net/
29. GAMESS、Schmidt M. W., ET AL, ウェブサイトhttp://www.msg.ameslab.gov/gamess/
30. Firefly、Granovsky A. A., ET AL, ウェブサイトhttp://classic.chem.msu.su/gran/gamess/index.html
31. Gaussian（登録商標）、Gaussian, Inc.社、ウェブサイトhttp://gaussian.com/citation/
32. Spartan、Wavefunction, Inc.社、ウェブサイトhttps://www.wavefun.com/products/spartan.html
33. Q-Chem、Q-Chem, Inc.社、ウェブサイト http://www.q-chem.com/
34. HyperChem、Hypercube, Inc.、ウェブサイトhttp://www.hyper.com/
35. Molecular Operating Environment、Chemical Computing Group ULC、ウェブサイトhttps://www.chemcomp.com/MOE-Molecular_Operating_Environment.htm
36. BIOVIA（登録商標） Discovery Studio、Dassault Systemes社、ウェブサイトhttp://accelrys.com/products/collaborative-science/biovia-discovery-studio/
37. BIOVIA（登録商標） Material Studio、Dassault Systemes社、ウェブサイトhttp://accelrys.com/products/collaborative-science/biovia-materials-studio/
38. ConfGen、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017ウェブサイトhttps://www.schrodinger.com/confgen
39. LigPrep、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017,ウェブサイトhttps://www.schrodinger.com/ligprep
40. Desmond Molecular Dynamics System、D. E. Shaw Research、New York, NY, 2017. ウェブサイトhttps://www.schrodinger.com/desmond
41. Jaguar、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017, ウェブサイトhttps://www.schrodinger.com/jaguar
42. MacroModel、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017, ウェブサイトhttps://www.schrodinger.com/macromodel
43. MOLGEN、Wassermann A., ET AL, ウェブサイト http://www.molgen.de/
44. CONFLEX（登録商標）、CONFLEX Corporation社、ウェブサイトhttp://www.conflex.net/
45. OMEGA、OpenEye Scientific Software、ウェブサイトhttps://www.eyesopen.com/omega
46. VConf、VeraChem, LLC、ウェブサイトhttp://www.verachem.com/products/vconf/
47. Key3D、IMMD, Inc.、ウェブサイト http://www.immd.co.jp/en/product_2.html
48. Molpro、TTI GmbH, ウェブサイトhttps://www.molpro.net/
49. Molcas、Veryazov. V. ET AL, ウェブサイト http://www.molcas.org/
50. ADF（登録商標）、Scientific Computing & Modelling NV, ウェブサイトhttps://www.scm.com/product/adf/
51. TURBOMOLE、TURBOMOLE GmbH, ウェブサイトhttp://www.turbomole.com/
52. PQS、Parallel Quantum Solutions, ウェブサイトhttp://www.pqs-chem.com/
53. MPQC、Valeev E. ET AL, ウェブサイト http://www.mpqc.org/
54. Dalton、Dalton/LSDalton developers, ウェブサイトhttp://daltonprogram.org
55. LSDalton、Dalton/LSDalton developersウェブサイトhttp://daltonprogram.org
56. COLUMBUS、Lischka H. ET AL, ウェブサイト https://www.univie.ac.at/columbus/
57. NWChem、Valiev M., ET AL, ウェブサイトhttp://www.nwchem-sw.org/index.php/Main_Page
58. PSI4、Sherrill C. D., ET AL, ウェブサイトhttp://www.psicode.org/
59. CFOUR、Stanton J. F., ET AL, ウェブサイトhttp://www.cfour.de/
60. ACES、Bartlett R. J., ET AL, ウェブサイト http://www.qtp.ufl.edu/ACES/
61. ORCA、Neese, F., ET AL, ウェブサイト https://orcaforum.cec.mpg.de/
62. SMASH、Ishimura K., ウェブサイト http://smash-qc.sourceforge.net/
63. ABINIT-MP（登録商標）、Mochizuki Y., ET AL, ウェブサイトhttp://www.cenav.org/abinitmpopen1/
64. NTChem、Nakajima, T., ET AL, ウェブサイトhttp://labs.aics.riken.jp/nakajimat_top/ntchem_e.html
65. PAICS、Ishikawa T., ウェブサイトhttp://www.paics.net/index_e.html
66. Johann Gasteiger J., ET AL(Editor), "Chemoinformatics: A Textbook"、WILEY-VCH Verlag GmbH & Co. KgaA、Weinheim, 2003, ISBN 3-527-30681-1
67. Wicker J. G. P., ET AL, "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, vol. 56, iss. 12, November 2016, p. 2347-2352, DOI: 10.1021/acs.jcim.6b00565
68. Karelson M., ET AL, "Quantum-Chemical Descriptors in QSAR/QSPR Studies", Chemical Reviews, (The United States of America), The American Chemical Society Publications, vol. 96, iss. 3, May 1996, p. 1027-1044, DOI: 10.1021/cr950202r
69. DRAGON、Kode s.r.l.社、ウェブサイト https://chm.kode-solutions.net/products_dragon.php
70. CODESSA PRO、CompuDrug International, Inc.社、ウェブサイトhttp://www.compudrug.com/codessa_pro
71. ADAPT、Jurs P. C., ET AL、ウェブサイトhttp://research.chem.psu.edu/pcjgroup/adapt.html
72. ADMET Predictor、Simulations Plus, Inc.社、ウェブサイトhttp://www.simulations-plus.com/software/admet-property-prediction-qsar/
73. CORINA Symphony、Molecular Networks GmbH社およびAltamira, LLC.社、ウェブサイト https://www.mn-am.com/products/corinasymphony
74. Pentacle、Molecular Discovery Ltd社、ウェブサイト http://www.moldiscovery.com/software/pentacle/
75. VolSurf+、Molecular Discovery Ltd、ウェブサイトhttp://www.moldiscovery.com/software/vsplus/
76. ISIDA Fragmentor、Varnek A., ET AL、ウェブサイトhttp://infochim.u-strasbg.fr/spip.php?rubrique41
77. JOELib, Zell A., ET AL、ウェブサイトhttp://www.ra.cs.uni-tuebingen.de/software/joelib/index.html
78. Molconn-Z、eduSoft, LC社、ウェブサイトhttp://www.edusoft-lc.com/molconn/
79. PowerMV、Liu J., ET AL、ウェブサイトhttps://www.niss.org/research/software/powermv
80. PreADMET、Bioinformatics & Molecular Design Research Center (BMDRC)、ウェブサイト https://preadmet.bmdrc.kr/preadmet-pc-version-2-0/
81. PaDEL-Descriptor、Yap C. W., ウェブサイトhttp://www.yapcwsoft.com/dd/padeldescriptor/
82. Cinfony、 O'Boyle N. M., ET AL、ウェブサイト http://cinfony.github.io/
83. Cao D.-S., ET AL, "ChemoPy: freely available python package for computational biology and chemoinformatics", Bioinformatics, (The United Kingdom), Oxford University Press, vol. 29, iss. 8, March 2013, p. 1092-1094, DOI: https://doi.org/10.1093/bioinformatics/btt105
84. ToMoCoMD-CARDD、Ponce Y. M., ウェブサイト http://tomocomd.com/
85. QuaSAR-Descriptor、Chemical Computing Group ULC社、ウェブサイトhttps://www.chemcomp.com/journal/descr.htm
86. QikProp、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017, ウェブサイト https://www.schrodinger.com/qikprop
87. VCharge、VeraChem, LLC、ウェブサイトhttp://www.verachem.com/products/vcharge/
88. Mold2、Hong, H., ET AL, ウェブサイトhttps://www.fda.gov/ScienceResearch/BioinformaticsTools/Mold2/ucm144528.htm
89. LibSVM、Chang C.-C., ET AL、ウェブサイトhttps://www.csie.ntu.edu.tw/~cjlin/libsvm/
90. TensorFlow^TM、Google Inc.、ウェブサイトhttps://www.tensorflow.org/
91. Chainer（登録商標）、Preferred Networks, Inc.、ウェブサイト http://chainer.org/
92. Jubatus（登録商標）、Preferred Networks, Inc.および日本電信電話株式会社、ウェブサイト https://jubat.us/en/
93. Caffe、Jia Y., ET AL、ウェブサイトhttp://caffe.berkeleyvision.org/
94. Theano、Theano Development Team、ウェブサイトhttp://deeplearning.net/software/theano/
95. Torch、Ronan Collobert ET AL、ウェブサイトhttp://torch.ch/
96. neon^TM、Nervana Systems、ウェブサイト https://www.nervanasys.com/technology/neon/
97. MXNet、ウェブサイト http://mxnet.io/
98. The Microsoft Cognitive Toolkit、Microsoft Corporation、ウェブサイトhttps://www.microsoft.com/en-us/cognitive-toolkit/
99. R(c)、The R Foundation、ウェブサイトhttps://www.r-project.org/
100. MATLAB（登録商標）、The MathWorks, Inc.、ウェブサイトhttps://www.mathworks.com/products/matlab.html?s_tid=hp_products_matlab
101. Mathematica（登録商標）、Wolfram Research、ウェブサイト http://www.wolfram.com/mathematica/
102. SAS（登録商標）、SAS Institute Inc.、ウェブサイトhttps://www.sas.com/en_us/home.html
103. RapidMiner（登録商標）、RapidMiner, Inc.、ウェブサイトhttps://rapidminer.com/
104. KNIME（登録商標）、KNIME.com AG、ウェブサイトhttps://www.knime.org/
105. Witten I. H., ET AL, The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", (The United States of America), Morgan Kaufmann, Fourth Edition, 2016, ISBN13: 978-0128042915. ウェブサイト http://www.cs.waikato.ac.nz/ml/weka/
106. shogun-toolbox/shogun、ウェブサイト http://www.shogun-toolbox.org/, Sonnenburg S., ET AL, "shogun-toolbox/shogun: Shogun 6.0.0 - Baba Nobuharu", April 2017, DOI: 10.5281/zenodo.556748
107. Orange、Demsar J., ET AL, "Orange: Data Mining Toolbox in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 14, Aug 2013, p. 2349-2353. ウェブサイトhttps://orange.biolab.si/
108. Apache Mahout^TM、The Apache Software Foundation、ウェブサイトhttp://mahout.apache.org/
109. scikit-learn、ウェブサイトhttp://scikit-learn.org/stable/. Pedregosa F., ET AL, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 12, Oct 2011, p. 2825-2830
110. mlpy, Albanese D., ET AL、ウェブサイトhttp://mlpy.sourceforge.net/
111. Chen T., ET AL, "XGBoost: A Scalable Tree Boosting System", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug 2016, DOI: 10.1145/2939672.2939785
112. Deeplearning4j、Skymind社、Deeplearning4j Development Team, "Deeplearning4j: Open-source distributed deep learning for the JVM", Apache Software Foundation License 2.0. ウェブサイトhttps://deeplearning4j.org/
113. MedChem Designer^TM、Simulations Plus, Inc.社、ウェブサイトhttp://www.simulations-plus.com/software/medchem-designer/
114. ChemBioDraw（登録商標）、PerkinElmer, Inc.社、ウェブサイトhttps://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
115. ChemDraw（登録商標）、PerkinElmer, Inc.社、ウェブサイトhttp://www.cambridgesoft.com/software/overview.aspx
116. BIOVIA（登録商標） Draw、Dassault Systemes社、ウェブサイトhttp://accelrys.com/products/collaborative-science/biovia-draw/
117. Excel（登録商標）、Microsoft Corporation、ウェブサイト https://products.office.com/en-us/excel
118. Libre Office、The Document Foundation、ウェブサイトhttps://www.libreoffice.org/
119. Apache Open Office^TM、The Apache Software Foundation、ウェブサイトhttps://www.openoffice.org/
120. Sakuratani Y, ET AL, "Molecular size as a limiting characteristic for bioconcentration in fish", Journal of Environmental Biology, January 2008, vol. 29, iss. 1, p. 89-92.、ウェブサイトhttp://www.jeb.co.in/journal_issues/200801_jan08/paper_15.pdf
121. Yap C. W., "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints", Journal of Computational Chemistry, Volume 32, Issue 7, p. 1466-1474, May 2011. DOI: 10.1002/jcc.21707. Supporting Information: JCCT_21707_sm_suppinformation.xls、ウェブサイトhttp://onlinelibrary.wiley.com/doi/10.1002/jcc.21707/suppinfo <References>
1. Weininger D., "SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, February 1988, vol. 28, iss.1, p. 31-36, DOI: 10.1021 / ci00057a005
2. Weininger D., ET AL, "SMILES. 2. Algorithm for generation of unique SMILES notation", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, May 1989, vol. 29 , iss. 2, p. 97-101, DOI: 10.1021 / ci00062a008
3. Dalby A., ET AL, "Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 32, iss. 3, May 1992, p. 244-255. DOI: 10.1021 / ci00007a012
4. Schlegel HB, "Geometry optimization", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Ltd., May 2011, vol. 1, iss. 5, p.790-809 , DOI: 10.1002 / wcms.34
5. Zerner MC, ET AL (Eds), "Semiempirical Molecular Orbital Methods", Reviews in Computational Chemistry, (Germany), Wiley-VCH, Inc., 1991, vol. 2, p. 313-365, DOI: 10.1002 / 9780470125793.ch8
6. Friesner RA, "Ab initio quantum chemistry: Methodology and applications", Proceedings of the National Academy of Sciences of the United States of America, (The United States of America), The United States National Academy of Sciences, May 2005, vol . 102, no. 19, p. 6648-6653, DOI: 10.1073 / pnas.0408036102
7. Parr RG, "Density Functional Theory", Annual Review of Physical Chemistry, (The United States of America), Annual Reviews, Inc., 1983, vol. 34, p. 631-656, https://doi.org /10.1146/annurev.pc.34.100183.003215
8. Bowen JP ET AL, Lipkowitz KB ET AL (Eds), "Molecular Mechanics: The Art and Science of Parameterization", Reviews in Computational Chemistry, (Germany), WILEY-VCH, Inc., 1991, vol. 2, p . 81-97, DOI: 10.1002 / 9780470125793
9. Mazzanti A., ET AL, "Recent trends in conformational analysis", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Inc., November 2011, vol. 2, iss. 4, p. 613-641, DOI: 10.1002 / wcms.96
10. CORINA Classic, Molecular Networks GmbH and Altamira, LLC., Website https://www.mn-am.com/products/corina
11. SYBYL®-X Suite, Certara USA, Inc., website https://www.certara.com/software/molecular-modeling-and-simulation/sybyl-x-suite/
12. O'Boyle NM, ET AL, "Open Babel: An open chemical toolbox", Journal of Cheminformatics, (The United States of America), Springer Publishing, 3:33, Oct 2011, DOI: 10.1186 / 1758-2946- 3-33
13. Steinbeck C., ET AL, "The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 43, iss. 2, February 2003, p. 493-500, DOI: 10.1021 / ci025584y
14. RDKit, Landrum G., "RDKit Documentation Release 2017.03.1", Online Documentation, Website http://www.rdkit.org/docs/Overview.html
15. Chem3D ^TM , PerkinElmer, Inc., website http://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/ChemOfficeProfessional/
16. ChemBio3D®, PerkinElmer, Inc., website https://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
17. MarvinSketch, ChemAxon, website https://www.chemaxon.com/products/calculator-plugins/molecular-modelling/
18. Balloon, Vainio M., website http://users.abo.fi/mivainio/balloon/
19. TINKER, Ponter J., ET AL, Website https://dasher.wustl.edu/tinker/
20. Amber, Case DA, ET AL, Website http://ambermd.org/
21. AmberTools, Website http://ambermd.org/#AmberTools
22. CHARMM, Karplus M., ET AL, Website https://www.charmm.org/charmm/
23. NAMD, Theoretical and Computational Biophysics Group, Beckman Institute, University of Illinois, Website http://www.ks.uiuc.edu/Research/namd/
24. BOSS, Jorgensen WL, ET AL, Website http://zarbi.chem.yale.edu/software.html
25. VEGA ZZ / VEGA Command line, Pedretti A., ET AL: Website http://www.vegazz.net/
26. GROMOS ^TM , van Gunsteren WF, ET AL, website http://www.igc.ethz.ch/gromos.html
27. GROMACS, van der Spoel D., ET AL, website http://www.gromacs.org/
Website http://openmopac.net/
29. GAMESS, Schmidt MW, ET AL, website http://www.msg.ameslab.gov/gamess/
30. Firefly, Granovsky AA, ET AL, website http://classic.chem.msu.su/gran/gamess/index.html
31. Gaussian®, Gaussian, Inc., website http://gaussian.com/citation/
32. Spartan, Wavefunction, Inc., Website https://www.wavefun.com/products/spartan.html
33. Q-Chem, Q-Chem, Inc., website http://www.q-chem.com/
34. HyperChem, Hypercube, Inc., website http://www.hyper.com/
35. Molecular Operating Environment, Chemical Computing Group ULC, Website https://www.chemcomp.com/MOE-Molecular_Operating_Environment.htm
36. BIOVIA® Discovery Studio, Dassault Systemes, website http://accelrys.com/products/collaborative-science/biovia-discovery-studio/
37. BIOVIA® Material Studio, Dassault Systemes, website http://accelrys.com/products/collaborative-science/biovia-materials-studio/
38. ConfGen, Schrodinger®, LLC, Schrodinger Release 2017-1: New York, NY, 2017 Website https://www.schrodinger.com/confgen
39. LigPrep, Schrodinger®, LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/ligprep
40. Desmond Molecular Dynamics System, DE Shaw Research, New York, NY, 2017. Website https://www.schrodinger.com/desmond
41. Jaguar, Schrodinger®, LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/jaguar
42. MacroModel, Schrodinger®, LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/macromodel
43. MOLGEN, Wassermann A., ET AL, website http://www.molgen.de/
44. CONFLEX®, CONFLEX Corporation, website http://www.conflex.net/
45. OMEGA, OpenEye Scientific Software, Website https://www.eyesopen.com/omega
46. VConf, VeraChem, LLC, website http://www.verachem.com/products/vconf/
47. Key3D, IMMD, Inc., website http://www.immd.co.jp/en/product_2.html
48. Molpro, TTI GmbH, Website https://www.molpro.net/
49. Molcas, Veryazov. V. ET AL, website http://www.molcas.org/
50. ADF®, Scientific Computing & Modeling NV, Website https://www.scm.com/product/adf/
51. TURBOMOLE, TURBOMOLE GmbH, website http://www.turbomole.com/
52. PQS, Parallel Quantum Solutions, website http://www.pqs-chem.com/
53. MPQC, Valeev E. ET AL, Website http://www.mpqc.org/
54. Dalton, Dalton / LS Dalton developers, website http://daltonprogram.org
55. LSDalton, Dalton / LS Dalton developers website http://daltonprogram.org
56. COLUMBUS, Lischka H. ET AL, website https://www.univie.ac.at/columbus/
57. NWChem, Valiev M., ET AL, Website http://www.nwchem-sw.org/index.php/Main_Page
58. PSI4, Sherrill CD, ET AL, website http://www.psicode.org/
59. CFOUR, Stanton JF, ET AL, website http://www.cfour.de/
60. ACES, Bartlett RJ, ET AL, website http://www.qtp.ufl.edu/ACES/
61. ORCA, Neese, F., ET AL, website https://orcaforum.cec.mpg.de/
62. SMASH, Ishimura K., Website http://smash-qc.sourceforge.net/
63. ABINIT-MP®, Mochizuki Y., ET AL, Website http://www.cenav.org/abinitmpopen1/
64. NTChem, Nakajima, T., ET AL, Website http://labs.aics.riken.jp/nakajimat_top/ntchem_e.html
65. PAICS, Ishikawa T., Website http://www.paics.net/index_e.html
66. Johann Gasteiger J., ET AL (Editor), "Chemoinformatics: A Textbook", WILEY-VCH Verlag GmbH & Co. KgaA, Weinheim, 2003, ISBN 3-527-30681-1
67. Wicker JGP, ET AL, "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, vol. 56, iss . 12, November 2016, p. 2347-2352, DOI: 10.1021 / acs.jcim.6b00565
68. Karelson M., ET AL, "Quantum-Chemical Descriptors in QSAR / QSPR Studies", Chemical Reviews, (The United States of America), The American Chemical Society Publications, vol. 96, iss. 3, May 1996, p. . 1027-1044, DOI: 10.1021 / cr950202r
69. DRAGON, Kode srl, website https://chm.kode-solutions.net/products_dragon.php
70. CODESSA PRO, CompuDrug International, Inc., website http://www.compudrug.com/codessa_pro
71. ADAPT, Jurs PC, ET AL, website http://research.chem.psu.edu/pcjgroup/adapt.html
72. ADMET Predictor, Simulations Plus, Inc., website http://www.simulations-plus.com/software/admet-property-prediction-qsar/
73. CORINA Symphony, Molecular Networks GmbH and Altamira, LLC., Website https://www.mn-am.com/products/corinasymphony
74. Pentacle, Molecular Discovery Ltd, website http://www.moldiscovery.com/software/pentacle/
75. VolSurf +, Molecular Discovery Ltd, website http://www.moldiscovery.com/software/vsplus/
76. ISIDA Fragmentor, Varnek A., ET AL, website http://infochim.u-strasbg.fr/spip.php?rubrique41
77. JOELib, Zell A., ET AL, Website http://www.ra.cs.uni-tuebingen.de/software/joelib/index.html
78. Molconn-Z, eduSoft, LC, website http://www.edusoft-lc.com/molconn/
79. PowerMV, Liu J., ET AL, Website https://www.niss.org/research/software/powermv
80. PreADMET, Bioinformatics & Molecular Design Research Center (BMDRC), Website https://preadmet.bmdrc.kr/preadmet-pc-version-2-0/
81. PaDEL-Descriptor, Yap CW, Website http://www.yapcwsoft.com/dd/padeldescriptor/
82. Cinfony, O'Boyle NM, ET AL, website http://cinfony.github.io/
83. Cao D.-S., ET AL, "ChemoPy: freely available python package for computational biology and chemoinformatics", Bioinformatics, (The United Kingdom), Oxford University Press, vol. 29, iss. 8, March 2013, p . 1092-1094, DOI: https://doi.org/10.1093/bioinformatics/btt105
84. ToMoCoMD-CARDD, Ponce YM, Website http://tomocomd.com/
85. QuaSAR-Descriptor, Chemical Computing Group ULC, website https://www.chemcomp.com/journal/descr.htm
86. QikProp, Schrodinger®, LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/qikprop
87. VCharge, VeraChem, LLC, website http://www.verachem.com/products/vcharge/
88. Mold2, Hong, H., ET AL, Website https://www.fda.gov/ScienceResearch/BioinformaticsTools/Mold2/ucm144528.htm
89. LibSVM, Chang C.-C., ET AL, Website https://www.csie.ntu.edu.tw/~cjlin/libsvm/
90. TensorFlow ^TM , Google Inc., Website https://www.tensorflow.org/
91. Chainer®, Preferred Networks, Inc., Website http://chainer.org/
92. Jubatus®, Preferred Networks, Inc. and Nippon Telegraph and Telephone Corporation, website https://jubat.us/en/
93. Caffe, Jia Y., ET AL, website http://caffe.berkeleyvision.org/
94. Theano, Theano Development Team, Website http://deeplearning.net/software/theano/
95. Torch, Ronan Collobert ET AL, website http://torch.ch/
96. neon ^TM , Nervana Systems, website https://www.nervanasys.com/technology/neon/
97. MXNet, website http://mxnet.io/
98. The Microsoft Cognitive Toolkit, Microsoft Corporation, Website https://www.microsoft.com/en-us/cognitive-toolkit/
99. R (c), The R Foundation, Website https://www.r-project.org/
100. MATLAB®, The MathWorks, Inc., Website https://www.mathworks.com/products/matlab.html?s_tid=hp_products_matlab
101. Mathematica®, Wolfram Research, Website http://www.wolfram.com/mathematica/
102. SAS®, SAS Institute Inc., website https://www.sas.com/en_us/home.html
103. RapidMiner®, RapidMiner, Inc., website https://rapidminer.com/
104. KNIME®, KNIME.com AG, website https://www.knime.org/
105. Witten IH, ET AL, The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", (The United States of America), Morgan Kaufmann, Fourth Edition, 2016, ISBN13: 978-0128042915. Web Site http://www.cs.waikato.ac.nz/ml/weka/
106. shogun-toolbox / shogun, website http://www.shogun-toolbox.org/, Sonnenburg S., ET AL, "shogun-toolbox / shogun: Shogun 6.0.0 --Baba Nobuharu", April 2017, DOI : 10.5281 / zenodo.556748
107. Orange, Demsar J., ET AL, "Orange: Data Mining Toolbox in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 14, Aug 2013, p. 2349-2353. Website https://orange.biolab.si/
108. Apache Mahout ^TM , The Apache Software Foundation, Website http://mahout.apache.org/
109. scikit-learn, website http://scikit-learn.org/stable/. Pedregosa F., ET AL, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 12, Oct 2011, p. 2825-2830
110. mlpy, Albanese D., ET AL, website http://mlpy.sourceforge.net/
111. Chen T., ET AL, "XGBoost: A Scalable Tree Boosting System", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug 2016, DOI: 10.1145 / 2939672.2939785
112. Deeplearning4j, Skymind, Deeplearning4j Development Team, "Deeplearning4j: Open-source distributed deep learning for the JVM", Apache Software Foundation License 2.0. Website https://deeplearning4j.org/
113. MedChem Designer ^TM , Simulations Plus, Inc., Website http://www.simulations-plus.com/software/medchem-designer/
114. ChemBioDraw®, PerkinElmer, Inc., website https://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
115. ChemDraw®, PerkinElmer, Inc., website http://www.cambridgesoft.com/software/overview.aspx
116. BIOVIA® Draw, Dassault Systemes, website http://accelrys.com/products/collaborative-science/biovia-draw/
117. Excel®, Microsoft Corporation, website https://products.office.com/en-us/excel
118. Libre Office, The Document Foundation, Website https://www.libreoffice.org/
119. Apache Open Office ^TM , The Apache Software Foundation, Website https://www.openoffice.org/
120. Sakuratani Y, ET AL, "Molecular size as a limiting characteristic for bioconcentration in fish", Journal of Environmental Biology, January 2008, vol. 29, iss. 1, p. 89-92., Website http: // www.jeb.co.in/journal_issues/200801_jan08/paper_15.pdf
121. Yap CW, "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints", Journal of Computational Chemistry, Volume 32, Issue 7, p. 1466-1474, May 2011. DOI: 10.1002 / jcc.21707. Supporting Information: JCCT_21707_sm_suppinformation.xls, website http://onlinelibrary.wiley.com/doi/10.1002/jcc.21707/suppinfo

１毒性予測システム
２入力装置
３演算装置
４出力装置
５主記憶装置
６制御装置
７補助記憶装置 1 Toxicity prediction system 2 Input device 3 Arithmetic device 4 Output device 5 Main memory device 6 Control device 7 Auxiliary storage device

Claims

(1) A step of receiving the structural information of the test compound input by the user, and
(2) A step of generating a three-dimensional molecular structure whose structure is optimized based on the received structural information, and
(3) One or more molecular descriptions including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. Steps to generate child values and
(4) It is a step in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the sum of the probability of toxicity and the probability of non-toxicity is 100%. A certain step, and (5) a step of outputting the calculated probability, and
Only including,
Step (4) is
(4-1) A step of normalizing the value of the molecular descriptor, and
(4-2) A step of calculating the probability of the presence or absence of toxicity of the test compound using the normalized value,
Consists of
A method of predicting the toxicity of a compound, performed by a computer.

The three-dimensional molecular structure is the three-dimensional molecular structure whose structure is optimized by the semi-empirical molecular orbital method, the three-dimensional molecular structure whose structure is optimized by the ab initio molecular orbital method, and the structure by the density general function method. Optimized 3D molecular structure , molecular dynamics method, semi-empirical molecular orbital method, ab initio molecular orbital method or 3D molecular structure searched by density general function method, and molecular dynamics method, semi-experience One or more three-dimensional molecular structures selected from the group consisting of a three-dimensional molecular structure whose structure is optimized by any combination of the ab initio molecular orbital method, the ab initio molecular orbital method, and the density general function method. Item 1. The prediction method according to Item 1.

The prediction method according to claim 1 , wherein the three-dimensional molecular structure is two or more three-dimensional molecular structures whose structures have been optimized by the semi-empirical molecular orbital method.

The prediction method according to any one of claims 1 to 3 , wherein the one or more molecular descriptors include one or more three-dimensional molecular descriptors and one or more quantum chemical molecular descriptors.

The one or more molecular descriptors are one or more three-dimensional molecular descriptors, one or more quantum chemical molecular descriptors, one or more two-dimensional molecular descriptors, and one or more one-dimensional molecular descriptors. The prediction method according to any one of claims 1 to 3 , which comprises one or more 0-dimensional molecular descriptors.

The toxicity prediction model according to any one of claims 1 to 5 , wherein the toxicity prediction model is a toxicity prediction model constructed by machine learning using the values of normalized molecular descriptors of a plurality of compounds known to have toxicity. Prediction method.

The prediction method according to claim 6 , wherein the machine learning is one or more machine learning selected from the group consisting of a support vector machine, a Bayesian network, a neural network, AdaBoost, a random forest, and active learning.

It further comprises the step of generating the chemical formula of the test compound.
The prediction method according to any one of claims 1 to 7 , wherein in step (5), the generated chemical formula is associated with the probability and output.

There are two or more of the test compounds,
The prediction method according to any one of claims 1 to 8 , wherein in step (5), the probability is output for each test compound.

The prediction method according to any one of claims 1 to 9 , wherein in step (5), the determination result of the presence or absence of toxicity of the test compound is output together with the probability.

The prediction method according to any one of claims 1 to 10 , wherein the output of step (5) is displayed in a tabular format.

The prediction method according to any one of claims 1 to 11, wherein the toxicity is mutagenicity determined by a return mutation test using a bacterium.

An input means for inputting the structural information of the test compound,
A receiving means for receiving the structural information of the test compound input by the user, and
A first generation means for generating a three-dimensional molecular structure whose structure is optimized based on the received structural information, and
The value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. The first calculation means for calculating
It is a calculation means for the toxicity prediction model to calculate the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and when the probability of toxicity and the probability of non-toxicity are added, it is 100%. A second calculation means, and an output means for outputting the calculated probability.
Only including,
The second calculation means is a system for predicting the toxicity of a compound, which normalizes the value of the molecular descriptor and calculates the probability of the presence or absence of toxicity of the test compound using the normalized value.

An input device that functions as the input means and
An arithmetic unit that functions as the first generation means, the first calculation means, and the second calculation means.
An output device that functions as the output means,
The main memory and the control device that controls the system,
Containing system of claim 1 3.

Further comprising an auxiliary storage device in which the program is stored, the system according to claim 1 4.

The process of receiving the structural information of the test compound input by the user, and
A process of generating a three-dimensional molecular structure with an optimized structure based on the received structural information, and
The value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. And the process of calculating
In the process in which the toxicity prediction model calculates the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the total of the probability of toxicity and the probability of non-toxicity is 100% . , A process of normalizing the value of the molecular descriptor and calculating the probability of the presence or absence of toxicity of the test compound using the normalized value , and a process of outputting the calculated probability.
A program that lets your computer run.

A computer-readable storage medium containing the program according to claim 16.