A series of baseline model implementations for the guacamol
benchmark
for generative chemistry.
A more in depth explanation of the benchmarks and scores for these baselines is
can be found in our paper.
To install all dependencies:
conda install rdkit -c rdkit
pip install -r requirements.txt
Some baselines require the guacamol
dataset to run, to get it run:
bash fetch_guacamol_dataset.sh
Dummy baseline, always returning random molecules form the guacamol
training set.
To execute the goal-directed generation benchmarks:
python -m random_smiles_sampler.goal_directed_generation
To execute the distribution learning benchmarks:
python -m random_smiles_sampler.distribution_learning
Dummy baseline that simply returns the molecules from the guacamol
training set that best satisfy the score of a goal-directed benchmark.
There is no model nor training, its only purpose is to establish a lower bound
on the benchmark scores.
To execute the goal-directed generation benchmarks:
python -m best_from_chembl.goal_directed_generation
No distribution learning benchmark available.
Genetic algorithm on SMILES as described in: https://www.journal.csj.jp/doi/10.1246/cl.180665
Implementation adapted from: https://github.com/tsudalab/ChemGE
To execute the goal-directed generation benchmarks:
python -m smiles_ga.goal_directed_generation
No distribution learning benchmark available.
Genetic algoritm on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751
Implementation adapted from: https://github.com/jensengroup/GB-GA
To execute the goal-directed generation benchmarks:
python -m graph_ga.goal_directed_generation
No distribution learning benchmark available.
Monte Carlo Tree Search on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751
Implementation adapted from: https://github.com/jensengroup/GB-GB
To execute the goal-directed generation benchmarks:
python -m graph_mcts.goal_directed_generation
To execute the distribution learning benchmarks:
python -m graph_mcts.distribution_learning
To re-generate the distribution statistics as pickle files:
python -m graph_mcts.analyze_dataset
Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329
This implementation optimizes using hill climbing algorithm.
Implementation by BenevolentAI
A pre-trained model is provided in: [smiles_lstm/pretrained_model](TODO real URL)
To execute the goal-directed generation benchmarks:
python -m smiles_lstm_hc.goal_directed_generation
To execute the distribution learning benchmark:
python -m smiles_lstm_hc.distribution_learning
To train a model from scratch:
python -m smiles_lstm_hc.train_smiles_lstm_model
Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329
This implementation optimizes using proximal policy optimization algorithm.
Implementation by BenevolentAI
A pre-trained model is provided in: [smiles_lstm/pretrained_model](TODO real URL)
To execute the goal-directed generation benchmarks:
python -m smiles_lstm_ppo.goal_directed_generation