Skip to content

Latest commit

 

History

History
27 lines (22 loc) · 1.51 KB

README.md

File metadata and controls

27 lines (22 loc) · 1.51 KB

HLC_Predictor

Computational Chemistry Masters Project - University of Southampton A set of Jupyter notebooks illustrating a Henry's law constant (HLC) predictive model, starting from a species' SMILES string.

The compilation of HLCs used in this project was created by R. Sander, the paper published is available here.

The CAS reference numbers in the compilation were used to create SMILES strings (via cirpy). These were in turn passed through DRAGON or a series of RDkit functions to calculate molecular descriptors.

Supervised machine learning algorithms were trained (using the calculated descriptors labelled with their molecules' HLCs) to predict the constants.

  • 7 ML algorithms
  • 4 feature selection methods
  • 6 sets of descriptors

Dependancies

  • Jupyter notebooks, with the following python packages installed:
    • pandas (data structures)
    • numpy (maths)
    • statsmodels.api (stats)
    • cirpy (conversion between chemical identifiers)
    • ipywidgets and IPython.display (widgets and nicer outputs)
    • RDKit (descriptors)
    • matplotlib.pyplot (visualisation)
    • scikit-learn (models, feature selection, PCA)
    • joblib (saving python objects)
    • mpld3 (hover-over labels for plots)
  • DRAGON 6 (not within python, external software for descriptor calculation)