From da88c22d91bec2597baab9844374fab38db58400 Mon Sep 17 00:00:00 2001 From: martin-sicho Date: Tue, 3 Sep 2024 23:50:36 +0200 Subject: [PATCH] split the data representation tutorial --- .../advanced/data/data_representation.ipynb | 1632 +++++++++ .../basics/data/data_representation.ipynb | 3084 +++++++---------- 2 files changed, 2932 insertions(+), 1784 deletions(-) create mode 100644 tutorials/advanced/data/data_representation.ipynb diff --git a/tutorials/advanced/data/data_representation.ipynb b/tutorials/advanced/data/data_representation.ipynb new file mode 100644 index 00000000..b6bd2500 --- /dev/null +++ b/tutorials/advanced/data/data_representation.ipynb @@ -0,0 +1,1632 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3f1a7a18f2217205", + "metadata": {}, + "source": [ + "## Data Representation API (`PropertyStorage` and `ChemStore`)\n", + "\n", + "### Overview\n", + "\n", + "When designing the storage API we tried to identify the most common tasks that need to be performed when working with diverse cheminformatics data sets, mainly in the context of QSPR modelling, but it can also be used to store data from molecular docking or other structure-based simulations. Therefore, QSPRpred defines a general API to register and store properties (independent variables) for arbitrary data entries in its `PropertyStorage` abstract class, which is then further extended by the `ChemStore` interface that supports more specific functionality for encoding molecules alongside their properties. If you take a look at the [API documentation](https://cddleiden.github.io/QSPRpred/docs/api/modules.html) of these classes, you can see the methods and attributes to interact with them. Therefore, anyone can implement any kind of storage system to store compound representations and their properties and as long as they adhere to the above interfaces, their storage system can be used in QSPRpred seamlessly. This potentially enables more advanced users to interface different storage backends (i.e. SQL databases, NoSQL databases, online REST APIs or prohibitively large data sets) with QSPRpred as well. Since this is more advanced functionality, it is not yet covered in this tutorial, which only focuses on currently available implementations that focus on storing data locally by the means of `pandas` data frames. However, we are happy for any inquiries about developing clients for custom APIs or databases. Let us know on the [issue tracker](https://github.com/CDDLeiden/QSPRpred/issues) or via [email](https://github.com/CDDLeiden/QSPRpred/blob/main/pyproject.toml).\n", + "\n", + "### `PandasDataTable` as `PropertyStorage`\n", + "\n", + "**Note: Feel free to skip this part of the tutorial and continue to the \"`TabularStorageBasic` as `ChemStore`\" section if you are more interested in the cheminformatics features of QSPRpred and are not interested in understanding `PropertyStorage` in detail.**\n", + "\n", + "Tabular data is the most common data type in QSPR modelling and `pandas` is the Python package of choice when it comes to processing it. Therefore, we decided to compose the default `PropertyStorage` implementation around it and provide a light wrapper for the `pandas.DataFrame` class called `PandasDataTable`. `PandasDataTable` objects simply manage storage and state of a given `pandas.DataFrame` and giving it all features of the `PropertyStorage` API at the same time. You will typically not interact with these objects directly, but we will now use it for the demonstration of some functions facilitated by the `PropertyStorage` API. We will use the `A2A_LIGANDS.tsv` file from the tutorial data folder as an example data set. This file contains a list of ligands for the adenosine A2A receptor, which is a common target in drug discovery. The data set contains SMILES strings and some other properties relevant for QSPR modelling:" + ] + }, + { + "cell_type": "code", + "id": "fe42793b0c10c905", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:55.943849Z", + "start_time": "2024-09-03T21:47:55.722561Z" + } + }, + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.read_csv(\"../../tutorial_data/A2A_LIGANDS.tsv\", sep=\"\\t\")\n", + "\n", + "df.head()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + " SMILES pchembl_value_Mean \\\n", + "0 Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(... 8.68 \n", + "1 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... 4.82 \n", + "2 O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 5.65 \n", + "3 CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC... 5.45 \n", + "4 CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(... 5.20 \n", + "\n", + " Year \n", + "0 2008.0 \n", + "1 2010.0 \n", + "2 2009.0 \n", + "3 2009.0 \n", + "4 2019.0 " + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SMILESpchembl_value_MeanYear
0Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...8.682008.0
1Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...4.822010.0
2O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0
3CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...5.452009.0
4CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...5.202019.0
\n", + "
" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 1 + }, + { + "cell_type": "markdown", + "id": "eb783d9f4ea943fb", + "metadata": {}, + "source": [ + "Wrapping this data frame in a `PandasDataTable` object is simple:" + ] + }, + { + "cell_type": "code", + "id": "6559b1618887dfbf", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.455536Z", + "start_time": "2024-09-03T21:47:55.944542Z" + } + }, + "source": [ + "from qsprpred.data.tables.pandas import PandasDataTable\n", + "import os\n", + "\n", + "random_state = 42 # for reproducibility of all random operations\n", + "os.makedirs(\"../../tutorial_output/data\",\n", + " exist_ok=True) # create the output directory if it does not exist yet\n", + "dataset = PandasDataTable(df=df, store_dir=\"../../tutorial_output/data\",\n", + " name=\"RepresentationTutorialDataset\",\n", + " random_state=random_state)\n", + "dataset.getDF()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + " SMILES \\\n", + "ID \n", + "RepresentationTutorialDataset_0000 Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(... \n", + "RepresentationTutorialDataset_0001 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... \n", + "RepresentationTutorialDataset_0002 O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", + "RepresentationTutorialDataset_0003 CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC... \n", + "RepresentationTutorialDataset_0004 CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(... \n", + "... ... \n", + "RepresentationTutorialDataset_4077 CNc1ncc(C(=O)NCc2ccc(OC)cc2)c2nc(-c3ccco3)nn12 \n", + "RepresentationTutorialDataset_4078 Nc1nc(-c2ccco2)c2ncn(C(=O)NCCc3ccccc3)c2n1 \n", + "RepresentationTutorialDataset_4079 Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n1 \n", + "RepresentationTutorialDataset_4080 CCCOc1ccc(C=Cc2cc3c(c(=O)n(C)c(=O)n3C)n2C)cc1 \n", + "RepresentationTutorialDataset_4081 CCOC(=O)c1cnc(NCC(C)C)n2nc(-c3ccco3)nc12 \n", + "\n", + " pchembl_value_Mean Year \\\n", + "ID \n", + "RepresentationTutorialDataset_0000 8.68 2008.0 \n", + "RepresentationTutorialDataset_0001 4.82 2010.0 \n", + "RepresentationTutorialDataset_0002 5.65 2009.0 \n", + "RepresentationTutorialDataset_0003 5.45 2009.0 \n", + "RepresentationTutorialDataset_0004 5.20 2019.0 \n", + "... ... ... \n", + "RepresentationTutorialDataset_4077 7.09 2018.0 \n", + "RepresentationTutorialDataset_4078 8.22 2008.0 \n", + "RepresentationTutorialDataset_4079 4.89 2010.0 \n", + "RepresentationTutorialDataset_4080 6.51 2013.0 \n", + "RepresentationTutorialDataset_4081 7.35 2014.0 \n", + "\n", + " ID \n", + "ID \n", + "RepresentationTutorialDataset_0000 RepresentationTutorialDataset_0000 \n", + "RepresentationTutorialDataset_0001 RepresentationTutorialDataset_0001 \n", + "RepresentationTutorialDataset_0002 RepresentationTutorialDataset_0002 \n", + "RepresentationTutorialDataset_0003 RepresentationTutorialDataset_0003 \n", + "RepresentationTutorialDataset_0004 RepresentationTutorialDataset_0004 \n", + "... ... \n", + "RepresentationTutorialDataset_4077 RepresentationTutorialDataset_4077 \n", + "RepresentationTutorialDataset_4078 RepresentationTutorialDataset_4078 \n", + "RepresentationTutorialDataset_4079 RepresentationTutorialDataset_4079 \n", + "RepresentationTutorialDataset_4080 RepresentationTutorialDataset_4080 \n", + "RepresentationTutorialDataset_4081 RepresentationTutorialDataset_4081 \n", + "\n", + "[4082 rows x 4 columns]" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SMILESpchembl_value_MeanYearID
ID
RepresentationTutorialDataset_0000Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...8.682008.0RepresentationTutorialDataset_0000
RepresentationTutorialDataset_0001Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...4.822010.0RepresentationTutorialDataset_0001
RepresentationTutorialDataset_0002O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0RepresentationTutorialDataset_0002
RepresentationTutorialDataset_0003CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...5.452009.0RepresentationTutorialDataset_0003
RepresentationTutorialDataset_0004CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...5.202019.0RepresentationTutorialDataset_0004
...............
RepresentationTutorialDataset_4077CNc1ncc(C(=O)NCc2ccc(OC)cc2)c2nc(-c3ccco3)nn127.092018.0RepresentationTutorialDataset_4077
RepresentationTutorialDataset_4078Nc1nc(-c2ccco2)c2ncn(C(=O)NCCc3ccccc3)c2n18.222008.0RepresentationTutorialDataset_4078
RepresentationTutorialDataset_4079Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n14.892010.0RepresentationTutorialDataset_4079
RepresentationTutorialDataset_4080CCCOc1ccc(C=Cc2cc3c(c(=O)n(C)c(=O)n3C)n2C)cc16.512013.0RepresentationTutorialDataset_4080
RepresentationTutorialDataset_4081CCOC(=O)c1cnc(NCC(C)C)n2nc(-c3ccco3)nc127.352014.0RepresentationTutorialDataset_4081
\n", + "

4082 rows × 4 columns

\n", + "
" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 2 + }, + { + "cell_type": "markdown", + "id": "6e8c0724-0803-4833-8031-6f2482d5fc71", + "metadata": {}, + "source": [ + "Since\n", + "`pandas.DataFrame` is such\n", + "a\n", + "popular\n", + "format, `PropertyStorage`\n", + "enforces\n", + "that\n", + "`getDF`\n", + "exists in all\n", + "implementations and should\n", + "list\n", + "all\n", + "data\n", + "entries and all\n", + "properties in the\n", + "`PropertyStorage`\n", + "object.This is to\n", + "facilitate\n", + "easy\n", + "data\n", + "exchange\n", + "between\n", + "QSPRpred and any\n", + "custom\n", + "code\n", + "that\n", + "relies\n", + "on\n", + "`pandas`.However, we\n", + "can\n", + "also\n", + "do\n", + "a\n", + "lot\n", + "with `PandasDataTable` objects directly:" + ] + }, + { + "cell_type": "code", + "id": "6e1647b440e77026", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.459819Z", + "start_time": "2024-09-03T21:47:56.456563Z" + } + }, + "source": [ + "len(dataset)" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "4082" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 3 + }, + { + "cell_type": "markdown", + "id": "5dbaf44f-b6a6-494b-8f87-2ded27686539", + "metadata": {}, + "source": [ + "the\n", + "saved\n", + "properties / features:" + ] + }, + { + "cell_type": "code", + "id": "9e3256e4a0ba713b", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.476068Z", + "start_time": "2024-09-03T21:47:56.460604Z" + } + }, + "source": [ + "dataset.getProperties()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "['SMILES', 'pchembl_value_Mean', 'Year', 'ID']" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 4 + }, + { + "cell_type": "markdown", + "id": "1720ac68-e052-4dc1-b28f-79a859f82dba", + "metadata": {}, + "source": [ + "You\n", + "will\n", + "also\n", + "notice\n", + "that\n", + "`PandasDataTable`\n", + "objects\n", + "also\n", + "automatically\n", + "create\n", + "a\n", + "unique\n", + "identifier\n", + "for each data entry.This is the `idProp` property, which is a unique identifier for each data entry.This is useful for tracking data entries and is used internally by QSPRpred to keep track of data entries and selecting relevant subsets.You can access it as follows:" + ] + }, + { + "cell_type": "code", + "id": "12499b08eca90ab9", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.482927Z", + "start_time": "2024-09-03T21:47:56.477399Z" + } + }, + "source": [ + "dataset.idProp" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "'ID'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 5 + }, + { + "cell_type": "code", + "id": "c585fbd3e0f09496", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.486604Z", + "start_time": "2024-09-03T21:47:56.483567Z" + } + }, + "source": [ + "dataset.getProperty(dataset.idProp)" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "ID\n", + "RepresentationTutorialDataset_0000 RepresentationTutorialDataset_0000\n", + "RepresentationTutorialDataset_0001 RepresentationTutorialDataset_0001\n", + "RepresentationTutorialDataset_0002 RepresentationTutorialDataset_0002\n", + "RepresentationTutorialDataset_0003 RepresentationTutorialDataset_0003\n", + "RepresentationTutorialDataset_0004 RepresentationTutorialDataset_0004\n", + " ... \n", + "RepresentationTutorialDataset_4077 RepresentationTutorialDataset_4077\n", + "RepresentationTutorialDataset_4078 RepresentationTutorialDataset_4078\n", + "RepresentationTutorialDataset_4079 RepresentationTutorialDataset_4079\n", + "RepresentationTutorialDataset_4080 RepresentationTutorialDataset_4080\n", + "RepresentationTutorialDataset_4081 RepresentationTutorialDataset_4081\n", + "Name: ID, Length: 4082, dtype: object" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 6 + }, + { + "cell_type": "markdown", + "id": "cd8438a2-1e46-4c1e-aa5c-926c257c103a", + "metadata": {}, + "source": [ + "Knowing\n", + "the\n", + "identifier, you\n", + "can\n", + "select\n", + "a\n", + "subset\n", + "of\n", + "the\n", + "data\n", + "set:" + ] + }, + { + "cell_type": "code", + "id": "da69f64b78ca663", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.492966Z", + "start_time": "2024-09-03T21:47:56.487125Z" + } + }, + "source": [ + "subset = dataset.getSubset([\"SMILES\", \"Year\"],\n", + " ids=[\"RepresentationTutorialDataset_0000\",\n", + " \"RepresentationTutorialDataset_0001\"])\n", + "subset.getDF()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + " SMILES \\\n", + "ID \n", + "RepresentationTutorialDataset_0000 Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(... \n", + "RepresentationTutorialDataset_0001 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... \n", + "\n", + " Year ID \n", + "ID \n", + "RepresentationTutorialDataset_0000 2008.0 RepresentationTutorialDataset_0000 \n", + "RepresentationTutorialDataset_0001 2010.0 RepresentationTutorialDataset_0001 " + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SMILESYearID
ID
RepresentationTutorialDataset_0000Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...2008.0RepresentationTutorialDataset_0000
RepresentationTutorialDataset_0001Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...2010.0RepresentationTutorialDataset_0001
\n", + "
" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 7 + }, + { + "cell_type": "markdown", + "id": "703015c5-addc-4aaa-af1b-935adcd84993", + "metadata": {}, + "source": [ + "Notice\n", + "that\n", + "the\n", + "subset is actually\n", + "also\n", + "a\n", + "`PandasDataTable`\n", + "object, so\n", + "you\n", + "can\n", + "perform\n", + "the\n", + "same\n", + "operations\n", + "on\n", + "it as on\n", + "the\n", + "original\n", + "data\n", + "set.\n", + "\n", + "You\n", + "can\n", + "also\n", + "just\n", + "get\n", + "values\n", + "of\n", + "a\n", + "single\n", + "property\n", + "for certain molecules:" + ] + }, + { + "cell_type": "code", + "id": "fd9624328989e66b", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.496674Z", + "start_time": "2024-09-03T21:47:56.493480Z" + } + }, + "source": [ + "dataset.getProperty(\"pchembl_value_Mean\", ids=[\"RepresentationTutorialDataset_0000\",\n", + " \"RepresentationTutorialDataset_0001\"])" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "ID\n", + "RepresentationTutorialDataset_0000 8.68\n", + "RepresentationTutorialDataset_0001 4.82\n", + "Name: pchembl_value_Mean, dtype: float64" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 8 + }, + { + "cell_type": "markdown", + "id": "2429fb38-b739-4f13-af91-3cf555d1a522", + "metadata": {}, + "source": [ + "This is extended\n", + "further and in this\n", + "particular\n", + "case\n", + "we\n", + "can\n", + "also\n", + "perform\n", + "simple\n", + "searches\n", + "on\n", + "properties:" + ] + }, + { + "cell_type": "code", + "id": "b41baa767bab88aa", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.506410Z", + "start_time": "2024-09-03T21:47:56.497326Z" + } + }, + "source": [ + "subset = dataset.searchOnProperty(\"Year\", [2009, 2010], exact=True)\n", + "subset.getDF()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + " SMILES \\\n", + "ID \n", + "RepresentationTutorialDataset_0001 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... \n", + "RepresentationTutorialDataset_0002 O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", + "RepresentationTutorialDataset_0003 CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC... \n", + "RepresentationTutorialDataset_0009 CCCn1c(=O)c2c([nH]c(-c3ccccc3)n2)n(CCCOC)c1=O \n", + "RepresentationTutorialDataset_0018 O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1 \n", + "... ... \n", + "RepresentationTutorialDataset_4049 Nc1nc(-c2ccco2)cc(C(=O)NCc2ccccc2Cl)n1 \n", + "RepresentationTutorialDataset_4050 COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1 \n", + "RepresentationTutorialDataset_4060 N#Cc1cccc(C(=O)Nc2nc3c(ncc(C(=O)N4CCCCC4)c3)n2... \n", + "RepresentationTutorialDataset_4061 COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1 \n", + "RepresentationTutorialDataset_4079 Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n1 \n", + "\n", + " pchembl_value_Mean Year \\\n", + "ID \n", + "RepresentationTutorialDataset_0001 4.82 2010.0 \n", + "RepresentationTutorialDataset_0002 5.65 2009.0 \n", + "RepresentationTutorialDataset_0003 5.45 2009.0 \n", + "RepresentationTutorialDataset_0009 6.47 2009.0 \n", + "RepresentationTutorialDataset_0018 6.74 2010.0 \n", + "... ... ... \n", + "RepresentationTutorialDataset_4049 8.59 2009.0 \n", + "RepresentationTutorialDataset_4050 7.24 2009.0 \n", + "RepresentationTutorialDataset_4060 6.75 2010.0 \n", + "RepresentationTutorialDataset_4061 8.80 2009.0 \n", + "RepresentationTutorialDataset_4079 4.89 2010.0 \n", + "\n", + " ID \n", + "ID \n", + "RepresentationTutorialDataset_0001 RepresentationTutorialDataset_0001 \n", + "RepresentationTutorialDataset_0002 RepresentationTutorialDataset_0002 \n", + "RepresentationTutorialDataset_0003 RepresentationTutorialDataset_0003 \n", + "RepresentationTutorialDataset_0009 RepresentationTutorialDataset_0009 \n", + "RepresentationTutorialDataset_0018 RepresentationTutorialDataset_0018 \n", + "... ... \n", + "RepresentationTutorialDataset_4049 RepresentationTutorialDataset_4049 \n", + "RepresentationTutorialDataset_4050 RepresentationTutorialDataset_4050 \n", + "RepresentationTutorialDataset_4060 RepresentationTutorialDataset_4060 \n", + "RepresentationTutorialDataset_4061 RepresentationTutorialDataset_4061 \n", + "RepresentationTutorialDataset_4079 RepresentationTutorialDataset_4079 \n", + "\n", + "[804 rows x 4 columns]" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SMILESpchembl_value_MeanYearID
ID
RepresentationTutorialDataset_0001Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...4.822010.0RepresentationTutorialDataset_0001
RepresentationTutorialDataset_0002O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0RepresentationTutorialDataset_0002
RepresentationTutorialDataset_0003CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...5.452009.0RepresentationTutorialDataset_0003
RepresentationTutorialDataset_0009CCCn1c(=O)c2c([nH]c(-c3ccccc3)n2)n(CCCOC)c1=O6.472009.0RepresentationTutorialDataset_0009
RepresentationTutorialDataset_0018O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc16.742010.0RepresentationTutorialDataset_0018
...............
RepresentationTutorialDataset_4049Nc1nc(-c2ccco2)cc(C(=O)NCc2ccccc2Cl)n18.592009.0RepresentationTutorialDataset_4049
RepresentationTutorialDataset_4050COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n17.242009.0RepresentationTutorialDataset_4050
RepresentationTutorialDataset_4060N#Cc1cccc(C(=O)Nc2nc3c(ncc(C(=O)N4CCCCC4)c3)n2...6.752010.0RepresentationTutorialDataset_4060
RepresentationTutorialDataset_4061COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc18.802009.0RepresentationTutorialDataset_4061
RepresentationTutorialDataset_4079Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n14.892010.0RepresentationTutorialDataset_4079
\n", + "

804 rows × 4 columns

\n", + "
" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 9 + }, + { + "cell_type": "markdown", + "id": "f1b79255-5ed7-4446-9fd6-fbf5b99dd774", + "metadata": {}, + "source": [ + "You\n", + "can\n", + "also\n", + "do\n", + "some\n", + "operations\n", + "on\n", + "the\n", + "data\n", + "frame, like\n", + "shuffle\n", + "it(always\n", + "the\n", + "same\n", + "result\n", + "thanks\n", + "to\n", + "the\n", + "fixed\n", + "random\n", + "state):" + ] + }, + { + "cell_type": "code", + "id": "5120747cdaa1ac5e", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.512767Z", + "start_time": "2024-09-03T21:47:56.506923Z" + } + }, + "source": [ + "dataset.shuffle()\n", + "dataset.getDF()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + " SMILES \\\n", + "ID \n", + "RepresentationTutorialDataset_0599 CCCn1c(-c2ccccc2)nc2c1ncnc2NC1CCOC1 \n", + "RepresentationTutorialDataset_0752 CCCn1c(=O)c2c([nH]c(-c3c[nH]nc3)n2)n(CCC)c1=O \n", + "RepresentationTutorialDataset_1954 COc1cccc2c1nc(N)n1nc(CN3CCN(c4ncc(F)cc4)CC3C)nc21 \n", + "RepresentationTutorialDataset_2928 COc1cccc(CCCC(=O)Nc2nc3c(cccc3)c(=O)s2)c1 \n", + "RepresentationTutorialDataset_2512 COc1c2nc(NC(=O)c3ccc(F)cc3)sc2c(N(CCO)C(C)=O)cc1 \n", + "... ... \n", + "RepresentationTutorialDataset_1130 CCNC(=O)C1OC(n2cnc3c2nc(C#CC2(O)CCCC2)nc3NCC)C... \n", + "RepresentationTutorialDataset_1294 CNC(=O)C1SC(n2cnc3c2nc(Cl)nc3NCc2cc(I)ccc2)C(O... \n", + "RepresentationTutorialDataset_0860 CCNC(=O)C1OC(n2cnc3c(N)nc(N4CCN(c5ccc(OCC(=O)O... \n", + "RepresentationTutorialDataset_3507 CNC(=O)C1[Se]C(n2cnc3c2ncnc3NC2CCC2)C(O)C1O \n", + "RepresentationTutorialDataset_3174 Nc1nc2c(cnn2CCCc2ccc3OCOc3c2)c2nc(-c3ccco3)nn12 \n", + "\n", + " pchembl_value_Mean Year \\\n", + "ID \n", + "RepresentationTutorialDataset_0599 5.77 2018.0 \n", + "RepresentationTutorialDataset_0752 6.64 2006.0 \n", + "RepresentationTutorialDataset_1954 7.88 2015.0 \n", + "RepresentationTutorialDataset_2928 6.94 2013.0 \n", + "RepresentationTutorialDataset_2512 7.01 2010.0 \n", + "... ... ... \n", + "RepresentationTutorialDataset_1130 6.03 2006.0 \n", + "RepresentationTutorialDataset_1294 6.65 2003.0 \n", + "RepresentationTutorialDataset_0860 7.28 2015.0 \n", + "RepresentationTutorialDataset_3507 5.97 2017.0 \n", + "RepresentationTutorialDataset_3174 8.48 1998.0 \n", + "\n", + " ID \n", + "ID \n", + "RepresentationTutorialDataset_0599 RepresentationTutorialDataset_0599 \n", + "RepresentationTutorialDataset_0752 RepresentationTutorialDataset_0752 \n", + "RepresentationTutorialDataset_1954 RepresentationTutorialDataset_1954 \n", + "RepresentationTutorialDataset_2928 RepresentationTutorialDataset_2928 \n", + "RepresentationTutorialDataset_2512 RepresentationTutorialDataset_2512 \n", + "... ... \n", + "RepresentationTutorialDataset_1130 RepresentationTutorialDataset_1130 \n", + "RepresentationTutorialDataset_1294 RepresentationTutorialDataset_1294 \n", + "RepresentationTutorialDataset_0860 RepresentationTutorialDataset_0860 \n", + "RepresentationTutorialDataset_3507 RepresentationTutorialDataset_3507 \n", + "RepresentationTutorialDataset_3174 RepresentationTutorialDataset_3174 \n", + "\n", + "[4082 rows x 4 columns]" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SMILESpchembl_value_MeanYearID
ID
RepresentationTutorialDataset_0599CCCn1c(-c2ccccc2)nc2c1ncnc2NC1CCOC15.772018.0RepresentationTutorialDataset_0599
RepresentationTutorialDataset_0752CCCn1c(=O)c2c([nH]c(-c3c[nH]nc3)n2)n(CCC)c1=O6.642006.0RepresentationTutorialDataset_0752
RepresentationTutorialDataset_1954COc1cccc2c1nc(N)n1nc(CN3CCN(c4ncc(F)cc4)CC3C)nc217.882015.0RepresentationTutorialDataset_1954
RepresentationTutorialDataset_2928COc1cccc(CCCC(=O)Nc2nc3c(cccc3)c(=O)s2)c16.942013.0RepresentationTutorialDataset_2928
RepresentationTutorialDataset_2512COc1c2nc(NC(=O)c3ccc(F)cc3)sc2c(N(CCO)C(C)=O)cc17.012010.0RepresentationTutorialDataset_2512
...............
RepresentationTutorialDataset_1130CCNC(=O)C1OC(n2cnc3c2nc(C#CC2(O)CCCC2)nc3NCC)C...6.032006.0RepresentationTutorialDataset_1130
RepresentationTutorialDataset_1294CNC(=O)C1SC(n2cnc3c2nc(Cl)nc3NCc2cc(I)ccc2)C(O...6.652003.0RepresentationTutorialDataset_1294
RepresentationTutorialDataset_0860CCNC(=O)C1OC(n2cnc3c(N)nc(N4CCN(c5ccc(OCC(=O)O...7.282015.0RepresentationTutorialDataset_0860
RepresentationTutorialDataset_3507CNC(=O)C1[Se]C(n2cnc3c2ncnc3NC2CCC2)C(O)C1O5.972017.0RepresentationTutorialDataset_3507
RepresentationTutorialDataset_3174Nc1nc2c(cnn2CCCc2ccc3OCOc3c2)c2nc(-c3ccco3)nn128.481998.0RepresentationTutorialDataset_3174
\n", + "

4082 rows × 4 columns

\n", + "
" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 10 + }, + { + "cell_type": "markdown", + "id": "50b3b5c1-a117-4ae3-8d72-e7c0f30a24e9", + "metadata": {}, + "source": [ + "We can also edit the properties:" + ] + }, + { + "cell_type": "code", + "id": "d3ebb527b8b447aa", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:47:56.520234Z", + "start_time": "2024-09-03T21:47:56.513324Z" + } + }, + "source": [ + "# get\n", + "year = dataset.getProperty(\"Year\")\n", + "display(year)\n", + "# drop\n", + "dataset.removeProperty(\"Year\")\n", + "display(dataset.getProperties())\n", + "# set\n", + "dataset.addProperty(\"Year\", year)\n", + "display(dataset.getProperties())\n", + "# set only for some ids\n", + "dataset.addProperty(\"Year\", [1990, 1990], ids=dataset.getProperty(dataset.idProp)[:2])\n", + "display(dataset.getProperty(\"Year\", ids=dataset.getProperty(dataset.idProp)[:2]))" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "ID\n", + "RepresentationTutorialDataset_0599 2018.0\n", + "RepresentationTutorialDataset_0752 2006.0\n", + "RepresentationTutorialDataset_1954 2015.0\n", + "RepresentationTutorialDataset_2928 2013.0\n", + "RepresentationTutorialDataset_2512 2010.0\n", + " ... \n", + "RepresentationTutorialDataset_1130 2006.0\n", + "RepresentationTutorialDataset_1294 2003.0\n", + "RepresentationTutorialDataset_0860 2015.0\n", + "RepresentationTutorialDataset_3507 2017.0\n", + "RepresentationTutorialDataset_3174 1998.0\n", + "Name: Year, Length: 4082, dtype: float64" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "['SMILES', 'pchembl_value_Mean', 'ID']" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "['SMILES', 'pchembl_value_Mean', 'ID', 'Year']" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "ID\n", + "RepresentationTutorialDataset_0599 1990.0\n", + "RepresentationTutorialDataset_0752 1990.0\n", + "Name: Year, dtype: float64" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "execution_count": 11 + }, + { + "cell_type": "markdown", + "id": "83d155e8-084c-48a9-9109-aa1a9f93308e", + "metadata": {}, + "source": [ + "You can easily achieve all of the above by editing the data frame directly, but `pandas` syntax can sometimes be cumbersome, so it is nice to have more intuitive methods available. However, you can always access the underlying data frame if more complex operations are needed and then wrap it back into a `PandasDataTable` object.\n", + "\n", + "### `TabularStorageBasic` as `ChemStore`\n", + "\n", + "`PandasDataTable` is not very exciting because it does not offer much on top of the `pandas.DataFrame` class. However, it is a good starting point to understand the `PropertyStorage` API. The `ChemStore` interface is a more advanced version of `PropertyStorage` that is specifically designed for storing and managing chemical data sets. `TabularStorageBasic` implements `ChemStore` using data frames managed by `PandasDataTable` under the hood as well, but thanks to `ChemStore` has a few more capabilities:" + ] + }, + { + "cell_type": "code", + "id": "f7e04a3b-5904-452b-aa92-eee38796a045", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:49:05.791863Z", + "start_time": "2024-09-03T21:47:56.520743Z" + } + }, + "source": [ + "from qsprpred.data.chem.identifiers import InchiIdentifier\n", + "from qsprpred.data.chem.standardizers.papyrus import PapyrusStandardizer\n", + "from qsprpred.data.storage.tabular.basic_storage import TabularStorageBasic\n", + "\n", + "df = pd.read_csv(\"../../tutorial_data/A2A_LIGANDS.tsv\", sep=\"\\t\")\n", + "storage = TabularStorageBasic(\n", + " name=\"RepresentationTutorialChemStore\",\n", + " path=\"../../tutorial_output/data\",\n", + " df=df,\n", + " smiles_col=\"SMILES\",\n", + " standardizer=PapyrusStandardizer(), # standardizes the SMILES strings\n", + " identifier=InchiIdentifier() # generates custom identifiers\n", + ")\n", + "storage" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "TabularStorageBasic (4082)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 12 + }, + { + "cell_type": "markdown", + "id": "d6d4e68d-065c-40d9-b5e5-8f361e826671", + "metadata": {}, + "source": [ + "As you can see, the code above took a little while to execute. That is because we also performed custom standardization and unique identification of the molecules. In this case, we already have standardized data, but in other cases it might be useful to standardize and identify molecules to find potential duplicates in your data set. In this sense, QSPRpred is also a molecule registration system that you can use to merge data sets from different sources. If you want to speed things up, you can tell `TabularStorageBasic` to run on multiple CPUs as well:" + ] + }, + { + "cell_type": "code", + "id": "4e21d65f-19e8-4dee-9600-08b0e1a83f5f", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:49:10.714797Z", + "start_time": "2024-09-03T21:49:05.792825Z" + } + }, + "source": [ + "df = pd.read_csv(\"../../tutorial_data/A2A_LIGANDS.tsv\", sep=\"\\t\")\n", + "storage = TabularStorageBasic(\n", + " name=\"RepresentationTutorialChemStore\",\n", + " path=\"../../tutorial_output/data\",\n", + " df=df,\n", + " smiles_col=\"SMILES\",\n", + " standardizer=PapyrusStandardizer(), # standardizes the SMILES strings\n", + " identifier=InchiIdentifier(), # generates custom identifiers\n", + " n_jobs=os.cpu_count() # use all available CPUs\n", + ")\n", + "storage" + ], + "outputs": [ + { + "data": { + "text/plain": [ + "TabularStorageBasic (4082)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 13 + }, + { + "cell_type": "markdown", + "id": "e399632a-87c5-40d0-b445-ee13c9281300", + "metadata": {}, + "source": [ + "If you have multiple cores available, this should have been considerably faster. Easy parallelization is also one feature you get for free with QSPRpred (see [this advanced tutorial to learn more](../../advanced/data/parallelization.ipynb)).\n", + "\n", + "Remember that the `TabularStorageBasic` object is also a `PropertyStorage` object, so you can use all the methods and attributes of the `PropertyStorage` API on it:" + ] + }, + { + "cell_type": "code", + "id": "4f18d963-8638-41b5-8013-3a23469c62be", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:49:10.729681Z", + "start_time": "2024-09-03T21:49:10.716096Z" + } + }, + "source": [ + "subset = storage.searchOnProperty(\"Year\", [2009, 2010], exact=True)\n", + "subset.getDF()" + ], + "outputs": [ + { + "data": { + "text/plain": [ + " SMILES \\\n", + "ID \n", + "AAEYTMMNWWKSKZ-UHFFFAOYSA-N Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12... \n", + "AAGFKZWKWAMJNP-UHFFFAOYSA-N O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", + "AANUKDYJZPKTKN-UHFFFAOYSA-N CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC... \n", + "ABIXUHSEHFCQMV-UHFFFAOYSA-N CCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=O \n", + "ACNFYYUXBQGWQL-UHFFFAOYSA-N O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1 \n", + "... ... \n", + "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N Nc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n1 \n", + "ZVYYCMRDDCYZAU-UHFFFAOYSA-N COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1 \n", + "ZWVWCKOJGDHDIG-UHFFFAOYSA-N N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2... \n", + "ZXCVHJXQJJLILE-UHFFFAOYSA-N COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1 \n", + "ZZBZWSYDXUPJCT-UHFFFAOYSA-N Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1 \n", + "\n", + " pchembl_value_Mean Year \\\n", + "ID \n", + "AAEYTMMNWWKSKZ-UHFFFAOYSA-N 4.82 2010.0 \n", + "AAGFKZWKWAMJNP-UHFFFAOYSA-N 5.65 2009.0 \n", + "AANUKDYJZPKTKN-UHFFFAOYSA-N 5.45 2009.0 \n", + "ABIXUHSEHFCQMV-UHFFFAOYSA-N 6.47 2009.0 \n", + "ACNFYYUXBQGWQL-UHFFFAOYSA-N 6.74 2010.0 \n", + "... ... ... \n", + "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N 8.59 2009.0 \n", + "ZVYYCMRDDCYZAU-UHFFFAOYSA-N 7.24 2009.0 \n", + "ZWVWCKOJGDHDIG-UHFFFAOYSA-N 6.75 2010.0 \n", + "ZXCVHJXQJJLILE-UHFFFAOYSA-N 8.80 2009.0 \n", + "ZZBZWSYDXUPJCT-UHFFFAOYSA-N 4.89 2010.0 \n", + "\n", + " original_smiles \\\n", + "ID \n", + "AAEYTMMNWWKSKZ-UHFFFAOYSA-N Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12... \n", + "AAGFKZWKWAMJNP-UHFFFAOYSA-N O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", + "AANUKDYJZPKTKN-UHFFFAOYSA-N CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC... \n", + "ABIXUHSEHFCQMV-UHFFFAOYSA-N CCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=O \n", + "ACNFYYUXBQGWQL-UHFFFAOYSA-N O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1 \n", + "... ... \n", + "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N Nc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n1 \n", + "ZVYYCMRDDCYZAU-UHFFFAOYSA-N COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1 \n", + "ZWVWCKOJGDHDIG-UHFFFAOYSA-N N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2... \n", + "ZXCVHJXQJJLILE-UHFFFAOYSA-N COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1 \n", + "ZZBZWSYDXUPJCT-UHFFFAOYSA-N Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1 \n", + "\n", + " ID \\\n", + "ID \n", + "AAEYTMMNWWKSKZ-UHFFFAOYSA-N AAEYTMMNWWKSKZ-UHFFFAOYSA-N \n", + "AAGFKZWKWAMJNP-UHFFFAOYSA-N AAGFKZWKWAMJNP-UHFFFAOYSA-N \n", + "AANUKDYJZPKTKN-UHFFFAOYSA-N AANUKDYJZPKTKN-UHFFFAOYSA-N \n", + "ABIXUHSEHFCQMV-UHFFFAOYSA-N ABIXUHSEHFCQMV-UHFFFAOYSA-N \n", + "ACNFYYUXBQGWQL-UHFFFAOYSA-N ACNFYYUXBQGWQL-UHFFFAOYSA-N \n", + "... ... \n", + "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N ZVWNHOGZGKJOCZ-UHFFFAOYSA-N \n", + "ZVYYCMRDDCYZAU-UHFFFAOYSA-N ZVYYCMRDDCYZAU-UHFFFAOYSA-N \n", + "ZWVWCKOJGDHDIG-UHFFFAOYSA-N ZWVWCKOJGDHDIG-UHFFFAOYSA-N \n", + "ZXCVHJXQJJLILE-UHFFFAOYSA-N ZXCVHJXQJJLILE-UHFFFAOYSA-N \n", + "ZZBZWSYDXUPJCT-UHFFFAOYSA-N ZZBZWSYDXUPJCT-UHFFFAOYSA-N \n", + "\n", + " ID_before_change \n", + "ID \n", + "AAEYTMMNWWKSKZ-UHFFFAOYSA-N AAEYTMMNWWKSKZ-UHFFFAOYSA-N \n", + "AAGFKZWKWAMJNP-UHFFFAOYSA-N AAGFKZWKWAMJNP-UHFFFAOYSA-N \n", + "AANUKDYJZPKTKN-UHFFFAOYSA-N AANUKDYJZPKTKN-UHFFFAOYSA-N \n", + "ABIXUHSEHFCQMV-UHFFFAOYSA-N ABIXUHSEHFCQMV-UHFFFAOYSA-N \n", + "ACNFYYUXBQGWQL-UHFFFAOYSA-N ACNFYYUXBQGWQL-UHFFFAOYSA-N \n", + "... ... \n", + "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N ZVWNHOGZGKJOCZ-UHFFFAOYSA-N \n", + "ZVYYCMRDDCYZAU-UHFFFAOYSA-N ZVYYCMRDDCYZAU-UHFFFAOYSA-N \n", + "ZWVWCKOJGDHDIG-UHFFFAOYSA-N ZWVWCKOJGDHDIG-UHFFFAOYSA-N \n", + "ZXCVHJXQJJLILE-UHFFFAOYSA-N ZXCVHJXQJJLILE-UHFFFAOYSA-N \n", + "ZZBZWSYDXUPJCT-UHFFFAOYSA-N ZZBZWSYDXUPJCT-UHFFFAOYSA-N \n", + "\n", + "[804 rows x 6 columns]" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SMILESpchembl_value_MeanYearoriginal_smilesIDID_before_change
ID
AAEYTMMNWWKSKZ-UHFFFAOYSA-NNc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12...4.822010.0Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12...AAEYTMMNWWKSKZ-UHFFFAOYSA-NAAEYTMMNWWKSKZ-UHFFFAOYSA-N
AAGFKZWKWAMJNP-UHFFFAOYSA-NO=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1AAGFKZWKWAMJNP-UHFFFAOYSA-NAAGFKZWKWAMJNP-UHFFFAOYSA-N
AANUKDYJZPKTKN-UHFFFAOYSA-NCNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC...5.452009.0CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC...AANUKDYJZPKTKN-UHFFFAOYSA-NAANUKDYJZPKTKN-UHFFFAOYSA-N
ABIXUHSEHFCQMV-UHFFFAOYSA-NCCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=O6.472009.0CCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=OABIXUHSEHFCQMV-UHFFFAOYSA-NABIXUHSEHFCQMV-UHFFFAOYSA-N
ACNFYYUXBQGWQL-UHFFFAOYSA-NO=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc16.742010.0O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1ACNFYYUXBQGWQL-UHFFFAOYSA-NACNFYYUXBQGWQL-UHFFFAOYSA-N
.....................
ZVWNHOGZGKJOCZ-UHFFFAOYSA-NNc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n18.592009.0Nc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n1ZVWNHOGZGKJOCZ-UHFFFAOYSA-NZVWNHOGZGKJOCZ-UHFFFAOYSA-N
ZVYYCMRDDCYZAU-UHFFFAOYSA-NCOc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n17.242009.0COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1ZVYYCMRDDCYZAU-UHFFFAOYSA-NZVYYCMRDDCYZAU-UHFFFAOYSA-N
ZWVWCKOJGDHDIG-UHFFFAOYSA-NN#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2...6.752010.0N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2...ZWVWCKOJGDHDIG-UHFFFAOYSA-NZWVWCKOJGDHDIG-UHFFFAOYSA-N
ZXCVHJXQJJLILE-UHFFFAOYSA-NCOc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc18.802009.0COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1ZXCVHJXQJJLILE-UHFFFAOYSA-NZXCVHJXQJJLILE-UHFFFAOYSA-N
ZZBZWSYDXUPJCT-UHFFFAOYSA-NNc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n14.892010.0Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1ZZBZWSYDXUPJCT-UHFFFAOYSA-NZZBZWSYDXUPJCT-UHFFFAOYSA-N
\n", + "

804 rows × 6 columns

\n", + "
" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": 14 + }, + { + "cell_type": "markdown", + "id": "b1ba897f-ffd5-4a8e-a0a0-3c2e4f0d32fd", + "metadata": {}, + "source": [ + "In addition to what we already explored, `ChemStore` also adds a few more cheminformatics tools that some might appreciate. You can iterate over the storage and get the molecules as `StoredMol` objects, which have their own capabilities:" + ] + }, + { + "cell_type": "code", + "id": "b91878dc-9c10-456b-b616-1a8d8067863c", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:49:10.760259Z", + "start_time": "2024-09-03T21:49:10.730211Z" + } + }, + "source": [ + "for mol in storage:\n", + " print(mol)\n", + " print(mol.as_rd_mol())\n", + " print(mol.smiles)\n", + " print(mol.props)\n", + " print(mol.representations)\n", + " break" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TabularMol(AACWUFIIMOHGSO-UHFFFAOYSA-N, Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1)\n", + "\n", + "Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1\n", + "{'ID': 'AACWUFIIMOHGSO-UHFFFAOYSA-N', 'SMILES': 'Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1', 'ID_before_change': 'RepresentationTutorialChemStore_library_0000', 'original_smiles': 'Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(C)c1', 'Year': 2008.0, 'pchembl_value_Mean': 8.68}\n", + "None\n" + ] + } + ], + "execution_count": 15 + }, + { + "cell_type": "markdown", + "id": "ff541ccf-8444-4977-8ee6-50c74ec2c117", + "metadata": {}, + "source": [ + "Therefore, we have all the information about the molecule we can get, and we can also easily turn it into an rdkit molecule object. Not that the `representations` property is currently empty for the molecules, which would be populated if we had conformers, protomers, tautomers or other representations of the molecule present in the storage. This feature is not implemented yet, but will be soon (feel free to inquire about the status on the [issue tracker](https://github.com/CDDLeiden/QSPRpred/issues) or via [email](https://github.com/CDDLeiden/QSPRpred/blob/main/pyproject.toml)).\n", + "\n", + "You can also iterate over the molecules in chunks:" + ] + }, + { + "cell_type": "code", + "id": "46b6608c-7a9f-4616-bc45-a197af7c1759", + "metadata": { + "ExecuteTime": { + "end_time": "2024-09-03T21:49:10.763082Z", + "start_time": "2024-09-03T21:49:10.760828Z" + } + }, + "source": [ + "for chunk in storage.iterChunks(size=2):\n", + " print(chunk)\n", + " break" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[, ]\n" + ] + } + ], + "execution_count": 16 + }, + { + "cell_type": "markdown", + "id": "520a16c0-bcdd-4f42-9eaa-f5e33404840d", + "metadata": {}, + "source": [ + "This can be useful when processing large data sets one chunk at a time and with a smart implementation of `ChemStore.iterChunks` the data set does not have to be loaded into memory all at once. The chunks can also be consumed in parallel, which can speed up processing even further (see [this advanced tutorial to learn more](../../advanced/data/parallelization.ipynb))." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/basics/data/data_representation.ipynb b/tutorials/basics/data/data_representation.ipynb index dab61d79..596f3a88 100644 --- a/tutorials/basics/data/data_representation.ipynb +++ b/tutorials/basics/data/data_representation.ipynb @@ -9,36 +9,16 @@ "In this tutorial, you will learn how data sets are represented in QSPRpred and how you can use the framework to store and prepare data sets not only for QSPR modeling, but general cheminformatics tasks as well." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data Representation API (`PropertyStorage` and `ChemStore`)\n", - "\n", - "### Overview\n", - "\n", - "When designing the storage API we tried to identify the most common tasks that need to be performed when working with diverse cheminformatics data sets, mainly in the context of QSPR modelling, but it can also be used to store data from molecular docking or other structure-based simulations. Therefore, QSPRpred defines a general API to register and store properties (independent variables) for arbitrary data entries in its `PropertyStorage` abstract class, which is then further extended by the `ChemStore` interface that supports more specific functionality for encoding molecules alongside their properties. If you take a look at the [API documentation](https://cddleiden.github.io/QSPRpred/docs/api/modules.html) of these classes, you can see the methods and attributes to interact with them. Therefore, anyone can implement any kind of storage system to store compound representations and their properties and as long as they adhere to the above interfaces, their storage system can be used in QSPRpred seamlessly. This potentially enables more advanced users to interface different storage backends (i.e. SQL databases, NoSQL databases, online REST APIs or prohibitively large data sets) with QSPRpred as well. Since this is more advanced functionality, it is not yet covered in this tutorial, which only focuses on currently available implementations that focus on storing data locally by the means of `pandas` data frames. However, we are happy for any inquiries about developing clients for custom APIs or databases. Let us know on the [issue tracker](https://github.com/CDDLeiden/QSPRpred/issues) or via [email](https://github.com/CDDLeiden/QSPRpred/blob/main/pyproject.toml).\n", - "\n", - "### `PandasDataTable` as `PropertyStorage`\n", - "\n", - "**Note: Feel free to skip this part of the tutorial and continue to the \"`TabularStorageBasic` as `ChemStore`\" section if you are more interested in the cheminformatics features of QSPRpred and are not interested in understanding `PropertyStorage` in detail.**\n", - "\n", - "Tabular data is the most common data type in QSPR modelling and `pandas` is the Python package of choice when it comes to processing it. Therefore, we decided to compose the default `PropertyStorage` implementation around it and provide a light wrapper for the `pandas.DataFrame` class called `PandasDataTable`. `PandasDataTable` objects simply manage storage and state of a given `pandas.DataFrame` and giving it all features of the `PropertyStorage` API at the same time. You will typically not interact with these objects directly, but we will now use it for the demonstration of some functions facilitated by the `PropertyStorage` API. We will use the `A2A_LIGANDS.tsv` file from the tutorial data folder as an example data set. This file contains a list of ligands for the adenosine A2A receptor, which is a common target in drug discovery. The data set contains SMILES strings and some other properties relevant for QSPR modelling:" - ] - }, { "cell_type": "code", "metadata": { "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:40:58.192498Z", - "iopub.status.busy": "2023-09-21T15:40:58.192284Z", - "iopub.status.idle": "2023-09-21T15:40:58.411954Z", - "shell.execute_reply": "2023-09-21T15:40:58.411046Z" + "jupyter": { + "outputs_hidden": false }, "ExecuteTime": { - "end_time": "2024-08-28T08:39:23.998036Z", - "start_time": "2024-08-28T08:39:23.718045Z" + "end_time": "2024-09-03T21:46:34.479118Z", + "start_time": "2024-09-03T21:46:34.087283Z" } }, "source": [ @@ -106,1400 +86,91 @@ " \n", " 2\n", " O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1\n", - " 5.65\n", - " 2009.0\n", - " \n", - " \n", - " 3\n", - " CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...\n", - " 5.45\n", - " 2009.0\n", - " \n", - " \n", - " 4\n", - " CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...\n", - " 5.20\n", - " 2019.0\n", - " \n", - " \n", - "\n", - "" - ] - }, - "execution_count": 1, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 1 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Wrapping this data frame in a `PandasDataTable` object is simple:" - }, - { - "cell_type": "code", - "metadata": { - "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:40:58.442281Z", - "iopub.status.busy": "2023-09-21T15:40:58.441291Z", - "iopub.status.idle": "2023-09-21T15:40:59.022817Z", - "shell.execute_reply": "2023-09-21T15:40:59.021985Z" - }, - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.528697Z", - "start_time": "2024-08-28T08:39:24.003060Z" - } - }, - "source": [ - "from qsprpred.data.tables.pandas import PandasDataTable\n", - "import os\n", - "\n", - "random_state = 42 # for reproducibility of all random operations\n", - "os.makedirs(\"../../tutorial_output/data\",\n", - " exist_ok=True) # create the output directory if it does not exist yet\n", - "dataset = PandasDataTable(df=df, store_dir=\"../../tutorial_output/data\",\n", - " name=\"RepresentationTutorialDataset\",\n", - " random_state=random_state)\n", - "dataset.getDF()" - ], - "outputs": [ - { - "data": { - "text/plain": [ - " SMILES \\\n", - "ID \n", - "RepresentationTutorialDataset_0000 Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(... \n", - "RepresentationTutorialDataset_0001 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... \n", - "RepresentationTutorialDataset_0002 O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", - "RepresentationTutorialDataset_0003 CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC... \n", - "RepresentationTutorialDataset_0004 CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(... \n", - "... ... \n", - "RepresentationTutorialDataset_4077 CNc1ncc(C(=O)NCc2ccc(OC)cc2)c2nc(-c3ccco3)nn12 \n", - "RepresentationTutorialDataset_4078 Nc1nc(-c2ccco2)c2ncn(C(=O)NCCc3ccccc3)c2n1 \n", - "RepresentationTutorialDataset_4079 Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n1 \n", - "RepresentationTutorialDataset_4080 CCCOc1ccc(C=Cc2cc3c(c(=O)n(C)c(=O)n3C)n2C)cc1 \n", - "RepresentationTutorialDataset_4081 CCOC(=O)c1cnc(NCC(C)C)n2nc(-c3ccco3)nc12 \n", - "\n", - " pchembl_value_Mean Year \\\n", - "ID \n", - "RepresentationTutorialDataset_0000 8.68 2008.0 \n", - "RepresentationTutorialDataset_0001 4.82 2010.0 \n", - "RepresentationTutorialDataset_0002 5.65 2009.0 \n", - "RepresentationTutorialDataset_0003 5.45 2009.0 \n", - "RepresentationTutorialDataset_0004 5.20 2019.0 \n", - "... ... ... \n", - "RepresentationTutorialDataset_4077 7.09 2018.0 \n", - "RepresentationTutorialDataset_4078 8.22 2008.0 \n", - "RepresentationTutorialDataset_4079 4.89 2010.0 \n", - "RepresentationTutorialDataset_4080 6.51 2013.0 \n", - "RepresentationTutorialDataset_4081 7.35 2014.0 \n", - "\n", - " ID \n", - "ID \n", - "RepresentationTutorialDataset_0000 RepresentationTutorialDataset_0000 \n", - "RepresentationTutorialDataset_0001 RepresentationTutorialDataset_0001 \n", - "RepresentationTutorialDataset_0002 RepresentationTutorialDataset_0002 \n", - "RepresentationTutorialDataset_0003 RepresentationTutorialDataset_0003 \n", - "RepresentationTutorialDataset_0004 RepresentationTutorialDataset_0004 \n", - "... ... \n", - "RepresentationTutorialDataset_4077 RepresentationTutorialDataset_4077 \n", - "RepresentationTutorialDataset_4078 RepresentationTutorialDataset_4078 \n", - "RepresentationTutorialDataset_4079 RepresentationTutorialDataset_4079 \n", - "RepresentationTutorialDataset_4080 RepresentationTutorialDataset_4080 \n", - "RepresentationTutorialDataset_4081 RepresentationTutorialDataset_4081 \n", - "\n", - "[4082 rows x 4 columns]" - ], - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SMILESpchembl_value_MeanYearID
ID
RepresentationTutorialDataset_0000Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...8.682008.0RepresentationTutorialDataset_0000
RepresentationTutorialDataset_0001Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...4.822010.0RepresentationTutorialDataset_0001
RepresentationTutorialDataset_0002O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0RepresentationTutorialDataset_0002
RepresentationTutorialDataset_0003CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...5.452009.0RepresentationTutorialDataset_0003
RepresentationTutorialDataset_0004CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...5.202019.0RepresentationTutorialDataset_0004
...............
RepresentationTutorialDataset_4077CNc1ncc(C(=O)NCc2ccc(OC)cc2)c2nc(-c3ccco3)nn127.092018.0RepresentationTutorialDataset_4077
RepresentationTutorialDataset_4078Nc1nc(-c2ccco2)c2ncn(C(=O)NCCc3ccccc3)c2n18.222008.0RepresentationTutorialDataset_4078
RepresentationTutorialDataset_4079Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n14.892010.0RepresentationTutorialDataset_4079
RepresentationTutorialDataset_4080CCCOc1ccc(C=Cc2cc3c(c(=O)n(C)c(=O)n3C)n2C)cc16.512013.0RepresentationTutorialDataset_4080
RepresentationTutorialDataset_4081CCOC(=O)c1cnc(NCC(C)C)n2nc(-c3ccco3)nc127.352014.0RepresentationTutorialDataset_4081
\n", - "

4082 rows × 4 columns

\n", - "
" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 2 - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": "Since `pandas.DataFrame` is such a popular format, `PropertyStorage` enforces that `getDF` exists in all implementations and should list all data entries and all properties in the `PropertyStorage` object. This is to facilitate easy data exchange between QSPRpred and any custom code that relies on `pandas`. However, we can also do a lot with `PandasDataTable` objects directly:" - }, - { - "cell_type": "code", - "metadata": { - "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:40:59.026127Z", - "iopub.status.busy": "2023-09-21T15:40:59.025578Z", - "iopub.status.idle": "2023-09-21T15:40:59.031057Z", - "shell.execute_reply": "2023-09-21T15:40:59.030006Z" - }, - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.531744Z", - "start_time": "2024-08-28T08:39:24.529282Z" - } - }, - "source": [ - "len(dataset)" - ], - "outputs": [ - { - "data": { - "text/plain": [ - "4082" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 3 - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "the saved properties/features:" - ] - }, - { - "cell_type": "code", - "metadata": { - "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:40:59.034372Z", - "iopub.status.busy": "2023-09-21T15:40:59.033863Z", - "iopub.status.idle": "2023-09-21T15:40:59.041678Z", - "shell.execute_reply": "2023-09-21T15:40:59.040758Z" - }, - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.548198Z", - "start_time": "2024-08-28T08:39:24.532457Z" - } - }, - "source": [ - "dataset.getProperties()" - ], - "outputs": [ - { - "data": { - "text/plain": [ - "['SMILES', 'pchembl_value_Mean', 'Year', 'ID']" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 4 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "You will also notice that `PandasDataTable` objects also automatically create a unique identifier for each data entry. This is the `idProp` property, which is a unique identifier for each data entry. This is useful for tracking data entries and is used internally by QSPRpred to keep track of data entries and selecting relevant subsets. You can access it as follows:" - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.555362Z", - "start_time": "2024-08-28T08:39:24.548722Z" - } - }, - "cell_type": "code", - "source": "dataset.idProp", - "outputs": [ - { - "data": { - "text/plain": [ - "'ID'" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 5 - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.558690Z", - "start_time": "2024-08-28T08:39:24.555880Z" - } - }, - "cell_type": "code", - "source": "dataset.getProperty(dataset.idProp)", - "outputs": [ - { - "data": { - "text/plain": [ - "ID\n", - "RepresentationTutorialDataset_0000 RepresentationTutorialDataset_0000\n", - "RepresentationTutorialDataset_0001 RepresentationTutorialDataset_0001\n", - "RepresentationTutorialDataset_0002 RepresentationTutorialDataset_0002\n", - "RepresentationTutorialDataset_0003 RepresentationTutorialDataset_0003\n", - "RepresentationTutorialDataset_0004 RepresentationTutorialDataset_0004\n", - " ... \n", - "RepresentationTutorialDataset_4077 RepresentationTutorialDataset_4077\n", - "RepresentationTutorialDataset_4078 RepresentationTutorialDataset_4078\n", - "RepresentationTutorialDataset_4079 RepresentationTutorialDataset_4079\n", - "RepresentationTutorialDataset_4080 RepresentationTutorialDataset_4080\n", - "RepresentationTutorialDataset_4081 RepresentationTutorialDataset_4081\n", - "Name: ID, Length: 4082, dtype: object" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 6 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Knowing the identifier, you can select a subset of the data set:" - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.564540Z", - "start_time": "2024-08-28T08:39:24.559198Z" - } - }, - "cell_type": "code", - "source": [ - "subset = dataset.getSubset([\"SMILES\", \"Year\"],\n", - " ids=[\"RepresentationTutorialDataset_0000\",\n", - " \"RepresentationTutorialDataset_0001\"])\n", - "subset.getDF()" - ], - "outputs": [ - { - "data": { - "text/plain": [ - " SMILES \\\n", - "ID \n", - "RepresentationTutorialDataset_0000 Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(... \n", - "RepresentationTutorialDataset_0001 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... \n", - "\n", - " Year ID \n", - "ID \n", - "RepresentationTutorialDataset_0000 2008.0 RepresentationTutorialDataset_0000 \n", - "RepresentationTutorialDataset_0001 2010.0 RepresentationTutorialDataset_0001 " - ], - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SMILESYearID
ID
RepresentationTutorialDataset_0000Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(...2008.0RepresentationTutorialDataset_0000
RepresentationTutorialDataset_0001Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...2010.0RepresentationTutorialDataset_0001
\n", - "
" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 7 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Notice that the subset is actually also a `PandasDataTable` object, so you can perform the same operations on it as on the original data set. \n", - "\n", - "You can also just get values of a single property for certain molecules:" - ] - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.568323Z", - "start_time": "2024-08-28T08:39:24.565086Z" - } - }, - "cell_type": "code", - "source": [ - "dataset.getProperty(\"pchembl_value_Mean\", ids=[\"RepresentationTutorialDataset_0000\",\n", - " \"RepresentationTutorialDataset_0001\"])" - ], - "outputs": [ - { - "data": { - "text/plain": [ - "ID\n", - "RepresentationTutorialDataset_0000 8.68\n", - "RepresentationTutorialDataset_0001 4.82\n", - "Name: pchembl_value_Mean, dtype: float64" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 8 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "This is extended further and in this particular case we can also perform simple searches on properties:" - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.576898Z", - "start_time": "2024-08-28T08:39:24.568784Z" - } - }, - "cell_type": "code", - "source": [ - "subset = dataset.searchOnProperty(\"Year\", [2009, 2010], exact=True)\n", - "subset.getDF()" - ], - "outputs": [ - { - "data": { - "text/plain": [ - " SMILES \\\n", - "ID \n", - "RepresentationTutorialDataset_0001 Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC... \n", - "RepresentationTutorialDataset_0002 O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", - "RepresentationTutorialDataset_0003 CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC... \n", - "RepresentationTutorialDataset_0009 CCCn1c(=O)c2c([nH]c(-c3ccccc3)n2)n(CCCOC)c1=O \n", - "RepresentationTutorialDataset_0018 O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1 \n", - "... ... \n", - "RepresentationTutorialDataset_4049 Nc1nc(-c2ccco2)cc(C(=O)NCc2ccccc2Cl)n1 \n", - "RepresentationTutorialDataset_4050 COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1 \n", - "RepresentationTutorialDataset_4060 N#Cc1cccc(C(=O)Nc2nc3c(ncc(C(=O)N4CCCCC4)c3)n2... \n", - "RepresentationTutorialDataset_4061 COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1 \n", - "RepresentationTutorialDataset_4079 Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n1 \n", - "\n", - " pchembl_value_Mean Year \\\n", - "ID \n", - "RepresentationTutorialDataset_0001 4.82 2010.0 \n", - "RepresentationTutorialDataset_0002 5.65 2009.0 \n", - "RepresentationTutorialDataset_0003 5.45 2009.0 \n", - "RepresentationTutorialDataset_0009 6.47 2009.0 \n", - "RepresentationTutorialDataset_0018 6.74 2010.0 \n", - "... ... ... \n", - "RepresentationTutorialDataset_4049 8.59 2009.0 \n", - "RepresentationTutorialDataset_4050 7.24 2009.0 \n", - "RepresentationTutorialDataset_4060 6.75 2010.0 \n", - "RepresentationTutorialDataset_4061 8.80 2009.0 \n", - "RepresentationTutorialDataset_4079 4.89 2010.0 \n", - "\n", - " ID \n", - "ID \n", - "RepresentationTutorialDataset_0001 RepresentationTutorialDataset_0001 \n", - "RepresentationTutorialDataset_0002 RepresentationTutorialDataset_0002 \n", - "RepresentationTutorialDataset_0003 RepresentationTutorialDataset_0003 \n", - "RepresentationTutorialDataset_0009 RepresentationTutorialDataset_0009 \n", - "RepresentationTutorialDataset_0018 RepresentationTutorialDataset_0018 \n", - "... ... \n", - "RepresentationTutorialDataset_4049 RepresentationTutorialDataset_4049 \n", - "RepresentationTutorialDataset_4050 RepresentationTutorialDataset_4050 \n", - "RepresentationTutorialDataset_4060 RepresentationTutorialDataset_4060 \n", - "RepresentationTutorialDataset_4061 RepresentationTutorialDataset_4061 \n", - "RepresentationTutorialDataset_4079 RepresentationTutorialDataset_4079 \n", - "\n", - "[804 rows x 4 columns]" - ], - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SMILESpchembl_value_MeanYearID
ID
RepresentationTutorialDataset_0001Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2c1cc1CCCC...4.822010.0RepresentationTutorialDataset_0001
RepresentationTutorialDataset_0002O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0RepresentationTutorialDataset_0002
RepresentationTutorialDataset_0003CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...5.452009.0RepresentationTutorialDataset_0003
RepresentationTutorialDataset_0009CCCn1c(=O)c2c([nH]c(-c3ccccc3)n2)n(CCCOC)c1=O6.472009.0RepresentationTutorialDataset_0009
RepresentationTutorialDataset_0018O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc16.742010.0RepresentationTutorialDataset_0018
...............
RepresentationTutorialDataset_4049Nc1nc(-c2ccco2)cc(C(=O)NCc2ccccc2Cl)n18.592009.0RepresentationTutorialDataset_4049
RepresentationTutorialDataset_4050COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n17.242009.0RepresentationTutorialDataset_4050
RepresentationTutorialDataset_4060N#Cc1cccc(C(=O)Nc2nc3c(ncc(C(=O)N4CCCCC4)c3)n2...6.752010.0RepresentationTutorialDataset_4060
RepresentationTutorialDataset_4061COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc18.802009.0RepresentationTutorialDataset_4061
RepresentationTutorialDataset_4079Nc1nc(Nc2ccc(F)cc2)nc(CSc2nnc(N)s2)n14.892010.0RepresentationTutorialDataset_4079
\n", - "

804 rows × 4 columns

\n", - "
" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 9 - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": "You can also do some operations on the data frame, like shuffle it (always the same result thanks to the fixed random state):" - }, - { - "cell_type": "code", - "metadata": { - "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:40:59.044194Z", - "iopub.status.busy": "2023-09-21T15:40:59.043961Z", - "iopub.status.idle": "2023-09-21T15:40:59.050633Z", - "shell.execute_reply": "2023-09-21T15:40:59.049533Z" - }, - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.582981Z", - "start_time": "2024-08-28T08:39:24.577969Z" - } - }, - "source": [ - "dataset.shuffle()\n", - "dataset.getDF()" - ], - "outputs": [ - { - "data": { - "text/plain": [ - " SMILES \\\n", - "ID \n", - "RepresentationTutorialDataset_0599 CCCn1c(-c2ccccc2)nc2c1ncnc2NC1CCOC1 \n", - "RepresentationTutorialDataset_0752 CCCn1c(=O)c2c([nH]c(-c3c[nH]nc3)n2)n(CCC)c1=O \n", - "RepresentationTutorialDataset_1954 COc1cccc2c1nc(N)n1nc(CN3CCN(c4ncc(F)cc4)CC3C)nc21 \n", - "RepresentationTutorialDataset_2928 COc1cccc(CCCC(=O)Nc2nc3c(cccc3)c(=O)s2)c1 \n", - "RepresentationTutorialDataset_2512 COc1c2nc(NC(=O)c3ccc(F)cc3)sc2c(N(CCO)C(C)=O)cc1 \n", - "... ... \n", - "RepresentationTutorialDataset_1130 CCNC(=O)C1OC(n2cnc3c2nc(C#CC2(O)CCCC2)nc3NCC)C... \n", - "RepresentationTutorialDataset_1294 CNC(=O)C1SC(n2cnc3c2nc(Cl)nc3NCc2cc(I)ccc2)C(O... \n", - "RepresentationTutorialDataset_0860 CCNC(=O)C1OC(n2cnc3c(N)nc(N4CCN(c5ccc(OCC(=O)O... \n", - "RepresentationTutorialDataset_3507 CNC(=O)C1[Se]C(n2cnc3c2ncnc3NC2CCC2)C(O)C1O \n", - "RepresentationTutorialDataset_3174 Nc1nc2c(cnn2CCCc2ccc3OCOc3c2)c2nc(-c3ccco3)nn12 \n", - "\n", - " pchembl_value_Mean Year \\\n", - "ID \n", - "RepresentationTutorialDataset_0599 5.77 2018.0 \n", - "RepresentationTutorialDataset_0752 6.64 2006.0 \n", - "RepresentationTutorialDataset_1954 7.88 2015.0 \n", - "RepresentationTutorialDataset_2928 6.94 2013.0 \n", - "RepresentationTutorialDataset_2512 7.01 2010.0 \n", - "... ... ... \n", - "RepresentationTutorialDataset_1130 6.03 2006.0 \n", - "RepresentationTutorialDataset_1294 6.65 2003.0 \n", - "RepresentationTutorialDataset_0860 7.28 2015.0 \n", - "RepresentationTutorialDataset_3507 5.97 2017.0 \n", - "RepresentationTutorialDataset_3174 8.48 1998.0 \n", - "\n", - " ID \n", - "ID \n", - "RepresentationTutorialDataset_0599 RepresentationTutorialDataset_0599 \n", - "RepresentationTutorialDataset_0752 RepresentationTutorialDataset_0752 \n", - "RepresentationTutorialDataset_1954 RepresentationTutorialDataset_1954 \n", - "RepresentationTutorialDataset_2928 RepresentationTutorialDataset_2928 \n", - "RepresentationTutorialDataset_2512 RepresentationTutorialDataset_2512 \n", - "... ... \n", - "RepresentationTutorialDataset_1130 RepresentationTutorialDataset_1130 \n", - "RepresentationTutorialDataset_1294 RepresentationTutorialDataset_1294 \n", - "RepresentationTutorialDataset_0860 RepresentationTutorialDataset_0860 \n", - "RepresentationTutorialDataset_3507 RepresentationTutorialDataset_3507 \n", - "RepresentationTutorialDataset_3174 RepresentationTutorialDataset_3174 \n", - "\n", - "[4082 rows x 4 columns]" - ], - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SMILESpchembl_value_MeanYearID
ID
RepresentationTutorialDataset_0599CCCn1c(-c2ccccc2)nc2c1ncnc2NC1CCOC15.772018.0RepresentationTutorialDataset_0599
RepresentationTutorialDataset_0752CCCn1c(=O)c2c([nH]c(-c3c[nH]nc3)n2)n(CCC)c1=O6.642006.0RepresentationTutorialDataset_0752
RepresentationTutorialDataset_1954COc1cccc2c1nc(N)n1nc(CN3CCN(c4ncc(F)cc4)CC3C)nc217.882015.0RepresentationTutorialDataset_1954
RepresentationTutorialDataset_2928COc1cccc(CCCC(=O)Nc2nc3c(cccc3)c(=O)s2)c16.942013.0RepresentationTutorialDataset_2928
RepresentationTutorialDataset_2512COc1c2nc(NC(=O)c3ccc(F)cc3)sc2c(N(CCO)C(C)=O)cc17.012010.0RepresentationTutorialDataset_2512
...............
RepresentationTutorialDataset_1130CCNC(=O)C1OC(n2cnc3c2nc(C#CC2(O)CCCC2)nc3NCC)C...6.032006.0RepresentationTutorialDataset_1130
RepresentationTutorialDataset_1294CNC(=O)C1SC(n2cnc3c2nc(Cl)nc3NCc2cc(I)ccc2)C(O...6.652003.0RepresentationTutorialDataset_1294
RepresentationTutorialDataset_0860CCNC(=O)C1OC(n2cnc3c(N)nc(N4CCN(c5ccc(OCC(=O)O...7.282015.0RepresentationTutorialDataset_0860
RepresentationTutorialDataset_3507CNC(=O)C1[Se]C(n2cnc3c2ncnc3NC2CCC2)C(O)C1O5.972017.0RepresentationTutorialDataset_3507
RepresentationTutorialDataset_3174Nc1nc2c(cnn2CCCc2ccc3OCOc3c2)c2nc(-c3ccco3)nn128.481998.0RepresentationTutorialDataset_3174
\n", - "

4082 rows × 4 columns

\n", - "
" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 10 - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": "We can also edit the properties:" - }, - { - "cell_type": "code", - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:39:24.590366Z", - "start_time": "2024-08-28T08:39:24.583423Z" - } - }, - "source": [ - "# get\n", - "year = dataset.getProperty(\"Year\")\n", - "display(year)\n", - "# drop\n", - "dataset.removeProperty(\"Year\")\n", - "display(dataset.getProperties())\n", - "# set\n", - "dataset.addProperty(\"Year\", year)\n", - "display(dataset.getProperties())\n", - "# set only for some ids\n", - "dataset.addProperty(\"Year\", [1990, 1990], ids=dataset.getProperty(dataset.idProp)[:2])\n", - "display(dataset.getProperty(\"Year\", ids=dataset.getProperty(dataset.idProp)[:2]))" - ], - "outputs": [ - { - "data": { - "text/plain": [ - "ID\n", - "RepresentationTutorialDataset_0599 2018.0\n", - "RepresentationTutorialDataset_0752 2006.0\n", - "RepresentationTutorialDataset_1954 2015.0\n", - "RepresentationTutorialDataset_2928 2013.0\n", - "RepresentationTutorialDataset_2512 2010.0\n", - " ... \n", - "RepresentationTutorialDataset_1130 2006.0\n", - "RepresentationTutorialDataset_1294 2003.0\n", - "RepresentationTutorialDataset_0860 2015.0\n", - "RepresentationTutorialDataset_3507 2017.0\n", - "RepresentationTutorialDataset_3174 1998.0\n", - "Name: Year, Length: 4082, dtype: float64" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "['SMILES', 'pchembl_value_Mean', 'ID']" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "['SMILES', 'pchembl_value_Mean', 'ID', 'Year']" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "ID\n", - "RepresentationTutorialDataset_0599 1990.0\n", - "RepresentationTutorialDataset_0752 1990.0\n", - "Name: Year, dtype: float64" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "execution_count": 11 - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can easily achieve all of those by editing the data frame directly, but `pandas` syntax can sometimes be cumbersome, so it is nice to have more intuitive methods available. However, you can always access the underlying data frame if more complex operations are needed and then wrap it back into a `PandasDataTable` object.\n", - "\n", - "### `TabularStorageBasic` as `ChemStore`\n", - "\n", - "`PandasDataTable` is not very exciting because it does not offer much on top of the `pandas.DataFrame` class. However, it is a good starting point to understand the `PropertyStorage` API. The `ChemStore` interface is a more advanced version of `PropertyStorage` that is specifically designed for storing and managing chemical data sets. `TabularStorageBasic` implements `ChemStore` using data frames managed by `PandasDataTable` under the hood as well, but thanks to `ChemStore` has a few more capabilities:" - ] - }, - { - "cell_type": "code", - "metadata": { - "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:40:59.195569Z", - "iopub.status.busy": "2023-09-21T15:40:59.195084Z", - "iopub.status.idle": "2023-09-21T15:40:59.230080Z", - "shell.execute_reply": "2023-09-21T15:40:59.229190Z" - }, - "ExecuteTime": { - "end_time": "2024-08-28T08:40:34.434507Z", - "start_time": "2024-08-28T08:39:24.590924Z" - } - }, - "source": [ - "from qsprpred.data.chem.identifiers import InchiIdentifier\n", - "from qsprpred.data.chem.standardizers.papyrus import PapyrusStandardizer\n", - "from qsprpred.data.storage.tabular.basic_storage import TabularStorageBasic\n", - "\n", - "df = pd.read_csv(\"../../tutorial_data/A2A_LIGANDS.tsv\", sep=\"\\t\")\n", - "storage = TabularStorageBasic(\n", - " name=\"RepresentationTutorialChemStore\",\n", - " path=\"../../tutorial_output/data\",\n", - " df=df,\n", - " smiles_col=\"SMILES\",\n", - " standardizer=PapyrusStandardizer(), # standardizes the SMILES strings\n", - " identifier=InchiIdentifier() # generates custom identifiers\n", - ")\n", - "storage" - ], - "outputs": [ - { - "data": { - "text/plain": [ - "TabularStorageBasic (4082)" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 12 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "As you can see, the code above took a little while to execute. That is because we also performed custom standardization and unique identification of the molecules. In this case, we already have standardized data, but in other cases it might be useful to standardize and identify molecules to find potential duplicates in your data set. In this sense, QSPRpred is also a molecule registration system that you can use to merge data sets from different sources. If you want to speed things up, you can tell `TabularStorageBasic` to run on multiple CPUs as well:" - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.378129Z", - "start_time": "2024-08-28T08:40:34.435349Z" - } - }, - "cell_type": "code", - "source": [ - "df = pd.read_csv(\"../../tutorial_data/A2A_LIGANDS.tsv\", sep=\"\\t\")\n", - "storage = TabularStorageBasic(\n", - " name=\"RepresentationTutorialChemStore\",\n", - " path=\"../../tutorial_output/data\",\n", - " df=df,\n", - " smiles_col=\"SMILES\",\n", - " standardizer=PapyrusStandardizer(), # standardizes the SMILES strings\n", - " identifier=InchiIdentifier(), # generates custom identifiers\n", - " n_jobs=os.cpu_count() # use all available CPUs\n", - ")\n", - "storage" - ], - "outputs": [ - { - "data": { - "text/plain": [ - "TabularStorageBasic (4082)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 13 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "If you have multiple cores available, this should have been considerably faster. Easy parallelization is also one feature you get for free with QSPRpred (see [this advanced tutorial to learn more](../../advanced/data/parallelization.ipynb)).\n", - "\n", - "Remember that the `TabularStorageBasic` object is also a `PropertyStorage` object, so you can use all the methods and attributes of the `PropertyStorage` API on it:" - ] - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.392452Z", - "start_time": "2024-08-28T08:40:39.378986Z" - } - }, - "cell_type": "code", - "source": [ - "subset = storage.searchOnProperty(\"Year\", [2009, 2010], exact=True)\n", - "subset.getDF()" - ], - "outputs": [ - { - "data": { - "text/plain": [ - " SMILES \\\n", - "ID \n", - "AAEYTMMNWWKSKZ-UHFFFAOYSA-N Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12... \n", - "AAGFKZWKWAMJNP-UHFFFAOYSA-N O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", - "AANUKDYJZPKTKN-UHFFFAOYSA-N CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC... \n", - "ABIXUHSEHFCQMV-UHFFFAOYSA-N CCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=O \n", - "ACNFYYUXBQGWQL-UHFFFAOYSA-N O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1 \n", - "... ... \n", - "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N Nc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n1 \n", - "ZVYYCMRDDCYZAU-UHFFFAOYSA-N COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1 \n", - "ZWVWCKOJGDHDIG-UHFFFAOYSA-N N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2... \n", - "ZXCVHJXQJJLILE-UHFFFAOYSA-N COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1 \n", - "ZZBZWSYDXUPJCT-UHFFFAOYSA-N Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1 \n", - "\n", - " pchembl_value_Mean Year \\\n", - "ID \n", - "AAEYTMMNWWKSKZ-UHFFFAOYSA-N 4.82 2010.0 \n", - "AAGFKZWKWAMJNP-UHFFFAOYSA-N 5.65 2009.0 \n", - "AANUKDYJZPKTKN-UHFFFAOYSA-N 5.45 2009.0 \n", - "ABIXUHSEHFCQMV-UHFFFAOYSA-N 6.47 2009.0 \n", - "ACNFYYUXBQGWQL-UHFFFAOYSA-N 6.74 2010.0 \n", - "... ... ... \n", - "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N 8.59 2009.0 \n", - "ZVYYCMRDDCYZAU-UHFFFAOYSA-N 7.24 2009.0 \n", - "ZWVWCKOJGDHDIG-UHFFFAOYSA-N 6.75 2010.0 \n", - "ZXCVHJXQJJLILE-UHFFFAOYSA-N 8.80 2009.0 \n", - "ZZBZWSYDXUPJCT-UHFFFAOYSA-N 4.89 2010.0 \n", - "\n", - " original_smiles \\\n", - "ID \n", - "AAEYTMMNWWKSKZ-UHFFFAOYSA-N Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12... \n", - "AAGFKZWKWAMJNP-UHFFFAOYSA-N O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1 \n", - "AANUKDYJZPKTKN-UHFFFAOYSA-N CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC... \n", - "ABIXUHSEHFCQMV-UHFFFAOYSA-N CCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=O \n", - "ACNFYYUXBQGWQL-UHFFFAOYSA-N O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1 \n", - "... ... \n", - "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N Nc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n1 \n", - "ZVYYCMRDDCYZAU-UHFFFAOYSA-N COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1 \n", - "ZWVWCKOJGDHDIG-UHFFFAOYSA-N N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2... \n", - "ZXCVHJXQJJLILE-UHFFFAOYSA-N COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1 \n", - "ZZBZWSYDXUPJCT-UHFFFAOYSA-N Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1 \n", - "\n", - " ID \\\n", - "ID \n", - "AAEYTMMNWWKSKZ-UHFFFAOYSA-N AAEYTMMNWWKSKZ-UHFFFAOYSA-N \n", - "AAGFKZWKWAMJNP-UHFFFAOYSA-N AAGFKZWKWAMJNP-UHFFFAOYSA-N \n", - "AANUKDYJZPKTKN-UHFFFAOYSA-N AANUKDYJZPKTKN-UHFFFAOYSA-N \n", - "ABIXUHSEHFCQMV-UHFFFAOYSA-N ABIXUHSEHFCQMV-UHFFFAOYSA-N \n", - "ACNFYYUXBQGWQL-UHFFFAOYSA-N ACNFYYUXBQGWQL-UHFFFAOYSA-N \n", - "... ... \n", - "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N ZVWNHOGZGKJOCZ-UHFFFAOYSA-N \n", - "ZVYYCMRDDCYZAU-UHFFFAOYSA-N ZVYYCMRDDCYZAU-UHFFFAOYSA-N \n", - "ZWVWCKOJGDHDIG-UHFFFAOYSA-N ZWVWCKOJGDHDIG-UHFFFAOYSA-N \n", - "ZXCVHJXQJJLILE-UHFFFAOYSA-N ZXCVHJXQJJLILE-UHFFFAOYSA-N \n", - "ZZBZWSYDXUPJCT-UHFFFAOYSA-N ZZBZWSYDXUPJCT-UHFFFAOYSA-N \n", - "\n", - " ID_before_change \n", - "ID \n", - "AAEYTMMNWWKSKZ-UHFFFAOYSA-N AAEYTMMNWWKSKZ-UHFFFAOYSA-N \n", - "AAGFKZWKWAMJNP-UHFFFAOYSA-N AAGFKZWKWAMJNP-UHFFFAOYSA-N \n", - "AANUKDYJZPKTKN-UHFFFAOYSA-N AANUKDYJZPKTKN-UHFFFAOYSA-N \n", - "ABIXUHSEHFCQMV-UHFFFAOYSA-N ABIXUHSEHFCQMV-UHFFFAOYSA-N \n", - "ACNFYYUXBQGWQL-UHFFFAOYSA-N ACNFYYUXBQGWQL-UHFFFAOYSA-N \n", - "... ... \n", - "ZVWNHOGZGKJOCZ-UHFFFAOYSA-N ZVWNHOGZGKJOCZ-UHFFFAOYSA-N \n", - "ZVYYCMRDDCYZAU-UHFFFAOYSA-N ZVYYCMRDDCYZAU-UHFFFAOYSA-N \n", - "ZWVWCKOJGDHDIG-UHFFFAOYSA-N ZWVWCKOJGDHDIG-UHFFFAOYSA-N \n", - "ZXCVHJXQJJLILE-UHFFFAOYSA-N ZXCVHJXQJJLILE-UHFFFAOYSA-N \n", - "ZZBZWSYDXUPJCT-UHFFFAOYSA-N ZZBZWSYDXUPJCT-UHFFFAOYSA-N \n", - "\n", - "[804 rows x 6 columns]" - ], - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SMILESpchembl_value_MeanYearoriginal_smilesIDID_before_change
ID
AAEYTMMNWWKSKZ-UHFFFAOYSA-NNc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12...4.822010.0Nc1c(C(=O)Nc2ccc([N+](=O)[O-])cc2)sc2nc3c(cc12...AAEYTMMNWWKSKZ-UHFFFAOYSA-NAAEYTMMNWWKSKZ-UHFFFAOYSA-N
AAGFKZWKWAMJNP-UHFFFAOYSA-NO=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc15.652009.0O=C(Nc1nc2ncccc2n2c(=O)n(-c3ccccc3)nc12)c1ccccc1AAGFKZWKWAMJNP-UHFFFAOYSA-NAAGFKZWKWAMJNP-UHFFFAOYSA-N
AANUKDYJZPKTKN-UHFFFAOYSA-NCNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC...5.452009.0CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCC...AANUKDYJZPKTKN-UHFFFAOYSA-NAANUKDYJZPKTKN-UHFFFAOYSA-N
ABIXUHSEHFCQMV-UHFFFAOYSA-NCCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=O6.472009.0CCCn1c(=O)c2[nH]c(-c3ccccc3)nc2n(CCCOC)c1=OABIXUHSEHFCQMV-UHFFFAOYSA-NABIXUHSEHFCQMV-UHFFFAOYSA-N
ACNFYYUXBQGWQL-UHFFFAOYSA-NO=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc16.742010.0O=C(Nc1nc(-c2ccccc2)nc2nn(Cc3ccccc3)cc12)c1ccccc1ACNFYYUXBQGWQL-UHFFFAOYSA-NACNFYYUXBQGWQL-UHFFFAOYSA-N
.....................
ZVWNHOGZGKJOCZ-UHFFFAOYSA-NNc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n18.592009.0Nc1nc(C(=O)NCc2ccccc2Cl)cc(-c2ccco2)n1ZVWNHOGZGKJOCZ-UHFFFAOYSA-NZVWNHOGZGKJOCZ-UHFFFAOYSA-N
ZVYYCMRDDCYZAU-UHFFFAOYSA-NCOc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n17.242009.0COc1ccccc1-c1cc(C(=O)NCc2ccccn2)nc(N)n1ZVYYCMRDDCYZAU-UHFFFAOYSA-NZVYYCMRDDCYZAU-UHFFFAOYSA-N
ZWVWCKOJGDHDIG-UHFFFAOYSA-NN#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2...6.752010.0N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2C2...ZWVWCKOJGDHDIG-UHFFFAOYSA-NZWVWCKOJGDHDIG-UHFFFAOYSA-N
ZXCVHJXQJJLILE-UHFFFAOYSA-NCOc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc18.802009.0COc1ccc(CCSc2cc3nc(-c4ccco4)nn3c(N)n2)cc1ZXCVHJXQJJLILE-UHFFFAOYSA-NZXCVHJXQJJLILE-UHFFFAOYSA-N
ZZBZWSYDXUPJCT-UHFFFAOYSA-NNc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n14.892010.0Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1ZZBZWSYDXUPJCT-UHFFFAOYSA-NZZBZWSYDXUPJCT-UHFFFAOYSA-N
\n", - "

804 rows × 6 columns

\n", - "
" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "execution_count": 14 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "In addition to what we already explored, `ChemStore` also adds a few more cheminformatics tools that some might appreciate. You can iterate over the storage and get the molecules as `StoredMol` objects, which have their own capabilities:" - }, - { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.423939Z", - "start_time": "2024-08-28T08:40:39.392966Z" - } - }, - "cell_type": "code", - "source": [ - "for mol in storage:\n", - " print(mol)\n", - " print(mol.as_rd_mol())\n", - " print(mol.smiles)\n", - " print(mol.props)\n", - " print(mol.representations)\n", - " break" - ], - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "TabularMol(AACWUFIIMOHGSO-UHFFFAOYSA-N, Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1)\n", - "\n", - "Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1\n", - "{'Year': 2008.0, 'pchembl_value_Mean': 8.68, 'ID_before_change': 'RepresentationTutorialChemStore_library_0000', 'ID': 'AACWUFIIMOHGSO-UHFFFAOYSA-N', 'original_smiles': 'Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(C)c1', 'SMILES': 'Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1'}\n", - "None\n" - ] + " 5.65\n", + " 2009.0\n", + " \n", + " \n", + " 3\n", + " CNC(=O)C12CC1C(n1cnc3c1nc(C#CCCCCC(=O)OC)nc3NC...\n", + " 5.45\n", + " 2009.0\n", + " \n", + " \n", + " 4\n", + " CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(...\n", + " 5.20\n", + " 2019.0\n", + " \n", + " \n", + "\n", + "" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" } ], - "execution_count": 15 + "execution_count": 1 }, { - "metadata": {}, "cell_type": "markdown", + "metadata": {}, "source": [ - "Therefore, we have all the information about the molecule we can get, and we can also easily turn it into an rdkit molecule object. Not that the `representations` property is currently empty for the molecules, which would be populated if we had conformers, protomers, tautomers or other representations of the molecule present in the storage. This feature is not implemented yet, but will be soon (feel free to inquire about the status on the [issue tracker](https://github.com/CDDLeiden/QSPRpred/issues) or via [email](https://github.com/CDDLeiden/QSPRpred/blob/main/pyproject.toml)).\n", + "### `MoleculeTable` and `QSPRDataset`\n", "\n", - "You can also iterate over the molecules in chunks:" + "Let's take a look at the data structures you know from [the quick start](../../quick_start.ipynb) and how they are implemented. The `MoleculeTable` and `QSPRDataset` classes are specific for QSPR modelling tasks and implement a selection of interfaces for this purpose. Check out entries for `MoleculeDataSet` and `QSPRDataSet` abstract classes in the [API documentation](https://cddleiden.github.io/QSPRpred/docs/api/modules.html) to see what they offer. The main thing to remember for this tutorial, however, is that `MoleculeTable` adds the ability to add and store molecular descriptors and `QSPRDataset` is its subclass, which adds the ability to store information about target properties and modelling tasks. \n", + "\n", + "These two classes are actually initialized from `ChemStore` instances:" ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.427037Z", - "start_time": "2024-08-28T08:40:39.424486Z" + "end_time": "2024-09-03T21:46:41.391694Z", + "start_time": "2024-09-03T21:46:34.479991Z" } }, - "cell_type": "code", "source": [ - "for chunk in storage.iterChunks(size=2):\n", - " print(chunk)\n", - " break" + "from qsprpred.data.chem.identifiers import InchiIdentifier\n", + "from qsprpred.data.chem.standardizers.papyrus import PapyrusStandardizer\n", + "from qsprpred.data.storage.tabular.basic_storage import TabularStorageBasic\n", + "import os\n", + "\n", + "storage = TabularStorageBasic(\n", + " name=\"RepresentationTutorialChemStore\",\n", + " path=\"../../tutorial_output/data\",\n", + " df=df,\n", + " smiles_col=\"SMILES\",\n", + " standardizer=PapyrusStandardizer(), # standardizes the SMILES strings\n", + " identifier=InchiIdentifier(), # generates custom identifiers\n", + " n_jobs=os.cpu_count() # use all available CPUs\n", + ")\n", + "storage" ], "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "[, ]\n" - ] + "data": { + "text/plain": [ + "TabularStorageBasic (4082)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" } ], - "execution_count": 16 - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "This can be useful when processing large data sets one chunk at a time and with a smart implementation of `ChemStore.iterChunks` the data set does not have to be loaded into memory all at once. The chunks can also be consumed in parallel, which can speed up processing even further (see [this advanced tutorial to learn more](../../advanced/data/parallelization.ipynb))." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### `MoleculeTable` and `QSPRDataset`\n", - "\n", - "Now that we know a bit about how QSPRpred stores molecules, we can take a look at the data structures you know from [the quick start](../../quick_start.ipynb) and how they are implemented. The `MoleculeTable` and `QSPRDataset` classes are specific for QSPR modelling tasks and implement a selection of interfaces for this purpose. Check out entries for `MoleculeDataSet` and `QSPRDataSet` abstract classes in the [API documentation](https://cddleiden.github.io/QSPRpred/docs/api/modules.html) to see what they offer. The main thing to remember for this tutorial, however, is that `MoleculeTable` adds the ability to add and store molecular descriptors and `QSPRDataset` is its subclass, which adds the ability to store information about target properties and modelling tasks. We can initialize them from `ChemStore` instances quite easily:" - ] + "execution_count": 2 }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.433291Z", - "start_time": "2024-08-28T08:40:39.427775Z" + "end_time": "2024-09-03T21:46:41.399262Z", + "start_time": "2024-09-03T21:46:41.392850Z" } }, - "cell_type": "code", "source": [ "from qsprpred.data import MoleculeTable\n", "\n", @@ -1729,26 +400,28 @@ "" ] }, - "execution_count": 17, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 17 + "execution_count": 3 }, { - "metadata": {}, "cell_type": "markdown", - "source": "Again, since this is also a `PropertyStorage` object, you can use all the methods and attributes of the `PropertyStorage` API on it and it also exposes a lot of the underlying storage methods and functionality as well: " + "metadata": {}, + "source": [ + "`ChemStore` is basically a wrapper around a databse or folder structure containing molecules and it supports some operations on the molecules themselves and their properties. You can see more in [this advanced tutorial](../advanced/data_representation.ipynb). Thanks to that we can perform various operations on the molecule table we just created: " + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.436105Z", - "start_time": "2024-08-28T08:40:39.433768Z" + "end_time": "2024-09-03T21:46:41.409649Z", + "start_time": "2024-09-03T21:46:41.400175Z" } }, - "cell_type": "code", "source": [ "for mol in mt:\n", " print(mol)\n", @@ -1764,29 +437,33 @@ "output_type": "stream", "text": [ "TabularMol(AACWUFIIMOHGSO-UHFFFAOYSA-N, Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1)\n", - "\n", + "\n", "Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1\n", - "{'Year': 2008.0, 'pchembl_value_Mean': 8.68, 'ID_before_change': 'RepresentationTutorialChemStore_library_0000', 'ID': 'AACWUFIIMOHGSO-UHFFFAOYSA-N', 'original_smiles': 'Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(C)c1', 'SMILES': 'Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1'}\n", + "{'pchembl_value_Mean': 8.68, 'ID_before_change': 'RepresentationTutorialChemStore_library_0000', 'ID': 'AACWUFIIMOHGSO-UHFFFAOYSA-N', 'SMILES': 'Cc1cc(C)n(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)n1', 'Year': 2008.0, 'original_smiles': 'Cc1nn(-c2cc(NC(=O)CCN(C)C)nc(-c3ccc(C)o3)n2)c(C)c1'}\n", "None\n" ] } ], - "execution_count": 18 + "execution_count": 4 }, { - "metadata": {}, "cell_type": "markdown", - "source": "Note that `ChemStore` objects are also subscriptable, which is also true for `MoleculeTable` objects:" + "metadata": {}, + "source": [ + "Note that `ChemStore` objects are also subscriptable, which is also true for `MoleculeTable` objects:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.442211Z", - "start_time": "2024-08-28T08:40:39.436573Z" + "end_time": "2024-09-03T21:46:41.420382Z", + "start_time": "2024-09-03T21:46:41.410173Z" } }, - "cell_type": "code", - "source": "mt['AACWUFIIMOHGSO-UHFFFAOYSA-N'].props", + "source": [ + "mt['AACWUFIIMOHGSO-UHFFFAOYSA-N'].props" + ], "outputs": [ { "data": { @@ -1799,26 +476,28 @@ " 'ID_before_change': 'RepresentationTutorialChemStore_library_0000'}" ] }, - "execution_count": 19, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 19 + "execution_count": 5 }, { - "metadata": {}, "cell_type": "markdown", - "source": "`QSPRDataset` is a subclass of `MoleculeTable`, which requires target properties to be defined in addition to the underlying `ChemStore` object:" + "metadata": {}, + "source": [ + "`QSPRDataset` is a subclass of `MoleculeTable`, which requires target properties to be defined in addition to the underlying `ChemStore` object:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.456617Z", - "start_time": "2024-08-28T08:40:39.442801Z" + "end_time": "2024-09-03T21:46:41.430901Z", + "start_time": "2024-09-03T21:46:41.421007Z" } }, - "cell_type": "code", "source": [ "from qsprpred import TargetTasks, TargetProperty\n", "\n", @@ -1839,26 +518,28 @@ "[TargetProperty(name=pchembl_value_Mean, task=REGRESSION)]" ] }, - "execution_count": 20, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 20 + "execution_count": 6 }, { - "metadata": {}, "cell_type": "markdown", - "source": "But you can also create it by converting from a `MoleculeTable` object:" + "metadata": {}, + "source": [ + "But you can also create it by converting from a `MoleculeTable` object:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.473618Z", - "start_time": "2024-08-28T08:40:39.457272Z" + "end_time": "2024-09-03T21:46:41.444157Z", + "start_time": "2024-09-03T21:46:41.431412Z" } }, - "cell_type": "code", "source": [ "dataset = QSPRDataset.fromMolTable(mt, target_props=[\n", " TargetProperty(\"pchembl_value_Mean\", TargetTasks.REGRESSION)\n", @@ -1872,26 +553,28 @@ "[TargetProperty(name=pchembl_value_Mean, task=REGRESSION)]" ] }, - "execution_count": 21, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 21 + "execution_count": 7 }, { - "metadata": {}, "cell_type": "markdown", - "source": "But you can also go directly from a data frame, which will create the underlying `ChemStore` object for you:" + "metadata": {}, + "source": [ + "And you can also go directly from a data frame, which will create the underlying `ChemStore` object for you:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.491558Z", - "start_time": "2024-08-28T08:40:39.474186Z" + "end_time": "2024-09-03T21:46:41.466848Z", + "start_time": "2024-09-03T21:46:41.444752Z" } }, - "cell_type": "code", "source": [ "dataset = QSPRDataset.fromDF(\n", " name=\"RepresentationTutorialDataset\",\n", @@ -1909,35 +592,37 @@ "[TargetProperty(name=pchembl_value_Mean, task=REGRESSION)]" ] }, - "execution_count": 22, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 22 + "execution_count": 8 }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.494506Z", - "start_time": "2024-08-28T08:40:39.492159Z" + "end_time": "2024-09-03T21:46:41.470035Z", + "start_time": "2024-09-03T21:46:41.467424Z" } }, - "cell_type": "code", - "source": "dataset.storage", + "source": [ + "dataset.storage" + ], "outputs": [ { "data": { "text/plain": [ - "TabularStorageBasic (3286)" + "TabularStorageBasic (4082)" ] }, - "execution_count": 23, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 23 + "execution_count": 9 }, { "cell_type": "markdown", @@ -1952,35 +637,38 @@ "cell_type": "code", "metadata": { "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:41:13.669645Z", - "iopub.status.busy": "2023-09-21T15:41:13.669383Z", - "iopub.status.idle": "2023-09-21T15:41:13.682232Z", - "shell.execute_reply": "2023-09-21T15:41:13.681377Z" + "jupyter": { + "outputs_hidden": false }, "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.515357Z", - "start_time": "2024-08-28T08:40:39.495334Z" + "end_time": "2024-09-03T21:46:41.493136Z", + "start_time": "2024-09-03T21:46:41.471883Z" } }, - "source": "dataset.save()", + "source": [ + "dataset.save()" + ], "outputs": [], - "execution_count": 24 + "execution_count": 10 }, { - "metadata": {}, "cell_type": "markdown", - "source": "This will save the data set into a folder we specified upon creation:" + "metadata": {}, + "source": [ + "This will save the data set into a folder we specified upon creation:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.518484Z", - "start_time": "2024-08-28T08:40:39.516166Z" + "end_time": "2024-09-03T21:46:41.496060Z", + "start_time": "2024-09-03T21:46:41.493773Z" } }, - "cell_type": "code", - "source": "dataset.path", + "source": [ + "dataset.path" + ], "outputs": [ { "data": { @@ -1988,27 +676,31 @@ "'/home/sichom/projects/QSPRpred/tutorials/tutorial_output/data/RepresentationTutorialDataset'" ] }, - "execution_count": 25, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 25 + "execution_count": 11 }, { - "metadata": {}, "cell_type": "markdown", - "source": "It will also update or save the underlying `ChemStore` object, which also lives in the same folder:" + "metadata": {}, + "source": [ + "It will also update or save the underlying `ChemStore` object, which also lives in the same folder:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.521141Z", - "start_time": "2024-08-28T08:40:39.519076Z" + "end_time": "2024-09-03T21:46:41.499076Z", + "start_time": "2024-09-03T21:46:41.496772Z" } }, - "cell_type": "code", - "source": "dataset.storage.path", + "source": [ + "dataset.storage.path" + ], "outputs": [ { "data": { @@ -2016,16 +708,16 @@ "'/home/sichom/projects/QSPRpred/tutorials/tutorial_output/data/RepresentationTutorialDataset_storage'" ] }, - "execution_count": 26, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 26 + "execution_count": 12 }, { - "metadata": {}, "cell_type": "markdown", + "metadata": {}, "source": [ "Therefore, storages and data sets can live in different folders and can be shared between projects. That means you can use the same storage for both your QSPR modelling and your docking project, for example. Both projects will have access to all data in your storage even if it changes over time, which can be useful for data management. \n", "\n", @@ -2036,15 +728,12 @@ "cell_type": "code", "metadata": { "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:41:13.685103Z", - "iopub.status.busy": "2023-09-21T15:41:13.684663Z", - "iopub.status.idle": "2023-09-21T15:41:13.718541Z", - "shell.execute_reply": "2023-09-21T15:41:13.717681Z" + "jupyter": { + "outputs_hidden": false }, "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.538396Z", - "start_time": "2024-08-28T08:40:39.521794Z" + "end_time": "2024-09-03T21:46:41.519209Z", + "start_time": "2024-09-03T21:46:41.499694Z" } }, "source": [ @@ -2060,12 +749,12 @@ "[TargetProperty(name=pchembl_value_Mean, task=REGRESSION)]" ] }, - "execution_count": 27, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 27 + "execution_count": 13 }, { "cell_type": "markdown", @@ -2077,41 +766,40 @@ ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.542525Z", - "start_time": "2024-08-28T08:40:39.540263Z" + "end_time": "2024-09-03T21:46:41.521928Z", + "start_time": "2024-09-03T21:46:41.519845Z" } }, - "cell_type": "code", - "source": "len(dataset) # original length", + "source": [ + "len(dataset) # original length" + ], "outputs": [ { "data": { "text/plain": [ - "3286" + "4082" ] }, - "execution_count": 28, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 28 + "execution_count": 14 }, { "cell_type": "code", "metadata": { "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:41:16.422240Z", - "iopub.status.busy": "2023-09-21T15:41:16.421937Z", - "iopub.status.idle": "2023-09-21T15:41:27.677052Z", - "shell.execute_reply": "2023-09-21T15:41:27.675892Z" + "jupyter": { + "outputs_hidden": false }, "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.554271Z", - "start_time": "2024-08-28T08:40:39.543135Z" + "end_time": "2024-09-03T21:46:41.717606Z", + "start_time": "2024-09-03T21:46:41.522502Z" } }, "source": [ @@ -2156,33 +844,834 @@ "len(dataset) # reduced length" ], "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "qsprpred - WARNING - Molecule refused by standardizer: COCCn1c(=O)c2c(nc(Cc3c(F)cccc3F)n2C)n(Cc2ccco2)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)OCC1OC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCC(NS(=O)(=O)c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)n1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(I)c4)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(Oc1ccc(-c2cc3c([nH]2)c(=O)n(C)c(=O)n3C)cc1)C(=O)Nc1ccc(Br)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(NOC)nc(C#Cc4ccc(F)cc4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1COC(n2cnc3c(NCc4cccc(Br)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1OC(n2cnc3c(NC)nc(C#Cc4ccc(F)cc4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2nn(Cc3ccc(C4(C(F)(F)F)N=N4)cc3)cc2c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1nc(NC(=O)c2ccc(F)cc2)nc2nn(CC(C)(C)C)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1cc2c(nc(NC(=O)Nc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: S=c1sc2c(ncn3nc(-c4ccco4)nc23)n1-c1ccccc1I. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(F)cc5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(-n2cccn2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(C(F)F)o2)nc2sc(Cc3ccccc3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(NCc3ccc(F)cc3)n3nc(-c4ccco4)nc3n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2ccccc2n2c(=O)c(-c3ccc(F)cc3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(CCN1CCN(c2ccc(F)cc2F)CC1)c1cc2nc(-c3ccco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(COc1ccc(Nc2nc(OCCc3c[nH]c4cc(Br)ccc34)nc3c2ncn3C2OC(CO)C(O)C2O)cc1)Nc1ccccc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(Br)cc4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(NC(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(N2CCC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCCn1c(NC(=O)c2cccc(F)c2)nc2cc(C(=O)NCc3cccc(OC)c3)ccc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(F)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1c(-c2ccc(F)cc2)cc(-c2ccccc2)nc1N. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)O)cc3)c(Br)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1cc2c(nc(NC(=O)Nc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1nc(Cl)c2cn(CCc3ccccc3)nc2n1)c1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cc(OCC(=O)c4ccc(F)cc4)nn3C)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc2c(-c3cnc4ccc(F)cn34)noc2c1)N1CCCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3cnc4c3nc(N)n3nc(-c5ccco5)nc43)CC2)c(F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1cc2c(nc(NC(=O)Nc3ccc(Br)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1COC(n2cnc3c(NCc4cccc(Br)c4)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncc(C(=O)c2sc(Nc3ccccc3)nc2-c2ccccc2)n1-c1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3c(F)cccc3F)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccnc3ccccc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(N)c3nn(Cc4ccccc4F)cc3n2)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccc(F)c2)nc2sc(Cc3ccccc3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CSCCn1c(=O)c2[nH]c(Cc3c(F)cccc3F)nc2n(Cc2ccco2)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccs2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1nc(C(F)(F)F)c(CN2CCN(c3nc(N)n4nc(-c5ccco5)nc4n3)CC2)c1Cl. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5cccc(F)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCc2cccc(F)c2)nc2sc(-c3ccco3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4ccccc4C(F)(F)F)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(O)c1cnc(NCc2ccc(C(F)(F)F)cc2)n2nc(-c3ccco3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC#Cc1nc(NCc2cccc(I)c2)c2ncn(C3SCC(O)C3O)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(NC(=O)Nc2nc3nn(CCc4c(Br)cc(Br)cc4Br)cc3c3nc(-c4ccco4)nn23)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1c(N=Cc2ccc(F)cc2)c(C#N)sc1=S. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCc3ccc(Br)cc3CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC4CC(=O)N(c5ccc(C(F)(F)F)cc5)C4)nc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1cc2c(nc(NC(=O)Nc3ccc(Br)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(Oc1ccccc1F)C(=O)Nc1n[nH]c(-c2ccccc2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC(c1ccccc1)c1cc(C(F)(F)F)nc2ccccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)c4ccc(I)cc4)cn3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN1CCc2sc3nc(C(=O)Nc4ccc(F)cc4)c(N)cc3c2C1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)c3cnn(Cc4cccc(C(F)(F)F)c4)c3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)c(Cl)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3cnc4c3nc(N)n3nc(-c5ccco5)nc43)CC2)c(C(F)(F)F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1OC(n2cnc3c(NOC)nc(I)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(C(O)CF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3ccccc3F)c3cnc(NC4CC4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3c2OCCO3)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1c(F)cc(N2CCN(CCn3c(=O)n(C)c4c3nc(N)n3nc(-c5ccco5)nc43)CC2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1cccs1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NCc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCc4ccncc4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3c(=O)n(C)c4c3nc(N)n3nc(-c5nccs5)nc43)CC2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(NC2CCCC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1cc(-c2cc(O)cc(F)c2)nc(-n2nc(C)cc2C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCN3Cc2cccc(Br)c2)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1[Se]C(n2cnc3c(NCc4cccc(F)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(C(F)(F)F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1c(N=Cc2ccc(Br)cc2)c(C#N)sc1=S. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5cccc(F)c5)nc(Cl)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)CN(C(C)=O)c1ccc(OC)c2nc(NC(=O)c3ccc(F)cc3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1COC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1nc(Cl)c2cn(CCc3ccccc3)nc2n1)c1ccc(C(F)(F)F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1cccc(F)c1Nc1nc2c(N3CCCC3)ncnc2s1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ncc[nH]2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC(c1ccccc1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1csc(-c2nc(N)c3cc(Cc4ccccc4F)sc3n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3c2OCCN3)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCN(CCCc2cccc(Br)c2)C3)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)n(CC(C)C)c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccc(F)cc2)cc(-c2ccco2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1nc(NC(=O)c2ccc(F)cc2)nc2nn(C)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3c(F)cccc3F)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C#CCn1c(=O)c2c(nc3n2CCCN3c2ccc(F)cc2)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c1nc(N)n1nc(CN3CCN(c4ccccc4F)CC3C)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NC4CC4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(COc1ccccc1)Nc1n[nH]c(-c2ccc(F)cc2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2c[nH]c3c(Br)cccc23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2c(nc(Cc3c(F)cccc3F)n2C)n(Cc2cccs2)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCCc2ccc(Br)cc2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3cccc(C(F)(F)F)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCCCCNC(=O)c2ccc(S(=O)(=O)F)cc2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(NC2CC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)nc2sc(CN3CCC(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(C(F)F)o2)nc2sc(CN3CCCC(F)(F)C3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(CCl)OC(n2cnc3c(Nc4ccc(Cl)cc4F)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c(C(=O)NCc3cccc(F)n3)nc(N)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCN(CCc1ccc(F)cc1)c1cc2nc(-c3ccco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCN(S(=O)(=O)c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1cc(-c2cc(F)cc(F)c2)nc(-n2nc(C)cc2C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCN1C(=O)c2[nH]nc(-c3ccccc3O)c2C1c1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccsc1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccncc1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5cccc(C(F)(F)F)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(N2CCN3CC(COc4cccc(F)c4)CCC3C2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cc3ccccc3o2)nc2sc(CN3CCC(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccccc2)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(OCC(=O)Nc2ccc(Br)cc2)ccc1-c1cc2c([nH]1)c(=O)n(C)c(=O)n2C. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(O)C1CCC(OC2CCN(c3ccc(-c4nc5cc(F)ccc5[nH]4)cn3)CC2)CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(CF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCc2ccc(-c3ccc(OC(F)(F)F)cc3)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3c2C(=O)NC3)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC1OC(n2cnc3c(NCc4cccc(I)c4)nc(-n4cc(-c5nccc6ccccc56)cn4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(Cc3ccc(C(F)(F)F)cc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC1(O)CCN(C(=O)Nc2nc3c(OCCF)ccc(N4CCOCC4)c3s2)CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(N4CCN(c5ccc(OCc6ccc(C(F)(F)F)cc6)cc5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(O)C(n2cnc3c(NCc4cccc(I)c4)ncnc32)C2CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(NOC)nc(I)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCN(c5ccc(C(F)(F)F)cc5)CC4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCNc1nc(C#Cc2ccc(F)s2)nc2c1ncn2C1C(O)C(O)C2(C(=O)NC)CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1nccs1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1cc(-c2ccccc2F)nc(-c2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(Br)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)nc2sc(CN3CCC(F)(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NCc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(CC(=O)N4CCN(c5cccc(C(F)(F)F)c5)CC4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCC4CC4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2[nH]c(Br)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(OCC(F)(F)F)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCC(NC(=O)c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cc(F)ccc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)n(CC(C)C)c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)n(C)c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)cc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2c[nH]c3cccc(Br)c23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)[nH]c(=NC(=N)NCCc2c[nH]c3ccc(F)cc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(CC2OC(n3cnc4c(Nc5ccc(Cl)cc5F)ncnc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNc1ncc2c(n1)n(-c1cccc(OC)c1)c(=O)n2Cc1c(F)cccc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(c5ccc(I)cc5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3Cc2cccc(Br)c2)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1cnc(-c2ccncc2F)c(-c2ncco2)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(N4CCN(c5ccc(OCc6ccc(F)cc6)cc5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C1NCCC12OC(n1cnc3c(NCc4cccc(I)c4)ncnc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N=C1C(=O)c2ccccc2C(=O)C1Nc1cccc(C(F)(F)F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(Cl)c4F)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(-c2nnc(N)nc2-c2ccc(F)cc2)cc(C(F)(F)F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5cccc(F)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2nn(CCc3ccccc3)cc2c2nc(-c3ccc(F)cc3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: NNc1nc(OCCc2c[nH]c3cc(Br)ccc23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CONc1nc(I)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC1CCCN1c1ncnc2sc(Nc3c(F)cccc3F)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(Br)cc4)cc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3ccccc3F)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cc[nH]n2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(N=Cc2ccc(F)cc2)c(C#N)sc1=S. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2nccc3ccccc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1cc2c(nc(NC(=O)Nc3ccc(Br)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(O)C(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C2CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1cnc(-c2ccncc2)c(-c2ccccc2F)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(N)nc(NCCCc3ccccc3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cccc(OC(F)(F)F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1cc(C(=O)N2CCCCCC2)ccc1C(=O)c1cnc2ccc(F)cn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)OCc1cccc(CNC(=O)c2nc(N)nc3c(F)cccc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1SC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(CC4CC(=O)N(c5ccc(F)cc5)C4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCN1C(=O)N2CCN=C2c2[nH]c(-c3ccc(Br)cc3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1ncc(C(=O)NCc2ccc(C(F)(F)F)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1nc(-n2cccn2)nc(N)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCC(C)n1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: NC1C(CO)OC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ccn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3cccc(OCC(=O)Nc4ccc(Br)cc4)c3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(NCCCc3ccccc3)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)c3ccc(F)cc3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3cc(F)ccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NC4CCCC4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4ccc(C(F)(F)F)cc4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(Nc5ccc(Cl)cc5F)nc(Cl)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1CSC(n2cnc3c(NCc4cccc(Br)c4)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(N)n3nc(-c4ccc(F)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC(NC(=O)C1CCC2c3[nH]c4ccccc4c3CCN2C1)C(=O)NC(Cc1ccc(F)cc1)C(=O)N1CC(N)CC1C(N)=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCCN4CCOCC4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccccc3)n3nc(-c4ccc(C(F)(F)F)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2nc(NCC(F)(F)F)cc(-n3cccn3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCN(CCn3c(=O)n(C)c4c3nc(N)n3nc(-c5ccco5)nc43)CC2)c(F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2CCO. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2nccs2)nc(-c2nccs2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3ccc(N)c(F)c3)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3cccc(F)c3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-n2nc(C)c3cnc(-c4ccc(C5CC5C(=O)O)cc4F)cc32)nc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1OC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1cnc(-c2ccncc2F)c(-c2ccncc2)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OC(C)(C)C(=O)Nc4ccc(F)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccccc1)c1sc(NC2CCN(Cc3ccccc3F)CC2)nc1-c1ccco1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3c(C(=O)N4CCN(c5ccccc5)CC4)ccnc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(F)c(F)c2)nc2sc(CN3CCOCC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cccc(CNC(=O)c2nc(N)nc3c(F)cccc23)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Br)c4)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1nc2cc(N3CCN(CCn4c(=O)n(C)c5c4nc(N)n4nc(-c6ccco6)nc54)CC3)c(F)cc2o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2ccccc2F)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccccc3)n3nc(-c4ccc(Br)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1cc(-c2ccc(F)cc2)c2oc(-c3ccco3)nc2c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC(NC(=O)CCCCCNC(=O)C1CCC2c3[nH]c4ccccc4c3CCN2C1)C(=O)NC(Cc1ccc(F)cc1)C(=O)N1CC(N)CC1C(N)=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCN(CCN1CCN(c2ccc(F)cc2F)CC1)c1cc2nc(-c3ccco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccc(Br)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1CSC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(Br)cc4)cc3)c(O)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CONc1nc(C#Cc2ccc(F)cc2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(O)c1cnc(NCc2ccc(F)cc2)n2nc(-c3ccco3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4ccccc4F)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5cc(F)cc(OC)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncc(-c2cccc(C(F)(F)F)c2)c(-c2ccccc2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCCCCCNC(=O)c2ccc(S(=O)(=O)F)cc2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(N=Cc2ccc(Br)cc2)c(C#N)sc1=S. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OC(C)C(=O)Nc4ccc(F)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(I)cc5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCc1nc2[nH]cnc2c2nc(-c3ccc(C(F)(F)F)cc3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccccc1CNC(=O)c1nc(N)nc2c(F)cccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(CCN1CCN(c2ccc(F)cc2F)CC1)c1cc2nc(-c3cnco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1OC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cc(I)ccc4OC)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2cnn(Cc3ccc(C(F)(F)F)cc3)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1SC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(cc(C=Cc3ccc(F)cc3)n2C)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC12CC1C(n1cnc3c(NCc4cccc(I)c4)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-c2nc(N)c3cc(CN4CCCC(F)(F)C4)sc3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2C1CCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1c(-c2cccc(OC(F)(F)F)c2)cc(-c2ccccc2)nc1N. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-c2cn3c(nc4cc(F)c(F)cc43)c(N)n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)(NC(=O)COc1ccc2[nH]c(=O)c(-c3nccs3)c(CCC(F)(F)F)c2c1)c1ccccc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOc1ccnc2c(CNC(=O)c3nc(N)nc4c(F)cccc34)cccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1COC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(N)nc(NC(C)Cc5ccc(Br)cc5)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2nccs2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(COC(=O)N4CCN(c5ccc(F)cc5)CC4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(F)c(F)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccccc2)c2c(n1)-c1cc(Cc3ccncc3F)ccc1C2=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)Cn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(CC(C)C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCC(=O)Nc1nc(-c2ccccc2F)cc(-c2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccc([N+](=O)[O-])c2)nc2sc(CN3CCC(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc(F)cc1)c1c[nH]c(C(=O)NCCCn2ccnc2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2ccc(F)cc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC1OC(n2cnc3c(NC4CC4c4cccc(C(F)(F)F)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(N2CCCCC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(Cl)cc5F)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(NCC)nc(I)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CC(N)=O)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccco1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(c5ccccc5Br)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(F)(CO)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(NC(=O)Nc2nc3nn(C)cc3c3nc(-c4ccc(Br)cc4)nn23)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccc(F)c2)nc2sc(CN3CCOCC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1[Se]C(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2nn(CCc3c(Br)cc(Br)cc3Br)cc2c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3c(F)cc(F)cc3F)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(Cc3ccc(C(F)(F)F)c(F)c3)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(O)C(n2cnc3c(NCc4cccc(Br)c4)nc(Cl)nc32)C2CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)c1ccc2c(c1)-c1nc(N)nc(-c3ccc(Br)o3)c1C2. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCNc1ncc2c(n1)n(-c1cccc(OC)c1)c(=O)n2Cc1c(F)cccc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cc(OCC4CC(=O)N(c5ccc(C(F)(F)F)cc5)C4)no3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3ccc(F)cc3F)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3ccc(F)cc3)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(C)c1nc(NC(=O)c2ccccc2)sc1C(=O)c1cnc(N)n1-c1ccc(C(F)(F)F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cnc3ccc(F)cn23)c2ccc(C(=O)N3CCCCCC3)cc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(ncn2Cc2ccccc2C(F)(F)F)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(N)n3nc(-c4ccc(C(F)(F)F)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC(NC(=O)CCCCCNC(=O)C1CCC2c3[nH]c4ccccc4c3CCN2C1)C(=O)NC(Cc1ccc(F)cc1)C(=O)NC(CCCCN)C(N)=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)(C)CC(=O)Nc1c(F)cc(C(=O)Nc2nccs2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccccn2)cc(-c2cccc(F)c2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(Cl)c2nc(NC(=O)c3ccc(F)cc3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C#CCN(CCCCCNc1nc(N)n2nc(-c3ccco3)nc2n1)C(=O)c1ccc(S(=O)(=O)F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3cnc4c3nc(N)n3nc(-c5ccccc5C(F)(F)F)nc43)CC2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCN(c5ccc(C(F)(F)F)cc5)CC4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(Br)cc4)cc3)c(Cl)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)c1snc(Nc2cc(Cl)ccc2F)c1N. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCCC(F)C3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1nccnc1-c1nc2cc(N(C)CCN3CCN(c4ccc(F)cc4F)CC3)nc(N)n2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1cc(-c2cc(F)cc(OC3CCN(C)C3)c2)nc(-n2nc(C)cc2C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3c(F)cccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1cccnc1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(C(=O)c2nc(C(F)(F)F)nc3ccsc23)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1nc(NC(=O)c2ccc(F)cc2)nc2nn(CCc3ccccc3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccc(F)cc3)n3nc(-c4ccc(Br)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CCN2CCN(c3ccc(C(F)(F)F)cc3)CC2)c2nc(N)n3nc(-c4ccco4)nc3c21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3Cc2c(F)cccc2Cl)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(C(C)=O)C(F)F)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)nc2sc(CN3CCCC(F)(F)C3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1ccnc1-c1nc(N)nc2c1cnn2Cc1ccccc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cc[nH]c23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccc3ccccc3n2)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(Oc1ccc(-c2cc3c([nH]2)c(=O)n(C)c(=O)n3C)cc1)C(=O)N1CCN(c2ccc(Br)cc2)CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(CCNc2nc(C#Cc3ccccc3F)nc3c2ncn3C2C(O)C(O)C3CC32)ccc1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(C)C(=O)c1cccc(-c2nc(N)c3cc(CN4CCC(F)CC4)sc3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cc(OCC(=O)c4ccc(F)cc4)n(C)n3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(CF)OC(n2cnc3c(NC4CCOCC4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCC(=O)Nc1cc(-c2ccc(F)cc2)nc(-c2ccc(F)cc2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCCN2CCN(c3ccc(F)cc3F)CC2)cc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(-c2cnn(-c3nc(NCc4cccc(I)c4)c4ncn(C5OC(CO)C(O)C5O)c4n3)c2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C=CCn1c(=O)c2c(nc(Cc3c(F)cccc3F)n2C)n(Cc2ccco2)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCCc2ccccc2)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(N)nc(NCc3ccccc3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1ccc(-c2nc(N)c3cc(CN4CCC(F)CC4)sc3n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1cnc2c(CNC(=O)c3nc(N)nc4c(F)cccc34)cccc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1nc(-c2cnc3ccc(F)cn23)c2ccc(C(=O)N3CCCCCCC3)cc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1c(C(=O)Nc2nccs2)ccc(NC(=O)CC(C)(C)C)c1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccccn2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2CCc1ccccc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)n1cc2c(Cl)nc(NC(=O)c3ccc(F)cc3)nc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccccn1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(COCCF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(CC4CC(=O)N(c5ccc(C(F)(F)F)cc5)C4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(Cc3ccc(C(F)(F)F)cc3)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)c(Br)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccc(C(F)(F)F)c2)nc2sc(CN3CC=CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCCc4cccs4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(-c2nnc(N)nc2-c2cccc(F)c2)cc(C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(SCCN2CCN(c3ccc(F)cc3F)CC2)cc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(NC(=O)Nc2nc3nn(CCc4cc(Br)c(Br)cc4Br)cc3c3nc(-c4ccco4)nn23)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)NCc4ccc(F)cc4)cc3)c(Cl)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1[Se]C(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2cc(NC(=O)Cc3cc(F)cc(F)c3)nc(-c3ccc(C)o3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(O)C1CCC(OC2CCN(c3ccc(-c4nc5cc(C(F)(F)F)ccc5[nH]4)cn3)CC2)CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(-c2ccc(F)cc2)nc2c(NC3CCCC3)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)N3CCC(c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCCN3Cc2ccccc2Br)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cc(F)cc(F)c2)nc2sc(CN3CCOCC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(NC3CCCCC3)nc3nc(-c4ccco4)nn23)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Cc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC1CCCN1c1ncnc2sc(Nc3cccc(Br)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(O)C(n2cnc3c(NCc4cccc(I)c4)nc(F)nc32)C2CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-c2cc(NC(C)=O)nc(-n3nc(C)cc3C)n2)c1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC1OC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3ccc(F)cc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CONc1nc(C#Cc2cccc(C(F)(F)F)c2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c1nc(N)n1nc(CN3CCN(c4ccc(F)cn4)CC3C)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCOc1ccnc2c(CNC(=O)c3nc(N)nc4c(F)cccc34)cccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OC(CC)C(=O)Nc4ccc(F)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(O)Cn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(N)n3nc(-c4ccc(Br)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(F)cc3F)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(N)nc(NCCc3ccccc3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)n1cc(-c2ncc(N)nc2-c2ccc(F)cc2)ccc1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=c1[nH]c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2c(=O)n1CC1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCC(NCc4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c1nc(N)n1nc(CN3CCN(c4ccc(F)cc4)CC3C)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(ncn2CCN2CCN(c3ccc(F)cc3F)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(Br)o2)nc2sc(CN3CCOCC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2CCCO. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(O)(CF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1[Se]C(n2cnc3c(NCc4cccc(Br)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3ccccc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCN(C(=O)c4cccc(C(F)(F)F)c4)C3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCC(=O)Nc1nc(-c2ccccc2F)cc(-c2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1nc(NCc2ccc(F)cc2)n2nc(-c3ccco3)nc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(C(=O)NCc3ccc(F)cc3)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1ccc(-c2ccncc2F)c(-c2ccco2)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-c2cc(C(F)(F)F)nc(NCc3ccc(F)cc3)n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)Cn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(N)nc(NC(C)Cc5ccc(C(F)(F)F)cc5)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3ncc4c3nc(N)n3c(=O)n(Cc5cccc(Cl)c5)nc43)CC2)c(F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)(C)Cn1cc2c(Cl)nc(NC(=O)c3ccc(F)cc3)nc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCC(C(=O)c5ccc(F)cc5)CC4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3ccc(F)c(F)c3F)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1sc(=S)n(CCc2ccccc2)c1N=Cc1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCC(NC(=O)c4cccc(C(F)(F)F)c4)C3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(-c2nnc(N)nc2-c2ccccc2)cc(C(F)(F)F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(-c4ccc(C(F)(F)F)nc4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(-c2nnc(N)nc2-c2ccc(F)cc2)cc(C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NC)nc(C#Cc4ccc(F)s4)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1nc(Nc2ccccc2)sc1C(=O)c1cnc(N)n1-c1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCCN2CCN(c3ccc(F)cc3F)CC2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(NCc3ccccc3)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NC(C)C)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NC(=O)c2ccccc2F)c([N+](=O)[O-])c(-c2ccco2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(CCN1CCN(c2ccc(F)cc2F)CC1)c1cc2nc(-c3ccco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)nc2sc(CN3CCC(F)C3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(C)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCc4ccccc4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(CC4CCN(c5ccc(OC(F)(F)F)cc5)C4=O)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NCCCNC(=O)c2ccc(S(=O)(=O)F)cc2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(COC(=O)Nc4ccc(F)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1ccc(-c2ccncc2)c(-c2cccc(F)c2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(NCC2CC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NN=Cc2ccc(C(F)(F)F)cc2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1ncc(C(=O)OCc2cccc(F)c2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC#Cc1nc(NCc2cccc(I)c2)c2ncn(C3C(O)C(O)C4CC43)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1c(N)nc2c(c1-c1ccc(F)cc1)CCC2. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCc3nc(C(F)(F)F)ccc3C2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2C1CCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(N2CCN(C(=O)Cc3ccc(Br)cc3)CC2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(OC(F)(F)F)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCN(C(=O)c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2cc(NC(=O)COc3cccc(F)c3)nc(-c3ccc(C)o3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1ncc(C(=O)OCc2ccc(F)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(F)c(Cl)c5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1ccc(F)cc1)Nc1ccc(Nc2nc(-c3ccccc3)nc3c2nnn3Cc2ccccc2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cccnc1CNC(=O)c1cc(-c2ccccc2F)nc(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCN(Cc4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1OC(n2cnc3c(NOC)nc(C#Cc4ccc(F)cc4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COC(=O)C1OC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C)c4F)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCC2CCCCC2)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC(c1ccccc1)c1nc(C(F)(F)F)nc2ccccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5ccc(Cl)cc5F)ncnc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(-c4ccc(C(F)(F)F)nc4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(Oc3ccccc3)nc3nc(-c4ccco4)nn23)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(Br)c(F)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccccc1Cn1c(=O)n(-c2cccc(OC(F)(F)F)c2)c2nc(NC3CC3)ncc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2Cc1ccccc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(NC(=O)COc3ccccc3F)cc(-c3nccs3)n2)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCCc2ccccn2)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(CC(=O)O)cc5Br)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2c[nH]c3cc(Br)ccc23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCN(c5ccc(F)cc5)CC4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc2c(-c3cnc4ccc(F)cn34)c[nH]c2c1)N1CCCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCC1CCCN1c1nc(-n2cccn2)nc(N)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCN3Cc2ccccc2Br)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3Cc2ccc(Cl)cc2F)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Cc1ccccc1)Nc1nc2nn(CCc3ccccc3)cc2c2nc(-c3ccc(F)cc3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(OCC(=O)NCCCCCC4=C5C(C)=CC(C)=[N+]5[B-](F)(F)n5c(C)cc(C)c54)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCN2CCN(c3ccc(F)cc3F)CC2)cc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(N)nc(NC(C)Cc5ccc(F)cc5)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4ccc(F)c(F)c4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NN=Cc2ccc(C(F)(F)F)cc2C(F)(F)F)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(OCCF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(C(F)(F)F)cc5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(NNC(=O)c4ccc(Br)o4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(=O)c1ccc(C(=O)c2cnc3ccc(Br)cn23)cc1)C1CCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1nc(Nc2ccc(Cl)cc2)sc1C(=O)c1cnc(N)n1-c1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccccc1)c1sc(NCc2ccc(F)c(F)c2)nc1-c1ccco1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)n(C)c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCN(c5ccc(C(F)(F)F)cc5)CC4)cc3)cc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(NCc3ccccc3)nc3nc(-c4ccco4)nn23)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccc(C(F)(F)F)c2)nc2sc(Cc3ccccc3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)nc2sc(CN3CCC(F)(F)C3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5cccc(Br)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccc(Br)c2)cn2c1nc1ccccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccccc2)c2c(n1)-c1cc(C(F)(F)c3cccnc3)ccc1C2=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc(CO)n2)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CCO)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(-c3cccs3)cc(C(F)(F)F)n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(CCl)OC(n2cnc3c(Nc4ccc(Cl)cc4F)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1ccccc1-c1nc(N)c2cc(CN3CCC(F)CC3)sc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(I)cc4)cc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1cccc(C(=O)Nc2nc3cc(C(=O)N4CCCCC4)cnc3n2Cc2ccc(F)cc2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)c3cnn(Cc4cccc(F)c4)c3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)Cn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(CC(C)C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(NC1CCC1)C1SC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccccc2O)nc(N2Cc3ccc(F)cc3C2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(Br)cn2cc(-c3ccco3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(ncn2CCN2CCN(c3ncc(F)cn3)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5cccc(F)c5)ncnc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCc4cccnc4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(I)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C=CCCn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cnc(-c2nc(N)nc3c2nnn3Cc2ccccc2F)s1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(COc1cccc(F)c1)Nc1cc(-c2nccs2)nc(-c2ccccn2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1nc(NC(=O)c2ccc(Br)cc2)nc2nn(CCc3ccccc3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C#CCn1c(=O)c2c(nc3n2CCCN3c2cccc(Br)c2)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3OC)c3cnc(NC4CC4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4ccc(N)c(I)c4)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(Nc4cccnc4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1sc(=S)n(CCc2ccccc2)c1N=Cc1ccc(Br)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccc(F)cc3)n3nc(-c4ccc(Cl)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(N)c3nn(Cc4cccc(F)c4)cc3n2)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2nccs2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(C(F)F)o2)nc2sc(CN3CCC(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(NC1CC1)C1SC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: S=c1sc2c(ncn3nc(-c4ccco4)nc23)n1-c1ccc(I)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1nc(-c2cnc3ccc(C(F)(F)F)cn23)c2ccc(C(=O)N3CCCCCC3)cc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccccc1Cn1c(=O)n(-c2cccc(F)c2)c2nc(NC3CC3)ncc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)cc(-c2nc(Nc3ccc(Cl)cc3F)c3ncn(C4OC(Cn5nc(C)cc5C)C(O)C4O)c3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)NC3CCN(C(=O)NCc4ccc(F)cc4)CC3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(F)c4)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1cc(-c2cccc(C(F)(F)F)c2)nc(-n2nc(C)cc2C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc2c(=O)c(C(=O)NCc3ccc(F)cc3)c[nH]c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(OCC(=O)NCCCCCCCCCCC4=C5C(C)=CC(C)=[N+]5[B-](F)(F)n5c(C)cc(C)c54)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1CSC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(-c2cc3c([nH]2)c(=O)n(C)c(=O)n3C)ccc1OCC(=O)Nc1ccc(Br)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(ncn2CCN2CCN(c3ccc(Cl)cc3F)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3c2NCCC3)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2c(nc3cc(OC)ccn32)n(CCCNC(=O)c2ccc(S(=O)(=O)F)cc2)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CC(=O)N(C)C)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc2c(-c3cnc4ccc(F)cn34)nn(Cc3ccccc3)c2c1)N1CCCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5cccc(Br)c5)ncnc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1COC(n2cnc3c(NCc4cccc(F)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1c(-c2ccc(F)cc2)cc(C2CC2)nc1N. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1cnc(NCc2ccc(C(F)(F)F)cc2)n2nc(-c3ccco3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(OCc2nc(-c3ccc(Br)cc3)no2)ccc1-c1cc2c([nH]1)c(=O)n(C)c(=O)n2C. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cc(C(F)(F)F)ccc4Cl)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1ccc(-c2ccncc2)c(-c2ccccc2F)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccccc1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCC(=O)Nc1cc(-c2ccccc2F)nc(-c2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(Cl)o2)nc2sc(CN3CCC(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-c2nc(N)c3cc(CN4CCC(F)CC4)sc3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C[S+](O)CCOc1cc(N2CCN(CCn3c(=O)sc4c3nc(N)n3nc(-c5ccco5)nc43)CC2)c(F)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2nn(CCc3cc(Br)c(Br)cc3Br)cc2c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3cnc4c3nc(N)n3nc(-c5cccc(F)c5)nc43)CC2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1cnc(NCc2ccc(F)cc2)n2nc(-c3ccco3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(NC(=O)COc3cccc(F)c3)cc(-c3nccs3)n2)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)c4ccc(I)cc4)cc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cccnc1CNC(=O)c1c(N)nc(-c2ccco2)nc1OCCF. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(N=Cc2ccc(F)cc2)c(C#N)sc1=S. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COC(=O)CN(C(C)=O)c1ccc(OC)c2nc(NC(=O)c3ccc(F)cc3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3ccc(F)c(F)c3)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cc(F)c(F)cc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)N3CCC(S(=O)(=O)c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(Cc2ccccc2F)nc2cn(-c3ccccc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(C(F)(F)F)cc5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2cc(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)[nH]c2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CCCO)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(C(F)(F)F)cc5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(C(F)F)o2)nc2sc(CN3CCOCC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)Cn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1nc(NC(=O)c2ccc(C(F)(F)F)cc2)nc2nn(CCc3ccccc3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(CCc1ccc(F)cc1)c1cc2nc(-c3ccco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(N=Cc2ccc(Br)cc2)c(C#N)sc1=S. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc[nH]1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2c[nH]c3ccc(I)cc23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1cc2c(nc(NC(=O)Nc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cncc3ccccc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C=CCn1c(Br)nc2c(N)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2cc(NC(=O)Cc3ccc(F)c(F)c3)nc(-c3ccc(C)o3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(O)C(n2cnc3c(NC(C4CC4)C4CC4)nc(I)nc32)C2CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Cn1cc(-c2ccccc2)c2ncn(C3CC3)c(=O)c21)NCc1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1nc(-c2ccccc2)c(C(=O)c2ccc(F)cc2F)s1)c1ccco1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCNc1nc(C#Cc2ccc(F)s2)nc2c1ncn2C1OC(C(=O)NC)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(F)c(F)cc1CNc1nc(Cl)nc2c1ncn2C1C(O)C(O)C2CC21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(F)cc3F)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=c1oc2c(O)c(O)ccc2cc1-c1cc(Br)cs1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1nc2nn(CCc3ccccc3)cc2c2nc(-c3ccc(F)cc3)nn12)c1ccccc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5cccc(F)c5F)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3Cc2ccc(F)c(Br)c2)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(=O)c1ccc(C(=O)c2cnc3ccc(F)cn23)cc1Cl)C1CCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCc4nc(-c5ccccc5C(F)(F)F)no4)cc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(N2CCCC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(O)(CCF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1c(C(=O)c2ccccc2)oc2ccc(Br)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(NC(=O)c1nc(N)nc2c(F)cccc12)c1cccc2cccnc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCc4noc(-c5ccc(C(F)(F)F)cc5)n4)cc3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)Sc1nc(-n2cccn2)nc(N)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C#CCn1c(Br)nc2c(N)nc(C#CCCCCCC)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: NC(=O)Cc1cc(N2CCN(CCn3c(=O)sc4c3nc(N)n3nc(-c5ccco5)nc43)CC2)c(F)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NN=Cc2ccc(Cl)c(C(F)(F)F)c2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccccc2)c2c(n1)-c1cc(Cc3cncc(F)c3)ccc1C2=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(Cl)o2)nc2sc(CN3CCCC(F)(F)C3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(N)c3cc(Cc4ccccc4F)sc3n2)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(Cl)c4)nc(C#CCCCCc4cn(C(Br)C(=O)c5ccccc5)nn4)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(C(F)F)o2)nc2sc(CN3CCC(F)(F)CC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(CSc2ccccc2F)OC(n2cnc3c(NC4CCCC4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(Nc2nc(C)c(C(=O)c3cnc(N)n3-c3ccc(F)cc3)s2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(Nc5ccc(Cl)cc5F)ncnc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)n(C)c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCN(c5ccc(F)cc5)CC4)cc3)cc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)N3CCC(O)(c4cccc(C(F)(F)F)c4)CC3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(=O)Nc1cc(-c2ccc(F)cc2)nc(-c2ccc(F)cc2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1cc2c(nc(NC(=O)Nc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)Cn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(N)nc(NC(CO)Cc5ccc(Br)cc5)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1cccc(Cl)c1)Nc1nc2nn(CCc3c(Br)cc(Br)cc3Br)cc2c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c(C(=O)NCc3cccc(C(F)(F)F)n3)nc(N)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(Cl)nc(NC(=O)c3ccc(C(F)(F)F)cc3)nc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(Cl)c(CN2CCCn3c2nc2c3c(=O)n(C)c(=O)n2C)c1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(-c3ccccc3)cc(C(F)(F)F)n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1cnc(-c2ccncc2F)c(-c2ccccc2F)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2cnn(Cc3c(F)cccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(c5ccc(F)cc5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5ccccc5I)nc(Cl)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN1CCN(C(=O)n2nc(-c3cnc4ccc(F)cn34)c3ccc(C(=O)N4CCCCCC4)cc32)CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1c(F)cccc1-c1cc(NC(C)=O)nc(-n2nc(C)cc2C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC1OC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Cc1ccc(F)cc1)Nc1nc2nn(CCCc3ccccc3)cc2c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Nc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: NCCCCC(NC(=O)C(Cc1ccc(F)c(F)c1)NC(=O)C1CCCN1C(=O)CCCCCNC(=O)C1CCC2c3[nH]c4ccccc4c3CCN2C1)C(N)=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(Br)cc4)cc3)nc2n(CCCOC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1nc(-c2ccccc2)c(C(=O)c2ccc(F)cc2F)s1)c1ccccc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccccc1Cc1cc2c(N)nc(-c3ccc(C(F)F)o3)nc2s1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2cc(NC(=O)COc3cc(F)cc(CN4CCOCC4)c3)nc(-c3ccc(C)o3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1cccc(-c2nc(N)c3cc(Cc4ccccc4F)sc3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CCN2CCN(c3ccc(C#N)c(F)c3)CC2)c2nc(N)n3nc(-c4ccco4)nc3c21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)CCn1cc2c(nc(NC(=O)Cc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CC(F)(F)F)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccccc3F)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1nc(-c2cnc3ccc(F)cn23)c2ccc(C(=O)N3CCCCCC3)cc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(NCC1CC1)C1SC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2C1CCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c(C(=O)NCc3cccc(COc4ccccc4F)n3)nc(N)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCC(=O)Nc1cc(-c2ccc(F)cc2)nc(-c2ccc(F)cc2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4cccc(F)c4)cc3)c(Cl)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(NC1CC1)C1SC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)n1cc2c(Cl)nc(NC(=O)c3ccc(C(F)(F)F)cc3)nc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Oc1c(Br)cc2[nH]c3cnccc3c2c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5cccc(I)c5)ncnc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(Br)o2)nc2sc(CN3CCCC3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCCc4ccccn4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1csc(-c2nc(N)nc3c2nnn3Cc2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc2c(c1)nnn2-c1cnc2ccc(F)cn12)N1CCCCCC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3Cc2ccccc2F)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cccc(I)c4)ncnc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CC(=O)O)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5cccc(C(F)(F)F)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(F)cc5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(N(CCN2CCN(c3ccc(F)cc3F)CC2)CC2CC2)cc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)n(CC2CC2)c(=O)c2[nH]c(-c3cnn(Cc4cccc(C(F)(F)F)c4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cc(C(=O)NCCCCCc4ccccc4)cnc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2nnn(Cc3cccc(C(F)(F)F)c3)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NN=Cc2ccc(OCc3ccc(F)cc3)cc2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5ccc(Cl)cc5F)nc(Cl)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2ccnn2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2Cc2ccc(C3(C(F)(F)F)N=N3)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC#Cc1nc(NCc2cccc(Br)c2)c2ncn(C3C(O)C(O)C4CC43)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1ccc(Br)c(-c2ccco2)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccc(Cl)o2)nc2sc(CN3CCC(F)(F)C3)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cc[nH]n2)c2cnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2c[nH]c3ccc(F)cc23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCc4ccccn4)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N(CC(C)C)C(C)=O)c2sc(NC(=O)c3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cccs2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC(NC(=O)C1CCC2c3[nH]c4ccccc4c3CCN2C1)C(=O)NC(Cc1ccc(F)c(F)c1)C(=O)NC(CCCCN)C(N)=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(ncn2Cc2ccccc2Br)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OCC12CC1C(n1cnc3c(NCc4cccc(I)c4)ncnc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccnc2c(CNC(=O)c3nc(N)nc4c(F)cccc34)cccc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2cccc(F)c2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cncn2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2cc(C(=O)Nc3c(C#N)ncn3-c3ccc(F)cc3)c(=N)oc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(F)cc3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccoc1)c1nc(C(F)(F)F)nc2ccsc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cccc(C(F)(F)F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1cccc(-c2nc(N)c3cc(CN4CCC(F)CC4)sc3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3Cc2ccccc2Br)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1ncc(C(=O)OCc2ccccc2Br)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(-n2c(=S)sc3c2ncn2nc(-c4ccco4)nc32)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(-c2cc(-c3cccc(Br)c3)c(C#N)c(N)n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CCN2CCN(c3ccc(F)cc3F)CC2)c2nc(N)n3nc(-c4ccco4)nc3c21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3cccc(F)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(CC4CC(=O)N(c5ccc(C(F)(F)F)cc5)C4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cccn2)nc(NC2CCC2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2cncs2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCSc1nc(-n2cccn2)nc(N)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3[nH]cnc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccccn2)cc(-c2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Cc3ccccc3)n3nc(-c4ccc(F)cc4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cc(-c5ccc(C(F)(F)F)cc5)on4)c3)nc2n(CC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccccc5F)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(Cc2c(F)cccc2F)nc2c1c(=O)n(CC1CC1)c(=O)n2Cc1ccco1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2c[nH]nn2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4)cc3)c(Br)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(nc(NC(=O)Cc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3cnc4c3nc(N)n3nc(-c5ccccc5F)nc43)CC2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(NCC1CC1)C1SC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2ccc(F)cc2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cc(C(=O)N4CCCc5ccccc54)cnc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cc(C(=O)N4CCN(c5ccccn5)CC4)cnc23)c2cccc(F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1cccc(Cl)c1)Nc1nc2nn(CCc3cc(Br)c(Br)cc3Br)cc2c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1ccc(-c2ccncc2)c(-c2ccc(F)cc2)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCC(=O)Nc1cc(-c2ccccc2F)nc(-c2ccccc2F)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: NCC1C(CO)OC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCC(NCc4cccc(C(F)(F)F)c4)C3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc(F)c2)cc(-c2ccco2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NN=Cc2ccc(Br)cn2)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1nnc(C2OC(n3cnc4c(NCc5cccc(Br)c5)nc(Cl)nc43)C(O)C2O)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)nc2n(CCCOC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1cc(F)c(N2CCN(CCn3c(=O)n(C)c4c3nc(N)n3nc(-c5ccco5)nc43)CC2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(N2CCN(Cc3ccc(F)cc3)CC2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1cc2c(Cl)nc(NC(=O)c3ccc(F)cc3)nc2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)N4CCN(c5ccc(Br)cc5)CC4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cc(OCC(=O)c4ccc(Br)cc4)nn3C)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CONC(=O)C1OC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc(-n2c(=O)n(Cc3c(F)cccc3F)c3cnc(NCCN(C)C)nc32)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1c(=O)n(CC(C)C)c(=O)c2[nH]c(-c3cnn(-c4ccc(C(F)(F)F)nc4)c3)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(F)c(OC)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC(c1cc(C(F)(F)F)nc2c(C(F)(F)F)cccc12)C1CCCCN1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(CSc2ccccc2F)OC(n2cnc3c(NC4CCOC4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(CCCc1ccccn1)C(=O)c1cnc2c(CNC(=O)c3nc(N)nc4c(F)cccc34)cccc2c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(C)c1nc(-n2cccn2)nc(N)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: C#CCn1c(=O)c2[nH]c(Cc3c(F)cccc3F)nc2n(Cc2ccco2)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2cc(NC(=O)COc3cc(F)cc(CN(C)C)c3)nc(-c3ccc(C)o3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC1CCCC(C)N1Cc1cc2c(N)nc(-c3ccc(C(F)F)o3)nc2s1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(O)C(n2cnc3c(NCc4cccc(Cl)c4)nc(C#Cc4ccc(F)c(F)c4)nc32)C2CC12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCn1cc2c(nc(NC(=O)Nc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)N3CCC(C(C)(O)CF)CC3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(NC(=O)Nc3ccc(F)cc3)c3nnn(Cc4ccccc4F)c3n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC1CCCC(C)N1Cc1cc2c(N)nc(-c3ccc(Br)o3)nc2s1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCC1CCCN1c1cc(NCC(F)(F)F)nc(-n2nc(C)cc2C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)c3ccc(F)cc3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(COC(=O)Nc4ccc(F)cc4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccc(Br)cc5)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCN(C(C)=O)c1ccc(OC)c2nc(NC(=O)C3CCN(S(=O)(=O)c4cccc(C(F)(F)F)c4)C3)sc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(C(=O)c3cnc4c(CNC(=O)c5nc(N)nc6c(F)cccc56)cccc4c3)CC2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C1[Se]C(n2cnc3c(NCc4cccc(Br)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1C(CSc2ccccc2F)OC(n2cnc3c(NC4CCCC4)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1SC(n2cnc3c(NCc4cccc(I)c4)ncnc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cnc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3cccc(F)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(Cc3c(F)cc(F)cc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-n2cc(O)cn2)nc(-n2cccn2)c1Br. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(CCc1ccc(F)cc1)c1cc2nc(-c3ccco3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCN(c3ccc(F)cc3F)CC2)c2nn(Cc3cccc(Cl)c3)c(=O)n12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4ccc(F)cc4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(Nc1ccc(-c2ccncc2)c(-c2cccc(F)c2)n1)C1CC1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCC#Cc1nc(NCc2cccc(Br)c2)c2ncn(C3SCC(O)C3O)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CCN2CCN(c3ccc(OC(F)F)cc3)CC2)c2nc(N)n3nc(-c4ccco4)nc3c21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCCN3c2ccc(F)cc2)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1OC(n2cnc3c(N)nc(NCCN4CCN(c5ccccc5C(F)(F)F)CC4)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1nc(Br)n2CC(O)CO. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNC(=O)C12CC1C(n1cnc3c(NCc4cc(I)ccc4Cl)nc(Cl)nc31)C(O)C2O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(cc(C=Cc3ccc(C(F)(F)F)cc3)n2C)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(OC)c(F)c5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(c1ccc(C(=O)N2CCCCCC2)cc1)c1cnc2ccc(F)cn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1ncc(C(=O)NCc2ccc(F)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1CSC(n2cnc3c(NCc4cccc(F)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cccc(I)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2[nH]c(-c3ccc(OCC(=O)NCc4ccc(F)cc4)cc3)cc2n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(N2CCOCC2)c2sc(NC(=O)C(C)(C)Oc3ccc(F)c(F)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3c(=O)n(C)c4c3nc(N)n3nc(-c5ccco5)nc43)CC2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1c(C(=O)Nc2ccccc2F)sc2nc3c(cc12)C(=O)CCC3. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCN2CCc3ncc(Br)cc3C2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cc(OCC(=O)c4ccc(Br)cc4)n(C)n3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1CSC(n2cnc3c(NCc4cccc(F)c4)nc(Cl)nc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCCc2ccc(OCCF)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(NC(=O)c2ccccc2OC(F)(F)F)c([N+](=O)[O-])c(-c2ccco2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ncco2)c2nnn(Cc3ccccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3ccc(S(=O)(=O)N4CCN(Cc5ccc(Br)cc5)CC4)cc3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)n(CCN2CCN(c3nc(C(F)(F)F)cs3)CC2)c2nc(N)n3nc(-c4ccco4)nc3c21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: OC1CSC(n2cnc3c(NCc4cccc(Br)c4)ncnc32)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(Br)nc2c(NCCc3ccccc3)ncnc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Brc1cccc(Nc2nc3c(N4CCCC4)ncnc3s2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccccc4F)cc3)c(Cl)c2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(OCC(=O)Nc2ccc(F)cc2)ccc1-c1cc2c([nH]1)c(=O)n(C)c(=O)n2C. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ccc(Cn2nnc3c(-c4ccco4)nc(N)nc32)c(F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3cnn(Cc4cc(-c5ccc(C(F)(F)F)cc5)on4)c3)nc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(-c2ccco2)c2ncn(Cc3c(F)cccc3F)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: O=C(NC1CCC1)C1SC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(cnn2CCCc2ccc(OCCCF)cc2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCNC(=O)C1SC(n2cnc3c(NCc4cccc(I)c4)nc(Cl)nc32)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCCn1cc2c(nc(NC(=O)Nc3ccc(F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cc(-c2cc3c([nH]2)c(=O)n(C)c(=O)n3C)ccc1OCC(=O)Nc1ccc(F)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1ncnc2c1ncn2C1OC(C(=O)NCC(F)(F)F)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(N2CCN(Cc3ccc(Br)cc3)CC2)nc2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc2c(ncn2CCN2CCC(c3ccc(F)cc3)CC2)c2nc(-c3ccco3)nn12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2ccccc2F)cc(-c2ccco2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(-c2nnc(N)nc2-c2cc(F)cc(F)c2)cc(C)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(N(C=O)CC(=O)Nc4ccc(Br)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(C(=O)NCc2cccc3cccnc23)c2cccc(Br)c2n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CNc1ncc2c(n1)n(-c1cccc(OC)c1)c(=O)n2Cc1c(F)cccc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Clc1ncnc2sc(Nc3cccc(Br)c3)nc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cccnc1CNC(=O)c1cc(-c2cccc(F)c2)nc(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)[nH]c2[nH]c(-c3cnn(Cc4cccc(F)c4)c3)nc2c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1cccc2c1nc(N)n1nc(CN3CCN(c4cccc(F)c4)CC3C)nc21. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1cc(C)n(-c2cc(NC(=O)COc3cc(F)cc(CN4CCCC4)c3)nc(-c3ccc(C)o3)n2)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)CCn1cc2c(nc(NC(=O)Cc3ccc(C(F)(F)F)cc3)n3nc(-c4ccco4)nc23)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CN(CCN1CCN(c2ccc(F)cc2F)CC1)c1cc2nc(-c3cccc(C#N)c3)nn2c(N)n1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COc1ccc(CC(=O)Nc2cc(-n3nc(C)cc3C)nc(-c3ccc(C)o3)n2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: COCCOc1ccc(N2CCN(CCn3ncc4c3nc(N)n3nc(-c5ccco5)nc43)CC2)c(F)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cn1c(=O)c2c(nc3n2CCN(Cc2csc(-c4ccc(C(F)(F)F)cc4)n2)C3)n(C)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: N#Cc1cccc(-c2nc(N)c3cc(CN4CCCC(F)(F)C4)sc3n2)c1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Cc1ccc(-c2nc(N)nc3c2nnn3Cc2ccccc2F)o1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1cccc(F)c1Nc1nc2c(Cl)ncnc2s1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CC(C)(C)CC(=O)Nc1ccc(C(=O)Nc2nccs2)cc1F. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(OCCc2c[nH]c3ccc(Br)cc23)nc2c1ncn2C1OC(CO)C(O)C1O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OC(CC)C(=O)Nc4ccc(Br)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCOC(=O)c1nc(NC(=O)c2ccc(F)cc2)nc2nn(C(C)C)cc12. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Fc1ccc(CNc2nc(-c3ccc(Cl)cc3)cc(C(F)(F)F)n2)cc1. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: CCCn1c(=O)c2[nH]c(-c3ccc(OCC(=O)Nc4ccc(F)cc4)cc3)cc2n(CCC)c1=O. Molecule removed.\n", + "qsprpred - WARNING - Molecule refused by standardizer: Nc1nc(CSc2nnc(N)s2)nc(Nc2ccc(F)cc2)n1. Molecule removed.\n" + ] + }, { "data": { "text/plain": [ "3286" ] }, - "execution_count": 29, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 29 + "execution_count": 15 }, { - "metadata": {}, "cell_type": "markdown", - "source": "You can see that you are required to also implement a few more things than just the `convert_smiles` method. This is because standardizers should be explicit about their settings and it should be possible to compare them. This will help you find out if two storages or data sets are compatible with each other or if you need to unify the standardization process between them:" + "metadata": {}, + "source": [ + "You can see that you are required to also implement a few more things than just the `convert_smiles` method. This is because standardizers should be explicit about their settings and it should be possible to compare them. This will help you find out if two storages or data sets are compatible with each other or if you need to unify the standardization process between them:" + ] }, { - "metadata": { - "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.557005Z", - "start_time": "2024-08-28T08:40:39.554920Z" - } - }, "cell_type": "code", - "source": "dataset.storage.standardizer.get_id()", + "metadata": {}, + "source": [ + "dataset.storage.standardizer.get_id()" + ], "outputs": [ { "data": { @@ -2190,26 +1679,28 @@ "'Br,F,I'" ] }, - "execution_count": 30, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 30 + "execution_count": 16 }, { - "metadata": {}, "cell_type": "markdown", - "source": "The standardizers used are saved with the storage so you can always retrieve them and check how the data was standardized:" + "metadata": {}, + "source": [ + "The standardizers used are saved with the storage so you can always retrieve them and check how the data was standardized:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.586593Z", - "start_time": "2024-08-28T08:40:39.557560Z" + "end_time": "2024-09-03T21:46:41.765542Z", + "start_time": "2024-09-03T21:46:41.720800Z" } }, - "cell_type": "code", "source": [ "dataset.save()\n", "dataset = QSPRDataset.fromFile(\n", @@ -2224,22 +1715,24 @@ "{'halogens': ['Br', 'F', 'I']}" ] }, - "execution_count": 31, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 31 + "execution_count": 17 }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:40:39.589385Z", - "start_time": "2024-08-28T08:40:39.587198Z" + "end_time": "2024-09-03T21:46:41.768412Z", + "start_time": "2024-09-03T21:46:41.766144Z" } }, - "cell_type": "code", - "source": "dataset.storage.standardizer.get_id()", + "source": [ + "dataset.storage.standardizer.get_id()" + ], "outputs": [ { "data": { @@ -2247,12 +1740,19 @@ "'Br,F,I'" ] }, - "execution_count": 32, + "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 32 + "execution_count": 18 + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "QSPRpred offers a few standardizers in the `qspr.data.chem.standardizers` package so feel free to look at the documentation of this package." + ] }, { "cell_type": "markdown", @@ -2267,15 +1767,12 @@ "cell_type": "code", "metadata": { "collapsed": false, - "execution": { - "iopub.execute_input": "2023-09-21T15:41:38.523697Z", - "iopub.status.busy": "2023-09-21T15:41:38.523454Z", - "iopub.status.idle": "2023-09-21T15:43:17.462900Z", - "shell.execute_reply": "2023-09-21T15:43:17.462012Z" + "jupyter": { + "outputs_hidden": false }, "ExecuteTime": { - "end_time": "2024-08-28T08:41:02.040003Z", - "start_time": "2024-08-28T08:40:39.590094Z" + "end_time": "2024-09-03T21:47:05.464812Z", + "start_time": "2024-09-03T21:46:41.769091Z" } }, "source": [ @@ -2285,36 +1782,42 @@ "dataset.addDescriptors([MorganFP(radius=3, nBits=2048), RDKitDescs()])" ], "outputs": [], - "execution_count": 33 + "execution_count": 19 }, { - "metadata": {}, "cell_type": "markdown", - "source": "Notice that since we are using the `TabularStorageBasic` as `ChemStore` for the data set, we can also speed these calculations up with parallelization:" + "metadata": {}, + "source": [ + "Notice that since we are using the `TabularStorageBasic` as `ChemStore` for the data set, we can also speed these calculations up with parallelization:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:41:02.042789Z", - "start_time": "2024-08-28T08:41:02.040775Z" + "end_time": "2024-09-03T21:47:05.467540Z", + "start_time": "2024-09-03T21:47:05.465409Z" } }, - "cell_type": "code", - "source": "dataset.nJobs = os.cpu_count()", + "source": [ + "dataset.nJobs = os.cpu_count()" + ], "outputs": [], - "execution_count": 34 + "execution_count": 20 }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:41:04.311526Z", - "start_time": "2024-08-28T08:41:02.043263Z" + "end_time": "2024-09-03T21:47:07.942129Z", + "start_time": "2024-09-03T21:47:05.468013Z" } }, - "cell_type": "code", - "source": "dataset.addDescriptors([MorganFP(radius=3, nBits=2048), RDKitDescs()], recalculate=True)", + "source": [ + "dataset.addDescriptors([MorganFP(radius=3, nBits=2048), RDKitDescs()], recalculate=True)" + ], "outputs": [], - "execution_count": 35 + "execution_count": 21 }, { "cell_type": "markdown", @@ -2326,14 +1829,16 @@ ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:41:04.315339Z", - "start_time": "2024-08-28T08:41:04.312427Z" + "end_time": "2024-09-03T21:47:07.946246Z", + "start_time": "2024-09-03T21:47:07.943330Z" } }, - "cell_type": "code", - "source": "dataset.descriptors", + "source": [ + "dataset.descriptors" + ], "outputs": [ { "data": { @@ -2341,170 +1846,174 @@ "[DescriptorTable (3286), DescriptorTable (3286)]" ] }, - "execution_count": 36, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 36 + "execution_count": 22 }, { - "metadata": {}, "cell_type": "markdown", - "source": "For your convenience, these are nothing else, but specialized implementations of `PandasDataTable` objects, so you can use all the methods and attributes of the `PropertyStorage` API on them as well:" + "metadata": {}, + "source": [ + "For your convenience, these are nothing else, but specialized implementations of `PandasDataTable` objects, so you can use all the methods and attributes of the `PropertyStorage` API on them as well:" + ] }, { + "cell_type": "code", "metadata": { "ExecuteTime": { - "end_time": "2024-08-28T08:41:04.352436Z", - "start_time": "2024-08-28T08:41:04.316072Z" + "end_time": "2024-09-03T21:47:07.982975Z", + "start_time": "2024-09-03T21:47:07.946704Z" } }, - "cell_type": "code", - "source": "dataset.descriptors[1].getDF()", + "source": [ + "dataset.descriptors[1].getDF()" + ], "outputs": [ { "data": { "text/plain": [ " RDkit_AvgIpc RDkit_BCUT2D_CHGHI \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N 3.096595 2.435707 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 3.647985 2.228107 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N 3.323272 2.150687 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 3.404520 2.187499 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N 3.391757 2.143655 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 3.143765 2.147943 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 3.236408 2.148100 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 2.651005 2.149889 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 2.879929 2.106531 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 3.578061 2.416188 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 3.428739 2.225750 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 3.516896 2.192647 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 2.736142 2.208091 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 3.163482 2.187793 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 3.075536 2.153961 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 2.673123 2.171015 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 2.973333 2.159815 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N 2.962669 2.748903 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 3.012530 2.139712 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 3.212763 2.148222 \n", "\n", " RDkit_BCUT2D_CHGLO RDkit_BCUT2D_LOGPHI \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N -2.161774 2.317230 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N -2.318791 2.236758 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N -2.126135 2.227910 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N -2.085804 2.198792 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N -2.088833 2.263334 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N -2.064522 2.152083 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N -2.038914 2.327026 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N -2.122324 2.239270 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N -1.952251 2.285934 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N -2.321011 2.341634 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N -2.131526 2.368438 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N -2.105192 2.253108 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N -2.320550 2.276421 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N -2.150447 2.319330 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N -2.097605 2.239338 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N -2.111679 2.324862 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N -2.063629 2.380729 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N -2.232095 2.673987 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N -2.086109 2.313124 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N -2.077333 2.222794 \n", "\n", " RDkit_BCUT2D_LOGPLOW RDkit_BCUT2D_MRHI \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N -2.326432 5.825265 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N -2.454781 5.933234 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N -2.166341 5.913803 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N -2.285132 5.994153 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N -2.207007 7.125505 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N -2.260265 5.904432 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N -1.967526 7.208427 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N -2.313572 5.967520 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N -2.070887 7.183883 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N -2.495074 6.182725 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N -2.035623 7.980846 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N -2.221299 6.006421 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N -2.466470 5.950158 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N -2.189010 7.214696 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N -2.394732 5.820468 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N -2.186289 8.143559 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N -2.191988 5.708237 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N -2.410849 6.283495 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N -1.989427 6.301805 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N -2.192116 7.887616 \n", "\n", " RDkit_BCUT2D_MRLOW RDkit_BCUT2D_MWHI \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N -0.051059 16.562593 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 0.189527 16.333855 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N -0.115143 16.342516 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 0.094746 16.477554 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N -0.117656 32.133480 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 0.093952 16.462145 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 0.414717 32.133545 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 0.091172 16.474794 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 1.017097 32.134708 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 0.226536 16.486029 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 0.299591 32.166546 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 0.261525 16.466509 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 0.066390 16.255518 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 0.936962 32.133556 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 0.414619 16.465296 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 0.264614 32.166599 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 0.367977 16.327944 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N -0.131544 35.495701 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 0.579939 35.495693 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 0.102196 32.233116 \n", "\n", " RDkit_BCUT2D_MWLOW RDkit_BalabanJ ... \\\n", "ID ... \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N 10.128448 1.967740 ... \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 10.039444 1.133155 ... \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N 10.119081 1.623868 ... \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 10.272628 1.513568 ... \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N 10.223183 1.543120 ... \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 10.272797 1.720541 ... \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 10.172657 1.857624 ... \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 10.100842 1.803300 ... \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 10.291913 2.419333 ... \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 10.163918 1.344940 ... \n", "... ... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 9.966175 1.679689 ... \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 10.272453 1.567671 ... \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 10.085078 1.664819 ... \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 10.310705 1.704997 ... \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 10.140409 2.239727 ... \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 9.991602 2.385960 ... \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 10.116677 1.951521 ... \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N 9.981504 1.537104 ... \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 9.997193 2.224349 ... \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 10.123233 1.396791 ... \n", "\n", " RDkit_fr_sulfone RDkit_fr_term_acetylene \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N 0.0 0.0 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 0.0 0.0 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N 0.0 0.0 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 0.0 0.0 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N 0.0 0.0 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 0.0 0.0 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 0.0 0.0 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 0.0 0.0 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 0.0 0.0 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 0.0 0.0 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 0.0 0.0 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 0.0 0.0 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 0.0 0.0 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 0.0 0.0 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 0.0 0.0 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 0.0 0.0 \n", "\n", " RDkit_fr_tetrazole RDkit_fr_thiazole \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N 0.0 0.0 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 0.0 0.0 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N 0.0 0.0 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 0.0 0.0 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N 0.0 1.0 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 0.0 0.0 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 0.0 1.0 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 0.0 0.0 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 0.0 1.0 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 0.0 0.0 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 0.0 0.0 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 0.0 0.0 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 0.0 0.0 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 0.0 1.0 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 0.0 0.0 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 0.0 0.0 \n", "\n", " RDkit_fr_thiocyan RDkit_fr_thiophene \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N 0.0 0.0 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 0.0 0.0 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N 0.0 0.0 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 0.0 0.0 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N 0.0 0.0 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 0.0 0.0 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 0.0 0.0 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 0.0 0.0 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 0.0 1.0 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 0.0 0.0 \n", "... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 0.0 0.0 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 0.0 0.0 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 0.0 0.0 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 0.0 0.0 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N 0.0 0.0 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 0.0 0.0 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 0.0 0.0 \n", "\n", " RDkit_fr_unbrch_alkane RDkit_fr_urea RDkit_qed \\\n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N 0.0 1.0 0.258843 \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N 0.0 1.0 0.378186 \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N 0.0 0.0 0.495511 \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N 1.0 0.0 0.475011 \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N 0.0 0.0 0.507193 \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N 0.0 0.0 0.684213 \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N 0.0 0.0 0.619291 \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N 0.0 0.0 0.665445 \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N 0.0 0.0 0.702394 \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N 0.0 1.0 0.464362 \n", "... ... ... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N 0.0 0.0 0.574570 \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N 0.0 1.0 0.448056 \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N 0.0 0.0 0.702763 \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N 0.0 0.0 0.802397 \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N 0.0 0.0 0.749242 \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N 0.0 0.0 0.861072 \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N 0.0 0.0 0.398507 \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N 0.0 0.0 0.554752 \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N 0.0 0.0 0.771410 \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N 0.0 0.0 0.331005 \n", "\n", " ID \n", "ID \n", - "WKDODQMRRMLXCA-UHFFFAOYSA-N WKDODQMRRMLXCA-UHFFFAOYSA-N \n", - "WLJHLZNFHMEDTP-UHFFFAOYSA-N WLJHLZNFHMEDTP-UHFFFAOYSA-N \n", - "WLMUFMKHAZAXJK-UHFFFAOYSA-N WLMUFMKHAZAXJK-UHFFFAOYSA-N \n", - "WLODUKXHHYGQDP-UHFFFAOYSA-N WLODUKXHHYGQDP-UHFFFAOYSA-N \n", - "WLQRARSNWWFZFK-UHFFFAOYSA-N WLQRARSNWWFZFK-UHFFFAOYSA-N \n", + "AUBIOACYLHNVOM-UHFFFAOYSA-N AUBIOACYLHNVOM-UHFFFAOYSA-N \n", + "AUNXSAUXRUCPOQ-UHFFFAOYSA-N AUNXSAUXRUCPOQ-UHFFFAOYSA-N \n", + "AUORTGQKWNTDKT-UHFFFAOYSA-N AUORTGQKWNTDKT-UHFFFAOYSA-N \n", + "AUQHNAQGJBULMA-UHFFFAOYSA-N AUQHNAQGJBULMA-UHFFFAOYSA-N \n", + "AVCVCCQJIMUKOJ-UHFFFAOYSA-N AVCVCCQJIMUKOJ-UHFFFAOYSA-N \n", "... ... \n", - "SEHVZOJKPYLYHJ-UHFFFAOYSA-N SEHVZOJKPYLYHJ-UHFFFAOYSA-N \n", - "SEJFSOMEWFABFC-UHFFFAOYSA-N SEJFSOMEWFABFC-UHFFFAOYSA-N \n", - "SENKKHIKGQSFKC-UHFFFAOYSA-N SENKKHIKGQSFKC-UHFFFAOYSA-N \n", - "SEONSDFSIHPWGR-UHFFFAOYSA-N SEONSDFSIHPWGR-UHFFFAOYSA-N \n", - "SERGQABLLZFSEN-UHFFFAOYSA-N SERGQABLLZFSEN-UHFFFAOYSA-N \n", + "ZXFPGFBEAAFTOL-UHFFFAOYSA-N ZXFPGFBEAAFTOL-UHFFFAOYSA-N \n", + "ZXNLEZIGIRYUMA-UHFFFAOYSA-N ZXNLEZIGIRYUMA-UHFFFAOYSA-N \n", + "ZXPDGTGMZKIESV-UHFFFAOYSA-N ZXPDGTGMZKIESV-UHFFFAOYSA-N \n", + "ZXUDFYXHMFXRAZ-UHFFFAOYSA-N ZXUDFYXHMFXRAZ-UHFFFAOYSA-N \n", + "ZYILDTWBSDDXCD-UHFFFAOYSA-N ZYILDTWBSDDXCD-UHFFFAOYSA-N \n", "\n", "[3286 rows x 211 columns]" ], @@ -2576,17 +2085,17 @@ " \n", " \n", " \n", - " WKDODQMRRMLXCA-UHFFFAOYSA-N\n", - " 3.096595\n", - " 2.435707\n", - " -2.161774\n", - " 2.317230\n", - " -2.326432\n", - " 5.825265\n", - " -0.051059\n", - " 16.562593\n", - " 10.128448\n", - " 1.967740\n", + " AUBIOACYLHNVOM-UHFFFAOYSA-N\n", + " 3.143765\n", + " 2.147943\n", + " -2.064522\n", + " 2.152083\n", + " -2.260265\n", + " 5.904432\n", + " 0.093952\n", + " 16.462145\n", + " 10.272797\n", + " 1.720541\n", " ...\n", " 0.0\n", " 0.0\n", @@ -2595,46 +2104,46 @@ " 0.0\n", " 0.0\n", " 0.0\n", - " 1.0\n", - " 0.258843\n", - " WKDODQMRRMLXCA-UHFFFAOYSA-N\n", + " 0.0\n", + " 0.684213\n", + " AUBIOACYLHNVOM-UHFFFAOYSA-N\n", " \n", " \n", - " WLJHLZNFHMEDTP-UHFFFAOYSA-N\n", - " 3.647985\n", - " 2.228107\n", - " -2.318791\n", - " 2.236758\n", - " -2.454781\n", - " 5.933234\n", - " 0.189527\n", - " 16.333855\n", - " 10.039444\n", - " 1.133155\n", + " AUNXSAUXRUCPOQ-UHFFFAOYSA-N\n", + " 3.236408\n", + " 2.148100\n", + " -2.038914\n", + " 2.327026\n", + " -1.967526\n", + " 7.208427\n", + " 0.414717\n", + " 32.133545\n", + " 10.172657\n", + " 1.857624\n", " ...\n", " 0.0\n", " 0.0\n", " 0.0\n", + " 1.0\n", " 0.0\n", " 0.0\n", " 0.0\n", " 0.0\n", - " 1.0\n", - " 0.378186\n", - " WLJHLZNFHMEDTP-UHFFFAOYSA-N\n", + " 0.619291\n", + " AUNXSAUXRUCPOQ-UHFFFAOYSA-N\n", " \n", " \n", - " WLMUFMKHAZAXJK-UHFFFAOYSA-N\n", - " 3.323272\n", - " 2.150687\n", - " -2.126135\n", - " 2.227910\n", - " -2.166341\n", - " 5.913803\n", - " -0.115143\n", - " 16.342516\n", - " 10.119081\n", - " 1.623868\n", + " AUORTGQKWNTDKT-UHFFFAOYSA-N\n", + " 2.651005\n", + " 2.149889\n", + " -2.122324\n", + " 2.239270\n", + " -2.313572\n", + " 5.967520\n", + " 0.091172\n", + " 16.474794\n", + " 10.100842\n", + " 1.803300\n", " ...\n", " 0.0\n", " 0.0\n", @@ -2644,56 +2153,56 @@ " 0.0\n", " 0.0\n", " 0.0\n", - " 0.495511\n", - " WLMUFMKHAZAXJK-UHFFFAOYSA-N\n", + " 0.665445\n", + " AUORTGQKWNTDKT-UHFFFAOYSA-N\n", " \n", " \n", - " WLODUKXHHYGQDP-UHFFFAOYSA-N\n", - " 3.404520\n", - " 2.187499\n", - " -2.085804\n", - " 2.198792\n", - " -2.285132\n", - " 5.994153\n", - " 0.094746\n", - " 16.477554\n", - " 10.272628\n", - " 1.513568\n", + " AUQHNAQGJBULMA-UHFFFAOYSA-N\n", + " 2.879929\n", + " 2.106531\n", + " -1.952251\n", + " 2.285934\n", + " -2.070887\n", + " 7.183883\n", + " 1.017097\n", + " 32.134708\n", + " 10.291913\n", + " 2.419333\n", " ...\n", " 0.0\n", " 0.0\n", " 0.0\n", - " 0.0\n", - " 0.0\n", + " 1.0\n", " 0.0\n", " 1.0\n", " 0.0\n", - " 0.475011\n", - " WLODUKXHHYGQDP-UHFFFAOYSA-N\n", + " 0.0\n", + " 0.702394\n", + " AUQHNAQGJBULMA-UHFFFAOYSA-N\n", " \n", " \n", - " WLQRARSNWWFZFK-UHFFFAOYSA-N\n", - " 3.391757\n", - " 2.143655\n", - " -2.088833\n", - " 2.263334\n", - " -2.207007\n", - " 7.125505\n", - " -0.117656\n", - " 32.133480\n", - " 10.223183\n", - " 1.543120\n", + " AVCVCCQJIMUKOJ-UHFFFAOYSA-N\n", + " 3.578061\n", + " 2.416188\n", + " -2.321011\n", + " 2.341634\n", + " -2.495074\n", + " 6.182725\n", + " 0.226536\n", + " 16.486029\n", + " 10.163918\n", + " 1.344940\n", " ...\n", " 0.0\n", " 0.0\n", " 0.0\n", - " 1.0\n", " 0.0\n", " 0.0\n", " 0.0\n", " 0.0\n", - " 0.507193\n", - " WLQRARSNWWFZFK-UHFFFAOYSA-N\n", + " 1.0\n", + " 0.464362\n", + " AVCVCCQJIMUKOJ-UHFFFAOYSA-N\n", " \n", " \n", " ...\n", @@ -2720,17 +2229,17 @@ " ...\n", " \n", " \n", - " SEHVZOJKPYLYHJ-UHFFFAOYSA-N\n", - " 3.428739\n", - " 2.225750\n", - " -2.131526\n", - " 2.368438\n", - " -2.035623\n", - " 7.980846\n", - " 0.299591\n", - " 32.166546\n", - " 9.966175\n", - " 1.679689\n", + " ZXFPGFBEAAFTOL-UHFFFAOYSA-N\n", + " 2.673123\n", + " 2.171015\n", + " -2.111679\n", + " 2.324862\n", + " -2.186289\n", + " 8.143559\n", + " 0.264614\n", + " 32.166599\n", + " 9.991602\n", + " 2.385960\n", " ...\n", " 0.0\n", " 0.0\n", @@ -2740,21 +2249,21 @@ " 0.0\n", " 0.0\n", " 0.0\n", - " 0.574570\n", - " SEHVZOJKPYLYHJ-UHFFFAOYSA-N\n", + " 0.861072\n", + " ZXFPGFBEAAFTOL-UHFFFAOYSA-N\n", " \n", " \n", - " SEJFSOMEWFABFC-UHFFFAOYSA-N\n", - " 3.516896\n", - " 2.192647\n", - " -2.105192\n", - " 2.253108\n", - " -2.221299\n", - " 6.006421\n", - " 0.261525\n", - " 16.466509\n", - " 10.272453\n", - " 1.567671\n", + " ZXNLEZIGIRYUMA-UHFFFAOYSA-N\n", + " 2.973333\n", + " 2.159815\n", + " -2.063629\n", + " 2.380729\n", + " -2.191988\n", + " 5.708237\n", + " 0.367977\n", + " 16.327944\n", + " 10.116677\n", + " 1.951521\n", " ...\n", " 0.0\n", " 0.0\n", @@ -2763,22 +2272,22 @@ " 0.0\n", " 0.0\n", " 0.0\n", - " 1.0\n", - " 0.448056\n", - " SEJFSOMEWFABFC-UHFFFAOYSA-N\n", + " 0.0\n", + " 0.398507\n", + " ZXNLEZIGIRYUMA-UHFFFAOYSA-N\n", " \n", " \n", - " SENKKHIKGQSFKC-UHFFFAOYSA-N\n", - " 2.736142\n", - " 2.208091\n", - " -2.320550\n", - " 2.276421\n", - " -2.466470\n", - " 5.950158\n", - " 0.066390\n", - " 16.255518\n", - " 10.085078\n", - " 1.664819\n", + " ZXPDGTGMZKIESV-UHFFFAOYSA-N\n", + " 2.962669\n", + " 2.748903\n", + " -2.232095\n", + " 2.673987\n", + " -2.410849\n", + " 6.283495\n", + " -0.131544\n", + " 35.495701\n", + " 9.981504\n", + " 1.537104\n", " ...\n", " 0.0\n", " 0.0\n", @@ -2788,45 +2297,45 @@ " 0.0\n", " 0.0\n", " 0.0\n", - " 0.702763\n", - " SENKKHIKGQSFKC-UHFFFAOYSA-N\n", + " 0.554752\n", + " ZXPDGTGMZKIESV-UHFFFAOYSA-N\n", " \n", " \n", - " SEONSDFSIHPWGR-UHFFFAOYSA-N\n", - " 3.163482\n", - " 2.187793\n", - " -2.150447\n", - " 2.319330\n", - " -2.189010\n", - " 7.214696\n", - " 0.936962\n", - " 32.133556\n", - " 10.310705\n", - " 1.704997\n", + " ZXUDFYXHMFXRAZ-UHFFFAOYSA-N\n", + " 3.012530\n", + " 2.139712\n", + " -2.086109\n", + " 2.313124\n", + " -1.989427\n", + " 6.301805\n", + " 0.579939\n", + " 35.495693\n", + " 9.997193\n", + " 2.224349\n", " ...\n", " 0.0\n", " 0.0\n", " 0.0\n", - " 1.0\n", " 0.0\n", " 0.0\n", " 0.0\n", " 0.0\n", - " 0.802397\n", - " SEONSDFSIHPWGR-UHFFFAOYSA-N\n", + " 0.0\n", + " 0.771410\n", + " ZXUDFYXHMFXRAZ-UHFFFAOYSA-N\n", " \n", " \n", - " SERGQABLLZFSEN-UHFFFAOYSA-N\n", - " 3.075536\n", - " 2.153961\n", - " -2.097605\n", - " 2.239338\n", - " -2.394732\n", - " 5.820468\n", - " 0.414619\n", - " 16.465296\n", - " 10.140409\n", - " 2.239727\n", + " ZYILDTWBSDDXCD-UHFFFAOYSA-N\n", + " 3.212763\n", + " 2.148222\n", + " -2.077333\n", + " 2.222794\n", + " -2.192116\n", + " 7.887616\n", + " 0.102196\n", + " 32.233116\n", + " 10.123233\n", + " 1.396791\n", " ...\n", " 0.0\n", " 0.0\n", @@ -2836,8 +2345,8 @@ " 0.0\n", " 0.0\n", " 0.0\n", - " 0.749242\n", - " SERGQABLLZFSEN-UHFFFAOYSA-N\n", + " 0.331005\n", + " ZYILDTWBSDDXCD-UHFFFAOYSA-N\n", " \n", " \n", "\n", @@ -2845,12 +2354,12 @@ "" ] }, - "execution_count": 37, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], - "execution_count": 37 + "execution_count": 23 }, { "cell_type": "markdown", @@ -2860,6 +2369,13 @@ "\n", "Now you know how data sets are represented in QSPRpred. Before you start modelling, you should also check out the [data preparation tutorial](data_preparation.ipynb) to learn how to prepare your data sets for modelling. This tutorial covers additional preparation steps such as feature filtering, selection and standardization through the `QSPRDataset.prepareDataset` method." ] + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": "" } ], "metadata": {