-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine ions to salt #73
Comments
This turned out to be more complicated than initially expected. We were planning to simply look at the lists of reactants, agents, and products, and in case there's a single cation and a single anion, merge them to the corresponding salt. After thinking more carefully, I realized the following issues with this:
I think it's quite likely that when there's multiple (formal) charges (especially within the same molecule) that adding such a function could lead to unexpected and unintended consequences. As an example, consider the following 2 reactions:
In addition to these reasons, consider the following:
|
* implemented tetrakis_synonyms * merged cat and ligand to one * corrected acetal smiles typo * make black * set temp=0 if (ice present) AND (temp is None) * implemented remove_none_and_empty_str * make black * assert '' not in data * added test case of rxn with no product * make black * make black * updated rxn participation logic * remove reactions with no reactants or no products * replace len(df) with (df.shape[0]) * checkpoint for ipynb * checkpoint 2 for ipynb * raw update to solvents * del merge ions issue #73 * added catch in case rxn str is not mapped * make black * added temp fix for solvents.csv * fixed mypy bugs * fixed check_valid_order confusing col with row * added source to some of solvents * make black * update test data * updated clean handling of unresolved names * make black * import types into cleaner * make black * expanded make pytest * expanded pytest * changed _0 to _000 * updated tests: unresolved name handling * make black * import optional * added todo * Set num_reag = -1 will result in no filtering * updated ipynb * updaed todo text * changed column headers * separate synonyms into different columns * remove placeholder O SMILES string * comparison to yehia's solvents * added "" around 2 cas to avoid excel date * removed solvent_name_4 column * moved the source column * drop duplicates * removed duplicate names in solvent_name_2 column * sorting * removed duplicates between different sources * strip spaces * make lower case * removal of duplicate names * updating solvents lists * resolved all smiles conflicts in solvents.csv * checked that all smiles are resolvable * make black * saved SMILES as canon SMILES in solvents.csv * regenerated test data, solvents csv col name * testing for duplicates * deleted unnecessary solvents csv files * removed ICE_PRESENT type * added back . between metals and organic molecules * Back to SMILES with . between metal (except Pd) * added procedure for how solvents.csv was generated * is_digit->is_number.Now del floats. name_list incl * removed the .ipynb files and converted to .py, moved to notebooks section * black * mypy * inspecting the large ORD file * Reconsiling mac vs ubuntu (#85) * adding reconcile changes * removing none and nan for string to <empty> * regen test data * temp disable tests * undoing * regen test data * updated doc string * removed remapping of none to nan * fixing the ubuntu bug * don't include the large cleaned USPTO file * reactivate test * update doc string: why not incl cleaned USPTO * add black and mypy * make edit --------- Co-authored-by: Daniel Wigh <[email protected]> --------- Co-authored-by: joearrowsmith <[email protected]> Co-authored-by: Joe Arrowsmith <[email protected]>
Combining ions to their respective salts will mean that there will be fewer components per reaction, which may be helpful for ML model prediction.
NB: "[Na+].[OH-]" becomes "[Na]O" because non-organic atoms must have square brackets around them for the string to be a valid smiles string.
The text was updated successfully, but these errors were encountered: