Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Feature/postgres similarity functions #2224

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
58d6d9c
Make spellcheck work cross-platform
zmbc Apr 3, 2024
2fbcced
Merge branch 'master' into spellcheck-cross-platform
zmbc Apr 4, 2024
f159e05
Merge branch 'master' into spellcheck-cross-platform
zslade Apr 5, 2024
75e752f
Make spellchecker script executable
zmbc Apr 9, 2024
99838f3
Include task in pyspelling call
zmbc Apr 9, 2024
71844d5
Include sentence to encourage contributions
zmbc Apr 9, 2024
ab17375
Update documentation on settings validation in response to code changes
ThomasHepworth Apr 23, 2024
ebba34b
Update predict.py
samnlindsay Apr 24, 2024
86f955c
Merge pull request #2152 from moj-analytical-services/bugfix_predict_…
RobinL Apr 25, 2024
ad28a62
Merge pull request #2149 from moj-analytical-services/docs/updating_s…
ThomasHepworth Apr 30, 2024
3d7cf00
Fixing spurious error messages with Databricks enable_splink
aymonwuolanne May 1, 2024
7dccd66
format
RobinL May 1, 2024
268f77e
remove ref to github action
zslade May 2, 2024
8425395
Merge pull request #2163 from moj-analytical-services/docs_tweak
zslade May 2, 2024
e252813
Reword script instructions
zmbc May 6, 2024
df49f62
Merge branch 'master' into spellcheck-cross-platform
zmbc May 7, 2024
be6a9ad
Merge pull request #2159 from aymonwuolanne/master
RobinL May 8, 2024
5c6df64
Merge branch 'master' into spellcheck-cross-platform
zmbc May 8, 2024
6c0437c
Fix Splink 4 blog post link
probjects May 9, 2024
1c69ebc
Merge pull request #2172 from probjects/master
RobinL May 10, 2024
0a10a93
Merge branch 'master' into spellcheck-cross-platform
zslade May 10, 2024
479cef8
Merge pull request #2131 from zmbc/spellcheck-cross-platform
zslade May 10, 2024
0684423
(feat) #2198 add postgres backend similarity functions to fully suppo…
vflumeris May 31, 2024
139cf12
fix missing import, run format again?
vflumeris May 31, 2024
026a90d
update broken documentation build
vflumeris Jun 3, 2024
50d8cac
fix documententation bad on postgres_docker
vflumeris Jun 3, 2024
e8c7923
lint with black
vflumeris Jul 2, 2024
e35f43c
try removing additional install in docs workflow
RobinL Jul 2, 2024
7137024
add tests of postgres functions
RobinL Jul 2, 2024
e59bfd3
exports
RobinL Jul 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,5 @@ cython_debug/
splink_db
splink_db_log
spark-warehouse

scripts/pyspelling/dictionary.dic
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
[![Documentation](https://img.shields.io/badge/API-documentation-blue)](https://moj-analytical-services.github.io/splink/)

> [!IMPORTANT]
> Development has begun on Splink 4 on the `splink4_dev` branch. Splink 3 is in maintenance mode and we are no longer accepting new features. We welcome contributions to Splink 4. Read more on our latest [blog](https://moj-analytical-services.github.io/splink/blog/2024/03/19/splink4.html).
> Development has begun on Splink 4 on the `splink4_dev` branch. Splink 3 is in maintenance mode and we are no longer accepting new features. We welcome contributions to Splink 4. Read more on our latest [blog](https://moj-analytical-services.github.io/splink/blog/2024/04/02/splink-3-updates-and-splink-4-development-announcement---april-2024.html).

# Fast, accurate and scalable probabilistic data linkage

Expand Down
18 changes: 12 additions & 6 deletions docs/dev_guides/changing_splink/contributing_to_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,29 +16,35 @@ Once you've finished updating Splink documentation we ask that you run our spell

## Spellchecking docs

When updating Splink documentation, we ask that you run our spellchecker before submitting a pull request. This is to help ensure quality and consistency across the documentation. Please note, the spellchecker _only works on markdown files_ and currently only works on systems which support `Homebrew` package manager. Instructions for other operating systems will be released later.
When updating Splink documentation, we ask that you run our spellchecker before submitting a pull request. This is to help ensure quality and consistency across the documentation. If for whatever reason you can't run the spellchecker on your system, please don't let this prevent you from contributing to the documentation. Please note, the spellchecker _only works on markdown files_.

To run the spellchecker on either a single markdown file or folder of markdown files, you can use the following script:
If you are a Mac user with the `Homebrew` package manager installed, the script below will automatically install
the required system dependency, `aspell`.
If you've created your development environment [using conda](./development_quickstart.md), `aspell` will have been installed as part of that
process.
Instructions for installing `aspell` through other means may be added here in the future.

To run the spellchecker on either a single markdown file or folder of markdown files, you can run the following bash script:

```sh
source scripts/pyspelling/spellchecker.sh <path_to_file_or_folder>
./scripts/pyspelling/spellchecker.sh <path_to_file_or_folder>
```

Omitting the file/folder path will run the spellchecker on all markdown files contained in the `docs` folder. We recommend running the spellchecker only on files that you have created or edited.

The spellchecker uses the Python package [PySpelling](https://facelessuser.github.io/pyspelling/) and its underlying spellchecking tool, Aspell. Running the above script will automatically install these packages along with any other necessary dependencies.

The spellchecker compares words to a [standard British English dictionary](https://github.com/LibreOffice/dictionaries/blob/master/en/en_GB.aff) and a custom dictionary (`scripts/pyspelling/custom_dictionary.txt`) of words. If no spelling mistakes are found, you will see the following terminal printout:
The spellchecker compares words to a standard British English dictionary and a custom dictionary (`scripts/pyspelling/custom_dictionary.txt`) of words. If no spelling mistakes are found, you will see the following terminal printout:

```sh
```

Spelling check passed :)

```

otherwise, PySpelling will printout the spelling mistakes found in each file.

Correct spellings of words not found in a standard dictionary (e.g. Splink) can be recorded as such by adding them to `scripts/pyspelling/custom_dictionary.txt`. (Don't worry about adding them in alphabetical order or accidental duplication as this will be handled automatically by a GitHub Action future.)
Correct spellings of words not found in a standard dictionary (e.g. "Splink") can be recorded as such by adding them to `scripts/pyspelling/custom_dictionary.txt`.

Please correct any mistakes found or update the custom dictionary to ensure the spellchecker passes before putting in a pull request containing updates to the documentation.

Expand Down
2 changes: 1 addition & 1 deletion docs/dev_guides/changing_splink/development_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ and the teardown script each time you want to stop it:
```

Included in the docker-compose file is a [pgAdmin](https://www.pgadmin.org/) container to allow easy exploration of the database as you work, which can be accessed in-browser on the default port.
The default username is `[email protected]` with password `b`.
The default url: http://localhost:80/ username is `[email protected]` with password `b`.

## Step 3, Conda install option: Install system dependencies

Expand Down
Loading
Loading