Split Detector::Journals model into two, extending BulkChecker into the detection model #162

matt-bernhardt · 2024-12-19T17:20:07Z

This splits the Detector::Journal model into two pieces:

Journal - which is concerned with storing the list of journal records which incoming terms are compared with. This is a database-backed Rails model, outside the detector namespace.
Detector::Journal - which actually does the comparison between incoming terms and the list of records. This is a plain ruby object, no backing database. It now includes the BulkChecker module, so developers can use the .check_all_matches method for their work.

In the process of making these changes, the existing set of journals which were imported via a rake task are dropped, so will need to be re-imported. The rake task has been updated to deposit this information in the new Journal model, but that population is not part of this PR. I left it out from a concern that the migration would take too long during deployment. The impact of this is that, for the minutes between this change being deployed and when the new rake task is run, TACOS will not match any journals.

Developer

Ticket(s)

https://mitlibraries.atlassian.net/browse/TCO-110

Accessibility

ANDI or Wave has been run in accordance to our guide and
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
There are no accessibility implications to this change

Documentation

Project documentation has been updated, and yard output previewed
No documentation changes are needed

ENV

All new ENV is documented in README.
All new ENV has been added to Heroku Pipeline, Staging and Prod.
ENV has not changed.

Stakeholders

Stakeholder approval has been confirmed
Stakeholder approval is not needed

Dependencies and migrations

NO dependencies are updated

YES migrations are included

PLEASE NOTE - running these migrations will result in the loss of data from your local application, as any existing Detector::Journal records will be lost. Rolling back the migration is possible, but not the restoring of lost records.

Reviewer

Code

I have confirmed that the code works as intended.
Any CodeClimate issues have been fixed or confirmed as
added technical debt.

Documentation

The commit message is clear and follows our guidelines
(not just this pull request message).
The documentation has been updated or is unnecessary.
New dependencies are appropriate or there were no changes.

Testing

There are appropriate tests covering any new functionality.
No additional test coverage is required.

** Why are these changes being introduced: Our current Detector::Journal model is asked to do two separate tasks: 1. Detect whether incoming search terms match with a list of known journals 2. Maintain that list of known journals This conflation of concerns prevents the model from using some of the shared tooling we are developing, like the Bulk Checker module, because the Bulk Checker ends up calling the .new() method (which doesn't look for a new detection, but creates a new journal record). ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-110 ** How does this address that need: This commit takes the first step toward resolving this conflict, by creating a new Journal model that will ake over the management of journal records. The Detector::Journal will then be simplified in a future commit to only deal with the detection operations. Specifically: 1. Adds a migration to create the Journal database table, matching the structure of the Detector::Journal table. 2. Creates test fixtures for these records 3. Updates the existing rake tasks to manage Journal records, and not Detector::Journal. 4. The new ::Journal model copies the records-management methods and lifecycle hook from Detector::Journal (downcasing journal title) 5. The existing Detector::Journal model has its _match methods updated to look up matches using the ::Journal model - which is initially unpopulated (see side effects) 6. Tests are updated to reflect the use of the new Journal model. ** Document any side effects to this change: As of this commit, there is a bit of duplication in the application, which will be resolved in future commits in this PR. This commit also does not initially populate the new Journal table, so the updated rake task will need to be run as part of the promotion of this work. During the period of time between the deployment and running this rake task, the application will fail to detect journal matches correctly.

** Why are these changes being introduced: The previous commit on this branch created a new Journal model and table for the management of journal records, but did not remove that functionality from the existing Detector::Journal model. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-110 ** How does this address that need: This finishes the job, removing the database table for detector_journals and converting the Detector::Journal model from an ActiveRecord type to a "plain old ruby object" (PORO). During this change, we also remove the test fixtures, and the one test of this model that confirmed this records management function. That test was previously moved to the Journal test suite. ** Document any side effects to this change: None

** Why are these changes being introduced: Now that the Detector::Journal model is only concerns with detections, and calling Detector::Journal.new no longer spawns any database records, we can implement the BulkChecker module for better developer experience. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-110 ** How does this address that need: This defines a @Detections attribute on the Detector::Journal class, as well as an initialize method to populate that attribute based on the full_term_match method. This allows the BulkChecker-provided .check_all_matches method to return meaningful information. ** Document any side effects to this change: This implementation may be too minimal, as the initialize method ends up calling the singleton methods in a slightly awkward way (or maybe I need to get used to this way of working, and not expect to just call full_term_matches directly in the initializer).

jazairi

I haven't confirmed the BulkChecker functionality, but this approach aligns with my understanding of what we're trying to do here. Having a model for how to separate out the ActiveRecord bit will make this easier to implement for suggested resources, so thanks for kicking this off!

Why these changes are being introduced: We [recently decided](#145) to make a separate Fingerprint model, associated with Term, as multiple detectors are likely to use fringerprinting (implemented in #138). We have also begun to split the ActiveRecord components of detectors into separate models (implemented for Detector::Journal in #162). Relevant ticket(s): * [TCO-111](https://mitlibraries.atlassian.net/browse/TCO-111) * [TCO-122](https://mitlibraries.atlassian.net/browse/TCO-122) How this addresses that need: * Splits the ActiveRecord components of Detector::SuggestedResource into a separate SuggestedResource model. * Associates SuggestedResource with Fingerprint, via Term, such that a suggested resource can have multiple terms and fingerprints. * Removes the suggested resource dashboard (see side effects). Side effects of this change: * Terms that are associated with a suggested resource should not be destroyed. Rails does not allow the `:dependent` option on `belongs_to` associations, so this commit instead adds a `before_destroy` hook with a custom method that aborts the callback and logs the attempt in Sentry. * Because administrate does not handle has_many relationships well, we will need to build a custom dashboard to manage suggested resources. This is ticketed as [TCO-145](https://mitlibraries.atlassian.net/browse/TCO-145). Until that UI is ready, we will use the Rails console to make any requested changes to suggested resources.

mitlib temporarily deployed to tacos-api-pipeline-pr-162 December 19, 2024 17:22 Inactive

matt-bernhardt force-pushed the tco-110-journals branch from 5f99223 to 2e580ad Compare December 19, 2024 19:24

mitlib temporarily deployed to tacos-api-pipeline-pr-162 December 19, 2024 19:24 Inactive

matt-bernhardt requested review from JPrevost and jazairi December 19, 2024 19:37

matt-bernhardt marked this pull request as ready for review December 19, 2024 19:37

jazairi self-assigned this Dec 19, 2024

matt-bernhardt added 3 commits December 19, 2024 16:37

matt-bernhardt force-pushed the tco-110-journals branch from 2e580ad to ac9a31f Compare December 19, 2024 21:37

mitlib temporarily deployed to tacos-api-pipeline-pr-162 December 19, 2024 21:37 Inactive

jazairi approved these changes Dec 20, 2024

View reviewed changes

matt-bernhardt merged commit 143d467 into main Dec 20, 2024
6 checks passed

matt-bernhardt deleted the tco-110-journals branch December 20, 2024 16:09

jazairi mentioned this pull request Feb 10, 2025

Refactor SuggestedResource to leverage new Fingerprint model #185

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Detector::Journals model into two, extending BulkChecker into the detection model #162

Split Detector::Journals model into two, extending BulkChecker into the detection model #162

matt-bernhardt commented Dec 19, 2024 •

edited by jazairi

Loading

jazairi left a comment

Split Detector::Journals model into two, extending BulkChecker into the detection model #162

Split Detector::Journals model into two, extending BulkChecker into the detection model #162

Conversation

matt-bernhardt commented Dec 19, 2024 • edited by jazairi Loading

Developer

Ticket(s)

Accessibility

Documentation

ENV

Stakeholders

Dependencies and migrations

Reviewer

Code

Documentation

Testing

jazairi left a comment

Choose a reason for hiding this comment

matt-bernhardt commented Dec 19, 2024 •

edited by jazairi

Loading