Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Detector::Journals model into two, extending BulkChecker into the detection model #162

Merged
merged 3 commits into from
Dec 20, 2024

Conversation

matt-bernhardt
Copy link
Member

@matt-bernhardt matt-bernhardt commented Dec 19, 2024

This splits the Detector::Journal model into two pieces:

  1. Journal - which is concerned with storing the list of journal records which incoming terms are compared with. This is a database-backed Rails model, outside the detector namespace.
  2. Detector::Journal - which actually does the comparison between incoming terms and the list of records. This is a plain ruby object, no backing database. It now includes the BulkChecker module, so developers can use the .check_all_matches method for their work.

In the process of making these changes, the existing set of journals which were imported via a rake task are dropped, so will need to be re-imported. The rake task has been updated to deposit this information in the new Journal model, but that population is not part of this PR. I left it out from a concern that the migration would take too long during deployment. The impact of this is that, for the minutes between this change being deployed and when the new rake task is run, TACOS will not match any journals.

Developer

Ticket(s)

https://mitlibraries.atlassian.net/browse/TCO-110

Accessibility

  • ANDI or Wave has been run in accordance to our guide and
    all issues introduced by these changes have been resolved or opened
    as new issues (link to those issues in the Pull Request details above)
  • There are no accessibility implications to this change

Documentation

  • Project documentation has been updated, and yard output previewed
  • No documentation changes are needed

ENV

  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.

Stakeholders

  • Stakeholder approval has been confirmed
  • Stakeholder approval is not needed

Dependencies and migrations

NO dependencies are updated

YES migrations are included

PLEASE NOTE - running these migrations will result in the loss of data from your local application, as any existing Detector::Journal records will be lost. Rolling back the migration is possible, but not the restoring of lost records.

Reviewer

Code

  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.

Documentation

  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.

Testing

  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-162 December 19, 2024 17:22 Inactive
@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-162 December 19, 2024 19:24 Inactive
@matt-bernhardt matt-bernhardt marked this pull request as ready for review December 19, 2024 19:37
@jazairi jazairi self-assigned this Dec 19, 2024
** Why are these changes being introduced:

Our current Detector::Journal model is asked to do two separate tasks:
1. Detect whether incoming search terms match with a list of known
   journals
2. Maintain that list of known journals

This conflation of concerns prevents the model from using some of the
shared tooling we are developing, like the Bulk Checker module, because
the Bulk Checker ends up calling the .new() method (which doesn't look
for a new detection, but creates a new journal record).

** Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/tco-110

** How does this address that need:

This commit takes the first step toward resolving this conflict, by
creating a new Journal model that will ake over the management of
journal records. The Detector::Journal will then be simplified in a
future commit to only deal with the detection operations.

Specifically:
1. Adds a migration to create the Journal database table, matching the
structure of the Detector::Journal table.

2. Creates test fixtures for these records

3. Updates the existing rake tasks to manage Journal records, and not
   Detector::Journal.

4. The new ::Journal model copies the records-management methods and
   lifecycle hook from Detector::Journal (downcasing journal title)

5. The existing Detector::Journal model has its _match methods updated
   to look up matches using the ::Journal model - which is initially
   unpopulated (see side effects)

6. Tests are updated to reflect the use of the new Journal model.

** Document any side effects to this change:

As of this commit, there is a bit of duplication in the application,
which will be resolved in future commits in this PR.

This commit also does not initially populate the new Journal table,
so the updated rake task will need to be run as part of the promotion
of this work. During the period of time between the deployment and
running this rake task, the application will fail to detect journal
matches correctly.
** Why are these changes being introduced:

The previous commit on this branch created a new Journal model and table
for the management of journal records, but did not remove that
functionality from the existing Detector::Journal model.

** Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/tco-110

** How does this address that need:

This finishes the job, removing the database table for detector_journals
and converting the Detector::Journal model from an ActiveRecord type to
a "plain old ruby object" (PORO).

During this change, we also remove the test fixtures, and the one test
of this model that confirmed this records management function. That test
was previously moved to the Journal test suite.

** Document any side effects to this change:

None
** Why are these changes being introduced:

Now that the Detector::Journal model is only concerns with detections,
and calling Detector::Journal.new no longer spawns any database records,
we can implement the BulkChecker module for better developer experience.

** Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/tco-110

** How does this address that need:

This defines a @Detections attribute on the Detector::Journal class,
as well as an initialize method to populate that attribute based on the
full_term_match method.

This allows the BulkChecker-provided .check_all_matches method to return
meaningful information.

** Document any side effects to this change:

This implementation may be too minimal, as the initialize method ends
up calling the singleton methods in a slightly awkward way (or maybe I
need to get used to this way of working, and not expect to just call
full_term_matches directly in the initializer).
@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-162 December 19, 2024 21:37 Inactive
Copy link
Contributor

@jazairi jazairi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't confirmed the BulkChecker functionality, but this approach aligns with my understanding of what we're trying to do here. Having a model for how to separate out the ActiveRecord bit will make this easier to implement for suggested resources, so thanks for kicking this off!

@matt-bernhardt matt-bernhardt merged commit 143d467 into main Dec 20, 2024
6 checks passed
@matt-bernhardt matt-bernhardt deleted the tco-110-journals branch December 20, 2024 16:09
jazairi added a commit that referenced this pull request Feb 10, 2025
Why these changes are being introduced:

We [recently decided](#145) to make a separate Fingerprint
model, associated with Term, as multiple detectors are likely
to use fringerprinting (implemented in #138). We have also begun
to split the ActiveRecord components of detectors into separate
models (implemented for Detector::Journal in #162).

Relevant ticket(s):

* [TCO-111](https://mitlibraries.atlassian.net/browse/TCO-111)
* [TCO-122](https://mitlibraries.atlassian.net/browse/TCO-122)

How this addresses that need:

* Splits the ActiveRecord components of Detector::SuggestedResource
into a separate SuggestedResource model.
* Associates SuggestedResource with Fingerprint, via Term, such
that a suggested resource can have multiple terms and fingerprints.
* Removes the suggested resource dashboard (see side effects).

Side effects of this change:

* Terms that are associated with a suggested resource should not
be destroyed. Rails does not allow the `:dependent` option on
`belongs_to` associations, so this commit instead adds a
`before_destroy` hook with a custom method that aborts the callback
and logs the attempt in Sentry.
* Because administrate does not handle has_many relationships well,
we will need to build a custom dashboard to manage suggested resources.
This is ticketed as [TCO-145](https://mitlibraries.atlassian.net/browse/TCO-145).
Until that UI is ready, we will use the Rails console to make any
requested changes to suggested resources.
jazairi added a commit that referenced this pull request Feb 10, 2025
Why these changes are being introduced:

We [recently decided](#145) to make a separate Fingerprint
model, associated with Term, as multiple detectors are likely
to use fringerprinting (implemented in #138). We have also begun
to split the ActiveRecord components of detectors into separate
models (implemented for Detector::Journal in #162).

Relevant ticket(s):

* [TCO-111](https://mitlibraries.atlassian.net/browse/TCO-111)
* [TCO-122](https://mitlibraries.atlassian.net/browse/TCO-122)

How this addresses that need:

* Splits the ActiveRecord components of Detector::SuggestedResource
into a separate SuggestedResource model.
* Associates SuggestedResource with Fingerprint, via Term, such
that a suggested resource can have multiple terms and fingerprints.
* Removes the suggested resource dashboard (see side effects).

Side effects of this change:

* Terms that are associated with a suggested resource should not
be destroyed. Rails does not allow the `:dependent` option on
`belongs_to` associations, so this commit instead adds a
`before_destroy` hook with a custom method that aborts the callback
and logs the attempt in Sentry.
* Because administrate does not handle has_many relationships well,
we will need to build a custom dashboard to manage suggested resources.
This is ticketed as [TCO-145](https://mitlibraries.atlassian.net/browse/TCO-145).
Until that UI is ready, we will use the Rails console to make any
requested changes to suggested resources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants