Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds CSV loader for SearchEvents #54

Merged
merged 3 commits into from
Jul 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions lib/tasks/search_event_loader.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# frozen_string_literal: true

require 'csv'

# Loaders can bulk load data
namespace :search_events do
# csv loader can bulk load SearchEvents and Terms.
#
# @note For use in development environments only. Duplicate search events will be created if the same CSV is loaded
# multiple times.
#
# @note the csv should be formated as `term phrase`, `timestamp`. A dataclip is available that can export in this
# format.
# @example
# bin/rails search_events:csv_loader['local_path_to_file.csv', 'some-source-to-use-for-all-loaded-records']
#
# @param path [String] local file path to a CSV file to load
# @param source [String] source name to load the data under
desc 'Load search_events from csv'
task :csv_loader, %i[path source] => :environment do |_task, args|
raise ArgumentError.new, 'Path is required' unless args.path.present?
raise ArgumentError.new, 'Source is required' unless args.source.present?

Rails.logger.info("Loading data from #{args.path}")

CSV.foreach(args.path) do |row|
term = Term.create_or_find_by!(phrase: row.first)
term.search_events.create!(source: args.source, created_at: row.last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter that this will create duplicate search events if, for example, the same CSV is loaded twice, right? I assume not given that this is just for dev environments, but I want to make sure I'm not missing anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPrevost Tagging on ☝️, as it occurred to me that GitHub may not send notifications on comments until a review is complete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that this would load the same csv multiple times if run multiple times and doesn't try to detect that it has already been loaded. I suspect this is fine, but maybe at minimum we should note?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a note to that effect. Will approve now so you can merge this as soon as you like, but feel free to modify the note as needed. (I'm not sure what multiple notes for the same method look like in YARD, and for some reason my local docserver isn't showing me the CSV loader task.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the tasks don't show up in Yard sadly. I suspect we can nudge them in there with some config. Multiple notes are fine though from my poking around on other methods.

end
end
end
Loading