Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Export logging abilities #1164

Merged
merged 17 commits into from
Nov 6, 2024
Merged

Expand Export logging abilities #1164

merged 17 commits into from
Nov 6, 2024

Conversation

CBroz1
Copy link
Member

@CBroz1 CBroz1 commented Oct 11, 2024

Description

This PR addresses #1144 by

  • logging table restrictions
  • logging table joins
  • Extracting Export functionality into a mixin that can be shared across SpyglassMixin and Merge

I made the decision to add a mixin folder with the future intent of breaking up SpyglassMixin into more manageable pieces.

This PR also makes changes to Merge.delete to avoid issues with attempting Part.delete with the new force_masters flag. I remove this flag and run a delete of orphaned master table entries

Checklist:

  • No. This PR should be accompanied by a release: (yes/no/unsure)
  • N/a. If release, I have updated the CITATION.cff
  • No. This PR makes edits to table definitions: (yes/no)
  • N/a. If table edits, I have included an alter snippet for release notes.
  • No. If this PR makes changes to position, I ran the relevant tests locally.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes

@CBroz1 CBroz1 marked this pull request as ready for review October 14, 2024 18:40
@CBroz1 CBroz1 requested a review from samuelbray32 October 14, 2024 18:40
@CBroz1 CBroz1 marked this pull request as draft October 14, 2024 19:38
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CBroz1 CBroz1 marked this pull request as ready for review October 14, 2024 21:52
Copy link
Collaborator

@samuelbray32 samuelbray32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review, will update after testing

docs/src/Features/Export.md Show resolved Hide resolved
@samuelbray32
Copy link
Collaborator

samuelbray32 commented Oct 23, 2024

Just a general comment from testing. I found in my code several cases where I logged large sections of tables for export due to restriction calls.

ex)
Table & restrt1 & restr2 where restr1 captures many unnecessary entries. I know can rewrite this line to avoid the issue, but identifying these in a users code could be a bit of work.

Would it make sense to have an environment variable such as EXPORT_RESTRICTIONS and only log the restrictions for exports if set to true?

Edit:
Alternatively, we could make a warning requiring user confirmation when restrict-logging a table that results in a large number (e.g. >1000) entries

@CBroz1
Copy link
Member Author

CBroz1 commented Oct 23, 2024

Would it make sense to have an environment variable such as EXPORT_RESTRICTIONS and only log the restrictions for exports if set to true?

If I understand correctly, we would have one flag for turning on logging, and another for turning on logging of restrictions? The latter sound like a different way of achieving Table.restrict(restr, log_export=False), which is maybe syntax we want to avoid for the end user

Maybe a prompt?

MyTable & my_restr1 & my_restr2
>> Warning. This logged >1k entries. Proceed? [Yes/No]
>> You may want to change this to Table & dj.AndList([r1, r2])

Ideally, we could find a way around this AND -> OR issue. That may be possible by logging the restriction at the point of fetch, rather than at the point of restriction. I think a plain restriction call will hit fetch before returning the result, turning the restriction to a list for me

Comment on lines 541 to 542
if log_export and self.export_id:
self._log_fetch(restriction=merge_restriction)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary here now. The Merge table is logged for export by the restriction call self & {self._reserved_sk: source} & merge_restriction below.

This also logs extra entries when fetch_nwb is called by passing a restriction for the parent source table. In that case, merge_restriction = True and the entire merge table gets logged.

@CBroz1
Copy link
Member Author

CBroz1 commented Oct 24, 2024

Thanks for your review @samuelbray32. I incorporated your feedback. Let me know if you think the new join handling will work

Unfortunately, I did not see consistent hitting of fetch for cases like

  • Table & r1 & r2
  • (Table & r1) & r2
  • Table1 * Table2 & r1

So this will still bear the AND -> OR issue described in documentation, but I would welcome input on a way to only intercept the most restrictive version for logging

@samuelbray32
Copy link
Collaborator

Thanks Chris!

At the end of the day, it may just require users adding some manual calls to their tables in their export process to ensure things get hit.

For example, fetches or restrictions after a table join won't get logged anyways since it is now happening on a dj.FreeTable rather than a SpyglassMixin object. I think rather than catch every one of these cases, it might make more sense for the user to run an extra script that calls something like (CustomTable & {"nwb_file_name":xxxx}).fetch() to catch everything they need

@samuelbray32
Copy link
Collaborator

samuelbray32 commented Oct 30, 2024

@CBroz1

Just noticed while running in docker that the external table "`common_nwbfile`.`~external_analysis`" isn't getting included in the export. I'm presuming it's not getting caught in the restriction graph since it isn't actually a parent or child of any other table. Will probably require an explicit piece of code to get the externals

Proposal:
Since the external table doesn't have dependencies or sensitive data (just hashes and filepaths), could we just
dump the whole externals table into the sql file?

@samuelbray32
Copy link
Collaborator

Follow up on the external tables:

Here is a way to select just the entries in the external table we want in the export

from spyglass.common.common_nwbfile import schema as common_schema
location="analysis"
filepaths = (Export * Export.File() & paper_key).fetch("file_path")
external_key = []
for path in filepaths:
    if not location in path:
        continue
    external_key.append({"filepath": path.split(location)[1][1:]})
external = common_schema.external['analysis']
external & external_key

We would need to do this for location in ['analysis', 'raw']

@CBroz1
Copy link
Member Author

CBroz1 commented Oct 30, 2024

Just noticed while running in docker that the external table "`common_nwbfile`.`~external_analysis`" isn't getting included in the export. I'm presuming it's not getting caught in the restriction graph since it isn't actually a parent or child of any other table. Will probably require an explicit piece of code to get the externals

That's right! It looks like datajoint's DiGraph object was built to exclude externals, but I think I'll be able to cascade to it if I manually add it to RestrGraph.graph. I'll try to add this logic into the cascade_files logic to avoid extra steps elsewhere

@CBroz1 CBroz1 linked an issue Oct 31, 2024 that may be closed by this pull request
@edeno edeno requested a review from samuelbray32 November 5, 2024 19:26
Copy link
Collaborator

@samuelbray32 samuelbray32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CBroz1!

@edeno edeno merged commit 92d9c35 into LorenFrankLab:master Nov 6, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First attribute read as log_export in fetch_nwb
3 participants