Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export logger #875

Merged
merged 21 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- Refactor `TableChain` to include `_searched` attribute. #867
- Fix errors in config import #882
- Save current spyglass version in analysis nwb files to aid diagnosis #897
- Add functionality to export vertical slice of database. #875
- Add pynapple support #898

### Pipelines
Expand Down Expand Up @@ -34,7 +35,7 @@
- Fixes to `_convert_mp4` #834
- Replace deprecated calls to `yaml.safe_load()` #834
- Spikesorting:
- Increase`spikeinterface` version to >=0.99.1, <0.100 #852
- Increase`spikeinterface` version to >=0.99.1, \<0.100 #852
- Bug fix in single artifact interval edge case #859
- Bug fix in FigURL #871
- LFP
Expand Down Expand Up @@ -211,3 +212,5 @@
[0.4.2]: https://github.com/LorenFrankLab/spyglass/releases/tag/0.4.2
[0.4.3]: https://github.com/LorenFrankLab/spyglass/releases/tag/0.4.3
[0.5.0]: https://github.com/LorenFrankLab/spyglass/releases/tag/0.5.0
[0.5.1]: https://github.com/LorenFrankLab/spyglass/releases/tag/0.5.1
[0.5.2]: https://github.com/LorenFrankLab/spyglass/releases/tag/0.5.2
2 changes: 2 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ nav:
- Data Sync: notebooks/02_Data_Sync.ipynb
- Merge Tables: notebooks/03_Merge_Tables.ipynb
- Config Populate: notebooks/04_PopulateConfigFile.ipynb
- Export: notebooks/05_Export.ipynb
- Spikes:
- Spike Sorting V0: notebooks/10_Spike_SortingV0.ipynb
- Spike Sorting V1: notebooks/10_Spike_SortingV1.ipynb
Expand All @@ -75,6 +76,7 @@ nav:
- Insert Data: misc/insert_data.md
- Merge Tables: misc/merge_tables.md
- Database Management: misc/database_management.md
- Export: misc/export.md
- API Reference: api/ # defer to gen-files + literate-nav
- How to Contribute: contribute.md
- Change Log: CHANGELOG.md
Expand Down
131 changes: 131 additions & 0 deletions docs/src/misc/export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Export Process

## Why

DataJoint does not have any built-in functionality for exporting vertical slices
of a database. A lab can maintain a shared DataJoint pipeline across multiple
projects, but conforming to NIH data sharing guidelines may require that data
from only one project be shared during publication.

## Requirements

To export data with the current implementation, you must do the following:

- All custom tables must inherit from `SpyglassMixin` (e.g.,
`class MyTable(SpyglassMixin, dj.ManualOrOther):`)
- Only one export can be active at a time.
- Start the export process with `ExportSelection.start_export()`, run all
functions associated with a given analysis, and end the export process with
`ExportSelection.end_export()`.

## How

The current implementation relies on two classes in the Spyglass package
(`SpyglassMixin` and `RestrGraph`) and the `Export` tables.

- `SpyglassMixin`: See `spyglass/utils/dj_mixin.py`
- `RestrGraph`: See `spyglass/utils/dj_graph.py`
- `Export`: See `spyglass/common/common_usage.py`

### Mixin

The `SpyglassMixin` class adds functionality to DataJoint tables. A subset of
methods are used to set an environment variable, `SPYGLASS_EXPORT_ID`, and,
while active, intercept all `fetch`/`fetch_nwb` calls to tables. When `fetch` is
called, the mixin grabs the table name and the restriction applied to the table
and stores them in the `ExportSelection` part tables.

- `fetch_nwb` is specific to Spyglass and logs all analysis nwb files that are
fetched.
- `fetch` is a DataJoint method that retrieves data from a table.

### Graph

The `RestrGraph` class uses DataJoint's networkx graph to store each of the
tables and restrictions intercepted by the `SpyglassMixin`'s `fetch` as
'leaves'. The class then cascades these restrictions up from each leaf to all
ancestors. Use is modeled in the methods of `ExportSelection`.

```python
from spyglass.utils.dj_graph import RestrGraph

restr_graph = RestrGraph(seed_table=AnyTable, leaves=None, verbose=False)
restr_graph.add_leaves(
leaves=[
{
"table_name": MyTable.full_table_name,
"restriction": "any_restriction",
},
{
"table_name": AnotherTable.full_table_name,
"restriction": "another_restriction",
},
]
)
restr_graph.cascade()
restricted_leaves = restr_graph.leaf_ft
all_restricted_tables = restr_graph.all_ft
```

By default, a `RestrGraph` object is created with a seed table to have access to
a DataJoint connection and graph. One or more leaves can be added at
initialization or later with the `add_leaves` method. The cascade process is
delayed until `cascade`, or another method that requires the cascade, is called.

Cascading a single leaf involves transforming the leaf's restriction into its
parent's restriction, then repeating the process until all ancestors are
reached. If two leaves share a common ancestor, the restrictions are combined.
This process also accommodates projected fields, which appear as numeric alias
nodes in the graph.

### Export Table

The `ExportSelection` is where users should interact with this process.

```python
from spyglass.common.common_usage import ExportSelection
from spyglass.common.common_usage import Export

export_key = {paper_id: "my_paper_id", analysis_id: "my_analysis_id"}
ExportSelection().start_export(**export_key)
analysis_data = (MyTable & my_restr).fetch()
analysis_nwb = (MyTable & my_restr).fetch_nwb()
ExportSelection().end_export()

# Visual inspection
touched_files = ExportSelection.list_file_paths(**export_key)
restricted_leaves = ExportSelection.preview_tables(**export_key)

# Export
Export().populate_paper(**export_key)
```

`Export`'s populate will invoke the `write_export` method to collect cascaded
restrictions and file paths in its part tables, and write out a bash script to
export the data using a series of `mysqldump` commands. The script is saved to
Spyglass's directory, `base_dir/export/paper_id/`, using credentials from
`dj_config`. To use alternative credentials, create a
[mysql config file](https://dev.mysql.com/doc/refman/8.0/en/option-files.html).

To retain the ability to delete the logging from a particular analysis, the
`export_id` is a combination of the `paper_id` and `analysis_id` in
`ExportSelection`. When populated, the `Export` table, only the maximum
`export_id` for a given `paper_id` is used, resulting in one shell script per
paper. Each shell script one `mysqldump` command per table.

## External Implementation

To implement an export for a non-Spyglass database, you will need to ...

- Create a modified version of `SpyglassMixin`, including ...
- `_export_table` method to lazy load an export table like `ExportSelection`
- `export_id` attribute, plus setter and deleter methods, to manage the status
of the export.
- `fetch` and other methods to intercept and log exported content.
- Create a modified version of `ExportSelection`, that adjusts fields like
`spyglass_version` to match the new database.

Or, optionally, you can use the `RestrGraph` class to cascade hand-picked tables
and restrictions without the background logging of `SpyglassMixin`. The
assembled list of restricted free tables, `RestrGraph.all_ft`, can be passed to
`Export.write_export` to generate a shell script for exporting the data.
Loading
Loading