Skip to content

Commit

Permalink
WIP: rebase Export process
Browse files Browse the repository at this point in the history
  • Loading branch information
CBroz1 committed Mar 26, 2024
1 parent c94335d commit 424d4cf
Show file tree
Hide file tree
Showing 11 changed files with 1,071 additions and 30 deletions.
2 changes: 2 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ nav:
- Data Sync: notebooks/02_Data_Sync.ipynb
- Merge Tables: notebooks/03_Merge_Tables.ipynb
- Config Populate: notebooks/04_PopulateConfigFile.ipynb
- Export: notebooks/05_Export.ipynb
- Spikes:
- Spike Sorting V0: notebooks/10_Spike_SortingV0.ipynb
- Spike Sorting V1: notebooks/10_Spike_SortingV1.ipynb
Expand All @@ -75,6 +76,7 @@ nav:
- Insert Data: misc/insert_data.md
- Merge Tables: misc/merge_tables.md
- Database Management: misc/database_management.md
- Export: misc/export.md
- API Reference: api/ # defer to gen-files + literate-nav
- How to Contribute: contribute.md
- Change Log: CHANGELOG.md
Expand Down
126 changes: 126 additions & 0 deletions docs/src/misc/export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Export Process

## Why

DataJoint does not have any built-in functionality for exporting vertical slices
of a database. A lab can maintain a shared DataJoint pipeline across multiple
projects, but conforming to NIH data sharing guidelines may require that data
from only one project be shared during publication.

## Requirements

To export data with the current implementation, you must do the following:

- All custom tables must inherit from `SpyglassMixin` (e.g.,
`class MyTable(SpyglassMixin, dj.ManualOrOther):`)
- Only one export can be active at a time.
- Start the export process with `ExportSelection.start_export()`, run all
functions associated with a given analysis, and end the export process with
`ExportSelection.end_export()`.

## How

The current implementation relies on two classes in the `spyglass` package:
`SpyglassMixin` and `RestrGraph` and the `Export` tables.

- `SpyglassMixin`: See `spyglass/utils/dj_mixin.py`
- `RestrGraph`: See `spyglass/utils/dj_graph.py`
- `Export`: See `spyglass/common/common_usage.py`

### Mixin

The `SpyglassMixin` class is a subclass of DataJoint's `Manual` class. A subset
of methods are used to set an environment variable, `SPYGLASS_EXPORT_ID`, and,
while active, intercept all `fetch`/`fetch_nwb` calls to tables. When `fetch` is
called, the mixin grabs the table name and the restriction applied to the table
and stores them in the `ExportSelection` part tables.

- `fetch_nwb` is specific to Spyglass and logs all analysis nwb files that are
fetched.
- `fetch` is a DataJoint method that retrieves data from a table.

### Graph

The `RestrGraph` class uses DataJoint's networkx graph to store each of the
tables and restrictions intercepted by the `SpyglassMixin`'s `fetch` as
'leaves'. The class then cascades these restrictions up from each leaf to all
ancestors. Use is modeled in the methods of `ExportSelection`.

```python
from spyglass.utils.dj_graph import RestrGraph

restr_graph = RestrGraph(seed_table=AnyTable, leaves=None, verbose=False)
restr_graph.add_leaves(
leaves=[
{
"table_name": MyTable.full_table_name,
"restriction": "any_restriction",
},
{
"table_name": AnotherTable.full_table_name,
"restriction": "another_restriction",
},
]
)
restr_graph.cascade()
restricted_leaves = restr_graph.leaf_ft
all_restricted_tables = restr_graph.all_ft

restr_graph.write_export(paper_id="my_paper_id") # part of `populate` below
```

By default, a `RestrGraph` object is created with a seed table to have access to
a DataJoint connection and graph. One or more leaves can be added at
initialization or later with the `add_leaves` method. The cascade process is
delayed until `cascade`, or another method that requires the cascade, is called.

Cascading a single leaf involves transforming the leaf's restriction into its
parent's restriction, then repeating the process until all ancestors are
reached. If two leaves share a common ancestor, the restrictions are combined.
This process also accommodates projected fields, which appear as numeric alias
nodes in the graph.

### Export Table

The `ExportSelection` is where users should interact with this process.

```python
from spyglass.common.common_usage import ExportSelection
from spyglass.common.common_usage import Export

export_key = {paper_id: "my_paper_id", analysis_id: "my_analysis_id"}
ExportSelection().start_export(**export_key)
ExportSelection().restart_export(**export_key) # to clear previous attempt
analysis_data = (MyTable & my_restr).fetch()
analysis_nwb = (MyTable & my_restr).fetch_nwb()
ExportSelection().end_export()

# Visual inspection
touched_files = DS().list_file_paths(**export_key)
restricted_leaves = DS().preview_tables(**export_key)

# Export
Export().populate()
```

`Export` will invoke `RestrGraph.write_export` to collect cascaded restrictions
and file paths in its part tables, and write out a bash script to export the
data using a series of `mysqldump` commands. The script is saved to Spyglass's
directory, `base_dir/export/paper_id/`, using credentials from `dj_config`. To
use alternative credentials, create a
[mysql config file](https://dev.mysql.com/doc/refman/8.0/en/option-files.html).

## External Implementation

To implement an export for a non-Spyglass database, you will need to ...

- Create a modified version of `SpyglassMixin`, including ...
- `_export_table` method to lazy load an export table like `ExportSelection`
- `export_id` attribute, plus setter and deleter methods, to manage the status
of the export.
- `fetch` and other methods to intercept and log exported content.
- Create a modified version of `ExportSelection`, that adjusts fields like
`spyglass_version` to match the new database.

Or, optionally, you can use the `RestrGraph` class to cascade hand-picked tables
and restrictions without the background logging of `SpyglassMixin`.
152 changes: 152 additions & 0 deletions notebooks/05_Export.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"# Export\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Intro\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Developer Note:_ if you may make a PR in the future, be sure to copy this\n",
"notebook, and use the `gitignore` prefix `temp` to avoid future conflicts.\n",
"\n",
"This is one notebook in a multi-part series on Spyglass.\n",
"\n",
"- To set up your Spyglass environment and database, see\n",
" [the Setup notebook](./00_Setup.ipynb)\n",
"- To insert data, see [the Insert Data notebook](./01_Insert_Data.ipynb)\n",
"- For additional info on DataJoint syntax, including table definitions and\n",
" inserts, see\n",
" [these additional tutorials](https://github.com/datajoint/datajoint-tutorials)\n",
"- For information on what's goint on behind the scenes of an export, see\n",
" [documentation](https://lorenfranklab.github.io/spyglass/0.5/misc/export/)\n",
"\n",
"In short, Spyglass offers the ability to generate exports of one or more subsets\n",
"of the database required for a specific analysis as long as you do the following:\n",
"\n",
"- Inherit `SpyglassMixin` for all custom tables.\n",
"- Run only one export at a time.\n",
"- Start and stop each export logging process.\n",
"\n",
"**NOTE:** For demonstration purposes, this notebook relies on a more populated\n",
"database to highlight restriction merging capabilities of the export process.\n",
"Adjust the restrictions to suit your own dataset.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start by importing the `spyglass` package, along with a few others.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[2024-01-29 16:15:00,903][INFO]: Connecting root@localhost:3309\n",
"[2024-01-29 16:15:00,912][INFO]: Connected root@localhost:3309\n"
]
}
],
"source": [
"import os\n",
"import datajoint as dj\n",
"\n",
"# change to the upper level folder to detect dj_local_conf.json\n",
"if os.path.basename(os.getcwd()) == \"notebooks\":\n",
" os.chdir(\"..\")\n",
"dj.config.load(\"dj_local_conf.json\") # load config for database connection info\n",
"\n",
"# ignore datajoint+jupyter async warnings\n",
"from spyglass.common.common_usage import Export, ExportSelection\n",
"from spyglass.lfp.analysis.v1 import LFPBandV1\n",
"from spyglass.position.v1 import TrodesPosV1\n",
"from spyglass.spikesorting.v1.curation import CurationV1\n",
"\n",
"# TODO: Add commentary, describe helpers on ExportSelection\n",
"\n",
"paper_key = {\"paper_id\": \"paper1\"}\n",
"ExportSelection().start_export(**paper_key, analysis_id=\"test1\")\n",
"a = (\n",
" LFPBandV1 & \"nwb_file_name LIKE 'med%'\" & {\"filter_name\": \"Theta 5-11 Hz\"}\n",
").fetch()\n",
"b = (\n",
" LFPBandV1\n",
" & {\n",
" \"nwb_file_name\": \"mediumnwb20230802_.nwb\",\n",
" \"filter_name\": \"Theta 5-10 Hz\",\n",
" }\n",
").fetch()\n",
"ExportSelection().start_export(**paper_key, analysis_id=\"test2\")\n",
"c = (CurationV1 & \"curation_id = 1\").fetch_nwb()\n",
"d = (TrodesPosV1 & 'trodes_pos_params_name = \"single_led\"').fetch()\n",
"ExportSelection().stop_export()\n",
"Export().populate_paper(**paper_key)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Up Next\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the [next notebook](./10_Spike_Sorting.ipynb), we'll start working with\n",
"ephys data with spike sorting.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "spy",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
94 changes: 94 additions & 0 deletions notebooks/py_scripts/05_Export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.16.0
# kernelspec:
# display_name: spy
# language: python
# name: python3
# ---

# # Export
#

# ## Intro
#

# _Developer Note:_ if you may make a PR in the future, be sure to copy this
# notebook, and use the `gitignore` prefix `temp` to avoid future conflicts.
#
# This is one notebook in a multi-part series on Spyglass.
#
# - To set up your Spyglass environment and database, see
# [the Setup notebook](./00_Setup.ipynb)
# - To insert data, see [the Insert Data notebook](./01_Insert_Data.ipynb)
# - For additional info on DataJoint syntax, including table definitions and
# inserts, see
# [these additional tutorials](https://github.com/datajoint/datajoint-tutorials)
# - For information on what's goint on behind the scenes of an export, see
# [documentation](https://lorenfranklab.github.io/spyglass/0.5/misc/export/)
#
# In short, Spyglass offers the ability to generate exports of one or more subsets
# of the database required for a specific analysis as long as you do the following:
#
# - Inherit `SpyglassMixin` for all custom tables.
# - Run only one export at a time.
# - Start and stop each export logging process.
#
# **NOTE:** For demonstration purposes, this notebook relies on a more populated
# database to highlight restriction merging capabilities of the export process.
# Adjust the restrictions to suit your own dataset.
#

# ## Imports
#

# Let's start by importing the `spyglass` package, along with a few others.
#

# +
import os
import datajoint as dj

# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("..")
dj.config.load("dj_local_conf.json") # load config for database connection info

# ignore datajoint+jupyter async warnings
from spyglass.common.common_usage import Export, ExportSelection
from spyglass.lfp.analysis.v1 import LFPBandV1
from spyglass.position.v1 import TrodesPosV1
from spyglass.spikesorting.v1.curation import CurationV1

# TODO: Add commentary, describe helpers on ExportSelection

paper_key = {"paper_id": "paper1"}
ExportSelection().start_export(**paper_key, analysis_id="test1")
a = (
LFPBandV1 & "nwb_file_name LIKE 'med%'" & {"filter_name": "Theta 5-11 Hz"}
).fetch()
b = (
LFPBandV1
& {
"nwb_file_name": "mediumnwb20230802_.nwb",
"filter_name": "Theta 5-10 Hz",
}
).fetch()
ExportSelection().start_export(**paper_key, analysis_id="test2")
c = (CurationV1 & "curation_id = 1").fetch_nwb()
d = (TrodesPosV1 & 'trodes_pos_params_name = "single_led"').fetch()
ExportSelection().stop_export()
Export().populate_paper(**paper_key)
# -

# ## Up Next
#

# In the [next notebook](./10_Spike_Sorting.ipynb), we'll start working with
# ephys data with spike sorting.
#
2 changes: 1 addition & 1 deletion src/spyglass/common/common_lab.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def _load_admin(cls):
"""Load admin list."""
cls._admin = list(
(cls.LabMemberInfo & {"admin": True}).fetch("datajoint_user_name")
)
) + ["root"]

@property
def admin(cls) -> list:
Expand Down
Loading

0 comments on commit 424d4cf

Please sign in to comment.