diff --git a/dj_local_conf_example.json b/dj_local_conf_example.json index 63efcae68..437d77577 100644 --- a/dj_local_conf_example.json +++ b/dj_local_conf_example.json @@ -32,7 +32,7 @@ "base": "/your/path/like/stelmo/nwb/" }, "kachery_dirs": { - "cloud": "/your/path/.kachery_cloud" + "cloud": "/your/path/.kachery-cloud" }, "dlc_dirs": { "base": "/your/path/like/nimbus/deeplabcut/" diff --git a/notebooks/02_Data_Sync.ipynb b/notebooks/02_Data_Sync.ipynb index defae02b0..74f01a46b 100644 --- a/notebooks/02_Data_Sync.ipynb +++ b/notebooks/02_Data_Sync.ipynb @@ -53,20 +53,23 @@ " inserts, see\n", " [these additional tutorials](https://github.com/datajoint/datajoint-tutorials)\n", "\n", - "Let's start by importing the `spyglass` package.\n" + "Let's start by importing the `spyglass` package and testing that your environment\n", + " is properly configured for kachery sharing\n", + "\n", + "If you haven't already done so, be sure to set up your Spyglass base directory and Kachery sharing directory with [Setup](./00_Setup.ipynb)" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "[2023-09-28 09:39:48,974][INFO]: Connecting root@localhost:3307\n", - "[2023-09-28 09:39:49,050][INFO]: Connected root@localhost:3307\n" + "[2023-12-22 08:22:32,189][INFO]: Connecting sambray@lmf-db.cin.ucsf.edu:3306\n", + "[2023-12-22 08:22:32,244][INFO]: Connected sambray@lmf-db.cin.ucsf.edu:3306\n" ] } ], @@ -154,7 +157,7 @@ "1. Try to load from the local file system/store. \n", "2. If unavailable, check if it is in the relevant sharing table (i.e., \n", " `NwbKachery` or `AnalysisNWBKachery`).\n", - "3. If present, attempt to download from the associated Kachery Resource.\n", + "3. If present, attempt to download from the associated Kachery Resource to the user's spyglass analysis directory.\n", "\n", "_Note:_ large file downloads may take a long time, so downloading raw data is\n", "not supported. We suggest direct transfer with\n", @@ -180,13 +183,14 @@ "3. `franklab.public`: Public file sharing (not yet active)\n", "\n", "Setting your zone can either be done as as an environment variable or an item \n", - "in a DataJoint config.\n", + "in a DataJoint config. Spyglass will automatically handle setting the appropriate zone when downloading\n", + "database files through kachery\n", "\n", "- Environment variable:\n", "\n", " ```bash\n", " export KACHERY_ZONE=franklab.default\n", - " export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery_cloud\n", + " export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery-cloud\n", " ```\n", "\n", "- DataJoint Config:\n", @@ -195,7 +199,7 @@ " \"custom\": {\n", " \"kachery_zone\": \"franklab.default\",\n", " \"kachery_dirs\": {\n", - " \"cloud\": \"/your/base/path/.kachery_cloud\"\n", + " \"cloud\": \"/your/base/path/.kachery-cloud\"\n", " }\n", " }\n", " ```" @@ -205,7 +209,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Host Setup" + "## Host Setup\n", + "\n", + "- If you are a member of a team with a pre-existing database and zone who will be sharing data, please skip to `Sharing Data`\n", + "\n", + "- If you are a collaborator outside your team's network and need to access files shared with you, please skip to `Accessing Shared Data`" ] }, { @@ -250,14 +258,19 @@ "suggest using the same name for the zone and resource.\n", "\n", "_Note:_ For each zone, you need to run the local daemon that listens for\n", - "requests from that zone. An example of the bash script we use is\n", + "requests from that zone and uploads data to the bucket for client download when requested. An example of the bash script we use is\n", "\n", "```bash\n", " export KACHERY_ZONE=franklab.collaborators\n", - " export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery_cloud\n", + " export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery-cloud\n", " cd /stelmo/nwb/franklab_collaborators_resource\n", " npx kachery-resource@latest share\n", - "```" + "```\n", + "\n", + "For convenience, we recommend saving this code as a bash script which can be executed by the local daemon. For franklab member, these scripts can be found in the directory `/home/loren/bin/`:\n", + "\n", + "- run_restart_kachery_collab.sh\n", + "- run_restart_kachery_default.sh" ] }, { @@ -272,13 +285,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We'll add zones/resources to the Spyglass database. First, we'll check existing\n", - "Zones." + "Once you have a hosted zone running, we need to add its information to the Spyglass database. \n", + "This will allow spyglass to manage linking files from our analysis tables to kachery.\n", + "First, we'll check existing Zones." ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -355,20 +369,29 @@ "
lab_name
\n", " \n", " \n", - "Total: 0
\n", + "Total: 2
\n", " " ], "text/plain": [ - "*kachery_zone_ description kachery_cloud_ kachery_proxy lab_name \n", - "+------------+ +------------+ +------------+ +------------+ +----------+\n", - "\n", - " (Total: 0)" + "*kachery_zone_ description kachery_cloud_ kachery_proxy lab_name \n", + "+------------+ +------------+ +------------+ +------------+ +------------+\n", + "franklab.colla franklab colla /stelmo/nwb/.k https://kacher Loren Frank \n", + "franklab.defau internal frank /stelmo/nwb/.k https://kacher Loren Frank \n", + " (Total: 2)" ] }, - "execution_count": 3, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } @@ -378,15 +401,71 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Check existing file list:" + "To add a new hosted Zone, we need to prepare an entry for the `KacheryZone` table. \n", + "Note that the `kacherycloud_dir` key should be the path for the server daemon _hosting_ the zone,\n", + " and is not required to be present on the client machine:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 38, + "metadata": {}, + "outputs": [], + "source": [ + "zone_name = config.get(\"KACHERY_ZONE\")\n", + "cloud_dir = config.get(\"KACHERY_CLOUD_DIR\")\n", + "\n", + "zone_key = {\n", + " \"kachery_zone_name\": zone_name,\n", + " \"description\": \" \".join(zone_name.split(\".\")) + \" zone\",\n", + " \"kachery_cloud_dir\": cloud_dir,\n", + " \"kachery_proxy\": \"https://kachery-resource-proxy.herokuapp.com\",\n", + " \"lab_name\": sgc.Lab.fetch(\"lab_name\", limit=1)[0],\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use caution when inserting into an active database, as it could interfere with\n", + "ongoing work." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "sgs.KacheryZone().insert1(zone_key)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Sharing Data" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once the zone exists, we can add `AnalysisNWB` files we want to share with members of the zone.\n", + "\n", + "The `AnalysisNwbFileKachery` table links analysis files made within other spyglass tables with a `uri` \n", + "used by kachery. We can view files already made available through kachery here:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -457,20 +536,56 @@ "analysis_file_uri
\n", " the uri of the file\n", " \n", - "Total: 0
\n", + "...
\n", + "Total: 298
\n", " " ], "text/plain": [ "*kachery_zone_ *analysis_file analysis_file_\n", "+------------+ +------------+ +------------+\n", - "\n", - " (Total: 0)" + "franklab.colla Banner20220224 sha1://562b488\n", + "franklab.colla Frodo20230810_ sha1://b38d2b0\n", + "franklab.colla Frodo20230810_ sha1://ca9c238\n", + "franklab.colla Frodo20230810_ sha1://a8452cf\n", + "franklab.colla Frodo20230810_ sha1://ca9c238\n", + "franklab.colla Frodo20230810_ sha1://ca9c238\n", + "franklab.colla Frodo20230810_ sha1://aa0e060\n", + "franklab.colla Frodo20230810_ sha1://ca9c238\n", + "franklab.colla Frodo20230810_ sha1://2889b68\n", + "franklab.colla Frodo20230810_ sha1://973ea71\n", + "franklab.colla Frodo20230810_ sha1://088a345\n", + "franklab.colla Frodo20230810_ sha1://4aa3199\n", + " ...\n", + " (Total: 298)" ] }, - "execution_count": 4, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -480,62 +595,12 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Prepare an entry for the `KacheryZone` table:" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [], - "source": [ - "zone_name = config.get(\"KACHERY_ZONE\")\n", - "cloud_dir = config.get(\"KACHERY_CLOUD_DIR\")\n", + "We can share additional results by populating new entries in this table.\n", "\n", - "zone_key = {\n", - " \"kachery_zone_name\": zone_name,\n", - " \"description\": \" \".join(zone_name.split(\".\")) + \" zone\",\n", - " \"kachery_cloud_dir\": cloud_dir,\n", - " \"kachery_proxy\": \"https://kachery-resource-proxy.herokuapp.com\",\n", - " \"lab_name\": sgc.Lab.fetch(\"lab_name\", limit=1)[0],\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use caution when inserting into an active database, as it could interfere with\n", - "ongoing work." - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [], - "source": [ - "sgs.KacheryZone().insert1(zone_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data Setup" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once the zone exists, we can add `AnalysisNWB` files we want to share by adding\n", - "entries to the `AnalysisNwbfileKacherySelection` table.\n", + "To do so we first add these entries to the `AnalysisNwbfileKacherySelection` table.\n", "\n", "_Note:_ This step depends on having previously run an analysis on the example \n", "file." @@ -584,6 +649,42 @@ "sgs.AnalysisNwbfileKachery.populate()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, we can share data based on its source table in the database using the helper function `share_data_to_kachery()` \n", + "\n", + "This will take a list of tables and add all associated analysis files for entries corresponding with a passed restriction. \n", + "Here, we are sharing LFP and position data for the Session \"minirec20230622_.nwb\"" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "from spyglass.sharing import share_data_to_kachery\n", + "from spyglass.lfp.v1 import LFPV1\n", + "from spyglass.position.v1 import TrodesPosV1\n", + "\n", + "tables = [LFPV1, TrodesPosV1]\n", + "restriction = {\"nwb_file_name\": \"minirec20230622_.nwb\"}\n", + "share_data_to_kachery(\n", + " table_list=tables,\n", + " restriction=restriction,\n", + " zone_name=\"franklab.collaborators\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Managing access" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -619,6 +720,576 @@ "```" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Accessing Shared Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are a collaborator accessing datasets, you first need to be given access to the zone by a collaborator admin (see above).\n", + "\n", + "If you know the uri for the dataset you are accessing you can test this process below (example is for members of `franklab.collaborators`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import kachery_cloud as kcl\n", + "\n", + "path = \"/path/to/save/file/to/test\"\n", + "zone_name = \"franklab.collaborators\"\n", + "uri = \"sha1://ceac0c1995580dfdda98d6aa45b7dda72d63afe4\"\n", + "\n", + "os.environ[\"KACHERY_ZONE\"] = zone_name\n", + "kcl.load_file(uri=uri, dest=path, verbose=True)\n", + "assert os.path.exists(path), f\"File not downloaded to {path}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In normal use, spyglass will manage setting the zone and uri when accessing files.\n", + "In general, the easiest way to access data valueswill be through the `fetch1_dataframe()`\n", + "function part of many of the spyglass tables. In brief this will check for the appropriate\n", + "nwb analysis file in your local directory, and if not found, attempt to download it from the appropriate kachery zone.\n", + "It will then parse the relevant information from that nwb file into a pandas dataframe. \n", + "\n", + "We will look at an example with data from the `LFPV1` table:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n",
+ " nwb_file_name \n", + " name of the NWB file\n", + " | \n",
+ " lfp_electrode_group_name \n", + " the name of this group of electrodes\n", + " | \n",
+ " target_interval_list_name \n", + " descriptive name of this interval list\n", + " | \n",
+ " filter_name \n", + " descriptive name of this filter\n", + " | \n",
+ " filter_sampling_rate \n", + " sampling rate for this filter\n", + " | \n",
+ " analysis_file_name \n", + " name of the file\n", + " | \n",
+ " interval_list_name \n", + " descriptive name of this interval list\n", + " | \n",
+ " lfp_object_id \n", + " the NWB object ID for loading this object from the file\n", + " | \n",
+ " lfp_sampling_rate \n", + " the sampling rate, in HZ\n", + " |
---|---|---|---|---|---|---|---|---|
Winnie20220713_.nwb | \n", + "tetrode_sample_Winnie | \n", + "pos 0 valid times | \n", + "LFP 0-400 Hz | \n", + "30000 | \n", + "Winnie20220713_C52XDICU6D.nwb | \n", + "lfp_tetrode_sample_Winnie_pos 0 valid times_valid times | \n", + "a89c590f-290b-4f9c-a568-b9ae67eee96d | \n", + "1000.0 |
Total: 1
\n", + " | 0 | \n", + "1 | \n", + "2 | \n", + "3 | \n", + "4 | \n", + "5 | \n", + "6 | \n", + "7 | \n", + "8 | \n", + "9 | \n", + "... | \n", + "18 | \n", + "19 | \n", + "20 | \n", + "21 | \n", + "22 | \n", + "23 | \n", + "24 | \n", + "25 | \n", + "26 | \n", + "27 | \n", + "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | \n", + "\n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " | \n", + " |
1.657741e+09 | \n", + "-90 | \n", + "-65 | \n", + "-104 | \n", + "-89 | \n", + "-31 | \n", + "-68 | \n", + "-27 | \n", + "-26 | \n", + "-32 | \n", + "-92 | \n", + "... | \n", + "-91 | \n", + "-99 | \n", + "-87 | \n", + "-117 | \n", + "-123 | \n", + "-85 | \n", + "-73 | \n", + "-74 | \n", + "-62 | \n", + "13 | \n", + "
1.657741e+09 | \n", + "-202 | \n", + "-145 | \n", + "-227 | \n", + "-220 | \n", + "-57 | \n", + "-130 | \n", + "-84 | \n", + "-68 | \n", + "-30 | \n", + "-191 | \n", + "... | \n", + "-168 | \n", + "-199 | \n", + "-176 | \n", + "-250 | \n", + "-238 | \n", + "-172 | \n", + "-158 | \n", + "-140 | \n", + "-127 | \n", + "54 | \n", + "
1.657741e+09 | \n", + "-218 | \n", + "-150 | \n", + "-224 | \n", + "-216 | \n", + "-84 | \n", + "-154 | \n", + "-84 | \n", + "-93 | \n", + "-29 | \n", + "-206 | \n", + "... | \n", + "-125 | \n", + "-153 | \n", + "-158 | \n", + "-219 | \n", + "-206 | \n", + "-137 | \n", + "-132 | \n", + "-129 | \n", + "-120 | \n", + "69 | \n", + "
1.657741e+09 | \n", + "-226 | \n", + "-151 | \n", + "-240 | \n", + "-230 | \n", + "-97 | \n", + "-144 | \n", + "-71 | \n", + "-95 | \n", + "-38 | \n", + "-236 | \n", + "... | \n", + "-105 | \n", + "-136 | \n", + "-149 | \n", + "-183 | \n", + "-210 | \n", + "-111 | \n", + "-83 | \n", + "-129 | \n", + "-92 | \n", + "116 | \n", + "
1.657741e+09 | \n", + "-235 | \n", + "-154 | \n", + "-250 | \n", + "-231 | \n", + "-54 | \n", + "-91 | \n", + "-81 | \n", + "-89 | \n", + "-30 | \n", + "-247 | \n", + "... | \n", + "-85 | \n", + "-107 | \n", + "-116 | \n", + "-140 | \n", + "-190 | \n", + "-68 | \n", + "-28 | \n", + "-114 | \n", + "-36 | \n", + "193 | \n", + "
... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
1.657742e+09 | \n", + "-3 | \n", + "-27 | \n", + "-6 | \n", + "29 | \n", + "-227 | \n", + "-442 | \n", + "-1 | \n", + "67 | \n", + "25 | \n", + "-15 | \n", + "... | \n", + "-83 | \n", + "-217 | \n", + "-61 | \n", + "-248 | \n", + "-196 | \n", + "-63 | \n", + "-111 | \n", + "-211 | \n", + "-52 | \n", + "166 | \n", + "
1.657742e+09 | \n", + "44 | \n", + "19 | \n", + "44 | \n", + "82 | \n", + "-175 | \n", + "-407 | \n", + "13 | \n", + "95 | \n", + "62 | \n", + "38 | \n", + "... | \n", + "3 | \n", + "-112 | \n", + "32 | \n", + "-177 | \n", + "-123 | \n", + "22 | \n", + "-5 | \n", + "-147 | \n", + "54 | \n", + "285 | \n", + "
1.657742e+09 | \n", + "94 | \n", + "63 | \n", + "92 | \n", + "129 | \n", + "-121 | \n", + "-341 | \n", + "61 | \n", + "132 | \n", + "88 | \n", + "88 | \n", + "... | \n", + "62 | \n", + "-28 | \n", + "104 | \n", + "-99 | \n", + "-53 | \n", + "82 | \n", + "61 | \n", + "-62 | \n", + "125 | \n", + "347 | \n", + "
1.657742e+09 | \n", + "142 | \n", + "107 | \n", + "135 | \n", + "179 | \n", + "-106 | \n", + "-370 | \n", + "88 | \n", + "178 | \n", + "120 | \n", + "148 | \n", + "... | \n", + "113 | \n", + "48 | \n", + "199 | \n", + "-44 | \n", + "7 | \n", + "145 | \n", + "108 | \n", + "-13 | \n", + "213 | \n", + "453 | \n", + "
1.657742e+09 | \n", + "108 | \n", + "84 | \n", + "95 | \n", + "130 | \n", + "-82 | \n", + "-281 | \n", + "52 | \n", + "134 | \n", + "73 | \n", + "105 | \n", + "... | \n", + "97 | \n", + "46 | \n", + "169 | \n", + "-16 | \n", + "22 | \n", + "118 | \n", + "94 | \n", + "-3 | \n", + "175 | \n", + "348 | \n", + "
901529 rows × 28 columns
\n", + "