Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: merge instrument and rig, make rig accept components: List[Union[]] #1238

Draft
wants to merge 35 commits into
base: release-v2.0.0
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5bceb2c
breaking-change: merge instrument/rig and convert rig to List[Optiona…
dbirman Jan 14, 2025
8d9bf81
examples: yikes
dbirman Jan 14, 2025
33d1f9d
refactor: rename Rig to Instrument, fixes to tests
dbirman Jan 16, 2025
9d812f2
chore: additional renaming
dbirman Jan 16, 2025
9da2d09
chore: more renaming and fixes
dbirman Jan 16, 2025
f936728
tests: fixing more tests
dbirman Jan 17, 2025
3d06811
tests: re-generate examples, fix missing Objective in test
dbirman Jan 17, 2025
d51a3fb
fix: update to support aind-data-schema-models v1
dbirman Jan 17, 2025
26b3b35
Merge branch 'release-v2.0.0' into 1172-20-replace-optional-fields-wi…
dbirman Jan 24, 2025
b4b5a89
feat: deprecating platform
dbirman Jan 24, 2025
85e2269
chore: lint
dbirman Jan 24, 2025
5280c39
chore: lint
dbirman Jan 24, 2025
8e403f3
chore: fixing some typos, refactor modality->modalities, fixing a lin…
dbirman Jan 24, 2025
e08aceb
chore: various typos
dbirman Jan 24, 2025
9dddfa2
tests: working on fixing tests for instrument changes
dbirman Jan 27, 2025
28ab3b9
refactor: fixing name generation to pass data description tests
dbirman Jan 27, 2025
d72618f
refactor: use fromisoformat and fix tests from data_description
dbirman Jan 27, 2025
3a9ceab
refactor: missing renames for AindCoreModel
dbirman Jan 27, 2025
08471b4
fix: typos in test_metadata
dbirman Jan 27, 2025
b792046
tests: fix import issue in examples
dbirman Jan 27, 2025
13888d6
tests: re-generate examples
dbirman Jan 27, 2025
cf79a30
tests: fixing label issues
dbirman Jan 27, 2025
ac35a65
fix: fixing a bug in name generator
dbirman Jan 27, 2025
bc5c61f
tests: missing objective for valid instrument
dbirman Jan 27, 2025
7602294
tests: fixing various tests
dbirman Jan 27, 2025
f0bb384
chore: bump version numbers and re-generate examples
dbirman Jan 27, 2025
1396653
chore: remove dead examples files
dbirman Jan 27, 2025
695085d
fix: remove device_type fields (came from a bad merge)
dbirman Jan 27, 2025
c45b5e6
fix: bad merge renamed rig files, renaming to instrument and re-generate
dbirman Jan 27, 2025
1ff075e
tests: more test fixing
dbirman Jan 27, 2025
de6ddf1
tests: additional fixes
dbirman Jan 27, 2025
6503010
tests: final fixes to compatibility check code
dbirman Jan 27, 2025
dae201c
chore: lint
dbirman Jan 27, 2025
ab85604
chore: lint
dbirman Jan 27, 2025
8dd0432
chore: bump adsm to 2.0
dbirman Jan 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ If you'd like to propose a large change or addition, or generally have a questio

## Controlled Vocabularies

Controlled vocabularies and other enumerated lists are maintained in a separate repository: [aind-data-schema-models](https://github.com/AllenNeuralDynamics/aind-data-schema-models). This allows us to specify these lists without changing aind-data-schema. Controlled vocabularies include lists of organizations, manufacturers, species, modalities, platforms, units, harp devices, and registries.
Controlled vocabularies and other enumerated lists are maintained in a separate repository: [aind-data-schema-models](https://github.com/AllenNeuralDynamics/aind-data-schema-models). This allows us to specify these lists without changing aind-data-schema. Controlled vocabularies include lists of organizations, manufacturers, species, modalities, units, harp devices, and registries.

To upgrade to the latest data models version:
```
Expand Down
4 changes: 1 addition & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
metadata,
procedures,
processing,
rig,
session,
subject,
quality_control,
Expand All @@ -31,7 +30,7 @@
metadata,
procedures,
processing,
rig,
instrument,
session,
subject,
quality_control,
Expand Down Expand Up @@ -88,7 +87,6 @@
"metadata.nd",
"procedures",
"processing",
"rig",
"session",
"subject",
"quality_control",
Expand Down
30 changes: 0 additions & 30 deletions docs/source/data_description.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,9 @@ data modalities, dates of collection, and more.
The data description is created during data transfer based on information you provide to that service and pulling
information from internal resources.

**Q: What is the difference between modality and platform?**

Modalities are types of data being collected. A platform is a standardized way of collecting one or more modalities of
data that we give a name. Platform standardization -- of file formats, hardware setup, etc -- enables us automatically
and reliably process data with centrally managed data pipelines.

Example 1: the behavior platform leverages Harp and Bonsai to run behavioral experiments and acquire multiple
modalities of data (behavior videos, electrophysiology, photometry, etc).

Example 2: We use SmartSPIM lightsheet microscopes to collect whole-mesoscale whole brain neuroanatomy data. This
is a single-modality (SPIM) platform (mesoscale anatomy, SmartSPIM colloquially).

Questions for AIND users
------------------------

**Q: What platform should I use?**

There is a controlled vocabulary in (aind-data-schema-models)[https://github.com/AllenNeuralDynamics/aind-data-schema-models].
Pick the one that most closely aligns with how you have collected data. If none exists, talk to Saskia de Vries or David Feng.

**Q: This data is for a AIND project and not part of a grant. Shouldn’t the funder be AIND?**

No. The funding for internally funded AIND or AIBS work is listed as “Allen Institute”.
Expand All @@ -44,16 +27,3 @@ to make sure your grant is on that sheet.

In the future we may need to tag cloud resources based on the originating
group, which may or may not be in AIND, in order to track usage and spending.


**Q: What happened to the “experiment type” asset label? Why are we using platform names instead?**

Formerly we used a short label called “experiment type” in asset names instead of platform
names. This concept was confusing because it was difficult to distinguish from a “modality”.
Most of our data contains multiple modalities. A recording session may contain trained behavior
event data (e.g. lick times), behavior videos (e.g. face camera), neuropixels recordings, and
fiber photometry recordings.

Anchoring browsing on data collection platforms is clearer. We will tag sessions in our metadata
database to indicate which modalities are present in which sessions.

20 changes: 8 additions & 12 deletions docs/source/data_organization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,7 @@ name be unique, but we should not use this name to encode essential metadata.

All primary data assets have the following naming convention:

<platform-abbreviation>_<subject-id>_<acquisition-date>_<acquisition-time>

A platform is a standardized system for collecting one or more modalities of data.
<subject-id>_<acquisition-date>_<acquisition-time>

A few points:

Expand All @@ -97,8 +95,7 @@ A few points:
- Acquisition date and time are essential for uniqueness
- Acquisition date and time are in local time zone
- Time-zone is documented in metadata
- All tokens (e.g. ``<platform-abbreviation>``, ``<subject-id>``) must not contain underscores or illegal filename characters.
- ``<platform-abbreviation>``: a less-than 10 character shorthand for a data acquisition platform
- All tokens (e.g. ``<subject-id>``) must not contain underscores or illegal filename characters.

Again, this name is strictly for uniqueness. We could use a GUID, but choose
to have a relatively simple naming convention to facilitate casual browsing.
Expand All @@ -123,7 +120,7 @@ Primary data assets are organized as follows:
- logs (general log files generated by the instrument or rig that are not modality-specific)
- <list of files>

Platform abbreviation and modality terms come from controlled vocabularies in aind-data-schema-models.
Modality terms come from controlled vocabularies in aind-data-schema-models.

Example for simultaneous electrophysiology with optotagging and fiber photometry:

Expand All @@ -145,9 +142,9 @@ Example for simultaneous electrophysiology with optotagging and fiber photometry
- face_camera.mp4
- body_camera.mp4

Example for lightsheet microscopy data acquired on the ExaSPIM platform:
Example for lightsheet microscopy data:

- exaSPIM_655568_2022-04-26_11-48-09
- 655568_2022-04-26_11-48-09
- <metadata JSON files>
- SPIM
- SPIM.ome.zarr
Expand Down Expand Up @@ -200,10 +197,9 @@ File name guidelines
When naming files, we should:

- use terms from vocabularies defined in aind-data-schema, e.g.
- platform names and modalities behavior video file names
- modalities, etc
- use isoformat datetimes, e.g. "YYYY-MM-DDThhmmss"
- use “yyyy-mm-dd" and “hh-mm-ss" in local time zone for dates and times
- separate tokens with underscores, and not include underscores in tokens, e.g.
- Do this: ``EFIP_655568_2022-04-26_11-48-09``
- Not this: ``EFIP-655568-2022_04_26-11_48_09``
- Do this: ``EFIP_655568_2022-04-26T114809``
- Do not include illegal filename characters in tokens

5 changes: 2 additions & 3 deletions docs/source/example_workflow/example_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import pandas as pd
from aind_data_schema_models.modalities import Modality
from aind_data_schema_models.organizations import Organization
from aind_data_schema_models.platforms import Platform

from aind_data_schema.core.data_description import Funding, RawDataDescription
from aind_data_schema.core.procedures import NanojectInjection, Perfusion, Procedures, Surgery, ViralMaterial
Expand Down Expand Up @@ -31,13 +30,13 @@
for session_idx, session in sessions_df.iterrows():
# our data always contains planar optical physiology and behavior videos
d = RawDataDescription(
modality=[Modality.POPHYS, Modality.BEHAVIOR_VIDEOS],
platform=Platform.BEHAVIOR,
modalities=[Modality.POPHYS, Modality.BEHAVIOR_VIDEOS],
subject_id=str(session["mouse_id"]),
creation_time=session["end_time"].to_pydatetime(),
institution=Organization.OTHER,
experimenters=[experimenter],
funding_source=[Funding(funder=Organization.NIMH)],
investigators=[experimenter)],
)

# we will store our json files in a directory named after the session
Expand Down
Loading
Loading