-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export logger #875
Export logger #875
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Another functionality question (maybe I just missed this): How does this handle a case where |
Thanks for your review, @samuelbray32. I've added a commit to incorporate
I haven't tested it, but my thinking was that the Kachery download process would make a copy of the file locally during the logged analysis, making it then exist for subsequent upload process. Is that not the case? If no, it might make sense to have the next step of 'prep file list for upload' fall back to kachery in the event it doesn't exist |
Fair. There is an edge case to handle, but it might be better to do when making the dandiset:
Related: Should the |
One note that should either check for or warn people about:
|
src/spyglass/utils/dj_mixin.py
Outdated
restr_str = make_condition(self, restr, set()) | ||
|
||
if isinstance(restr_str, str) and len(restr_str) > 2048: | ||
raise RuntimeError( | ||
"Export cannot handle restrictions > 2048.\n\t" | ||
+ "If required, please open an issue on GitHub.\n\t" | ||
+ f"Restriction: {restr_str}" | ||
) | ||
self._export_table.Table.insert1( | ||
dict( | ||
export_id=self.export_id, | ||
table_name=self.full_table_name, | ||
restriction=make_condition(self, restr_str, set()), | ||
), | ||
skip_duplicates=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get a list of restriction keys from the fetch call above you get very long queries with multiple OR gates. Instead add each restriction entry to the table separately.
if not isinstance(restr, List):
restr = [restr]
for r in restr:
restr_str = make_condition(self, r, set())
if isinstance(restr_str, str) and len(restr_str) > 2048:
raise RuntimeError(
"Export cannot handle restrictions > 2048.\n\t"
+ "If required, please open an issue on GitHub.\n\t"
+ f"Restriction: {restr_str}"
)
self._export_table.Table.insert1(
dict(
export_id=self.export_id,
table_name=self.full_table_name,
restriction=make_condition(self, restr_str, set()),
),
skip_duplicates=True,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into issues with this approach when restr
is an AndList
because...
Table & A & B
would get logged as...
Table & A
Table & B
It's also not clear to me how we benefit from separating Or
lists because the RestrGraph
will have to merge each of the entries later on the same node. We'd be less likely to hit the varchar limit on the Selection table, but we'd still hit it on she Export table ... if someone is using limit
in their fetch. limit
is also not super predictable, so it seems like very much an edge case anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might work with
if not isinstance(restr, Iterable) or isinstance(restr, dj.AndList):
restr = [restr]
But I'm still dissuaded by the need to reassemble
Thanks for your review, @samuelbray32. I've made some edits to ...
|
I hit a size error for the restriction at the To Replicate import spyglass.spikesorting.v0
import spyglass.lfp.v1
import spyglass.spikesorting.analysis.v1.group
import spyglass
import spyglass.lfp.analysis.v1
paper_key = {"paper_id": "sb_export_live_test"}
Export().populate_paper(**paper_key) Error Stack{
"name": "DataError",
"message": "(1406, \"Data too long for column 'restriction' at row 1\")",
"stack": "---------------------------------------------------------------------------
DataError Traceback (most recent call last)
Untitled-1.ipynb Cell 4 line 6
<a href='vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W3sdW50aXRsZWQ%3D?line=3'>4</a> import spyglass
<a href='vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W3sdW50aXRsZWQ%3D?line=4'>5</a> import spyglass.lfp.analysis.v1
----> <a href='vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W3sdW50aXRsZWQ%3D?line=5'>6</a> Export().populate_paper(**paper_key)
File ~/Documents/spyglass/src/spyglass/common/common_usage.py:202, in Export.populate_paper(self, paper_id)
200 if isinstance(paper_id, dict):
201 paper_id = paper_id.get(\"paper_id\")
--> 202 self.populate(ExportSelection().paper_export_id(paper_id))
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/autopopulate.py:241, in AutoPopulate.populate(self, suppress_errors, return_exception_objects, reserve_jobs, order, limit, max_calls, display_progress, processes, make_kwargs, *restrictions)
237 if processes == 1:
238 for key in (
239 tqdm(keys, desc=self.__class__.__name__) if display_progress else keys
240 ):
--> 241 error = self._populate1(key, jobs, **populate_kwargs)
242 if error is not None:
243 error_list.append(error)
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/autopopulate.py:292, in AutoPopulate._populate1(self, key, jobs, suppress_errors, return_exception_objects, make_kwargs)
290 self.__class__._allow_insert = True
291 try:
--> 292 make(dict(key), **(make_kwargs or {}))
293 except (KeyboardInterrupt, SystemExit, Exception) as error:
294 try:
File ~/Documents/spyglass/src/spyglass/common/common_usage.py:249, in Export.make(self, key)
244 self.write_export(
245 free_tables=restr_graph.all_ft, **paper_key, **version_key
246 )
248 self.insert1(key)
--> 249 self.Table().insert(table_inserts)
250 self.File().insert(file_inserts)
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/table.py:442, in Table.insert(self, rows, replace, skip_duplicates, ignore_extra_fields, allow_direct_insert)
426 try:
427 query = \"{command} INTO {destination}(`{fields}`) VALUES {placeholders}{duplicate}\".format(
428 command=\"REPLACE\" if replace else \"INSERT\",
429 destination=self.from_clause(),
(...)
440 ),
441 )
--> 442 self.connection.query(
443 query,
444 args=list(
445 itertools.chain.from_iterable(
446 (v for v in r[\"values\"] if v is not None) for r in rows
447 )
448 ),
449 )
450 except UnknownAttributeError as err:
451 raise err.suggest(
452 \"To ignore extra fields in insert, set ignore_extra_fields=True\"
453 )
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/connection.py:346, in Connection.query(self, query, args, as_dict, suppress_warnings, reconnect)
344 cursor = self._conn.cursor(cursor=cursor_class)
345 try:
--> 346 self._execute_query(cursor, query, args, suppress_warnings)
347 except errors.LostConnectionError:
348 if not reconnect:
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/connection.py:302, in Connection._execute_query(cursor, query, args, suppress_warnings)
300 cursor.execute(query, args)
301 except client.err.Error as err:
--> 302 raise translate_query_error(err, query)
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/connection.py:300, in Connection._execute_query(cursor, query, args, suppress_warnings)
297 if suppress_warnings:
298 # suppress all warnings arising from underlying SQL library
299 warnings.simplefilter(\"ignore\")
--> 300 cursor.execute(query, args)
301 except client.err.Error as err:
302 raise translate_query_error(err, query)
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/cursors.py:158, in Cursor.execute(self, query, args)
154 pass
156 query = self.mogrify(query, args)
--> 158 result = self._query(query)
159 self._executed = query
160 return result
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/cursors.py:325, in Cursor._query(self, q)
323 conn = self._get_db()
324 self._clear_result()
--> 325 conn.query(q)
326 self._do_get_result()
327 return self.rowcount
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/connections.py:549, in Connection.query(self, sql, unbuffered)
547 sql = sql.encode(self.encoding, \"surrogateescape\")
548 self._execute_command(COMMAND.COM_QUERY, sql)
--> 549 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
550 return self._affected_rows
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/connections.py:779, in Connection._read_query_result(self, unbuffered)
777 else:
778 result = MySQLResult(self)
--> 779 result.read()
780 self._result = result
781 if result.server_status is not None:
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/connections.py:1157, in MySQLResult.read(self)
1155 def read(self):
1156 try:
-> 1157 first_packet = self.connection._read_packet()
1159 if first_packet.is_ok_packet():
1160 self._read_ok_packet(first_packet)
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/connections.py:729, in Connection._read_packet(self, packet_type)
727 if self._result is not None and self._result.unbuffered_active is True:
728 self._result.unbuffered_active = False
--> 729 packet.raise_for_error()
730 return packet
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/protocol.py:221, in MysqlPacket.raise_for_error(self)
219 if DEBUG:
220 print(\"errno =\", errno)
--> 221 err.raise_mysql_exception(self._data)
File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/pymysql/err.py:143, in raise_mysql_exception(data)
141 if errorclass is None:
142 errorclass = InternalError if errno < 1000 else OperationalError
--> 143 raise errorclass(errno, errval)
DataError: (1406, \"Data too long for column 'restriction' at row 1\")"
} |
Thanks for testing! After reviewing the output, I decided to bump the limits on the downstream table. Your demo analysis has been populated on the prod db |
Ran into a new issue when exporting a paper id with just fetched spikesorting data where export tried to insert duplicate table entries for the common lab table paper_key = {"paper_id": "sb_spikes_test"}
ExportSelection().start_export(**paper_key, analysis_id="analysis1")
from spyglass.spikesorting.analysis.v1.group import SortedSpikesGroup
key = {"nwb_file_name":"Winnie20220717_.nwb",
"sorted_spikes_group_name":"08_lineartrack"}
group_key = (SortedSpikesGroup() & key).fetch1("KEY")
SortedSpikesGroup().fetch_spike_data(group_key)
Export().populate_paper(**paper_key)
Error Stack```python { "name": "DuplicateError", "message": "(\"Duplicate entry '`common_lab`.`lab`' for key '__export__table.table_name'\", 'To ignore duplicate entries in insert, set skip_duplicates=True')", "stack": "--------------------------------------------------------------------------- DuplicateError Traceback (most recent call last) Cell In[12], line 1 ----> 1 Export().populate_paper(**paper_key)File ~/Documents/spyglass/src/spyglass/common/common_usage.py:208, in Export.populate_paper(self, paper_id) File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/autopopulate.py:241, in AutoPopulate.populate(self, suppress_errors, return_exception_objects, reserve_jobs, order, limit, max_calls, display_progress, processes, make_kwargs, *restrictions) File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/autopopulate.py:292, in AutoPopulate._populate1(self, key, jobs, suppress_errors, return_exception_objects, make_kwargs) File ~/Documents/spyglass/src/spyglass/common/common_usage.py:256, in Export.make(self, key) File ~/mambaforge-pypy3/envs/spyglass/lib/python3.9/site-packages/datajoint/table.py:455, in Table.insert(self, rows, replace, skip_duplicates, ignore_extra_fields, allow_direct_insert) DuplicateError: ("Duplicate entry '
|
Thanks, Sam. I've added a commit to fix uniqueness enforcement within |
* Add spyglass version to created analysis nwb files (#897) * Add sg version to created analysis nwb files * update changelog * Change existing source script to spyglass version (#900) * Add pynapple support (#898) * Preliminary code * Add retrieval of file names * Add get_nwb_table function * Update docstrings * Update CHANGELOG.md * Hot fixes for clusterless `get_ahead_behind_distance` and `get_spike_times` (#904) * Squeeze results * Make method and not class method * Update CHANGELOG.md * fix bugs in fetch_nwb (#913) * Check for entry in merge part table prior to insert (#922) * check for entry in merge part table prior to insert * update changelog * Kachery fixes (#918) * Prioritize datajoint filepath for getting analysis file abs_path * remove deprecated kachery tables * update changelog * fix lint --------- Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]> * remove old tables from init (#925) * Fix improper uses of strip (#929) Strip will remove leading characters * Update CHANGELOG.md * Misc Issues (#903) * #892 * #885 * #879 * Partial address of #860 * Update Changelog * Partial solve of #886 - Ask import * Fix failing tests * Add note on order of inheritace * #933 * Could not replicate fill_nan error. Reverting except clause * Export logger (#875) * WIP: rebase Export process * WIP: revise doc * ✅ : Generate working export script * Cleanup: Expand notebook, migrate export process from graph class to export * Revert dj_chains related edits * Update changelog * Revise doc * Address review comments #875 * Remove walrus in eval * prevent log on preview * Fix arg order on fetch, iterate over restr * Add upstream analysis files during cascade. Address false positive fetch * Avoid regen file list on revisit node * Bump Export.Table.restr to mediumblob * Revise Export.Table uniqueness to include export_id * Spikesorting quality of life helpers (#910) * add utitlity function for finding spikesorting merge ids * add option to select v1 sorts that didn't go through artifact detection * add option to return merge keys as dicts for future restrictions * Add tool to get brain region and electrode info for a spikesorting merge id * update changelog * style cleanup * style cleanup * fix restriction bug for curation_id * account for change or radiu_um argument name in spikeinterface * only do joins with metric curastion tables if have relevant keys in the restriction * Update tutorial to use spikesorting merge table helper functions * fix spelling * Add logging of AnalysisNwbfile creation time and file size (#937) * Add logging for any func that creates AnalysisNwbfile * Migrate create to top of respective funcs * Use pathlib for file size. Bump creation time to top of in spikesort * Clear pre_create_time on create * get/del -> pop * Log when file accessed (#941) * Add logging for any func that creates AnalysisNwbfile * Fix bug on empty delete in merge table (#940) * fix bug on empty delete in merge table * update changelog * fix spelling --------- Co-authored-by: Chris Brozdowski <[email protected]> * Remove master restriction * Part delete takes restriction from self --------- Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]> Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]>
* Create class for group parts to help propagate deletes * spelling * update changelog * Part delete edits (#946) * Add spyglass version to created analysis nwb files (#897) * Add sg version to created analysis nwb files * update changelog * Change existing source script to spyglass version (#900) * Add pynapple support (#898) * Preliminary code * Add retrieval of file names * Add get_nwb_table function * Update docstrings * Update CHANGELOG.md * Hot fixes for clusterless `get_ahead_behind_distance` and `get_spike_times` (#904) * Squeeze results * Make method and not class method * Update CHANGELOG.md * fix bugs in fetch_nwb (#913) * Check for entry in merge part table prior to insert (#922) * check for entry in merge part table prior to insert * update changelog * Kachery fixes (#918) * Prioritize datajoint filepath for getting analysis file abs_path * remove deprecated kachery tables * update changelog * fix lint --------- Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]> * remove old tables from init (#925) * Fix improper uses of strip (#929) Strip will remove leading characters * Update CHANGELOG.md * Misc Issues (#903) * #892 * #885 * #879 * Partial address of #860 * Update Changelog * Partial solve of #886 - Ask import * Fix failing tests * Add note on order of inheritace * #933 * Could not replicate fill_nan error. Reverting except clause * Export logger (#875) * WIP: rebase Export process * WIP: revise doc * ✅ : Generate working export script * Cleanup: Expand notebook, migrate export process from graph class to export * Revert dj_chains related edits * Update changelog * Revise doc * Address review comments #875 * Remove walrus in eval * prevent log on preview * Fix arg order on fetch, iterate over restr * Add upstream analysis files during cascade. Address false positive fetch * Avoid regen file list on revisit node * Bump Export.Table.restr to mediumblob * Revise Export.Table uniqueness to include export_id * Spikesorting quality of life helpers (#910) * add utitlity function for finding spikesorting merge ids * add option to select v1 sorts that didn't go through artifact detection * add option to return merge keys as dicts for future restrictions * Add tool to get brain region and electrode info for a spikesorting merge id * update changelog * style cleanup * style cleanup * fix restriction bug for curation_id * account for change or radiu_um argument name in spikeinterface * only do joins with metric curastion tables if have relevant keys in the restriction * Update tutorial to use spikesorting merge table helper functions * fix spelling * Add logging of AnalysisNwbfile creation time and file size (#937) * Add logging for any func that creates AnalysisNwbfile * Migrate create to top of respective funcs * Use pathlib for file size. Bump creation time to top of in spikesort * Clear pre_create_time on create * get/del -> pop * Log when file accessed (#941) * Add logging for any func that creates AnalysisNwbfile * Fix bug on empty delete in merge table (#940) * fix bug on empty delete in merge table * update changelog * fix spelling --------- Co-authored-by: Chris Brozdowski <[email protected]> * Remove master restriction * Part delete takes restriction from self --------- Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]> Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]> * Fix linting --------- Co-authored-by: Chris Brozdowski <[email protected]> Co-authored-by: Eric Denovellis <[email protected]> Co-authored-by: Samuel Bray <[email protected]> Co-authored-by: Eric Denovellis <[email protected]>
Description
The goal of this work is to provide a means of starting/stopping a logging mechanism that will be sufficient to generate a sqldump of the subset of the database required for a given analysis and the list of analysis files required for that analysis.
This partially addresses #861. Specifically, points 1 and 2.
common_usage
is expanded to include this functionality. Is this the best place for these tables?export_dir
is added as a new location insettings
which will be home to a subdirectory for each paper. @samuelbray32 - do you think this will work for the file linking and upload process?dj_graph
is added as a new util for cascading restrictions up a graph and merging restrictions across leaf nodes. Instrumental in creating a vertical sliceunique_dicts
is added as a new helper function - take a list of dicts and return list with only unique entries. This is used to eliminate duplication both before generating theRestrGraph
and within it.fetch_nwb
for readability. This reduces the control-flow of 'which table' to a map that is expandable.Export.populate
when eliminating other entries for a given paperdj_mixin
is expanded to add logging wheneverfetch_nwb
,fetch
orfetch1
are called. It keeps track of an ongoing export with an env variable, which makes exports not thread safe.Checklist:
CITATION.cff
CHANGELOG.md