-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CsvEpochSource and EpochEncoder enhancements #13
Conversation
Since Excel assumes commas are the delimiter when the file extension is CSV, the use of tabs as the delimiter in the epoch encoder example results in output files that are read incorrectly by Excel. This could be resolved by either changing the file extension to TSV or by using commas with the extension CSV. This change does the latter.
CsvEpochSource was originally implemented for the purpose of demonstrating how to create a subclass of WritableEpochSource. However, if `'name': 'animal_state'` is removed, it's fully general and could be very useful to anyone interested in saving epochs using CSVs. Consequently, it's a great candidate for adoption into the main code. This change migrates the implementation of CsvEpochSource to the main code so that users do not need to implement this subclass themselves. They are still invited to create alternative implementations of subclasses of WritableEpochSource that use different file types. Dependence of ephyviewer on pandas is avoided by requiring pandas only when CsvEpochSource is instantiated.
Accessible via inherited method get_channel_name().
I also propose making |
This change allows epoch=None when creating a WritableEpochSource object. In this case, the new load() method is called to build an empty dictionary containing the appropriate keys and data types. Like the save() method, load() can be overridden in subclasses of WritableEpochSource to load epoch data from arbitrary sources. CsvEpochSource implements an example of this.
I think I came up with an alternative idea that is fully backwards compatible. Changing Instead, I made the constructor parameter |
(1) `WritableEpochSource._clean_and_set` now discards epochs with duration shorter than 1 microsecond. (2) `CsvEpochSource.save` now rounds time and duration to the nearest microsecond before writing to file. (3) `EpochEncoder.refresh_table` now rounds start, stop, and duration to the nearest microsecond before displaying in the data table. These changes (1) prevent the creation of spurious epochs resulting from floating point arithmetic. They also improve display of floating point numbers in both (2) saved files and (3) the EpochEncoder's data table.
Algorithms in `InMemoryEpochSource.get_chunk_by_time`, `WritableEpochSource.add_epoch`, `WritableEpochSource.delete_in_between`, and `WritableEpochSource.merge_neighbors` implicitly assumed that epochs never overlap. This commit rewrites them to allow for the possibility of overlapping epochs. Added sorting to `WritableEpochSource._clean_and_set`, since these modifications were simplest when not trying to maintain temporal ordering during operations (i.e., use of `np.append` instead of `insert_item`). Removed redundant epoch deletion code from `add_epoch`. Added calls to `delete_in_between` before each `add_epoch` call in EpochEncoder so the behavior remains the same. `WritableEpochSource.fill_blank` was not modified and still implicitly assumes that epochs do not overlap.
Removed assertion in `WritableEpochSource.__init__` that epochs do not overlap. Removed code in `CsvEpochSource.load` for working around inadvertent overlap caused by floating point arithmetic problems. Added `remove_old_when_inserting_new` boolean parameter to EpochEncoder. When True (default), existing epochs are deleted when new epochs are created using shortcut keys or the region selector (this was the old behavior). When False, existing epochs are not deleted, resulting in overlapping epochs.
What the EpochEncoder does about existing epochs when a new one is created (deletes them or not) is determined by the `remove_old_when_inserting_new` parameter. With this commit, the behavior can be temporarily switched by holding the Shift key when pressing a shortcut key. If `remove_old_when_inserting_new` is True, pressing a shortcut key without the modifier key will delete epochs that overlap with the new epoch. Holding Shift will prevent this deletion and allow overlapping. The logic is inverted if `remove_old_when_inserting_new` is False (i.e., Shift can be used to force deletion).
Beginning with e257e14, I've started addressing #16 by developing on top of the changes already proposed in this pull request. So far, these changes allow epochs to be created in the EpochEncoder that overlap, and functions like Merge Neighbors work regardless of whether epochs overlap. One exception is Fill Blanks. It has not been updated yet, and for the moment it's not clear what the behavior should be. See discussion in #16. |
Added new "Epoch insertion mode" button group to EpochEncoder GUI, containing radio buttons for "Mutually exclusive" and "Overlapping". Renamed global option `remove_old_when_inserting_new` to `exclusive_mode`.
Plain text labels for epochs provided in the EpochEncoder's data table are replaced with drop-down menus that allow the user to change the label of an existing epoch.
After using a drop-down menu to change an existing epoch's label and then pressing a shortcut key to insert a new one, the time would jump back to the first epoch, rather than forward one step. This was caused by `on_seek_table` triggering when the table was cleared by `refresh_table`. Interacting with a drop-down menu in the data table before this apparently caused row selection to trigger for the first row in an unexpected way, even if the drop-down menu interacted with was not in the first row.
Added a new feature that makes it easier to find epochs in the EpochEncoder's data table. When an epoch's rectangle is clicked in the plot, the corresponding row in the data table is automatically selected. More specifically, the label drop-down menu is selected, which allows the up and down arrow keys to be used to change the label quickly. Time is also moved to the start of the epoch.
Epochs would fail to delete if they started within the selected region and ended exactly at its right boundary.
In addition to selecting the corresponding row in the data table and changing the time, clicking a rectangle in the EpochEncoder plot will now update the region selection to match the epoch for easier duplication or deletion.
Changed region match feature to double-click instead of single-click since changing region might not always be desired.
self._next_id = 0 | ||
for chan in self.all: | ||
chan['id'] = np.arange(self._next_id, self._next_id + len(chan['time'])) | ||
self._next_id += len(chan['time']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already ask for it but. I don't get the 'id' goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ids are needed so that when get_chunk*
returns a subset of the epochs, the on_rect_clicked
and on_rect_doubleclicked
can still uniquely identify the epoch. This allows the appropriate table row to be selected, and allows look-up of the start and end of that rectangle for self.region.setRegion
.
See all this post.
|
||
assert self.all[0]['time'].dtype.kind=='f' | ||
assert self.all[0]['duration'].dtype.kind=='f' | ||
assert np.all((self.times[:-1]+self.durations[:-1])<=self.times[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be kept and test or not with an optional karg like assert_epoch_not_overlap_at_load=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we could add that.
@property | ||
def id_to_ind(self): | ||
return dict((id,ind) for ind,id in enumerate(self.ep_ids)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for all this setter with properties but if your final goal is multi channel EpochEncoder this will be useless no.
because each channel will in self.all[0]['label'], self.all[1]['label'], self.all[N]['label'] and so setter will be functions at the end.
Am I missing somethign ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The multi-channel idea I described here did not come to me until these properties were already in place. If we implement true multi-channel, these properties will need to change or be removed. I haven't started working on multi-channel since I wanted to stop making changes before you had a chance to review the existing changes.
keep3 = (ep_times<=t_start) & (ep_times+ep_durations>t_stop) # epochs that span the range | ||
keep = keep1 | keep2 | keep3 | ||
|
||
return ep_times[keep], ep_durations[keep], ep_labels[keep], ep_ids[keep] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the parent function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because get_chunk*
in WritableEpochSource
needed ids, but ids were not needed in InMemoryEpochSource
. In fact, I originally put ids in InMemoryEpochSource
and just called the parent function in WritableEpochSource
as you suggest, but this broke some things, as I described in 6d5987d's commit message
elif ep_times[i]<t1 and (t1<ep_stops[i]<t2): | ||
#~ print 'b' | ||
# if epoch starts and ends inside range, delete it | ||
if ep_times[i]>=t1 and ep_stops[i]<=t2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ep_stops[i]<=t2 became ep_stops[i]<t2.
I don't remember why it was important for me.
The general python approach is than the left limit is included (>=) and the right limit is excluded (<).
pandas apporach with .loc[t1:t2] with float index this your one. (right limit included).
Did you choose that intionally ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was intentional and necessary to fix a bug that existed before I began this PR. Epochs would fail to delete with the region deletion tool if they started within the selected region and ended exactly at its right boundary. None of the if
clauses below this would catch that case. This case was probably not noticed before rounding of times to the nearest microsecond was added simply due to small numerical inaccuracy in floating point numbers.
Something else. I don't understand exactly the behavior of "split_epoch". Coul we have a split on cursor and the epoch would be detected magically ? |
|
||
|
||
|
||
class CsvEpochSource(WritableEpochSource): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me.
I will add my ExcelEpochSource here after this PR because it is basically the same except:
df = pd.read_csv(self.filename, index_col=None) >>> df = pd.read_excel(self.filename, index_col=None).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great!
Thank you for taking the time to review this!
When I added the feature to click a rectangle to select that epoch in the table, I needed a way to match rectangles to table rows. Since
I tried very hard to do split carefully! I hope I succeeded. |
Splitting by clicking with the mouse cursor (perhaps right-clicking, or clicking with a key modifier) is a great idea! Why didn't I think of that? (And it wouldn't use magic, it would need to use ids, haha!) But I think the existing functionality should be retained for high-precision splitting (splitting at exact moments in time). Yes, in it's current form, you need to select in the table first. I chose to do it this way because I wanted to allow for splitting specific epochs that overlap with others that the user may not want to split. If the time marker (vline) is positioned inside two overlapping epochs and the user wants to split just one of them, automatic detection of the user's intention is not possible. Although splitting all epochs that intersect the time marker might be a desirable behavior, I did not implement it this way since I wanted more control. In it's current form, splitting multiple epochs at the same location is easy to do if the user (1) clicks on a rectangle under the cursor to select the matching table row, (2) clicks the "Split" button, and (3) repeats for each epoch under the time marker that he or she wants to split. |
I think I've replied to all of your comments and questions so far. Let me know if things are still confusing or if I missed any. I understand that this is a lot to review! |
Based on your comments, I added a few items to the list of goals in #16. |
OK. 2 other ideas: I am wondering something. It is just a proposal. For the split mode. cursor did not meant with the mouse cursor but like you did but without selecting the rectangle. Maybe it could be usefull to add a key EpochEncoder.params like force_selection_before_split=True (what you did) force_selection_before_split=False cut everything at this time. Or course we would have to chosse a better keyword (easy for an english speaker). |
Thanks @samuelgarcia for the write access privileges! Sorry I've been AWOL for the last several days, I had to focus on other projects. I plan to take a close look at all your recent work on ephyviewer tomorrow. |
Welcome in the small council. |
I understand the motivation to use the more efficient search algorithm, but I think there may be such a thing as unnecessary over-optimization, haha. I tried pushing the masking method to its limits using this script:
It wasn't until there were on the order of 1,000,000 epochs loaded in memory that the masking method became noticeably slower during random seeking (dragging the time slider back and forth) compared to the old searchsorted method on my computer. Of course, on slower systems or when many other data are loaded into ephyviewer, speed reduction due to the slower masking method might become an issue, but it seems unlikely given how large that number is. I'd like to eventually implement the "true multi-channel" described here. This would allow us to strictly enforce the no-overlaps rule within individual channels, and it would permit the use of the better searchsorted method. Nevertheless, if you still have concerns about performance with this PR in its current form, I think your suggestions are good for addressing them, and I could take a stab at making the changes. Let me know.
Adding the ability to choose how splitting works seems reasonable. How about this as an alternative approach: I could rename the "Split" button to "Split selected epoch", and I could add another button called "Split all epochs under cursor". Would that be acceptable? |
OK.Let's forget the performencee for the moment. I still think that exclusive_mode shoudl be in _init and not in params because in params this can be changed on th fly and will make complicated the behavior. OK for the split porposal. This is great, we will have a world class tools. Thank you very much. |
Perhaps this proposal in our other thread would address your concerns about |
Someday I will need to revisit all this and clean it up... but for now, I'm pushing a couple fixes from my fork! |
Hi Jeffrey. |
Hi Sam, good to hear from you! Yes, I've noticed that the Neo projects have been very busy lately with overhauls to annotations and fixes to the Blackrock IO. My inbox has been stuffed, haha! As I recall, you suggested some alternative implementations for the features proposed here, especially relating to the new optional feature of overlapping epochs and the internal representation of multiple epoch channels. I said in #16 that I liked your suggestions and had a desire to implement them, but it would take a lot of rewriting, and I thought I might just start a new pull request. As it turned out, I didn't have much time to work on this (surprise!), so I haven't begun that rewrite. As it is now, I use the new features in this PR regularly and find them very useful. If you think that merging soon and making incremental improvements later makes sense, that's fine with me. On the other hand, perhaps if you look closely again you will remember what you didn't like! |
In case those 2 pull requests weren't enough, I gave you 4 more. Don't worry, they are much smaller. 😜 You gave me write-access to the repo back in September. I can merge the new pull requests myself if you think they look sane. |
I guess this ultimate and very important commit is hiden message for merging ? |
Haha, I wasn't trying to steal your attention, I'm just polishing some things because I'm beginning to train some students to use the software. I think you could merge now, and some of the things that we discussed last fall could be implemented later. I will have to read through these threads again to recall the details, but I think you had some good suggestions I wanted to try. However, I think they could be implemented as changes later, on top of these changes, rather than requiring a total rewrite before merging anything. |
héhé. |
CsvEpochSource
was originally provided as a simple example of how to create a subclass ofWritableEpochSource
. However, it seems like it will be useful to many users of the epoch encoder, so I think it would be appropriate to migrate it into the main code.