Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use_times option in get_duration/get_total_duration #3623

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alejoe91
Copy link
Member

Since the __repr__ uses the get_durations function, I added an option to avoid using timestamps to estimate duration.
In fact, this can be quite slow for long zarr datasets with timestamps, since the whole array is loaded in mem when calling get_times().

@alejoe91 alejoe91 added the core Changes to core module label Jan 15, 2025
@alejoe91 alejoe91 requested review from JoeZiminski and h-mayorquin and removed request for JoeZiminski January 15, 2025 10:10
@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Jan 15, 2025

Where do you think the loading into memory is happening?
Maybe here in the asarray:

https://github.com/h-mayorquin/spikeinterface/blob/80820853354fbc2c6a56bb5dade4c94c8dacb321/src/spikeinterface/core/baserecording.py#L895-L898

Because the duration only requires the last and the first timestamps:

https://github.com/h-mayorquin/spikeinterface/blob/80820853354fbc2c6a56bb5dade4c94c8dacb321/src/spikeinterface/core/baserecording.py#L237-L240

Maybe we can just not transform to array if the timestamps are memmap and or zarr provided they are read only?

My concern is that changing the repr with these defaults will display wrong information for some cases but maybe I am wrong.

@alejoe91
Copy link
Member Author

I think that transforming to array is very convenient to cache the timestamps in memory and overall.it improves performance. Another option could be to have a get_start_time and get_end_time, which will only access the required samples.

@h-mayorquin
Copy link
Collaborator

Fair enough, changing the way that get_times work would probably require more discussion.

Adding private methods like this:

get_start_time and get_end_time, which will only access the required samples

Seems like a great idea to me. That way, we can use that in get duration without touching anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants