Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly reconstructed Fourier frequency array #856

Open
matteolucchini1 opened this issue Oct 31, 2024 · 5 comments
Open

Incorrectly reconstructed Fourier frequency array #856

matteolucchini1 opened this issue Oct 31, 2024 · 5 comments
Labels
bug It's a bug! Unexpected or unwanted behavior. help wanted We need additional help with these issues!

Comments

@matteolucchini1
Copy link
Collaborator

Description of the Bug

We typically have two ways to construct arrays of Fourier frequency. One is to take the size of a segment to calculate the number of bins (e.g. flux.size), another is to calculate the number of bins in a segment of given size with numpy.rint by hand. The bug is these two methods round non-integers differently, and as a result the two arrays are not necessarily the same.

Steps/Code to Replicate the Bug

seg_size = 5
time_res = 0.03
freq_int = fftfreq(int(seg_size/time_res),time_res)
freq_rint = ftfreq(int(np.rint(seg_size/time_res))),time_res)

Expected Results

These two should be the same the parts of the code that use either method to be consistent.

Actual Results

int(5/0.03) = 166 and np.rint(5/0.03) = 167.

The result is that parts of the code that call the two methods, one can get an error due to the array masking not working correctly. For example, VarEnergyspectrum uses the first method in the initialization when calling avg_pds_from_timeseries, and the second in _get_good_frequency_bins().

@matteolucchini1 matteolucchini1 added bug It's a bug! Unexpected or unwanted behavior. help wanted We need additional help with these issues! labels Oct 31, 2024
@matteobachetti
Copy link
Member

Hi @matteolucchini1, good catch! Between the two methods, I would absolutely vote for using the machinery in avg_pds_from_timeseries, as it is the better tested on a variety of input data. In any case, calling _get_good_frequency_bins with the freq argument should be the recommended practice

@matteolucchini1
Copy link
Collaborator Author

matteolucchini1 commented Nov 1, 2024

@matteobachetti yea I agree. I think the easiest implementation to make this a bit more bulletproof would be to replace the calls with np.rint() to just int() - there's a few in other classes (e.g. iterate_lc_counts in crossspectrum_from_lc_iterable) that could also lead to similar issues.

@matteobachetti
Copy link
Member

@matteolucchini1 I don't agree with this though. The current implementation in fourier.py works around a lot of corner cases related to floating point rounding. Plain np.int is subject to a lot of issues like that. There is no bulletproof method, but we have a pretty robust implementation and we just need to use it consistently

@matteolucchini1
Copy link
Collaborator Author

I'm confused - the implementation in avg_pds_from_timeseries can only work if we are passing a time series with a known size (because n_bin is set explicitely from there). If instead we are only passing a segment size and time resolution, we can't figure out how many segments the time series is divided in because we do not know its length in time.

We could pass freq to _get_good_intervals, but that just means we would have to move to move the goal posts and reverse engineer the Fourier frequency array outside of that method, which doesn't actually solve the problem that np.rint() results in a wrong n_bin and therefore wrong frequencies.

@matteobachetti
Copy link
Member

avg_pds_from_timeseries works with binned and unbinned data, and does the inference on the size of the timeseries internally thanks to fix_segment_size_to_integer_samples. We could use the same method outside, or move a bigger chunk of inference logic outside avg_pds_from_timeseries to make it callable from other functions maybe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug It's a bug! Unexpected or unwanted behavior. help wanted We need additional help with these issues!
Projects
None yet
Development

No branches or pull requests

2 participants