Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'NaN***' and '+Inf**' #42

Closed
gcorrall opened this issue Feb 8, 2024 · 2 comments · Fixed by #43
Closed

'NaN***' and '+Inf**' #42

gcorrall opened this issue Feb 8, 2024 · 2 comments · Fixed by #43
Assignees
Labels
bug Something isn't working

Comments

@gcorrall
Copy link
Contributor

gcorrall commented Feb 8, 2024

I've been using powerjoular to log power consumption on a few machines; this is run via systemd, and the data is stored in a csv file (and then processed by collectd)..

Recently I noticed one machine (Ubuntu 20.04, Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz) was regularly logging "NaN***" and "+Inf**" in the CSV file for the CPU utilisation, which in turn was causing the collectd plugin to log parsing errors ('Ignoring trailing garbage').

After some experimentation I think the problem originates in this line in powerjoular.adb:

CPU_Utilization := (Float (CPU_CCI_After.cbusy) - Float (CPU_CCI_Before.cbusy)) / (Float (CPU_CCI_After.ctotal) - Float (CPU_CCI_Before.ctotal));

If the before and after numbers are close enough together the Float conversion results in either CPU_CCI_After.cbusy and CPU_CCI_Before.cbusy being the same, or CPU_CCI_After.ctotal and CPU_CCI_Before.ctotal being the same; this would cause there to be a zero in the division and give a "NaN***" or "+Inf**".

The actual Long_Integers aren't the same; the Float conversion approximates them to the same value. E.g:

Put(Float(Long_Integer(15534323370)), Exp => 0, Fore => 0);
Put(Float(Long_Integer(15534324162)), Exp => 0, Fore => 0);

Both give the same answer (15534323712.00000), and caused the CPU_Utilization calculation to give me an +Inf**.

I think this case needs to be checked for, so that the division is avoided if a zero is involved. I assume this is also an issue with other similar calculations in the code.

Alternatively maybe greater precision could be used in the type conversions - Long_Float instead of Float. I crudely fixed this by changing all the Floats to Long_Floats (by running "sed -i 's/Float/Long_Float/g' src/*"). It did stop the problem.

@adelnoureddine adelnoureddine self-assigned this Feb 8, 2024
@adelnoureddine adelnoureddine added the bug Something isn't working label Feb 8, 2024
@adelnoureddine
Copy link
Member

Thanks @gcorrall for pointing out this issue.
Switching from Float to Long_Float would indeed solve the problem. Though we need to check all calculations and platforms (PID, RPi).

If you wish, you can submit a PR with the first float conversions, and I'll go next week from there to verify the remaining calculations and do some testing on RPi too.

@gcorrall
Copy link
Contributor Author

gcorrall commented Feb 9, 2024

No problem - I've now submitted a PR.

@adelnoureddine adelnoureddine linked a pull request Feb 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants