Issue with two different sources of data #159

emmanuel-ruffio · 2024-10-29T11:15:43Z

emmanuel-ruffio
Oct 29, 2024

Hi,

I'm still trying to use Binjr but I could not plot two series of different source in the same plot, or even in the same worksheet.
Data1 comes from file1
Data2 comes from file2
File1 and File2 have not exactly the same timestamp.
If I add Data2 to Data1 plot, Data1 is shown but not Data2. Same happens if there are in two different plots.

Everything is ok when I use two different worksheets but it's not what I need to do.

And one suggestion: would it be possible to precache data that are outside but close to the current plot time range. My data file is 100Mo and everytimes I slightly scroll the plot, I must wait few seconds for the plot to update.

Best regards,
Emmanuel.

fthevenet · 2024-10-30T18:24:36Z

fthevenet
Oct 30, 2024
Maintainer

Hi,

It sounds like the time interval for the data present in your two series don't overlap, in which case you'll to need to enlarge the interval that the worksheet is currently showing in order to view both data sets.

You can click on the icon left of the time selector for binjr to automatically pick a range that covers everything:

As for your performance issue, assuming you're using the CSV adapter, a few seconds to select a new interval does sound a bit excessive, so there might be a regression somewhere.
Could you please provide me with the logs for a session where the problem shows?
Thanks

1 reply

emmanuel-ruffio Oct 31, 2024
Author

Hi,

thank you for you answer. You were right for the first point... I checked twice but didn't see that one file has the timestamp:
0024-05-16 07:46:51,8903.32420138884,0,0,92.57,13.14,91.68
while the other
2024-05-16 07:55:47,nan,0,0,11.44,10.14,98.13,10.99,100,10.28,9.79,10.62,10.34,72.92
There is indeed 2000 years between the two... That's why I got weird results when I click on the "auto range button".

For the performance issue, I measured the update time: it's 10 seconds. I can still reduce the sample rate when I export the csv file with Matlab, but I think it's precisely the key benefit of software like Binjr: to handle large file efficiently.
Here is the log file. It appears there is an error. May the "nan" values in data be a problem ?

[2024-10-31 11:08:39.234] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Starting...
[2024-10-31 11:08:39.258] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Version=3.20.0 (build #20241023.7)
[2024-10-31 11:08:39.260] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java Version=23.0.1
[2024-10-31 11:08:39.262] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] JavaFX Version=23.0.1
[2024-10-31 11:08:39.263] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java Vendor=Eclipse Adoptium
[2024-10-31 11:08:39.264] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java VM name=OpenJDK 64-Bit Server VM (23.0.1+11)
[2024-10-31 11:08:39.265] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java Home=C:\Users\emmanuel\AppData\Local\binjr\runtime
[2024-10-31 11:08:39.267] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Operating System=Windows 10 (10.0)
[2024-10-31 11:08:39.268] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] System Architecture=amd64
[2024-10-31 11:08:39.269] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] JVM Heap Stats=Max: 4096MB | Committed: 60MB | Used: 24MB
[2024-10-31 11:08:39.270] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Garbage Collectors=Shenandoah Pauses, Shenandoah Cycles
[2024-10-31 11:08:41.528] [INFO ] [JavaFX Application Thread] [eu.binjr.common.plugins.ServiceLoaderHelper] Looking for services of type eu.binjr.core.data.adapters.DataAdapterInfo in C:\Users\emmanuel\AppData\Local\binjr\libs..\plugins
[2024-10-31 11:08:42.082] [INFO ] [JavaFX Application Thread] [org.reflections.Reflections] Reflections took 374 ms to scan 1 urls, producing 110 keys and 346 values
[2024-10-31 11:08:42.334] [INFO ] [JavaFX Application Thread] [org.reflections.Reflections] Reflections took 164 ms to scan 7 urls, producing 124 keys and 424 values
[2024-10-31 11:08:42.364] [INFO ] [JavaFX Application Thread] [eu.binjr.common.plugins.ServiceLoaderHelper] Looking for services of type eu.binjr.core.appearance.UserInterfaceThemes in C:\Users\emmanuel\AppData\Local\binjr\libs..\plugins
[2024-10-31 11:08:55.507] [ERROR] [ForkJoinPool-1-worker-1] [stderr] oct. 31, 2024 11:08:55 AM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
AVERTISSEMENT: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.

Best regards,
Emmanuel.

fthevenet · 2024-11-03T10:12:33Z

fthevenet
Nov 3, 2024
Maintainer

Hi,

To investigate the performance issue, I'll need you to change the logging severity to the DEBUG level.
To do that, press F12 to invoke the debug console, and set the log level to DEBUG:

Then, please load your csv file and replay the scenario where you experienced the 10s loading episodes, and upload the log file for this session (you will find it in %TEMP%\binjr) to this issue (you can just drop the file onto an open comment to this thread).

This should hopefully provide me with enough info to work out a reproducer and see if something can be optimized.

Thanks!

0 replies

emmanuel-ruffio · 2024-11-05T13:32:37Z

emmanuel-ruffio
Nov 5, 2024
Author

Hi,

here is the log file. Delays of about 8000 ms can be seen after scroll events.
Uploading binjr_2024-11-05_14-25-48_25676.log…

Thanks,
Emmanuel.

1 reply

fthevenet Nov 5, 2024
Maintainer

Thanks for that.
It seems however that the file was not fully uploaded, and I can't access it.
If it's really big, you might want to compress it.

emmanuel-ruffio · 2024-11-05T15:42:43Z

emmanuel-ruffio
Nov 5, 2024
Author

binjr_2024-11-05_14-25-48_25676.zip

Here is the compressed file.
Emmanuel.

0 replies

fthevenet · 2024-11-07T19:51:01Z

fthevenet
Nov 7, 2024
Maintainer

Hi,

I have found a serious performance regression in the latest v3.20.0 release compared to previous releases, that ties to a suspected regression in the Shenandoah Garbage Collector in OpenJDK 23.0.1.
Some operations, such as retrieving large quantities of data from the index could take up to 3 times longer.
Reverting to the latest version in the 21 branch of OpenJDK addresses the issue without any lack in functionalities, and I have prepared a preview build that you can try here: https://github.com/binjr/binjr/releases/tag/v3.20.1-SNAPSHOT

With that said, I am a bit surprise at the numbers I saw in the log file you provided; would you mind sharing the specs of the PC you used and a few lines of your dataset, as an example? The performances seems especially poor to be explained solely by the problem above.

For instance, in your logs I see that it takes 8.5 seconds to retrieve about 14,000 samples. On my machine, with v3.20.0 it takes the same amount of time to retrieve over 244,000 samples, and with the fix in v3.20.1 this comes down to 2.6 seconds.
On a more modest PC, an ultrabook from 2016 with an i5-7200U Intel CPU, it can still retrieve over 100,000 samples in 8s (and less than 3s with the fixed version).
So while the fix will hopefully still help you, I am wondering if there isn't another problem in your case, that could come either from your PC or maybe your dataset.

0 replies

emmanuel-ruffio · 2024-11-08T08:29:24Z

emmanuel-ruffio
Nov 8, 2024
Author

Hi,

I really appreciate your help. I will try the preview build.

Here are the specs of my computer, quite old, but still efficient especially with the additional ram installed:
HP ProBook 650 G2
Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz, 2304 MHz, 2 core(s), 4 logical core(s)
(RAM) installed 16,0 Go

One key parameter which may explain the poor performance is that each samples has 55 columns (for the first file) and 37 columns (for the second file), i.e. close to 100 sensors installed and more is coming. Here is one line for the first file:

2024-05-16 07:55:47,nan,0,0,11.44,10.14,98.13,10.99,100,10.28,9.79,10.62,10.34,72.92,10.92,100,10.14,10.99,10.06,11.53, 12.17,10.36,10.05,10.23,10.38,10.07,nan,nan,nan,9.56,9.56,9.56,9.81,9.56,9.75,9.69,9.76,100,10.71,10.66,947,3773,2268,6876, 222,1373,484,1909,138,628,1345,4392,nan,nan,nan,nan

Are data reloaded from ascii file every time the time range is changed ?

0 replies

fthevenet · 2024-11-08T12:14:24Z

fthevenet
Nov 8, 2024
Maintainer

Your CPU is roughly the same generation as my older one and performance should be comparable; the only significant different being that my model is able to increase its CPU clock significantly to handle a short duration, single threaded load (which is exactly the case here). Still, that doesn't explain a 10x difference.
Just a thought: you might want to check that the power management settings for your computer is set to favor performance over battery life when doing this, if it is not the case already, since this will tell Windows to not throttle the CPU as aggressively and yield much better performances.
Also, if you have programs running at the same time that use a significant amount of CPU, this would have a negative impact of binjr's performance.

To answer your question, the data are not reloaded from the CSV files; content for all files is indexed into a temporary Lucene index. The index itself is stored on disk so that binjr can handle arbitrarily large datasets without being limited by available RAM, but Lucene makes a sophisticated enough use of IO caches that it is unlikely for the IO path throughput or latency to play a significant role in this case, outside of the initial indexing phase, as long as there is enough available memory to be used as page cache.
Now to be honest, an inverted index is arguably not the best tool for the job here; it is great at quickly matching what samples are in the required time range, but not at retrieving large quantities of hits in one go, at a high throughput (I only reused the code that was originally written to for handle log files in binjr, as this is a perfect fit for full text search).
Still, it had proven good enough in most cases so far, so I've not yet had any strong reasons to rewrite it.

Finally, with regard to the high number of columns in the CSV file; this will definitely have an impact on the speed of the initial indexing phase, but not on the query matching nor on the data retrieval.
What will matter instead in the latter case is the number of columns/series that you plot in a binjr worksheet, but I see from your logs that you only put a few of them, not the whole 55, so that shouldn't have that much of an impact.

Anyway, please try the latest preview and let me know if it helps.

Cheers!

0 replies

fthevenet · 2024-11-08T17:30:07Z

fthevenet
Nov 8, 2024
Maintainer

FYI, I have further improved the fetch operation by making it run in parallel on multiple threads, which can decrease the time taken by up to 60% in some cases (i.e. 1600ms vs 2600ms to fetch 244099 hits for a single series, on an Intel i7-10850H CPU, 6 cores/12 threads @2.70-5.0GHz).

This is a rather brute force approach though, so the benefits will vary greatly depending on the available CPU resources and the context in which the fetch happens (i.e. fetching data from multiple series on separate charts is already done in parallel so CPU usage might already be maximized and this would provide no further benefits).

In any case, this can be disabled if it proves to hurt performances instead of helping, by unchecking the option called useParallelIndexFetch in the debug console (it is on by default):

0 replies

emmanuel-ruffio · 2024-11-09T11:49:47Z

emmanuel-ruffio
Nov 9, 2024
Author

I tried the new SNAPSHOT version, but it fails to start. "Unable to find java VM".
I never implemented anything in java and probably never will so I don't know what to do. I restored the previous version.

4 replies

fthevenet Nov 9, 2024
Maintainer

Just re-install binjr v3.20.1-SNAPSHOT again.
If after that it fails with the same message again, re-run the installer and chose "repair": this should fix the problem.

emmanuel-ruffio Nov 9, 2024
Author

Ok, I had to desinstall first and reinstall. It's working fine now and it's much faster. I don't even see the binjr logo anymore during scrolling: barely 1 second to update.

Thanks.

emmanuel-ruffio Nov 9, 2024
Author

However it's seems there is an issue with the installer. I don't have the "repair" option and on the last window, instead of clicking on "Finish" button to install the files, if I click on "back", an error is raised and Windows closes the installer. It's probably a minor issue, and the installation works fine anyway.

fthevenet Nov 10, 2024
Maintainer

That's great news!
Thanks for catching and reporting the performance problem; I'll do a final release for 3.20.1 very soon.

With regard to the installer, the "repair" button is only available when you run it and the same version is already installed, not when upgrading or installing a fresh copy, so that part is normal.
The error is not though; I'll take a look.

fthevenet · 2024-11-10T11:54:28Z

fthevenet
Nov 10, 2024
Maintainer

v3.20.1 is released, and fixes both the performance regression and the installer error.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binjr

Issue with two different sources of data #159

{{title}}

Replies: 10 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

binjr

Issue with two different sources of data #159

emmanuel-ruffio Oct 29, 2024

Replies: 10 comments · 6 replies

fthevenet Oct 30, 2024 Maintainer

emmanuel-ruffio Oct 31, 2024 Author

fthevenet Nov 3, 2024 Maintainer

emmanuel-ruffio Nov 5, 2024 Author

fthevenet Nov 5, 2024 Maintainer

emmanuel-ruffio Nov 5, 2024 Author

fthevenet Nov 7, 2024 Maintainer

emmanuel-ruffio Nov 8, 2024 Author

fthevenet Nov 8, 2024 Maintainer

fthevenet Nov 8, 2024 Maintainer

emmanuel-ruffio Nov 9, 2024 Author

fthevenet Nov 9, 2024 Maintainer

emmanuel-ruffio Nov 9, 2024 Author

emmanuel-ruffio Nov 9, 2024 Author

fthevenet Nov 10, 2024 Maintainer

fthevenet Nov 10, 2024 Maintainer

emmanuel-ruffio
Oct 29, 2024

Replies: 10 comments 6 replies

fthevenet
Oct 30, 2024
Maintainer

emmanuel-ruffio Oct 31, 2024
Author

fthevenet
Nov 3, 2024
Maintainer

emmanuel-ruffio
Nov 5, 2024
Author

fthevenet Nov 5, 2024
Maintainer

emmanuel-ruffio
Nov 5, 2024
Author

fthevenet
Nov 7, 2024
Maintainer

emmanuel-ruffio
Nov 8, 2024
Author

fthevenet
Nov 8, 2024
Maintainer

fthevenet
Nov 8, 2024
Maintainer

emmanuel-ruffio
Nov 9, 2024
Author

fthevenet Nov 9, 2024
Maintainer

emmanuel-ruffio Nov 9, 2024
Author

emmanuel-ruffio Nov 9, 2024
Author

fthevenet Nov 10, 2024
Maintainer

fthevenet
Nov 10, 2024
Maintainer