Issue with two different sources of data #159
Replies: 10 comments 6 replies
-
Beta Was this translation helpful? Give feedback.
-
Hi, To investigate the performance issue, I'll need you to change the logging severity to the Then, please load your csv file and replay the scenario where you experienced the 10s loading episodes, and upload the log file for this session (you will find it in This should hopefully provide me with enough info to work out a reproducer and see if something can be optimized. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi, here is the log file. Delays of about 8000 ms can be seen after scroll events. Thanks, |
Beta Was this translation helpful? Give feedback.
-
binjr_2024-11-05_14-25-48_25676.zip Here is the compressed file. |
Beta Was this translation helpful? Give feedback.
-
Hi, I have found a serious performance regression in the latest v3.20.0 release compared to previous releases, that ties to a suspected regression in the Shenandoah Garbage Collector in OpenJDK 23.0.1. With that said, I am a bit surprise at the numbers I saw in the log file you provided; would you mind sharing the specs of the PC you used and a few lines of your dataset, as an example? The performances seems especially poor to be explained solely by the problem above. For instance, in your logs I see that it takes 8.5 seconds to retrieve about 14,000 samples. On my machine, with v3.20.0 it takes the same amount of time to retrieve over 244,000 samples, and with the fix in v3.20.1 this comes down to 2.6 seconds. |
Beta Was this translation helpful? Give feedback.
-
Hi, I really appreciate your help. I will try the preview build. Here are the specs of my computer, quite old, but still efficient especially with the additional ram installed: One key parameter which may explain the poor performance is that each samples has 55 columns (for the first file) and 37 columns (for the second file), i.e. close to 100 sensors installed and more is coming. Here is one line for the first file:
Are data reloaded from ascii file every time the time range is changed ? |
Beta Was this translation helpful? Give feedback.
-
Your CPU is roughly the same generation as my older one and performance should be comparable; the only significant different being that my model is able to increase its CPU clock significantly to handle a short duration, single threaded load (which is exactly the case here). Still, that doesn't explain a 10x difference. To answer your question, the data are not reloaded from the CSV files; content for all files is indexed into a temporary Lucene index. The index itself is stored on disk so that binjr can handle arbitrarily large datasets without being limited by available RAM, but Lucene makes a sophisticated enough use of IO caches that it is unlikely for the IO path throughput or latency to play a significant role in this case, outside of the initial indexing phase, as long as there is enough available memory to be used as page cache. Finally, with regard to the high number of columns in the CSV file; this will definitely have an impact on the speed of the initial indexing phase, but not on the query matching nor on the data retrieval. Anyway, please try the latest preview and let me know if it helps. Cheers! |
Beta Was this translation helpful? Give feedback.
-
FYI, I have further improved the fetch operation by making it run in parallel on multiple threads, which can decrease the time taken by up to 60% in some cases (i.e. 1600ms vs 2600ms to fetch 244099 hits for a single series, on an Intel i7-10850H CPU, 6 cores/12 threads @2.70-5.0GHz). This is a rather brute force approach though, so the benefits will vary greatly depending on the available CPU resources and the context in which the fetch happens (i.e. fetching data from multiple series on separate charts is already done in parallel so CPU usage might already be maximized and this would provide no further benefits). In any case, this can be disabled if it proves to hurt performances instead of helping, by unchecking the option called |
Beta Was this translation helpful? Give feedback.
-
I tried the new SNAPSHOT version, but it fails to start. "Unable to find java VM". |
Beta Was this translation helpful? Give feedback.
-
v3.20.1 is released, and fixes both the performance regression and the installer error. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm still trying to use Binjr but I could not plot two series of different source in the same plot, or even in the same worksheet.
Data1 comes from file1
Data2 comes from file2
File1 and File2 have not exactly the same timestamp.
If I add Data2 to Data1 plot, Data1 is shown but not Data2. Same happens if there are in two different plots.
Everything is ok when I use two different worksheets but it's not what I need to do.
And one suggestion: would it be possible to precache data that are outside but close to the current plot time range. My data file is 100Mo and everytimes I slightly scroll the plot, I must wait few seconds for the plot to update.
Best regards,
Emmanuel.
Beta Was this translation helpful? Give feedback.
All reactions