Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Preemptively close chunks before big messages to isolate them (#1291)
### Changelog Close chunks when big messages are written to isolate them, leading to much better read performance when going over small-message channels since those isolated chunks can be skipped over entirely. ### Docs None ### Description A very common situation is to have a few topics with very large messages (images, point clouds, etc.) and many topics with very small (and frequent) messages (state, transforms, statuses, etc.). When writing all those into a single ("merged") mcap file, it naturally leads to chunks typically containing a series of small messages and ending with one big message which fills the rest of the chunk and "closes" it. The result of this is that nearly all chunks are "mixed" (i.e., messages from all channels appear in each chunk). During playback / reading, some of the client code (e.g., C++) implement logic to skip over chunks that don't have any message of interest (topics being read). That logic is completely thwarted if most chunks are mixed in this pathological pattern. So, this PR adds a very simple change. When an incoming message is very large (larger than the chunk size), instead of writing that message to the current chunk (mixing with prior smaller messages), it closes the current chunk and writes the big message to a fresh new chunk (which will be closed immediately after). The end result is that big-message channels get isolated into their own series of chunks, enabling the playback / readers to skip over them without loading those chunks to RAM or decompressing them. In theory (but I have not seen this), compression could also perform better on the isolated channels, where it matters most. All I have seen (see below) is that incompressible data (such a protobuf wireformat) does not get uselessly compressed along with the larger "bytes" data (images, point-clouds, etc.). This does not affect the resulting file size but speeds up reading times. (BTW, it would be nice to be able to configure the hardcoded to 1KB minimum compression size in the writer) The only slight drawback that I can see is getting some smaller / partial chunks. I guess there might be some rather unrealistic pathological cases (like interleaving every big and small message) where this could lead to a noticeable difference in file size (bigger chunk index). If you want a writer option to disable that behavior, that's fine, but I believe this should be the default. I ran some tests on some simple mcap files with mixed channels, the result are seen below. <table><tr><th>Before</th><th>After</th></tr><tr><td> Running `mcap-cli info` on the recorded mcap file (2.7 GB): ``` library: libmcap 1.4.0 profile: messages: 31870 duration: 3m50.316691925s start: 2024-08-08T13:03:44.098069201-04:00 (1723136624.098069201) end: 2024-08-08T13:07:34.414761126-04:00 (1723136854.414761126) compression: zstd: [6724/6725 chunks] [3.30 GiB/2.52 GiB (23.58%)] [11.21 MiB/sec] : [1/6725 chunks] [516.00 B/516.00 B (0.00%)] [2.00 B/sec] channels: (1) [redacted_topic_0] 2121 msgs (9.21 Hz) : foxglove.Grid [protobuf] (2) [redacted_topic_1] 2121 msgs (9.21 Hz) : [redacted] [protobuf] (3) [redacted_topic_2] 2297 msgs (9.97 Hz) : foxglove.PointCloud [protobuf] (4) [redacted_topic_3] 2297 msgs (9.97 Hz) : [redacted] [protobuf] (5) [redacted_topic_4] 11517 msgs (50.01 Hz) : foxglove.FrameTransforms [protobuf] (6) [redacted_topic_5] 11517 msgs (50.01 Hz) : [redacted] [protobuf] attachments: 0 metadata: 1 ``` Running `time mcap-cli cat --topic [redacted_topic_4]` on the recorded mcap file (2.7 GB): ``` real 0m3.063s user 0m4.179s sys 0m0.662s ``` </td><td> Running `mcap-cli info` on a recorded mcap file (2.7 GB): ``` library: libmcap 1.4.0 profile: messages: 31870 duration: 3m50.316691925s start: 2024-08-08T13:03:44.098069201-04:00 (1723136624.098069201) end: 2024-08-08T13:07:34.414761126-04:00 (1723136854.414761126) compression: : [2121/8839 chunks] [276.02 KiB/276.02 KiB (0.00%)] [1.20 KiB/sec] zstd: [6718/8839 chunks] [3.30 GiB/2.52 GiB (23.58%)] [11.21 MiB/sec] channels: (1) [redacted_topic_2] 2297 msgs (9.97 Hz) : foxglove.PointCloud [protobuf] (2) [redacted_topic_3] 2297 msgs (9.97 Hz) : [redacted] [protobuf] (3) [redacted_topic_4] 11517 msgs (50.01 Hz) : foxglove.FrameTransforms [protobuf] (4) [redacted_topic_5] 11517 msgs (50.01 Hz) : [redacted] [protobuf] (5) [redacted_topic_0] 2121 msgs (9.21 Hz) : foxglove.Grid [protobuf] (6) [redacted_topic_1] 2121 msgs (9.21 Hz) : [redacted] [protobuf] attachments: 0 metadata: 1 ``` Running `time mcap-cli cat --topic [redacted_topic_4]` on the recorded mcap file (2.7 GB): ``` real 0m0.070s user 0m0.059s sys 0m0.024s ``` </td></tr></table>
- Loading branch information