clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

haiqi96 · 2024-08-22T01:55:35Z

Description

The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:

Refactored BufferedFileReader to use RAII model.
Remove the internal FileDescriptor and C-style read/write from BufferedFileReader
Update the instantiation of all BufferedFileReader instance to use a FileDescriptorReader (instead of FileReader) to avoid double buffering.
Rewrite some BufferedFileReader apis using modern C++ features.

Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.

Validation performed

1.Made sure exisitng Unit test passes.
2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos.
3. Compressed and decompressed a single file, verified the decompressed file matches original

Summary by CodeRabbit

New Features
- Enhanced file reading logic in BufferedFileReader by transitioning to a ReaderInterface.
- Improved flexibility in file handling through integration with FileDescriptorReader.
- Added a new method in FileCompressor for parsing and encoding content from a reader.
Improvements
- Streamlined error handling and buffer management in BufferedFileReader.
- Clarified behavior in BufferReader methods to handle edge cases more effectively.
- Updated try_compressing_as_archive method to allow flexible file reading during compression.
Tests
- Updated test cases for BufferedFileReader to utilize ReaderInterface for improved file handling.

Co-authored-by: kirkrodrigues <[email protected]>

components/core/src/clp/BufferReader.cpp

components/core/src/clp/BufferedFileReader.cpp

kirkrodrigues · 2024-09-11T13:09:26Z

components/core/src/clp/BufferedFileReader.cpp

    m_buffer_begin_pos = 0;
-    m_buffer_reader.emplace(m_buffer.data(), 0);
+    m_buffer_reader = BufferReader{m_buffer.data(), 0};
    m_highest_read_pos = 0;


Shouldn't this be set to m_file_reader->get_pos(), i.e., the highest position we've read to in the underlying file? Otherwise, I think if we set a checkpoint on a file that has already had some data read, the checkpoint position will be wrong (0); and if we then clear that checkpoint without reading or seeking any data, we'll cause the underlying file to seek backwards.

If we make the suggested change, then it probably makes sense that m_file_pos should be set to m_highest_read_pos (the end result is equivalent, but the intention of the code would be clearer).

Can we also add a test for setting and clearing a checkpoint without reading any data beforehand?

Thanks for catching this. We should set both m_buffer_begin_pos and m_highest_read_pos to be m_file_reader->get_pos()

Imo, it would be more clear to assign m_file_pos first, and let m_buffer_begin_pos and m_highest_read_pos to be m_file_pos though

components/core/src/clp/BufferedFileReader.cpp

kirkrodrigues · 2024-09-11T13:47:55Z

components/core/src/clp/BufferedFileReader.hpp

@@ -185,7 +171,6 @@ class BufferedFileReader : public ReaderInterface {
     * @param buf
     * @param num_bytes_to_read The number of bytes to try and read
     * @param num_bytes_read The actual number of bytes read
-     * @return ErrorCode_NotInit if the file is not open
     * @return ErrorCode_BadParam if buf is null
     * @return ErrorCode_errno on error reading from the underlying file


I don't think this is a case anymore, right?

Perhaps we should rewrite the return statements using the "Same as xyz" method we use elsewhere.

Should have fixed all?

components/core/src/clp/BufferedFileReader.hpp

components/core/src/clp/BufferedFileReader.cpp

kirkrodrigues · 2024-09-11T14:01:30Z

components/core/tests/test-BufferedFileReader.cpp

+    boost::filesystem::remove(test_file_path);
+}
+
+TEST_CASE("Test reading with beginning offset", "[BufferedFileReader]") {


Instead of adding this test case, I think you could modify the original test to advance both the reference buffer and the FileDescriptorReader by some amount X before running the tests. Then you could use Catch2's GENERATE(0, N) (where N is some arbitrary value) to generate tests for the cases where the underlying file hasn't had any content read and where it has had content read.

components/core/src/clp/clp/FileCompressor.hpp

Co-authored-by: kirkrodrigues <[email protected]>

…lp_fork into BufferedReaderRefactor

coderabbitai · 2024-10-08T00:25:05Z

Walkthrough

The pull request introduces substantial modifications to the BufferedFileReader and related components, transitioning from a file descriptor-based approach to a more abstracted ReaderInterface. Key changes include updated constructor and method signatures, the removal of redundant file management methods, and enhanced error handling. The FileCompressor class has also been updated to accommodate the new reading strategy, with added methods for improved functionality. Overall, these changes streamline file reading operations and enhance the flexibility of the components involved.

Changes

File Path	Change Summary
components/core/src/clp/BufferedFileReader.cpp	Transitioned from file descriptor to `ReaderInterface`; updated `read_into_buffer` and constructor; removed unnecessary methods; refined error handling and logic.
components/core/src/clp/BufferedFileReader.hpp	Constructor updated to accept `std::unique_ptr<ReaderInterface>`; removed methods for file management; updated member variables for improved buffer management.
components/core/src/clp/clp/FileCompressor.hpp	Added `parse_and_encode` method; updated `try_compressing_as_archive` to accept `ReaderInterface&`; removed `m_file_reader` variable.
components/core/tests/test-BufferedFileReader.cpp	Updated tests to use `FileDescriptorReader`; modified initialization of `BufferedFileReader`; enhanced test coverage for reading from non-zero positions.

Possibly related PRs

ffi: Redesign Deserializer API to deserialize key-value pair IR streams one IR unit at a time (fixes #539). #549: The changes in this PR involve a redesign of the Deserializer API to handle IR streams using a ReaderInterface, which aligns with the main PR's updates to the BufferedFileReader class that also utilizes a ReaderInterface for reading operations.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (6)

components/core/src/clp/BufferReader.hpp (1)
26-27: LGTM! Consider adding a comment about usage.

The addition of a default constructor improves the flexibility of the BufferReader class. The use of constructor delegation is a good C++ practice.

Consider adding a comment to clarify that this constructor creates an empty buffer, and users should ensure data is set before reading operations are performed. This can help prevent potential misuse.
+    // Creates an empty BufferReader. Ensure data is set before performing read operations.
     BufferReader() : BufferReader(nullptr, 0, 0) {}
components/core/src/clp/BufferedFileReader.hpp (4)
18-21: Clarify documentation regarding 'ReaderInterface' and sequential access

The documentation describes performing unordered reads from inputs that only support sequential access, like S3. It would be helpful to explain how BufferedFileReader enables unordered reads over sequential inputs and any limitations or requirements for the underlying ReaderInterface.

98-99: Enhance documentation of return values

In the try_refill_buffer_if_empty() method, the return values section mentions:

Same as refill_reader_buffer if it fails

ErrorCode_Success on success

Consider detailing the specific error codes that refill_reader_buffer might return. This would improve clarity and assist users in handling errors appropriately.

159-160: Detail possible error codes in documentation

For the try_read method, the return values state:

Same as BufferReader::try_read if it fails

ErrorCode_Success on success

It would be beneficial to enumerate potential error codes from BufferReader::try_read, aiding developers in understanding and handling different failure scenarios.

219-219: Consider renaming 'm_file_reader' to 'm_reader' for clarity

Since m_file_reader holds a ReaderInterface that may not necessarily represent a file, renaming it to m_reader could better reflect its purpose and enhance code readability.

Apply this diff to rename the variable:
-    std::unique_ptr<ReaderInterface> m_file_reader;
+    std::unique_ptr<ReaderInterface> m_reader;
components/core/src/clp/clp/FileCompressor.cpp (1)
303-303: Update parameter name for clarity

The parameter file_reader is now of type ReaderInterface&, which may cause confusion with the existing reader parameters. Consider renaming it to reader for consistency and clarity.

Apply this diff to rename the parameter:
-bool FileCompressor::try_compressing_as_archive(
...
-    ReaderInterface& file_reader,
+    ReaderInterface& reader,
...
And update all usages of file_reader to reader within the function.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 54b832c and e4bbd30.

📒 Files selected for processing (7)

components/core/src/clp/BufferReader.cpp (1 hunks)
components/core/src/clp/BufferReader.hpp (1 hunks)
components/core/src/clp/BufferedFileReader.cpp (8 hunks)
components/core/src/clp/BufferedFileReader.hpp (6 hunks)
components/core/src/clp/clp/FileCompressor.cpp (7 hunks)
components/core/src/clp/clp/FileCompressor.hpp (2 hunks)
components/core/tests/test-BufferedFileReader.cpp (2 hunks)

🧰 Additional context used

🪛 cppcheck

components/core/src/clp/clp/FileCompressor.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (12)

components/core/src/clp/BufferReader.hpp (2)
26-27: Review implementation of ReaderInterface methods.

With the addition of the default constructor, it's now possible to have a BufferReader with no data. Please review the implementation of the ReaderInterface methods (e.g., try_read, try_seek_from_begin, try_get_pos, try_read_to_delimiter) to ensure they handle the case of an empty buffer correctly.

To assist with this review, you can use the following script to locate the implementations of these methods:
#!/bin/bash
# Description: Locate implementations of ReaderInterface methods

# Test: Search for implementations of ReaderInterface methods
rg --type cpp -A 5 $'auto BufferReader::(try_read|try_seek_from_begin|try_get_pos|try_read_to_delimiter)'
26-27: Consider adding null checks in critical methods.

With the introduction of the default constructor, it's now possible to have a BufferReader with a null internal buffer. To prevent potential null pointer dereferences, consider adding null checks in methods that directly access the buffer, such as peek_buffer and get_remaining_data_size.

To verify the current implementation, let's check these methods:
components/core/src/clp/BufferReader.cpp (1)

18-18: Change correctly sets buf to nullptr when no data remains.

This update improves clarity by ensuring buf is set to nullptr when there is no data left to peek, which aligns with expected behaviour.
components/core/src/clp/clp/FileCompressor.hpp (2)
68-77: The addition of parse_and_encode method is well-structured

The new parse_and_encode method is appropriately defined with clear parameters, enhancing the flexibility and functionality of the FileCompressor class.

115-117: Ensure all callers of try_compressing_as_archive are updated with the new parameter

The method try_compressing_as_archive now includes an additional parameter ReaderInterface& file_reader. Please verify that all calls to this method throughout the codebase have been updated to include the new parameter to prevent any runtime errors or unexpected behaviour.

Run the following script to identify all invocations of try_compressing_as_archive and check if they have been updated:

✅ Verification successful

All calls to try_compressing_as_archive have been verified and are up to date with the new file_reader parameter.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that all calls to 'try_compressing_as_archive' include the new 'file_reader' parameter.

# Test: Search for function calls to 'try_compressing_as_archive'. Expect: All calls should pass the 'file_reader' argument.

rg --type 'cpp' --pcre2 'try_compressing_as_archive\(' -A 2
Length of output: 1846
components/core/src/clp/BufferedFileReader.hpp (2)

29-29: Verify accuracy of 'file position' reference in documentation

The note states: "Unless otherwise noted, the 'file position' mentioned in docstrings is the position in the buffered file, not the position in the original input file." Given that BufferedFileReader wraps another ReaderInterface, please verify if this accurately reflects the intended meaning or if it should refer to the position in the original input data source.

87-87: Confirm necessity of 'override' specifier on destructor

The destructor is declared as ~BufferedFileReader() override = default;. Ensure that the base class ReaderInterface has a virtual destructor. If not, consider adding one to avoid undefined behaviour during object destruction.

components/core/tests/test-BufferedFileReader.cpp (1)

280-285: Good implementation of parameterized tests using GENERATE

Utilizing GENERATE(0, 127); effectively tests the BufferedFileReader starting from different offsets, enhancing test coverage and accounting for varied starting positions.

components/core/src/clp/clp/FileCompressor.cpp (3)

5-5: Include directives and using statements are appropriate

The added #include directives and using statements for <memory>, <utility>, std::make_unique, std::move, and std::unique_ptr are necessary for smart pointers and move semantics used in the code.

Also applies to: 7-7, 37-38, 41-41

129-129: Initialization of buffered_file_reader using make_unique

The initialization of buffered_file_reader with make_unique aligns with modern C++ practices and ensures proper memory management.

393-402: Confirm correct reader is passed to parse_and_encode

The function parse_and_encode is called with m_libarchive_file_reader as the reader parameter. Ensure that m_libarchive_file_reader correctly implements the ReaderInterface and is properly initialized before use.

Run the following script to verify the implementation of m_libarchive_file_reader:

components/core/src/clp/BufferedFileReader.cpp (1)

Line range hint 203-230: LGTM: The try_read_to_delimiter function is correctly updated

The modifications to try_read_to_delimiter appropriately handle error codes and update the file position accurately.

components/core/src/clp/BufferReader.cpp

components/core/src/clp/BufferedFileReader.hpp

coderabbitai · 2024-10-08T00:31:22Z

components/core/tests/test-BufferedFileReader.cpp

    for (size_t i = 0; i < test_data.size(); ++i) {
        test_data[i] = static_cast<char>('a' + (std::rand() % (cNumAlphabets)));
    }

    // Write to test file


🛠️ Refactor suggestion

Use modern random number generation instead of std::rand()

The use of std::rand() is outdated and not recommended in modern C++. Consider using the <random> header with std::mt19937 and std::uniform_int_distribution for better randomness and test reproducibility.

Here's how you might refactor the code:

#include <random> // Initialize random number generator std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<> distrib('a', 'a' + cNumAlphabets - 1); for (size_t i = 0; i < test_data.size(); ++i) { test_data[i] = static_cast<char>(distrib(gen)); }

components/core/tests/test-BufferedFileReader.cpp

coderabbitai · 2024-10-08T00:31:23Z

components/core/src/clp/clp/FileCompressor.cpp

@@ -116,15 +121,15 @@ bool FileCompressor::compress_file(
        streaming_archive::writer::Archive& archive_writer,
        bool use_heuristic
 ) {
-    std::string file_name = std::filesystem::canonical(file_to_compress.get_path()).string();
+    string file_name = std::filesystem::canonical(file_to_compress.get_path()).string();


⚠️ Potential issue

Handle potential exceptions from std::filesystem::canonical

The use of std::filesystem::canonical may throw exceptions if the file path does not exist or there are permission issues. To prevent the application from terminating unexpectedly, consider adding exception handling around this call.

You can modify the code as follows to handle potential exceptions:

string file_name; try { file_name = std::filesystem::canonical(file_to_compress.get_path()).string(); } catch (const std::filesystem::filesystem_error& e) { SPDLOG_ERROR( "Failed to get canonical path for {}: {}", file_to_compress.get_path(), e.what() ); return false; }

It's good to have since I ran into this a few times. but do we want to have the fix in this PR?

@haiqi96, understood. We can address this in a future PR. Would you like me to open a new GitHub issue to track this enhancement?

Sure, please

@haiqi96, I've created a new GitHub issue to track this enhancement: Issue #553.

🔗 Analysis chain

@haiqi96, I've opened a new GitHub issue to track this enhancement.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Create a new GitHub issue to handle exceptions from std::filesystem::canonical gh issue create \ --title "Handle exceptions from std::filesystem::canonical in FileCompressor.cpp" \ --body "As discussed in [PR #523](https://github.com/y-scope/clp/pull/523) and [this comment](https://github.com/y-scope/clp/pull/523#discussion_r1791033838), we should handle potential exceptions from \`std::filesystem::canonical\` in \`FileCompressor.cpp\`. Requester: @haiqi96" \ --assignee haiqi96

Length of output: 438

components/core/src/clp/clp/FileCompressor.cpp

coderabbitai · 2024-10-08T00:31:23Z

components/core/src/clp/BufferedFileReader.cpp

+auto read_into_buffer(
+        ReaderInterface* reader,
+        char* buf,
+        size_t num_bytes_to_read,
+        size_t& num_bytes_read
+) -> ErrorCode;


🛠️ Refactor suggestion

Remove unnecessary forward declaration of read_into_buffer to simplify code

Since the implementation of read_into_buffer follows immediately and there are no calls to it before its definition, the forward declaration within the anonymous namespace is unnecessary.

You can safely remove the forward declaration at lines 32–37.

@kirkrodrigues I believe our coding standard is to keep the forward declaration?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

components/core/src/clp/BufferedFileReader.cpp

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

components/core/src/clp/BufferedFileReader.cpp (3)
32-50: LGTM: Refactored read_into_buffer to use ReaderInterface

The changes align well with the PR objective of refactoring to use a ReaderInterface. The implementation correctly uses reader->try_read instead of direct file descriptor operations.

However, consider simplifying the error handling:
 auto read_into_buffer(
         ReaderInterface* reader,
         char* buf,
         size_t num_bytes_to_read,
         size_t& num_bytes_read
 ) -> ErrorCode {
     num_bytes_read = 0;
     auto const error_code = reader->try_read(buf, num_bytes_to_read, num_bytes_read);
-    if (0 == num_bytes_read) {
-        return ErrorCode_EndOfFile;
-    }
-    return error_code;
+    return (0 == num_bytes_read) ? ErrorCode_EndOfFile : error_code;
 }
This change maintains the same logic but reduces nesting and improves readability.

126-131: LGTM: Updated seek operation to use ReaderInterface

The changes align well with the PR objective of using ReaderInterface. The error handling for the seek operation is more robust, checking for specific error codes.

Consider simplifying the error handling:
-            if (auto const error_code = m_file_reader->try_seek_from_begin(pos);
-                error_code != ErrorCode_Success)
-            {
-                return error_code;
-            }
+            if (auto const error_code = m_file_reader->try_seek_from_begin(pos); error_code != ErrorCode_Success)
+                return error_code;
This change reduces nesting and improves readability while maintaining the same logic.

163-187: LGTM: Improved try_read method with std::span and restructured loop

The use of std::span for buffer management improves safety and clarity. The restructured loop correctly handles reading from the BufferReader and refilling when necessary, aligning well with the overall refactoring objectives.

Consider simplifying the error handling in the loop:
         } else if (ErrorCode_EndOfFile == error_code) {
             error_code = refill_reader_buffer(m_base_buffer_size);
-            if (ErrorCode_EndOfFile == error_code) {
-                break;
-            }
-            if (ErrorCode_Success != error_code) {
+            if (ErrorCode_Success != error_code && ErrorCode_EndOfFile != error_code) {
                 return error_code;
             }
+            if (ErrorCode_EndOfFile == error_code) {
+                break;
+            }
         } else {
             return error_code;
         }
This change reduces nesting and improves readability while maintaining the same logic.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e4bbd30 and 7fc448a.

📒 Files selected for processing (2)

components/core/src/clp/BufferReader.cpp (1 hunks)
components/core/src/clp/BufferedFileReader.cpp (8 hunks)

🧰 Additional context used

🔇 Additional comments (12)

components/core/src/clp/BufferReader.cpp (5)

7-16: Excellent improvements to the constructor!

The changes to the constructor are well-implemented:

The use of a member initializer list is a good C++ practice that can improve performance.

The nullptr check when data_size or pos is non-zero is correct and prevents potential issues.

The additional check pos > data_size addresses the past review comment and prevents invalid memory access.

These changes enhance the robustness and safety of the BufferReader class.

23-23: Good clarification in peek_buffer method

The change to set buf to nullptr when peek_size is zero is a good improvement. It clearly indicates that there's no data to peek, which enhances the method's clarity and prevents potential misuse of an invalid pointer.

Line range hint 27-59: try_read_to_delimiter method remains robust

The try_read_to_delimiter method has not undergone significant changes. The existing implementation appears to be correct and efficient, handling the delimiter search, appending to the string, and managing the internal buffer position appropriately.

Line range hint 61-64: Good defensive check in try_read method

The addition of the nullptr check when num_bytes_to_read > 0 is an excellent defensive programming practice. This check prevents potential issues with invalid memory access and improves the overall robustness of the method.

Line range hint 95-103: Improved clarity in overloaded try_read_to_delimiter

The addition of str.clear() when append is false is a good improvement. It ensures that the string is empty before appending when not in append mode, which enhances the method's correctness and clarity. This change aligns well with the expected behaviour of the method.

components/core/src/clp/BufferedFileReader.cpp (7)

3-18: LGTM: Appropriate includes and using declarations added

The new includes and using declarations are suitable for the changes made in the file. They support the transition to using smart pointers (std::unique_ptr) and modern C++ features (std::span), which aligns with the refactoring objectives.

54-68: LGTM: Constructor refactored to use ReaderInterface

The changes in the constructor align well with the PR objective. The use of std::unique_ptr instead of a file descriptor improves resource management and flexibility. The initialization of m_file_pos, m_buffer_reader, and m_highest_read_pos is correct and addresses previous review comments.

72-76: LGTM: Updated buffer emptiness check

The change to use m_buffer_reader.get_buffer_size() for checking buffer emptiness is consistent with the new BufferReader usage. The logic remains correct and aligns with the overall refactoring.

264-269: LGTM: Updated refill_reader_buffer to use ReaderInterface

The changes in refill_reader_buffer align well with the PR objective of using ReaderInterface. The use of read_into_buffer with m_file_reader.get() correctly replaces the direct file descriptor operations. The buffer management logic has been appropriately updated to work with the new BufferReader, maintaining the same functionality as before.

281-289: LGTM: Updated drop_content_before_current_pos to work with BufferReader

The changes in drop_content_before_current_pos are consistent with the new BufferReader usage. The use of m_buffer_reader.get_pos() and m_buffer_reader.get_buffer_size() for calculations is correct. The buffer management logic has been appropriately updated to work with the new BufferReader, maintaining the same functionality as before.

Line range hint 291-294: LGTM: Minor improvement in update_file_pos

The update_file_pos method now uses std::max to update m_highest_read_pos, which is a minor improvement in clarity and correctness. The method maintains its original functionality while aligning with modern C++ practices.

Line range hint 1-295: Overall: Excellent refactoring to use ReaderInterface

The changes in this file successfully achieve the PR objective of refactoring BufferedFileReader to use ReaderInterface instead of direct file descriptor operations. Key improvements include:

Adoption of modern C++ features like std::span and std::unique_ptr, enhancing code safety and clarity.

Consistent use of the new BufferReader interface throughout the file.

Improved error handling in several methods.

Removal of direct file descriptor operations, making the code more flexible and easier to test.

These changes significantly improve the code's flexibility, safety, and maintainability. The refactoring aligns well with the RAII model and modern C++ best practices.

While the changes are generally excellent, consider the minor suggestions for error handling improvements in previous comments to further enhance code readability.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)

components/core/src/clp/BufferedFileReader.cpp (6)
32-50: Approve changes to read_into_buffer with a suggestion

The refactoring of read_into_buffer to use ReaderInterface is a good improvement. It simplifies the function and makes it more flexible. However, consider enhancing the error handling to distinguish between different types of errors that try_read might return.

Consider updating the error handling to be more specific:
 auto error_code = reader->try_read(buf, num_bytes_to_read, num_bytes_read);
-if (0 == num_bytes_read) {
+if (error_code != ErrorCode_Success) {
+    return error_code;
+} else if (0 == num_bytes_read) {
     return ErrorCode_EndOfFile;
 }
-return error_code;
+return ErrorCode_Success;
This change ensures that any error from try_read is properly propagated.

54-76: Approve constructor changes with a suggestion

The updated constructor properly implements the transition to using ReaderInterface. The null check for reader_interface is a good addition for robustness.

Consider initializing m_highest_read_pos in the initializer list for consistency:
 BufferedFileReader::BufferedFileReader(
         std::unique_ptr<ReaderInterface> reader_interface,
         size_t base_buffer_size
 )
-        : m_file_reader(std::move(reader_interface)) {
+        : m_file_reader(std::move(reader_interface)),
+          m_highest_read_pos(0) {
     // ... rest of the constructor
-    m_highest_read_pos = m_file_pos;
+    m_highest_read_pos = m_file_pos;  // Update after m_file_pos is set
 }
This change improves code clarity and ensures m_highest_read_pos is always initialized.

Line range hint 89-112: Approve checkpoint changes with a suggestion

The updates to set_checkpoint and clear_checkpoint align well with the new buffering mechanism and improve the checkpoint functionality. The use of m_highest_read_pos in clear_checkpoint is a good enhancement.

In set_checkpoint, consider adding a comment explaining the condition for dropping content:
 if (m_checkpoint_pos.has_value() && m_checkpoint_pos < m_file_pos
-    && m_buffer_reader.get_buffer_size() != m_base_buffer_size)
+    && m_buffer_reader.get_buffer_size() != m_base_buffer_size)  // Buffer not full
 {
     drop_content_before_current_pos();
 }
This comment would clarify the reasoning behind this condition, improving code maintainability.

Line range hint 121-160: Approve try_seek_from_begin changes with a suggestion

The updates to try_seek_from_begin successfully implement the transition to using ReaderInterface and BufferReader. The error handling is more comprehensive, which is a good improvement.

Consider extracting the logic for seeking beyond the current buffer into a separate private method to improve readability:
 auto BufferedFileReader::try_seek_from_begin(size_t pos) -> ErrorCode {
     // ... existing checks ...

     auto error_code = m_buffer_reader.try_seek_from_begin(get_buffer_relative_pos(pos));
     if (ErrorCode_Truncated == error_code) {
-        if (false == m_checkpoint_pos.has_value()) {
-            // ... existing logic ...
-        } else {
-            // ... existing logic ...
-        }
+        return seek_beyond_current_buffer(pos);
     } else if (ErrorCode_Success != error_code) {
         return error_code;
     }
     update_file_pos(pos);
     return ErrorCode_Success;
 }

+auto BufferedFileReader::seek_beyond_current_buffer(size_t pos) -> ErrorCode {
+    if (false == m_checkpoint_pos.has_value()) {
+        // ... existing logic ...
+    } else {
+        // ... existing logic ...
+    }
+}
This refactoring would make the main method easier to read and understand.

Line range hint 162-196: Approve try_read changes with a suggestion

The refactoring of try_read is a significant improvement. The use of std::span and the simplified loop structure enhance readability and safety. The error handling is well-aligned with the new design.

Consider adding a comment to explain the logic for determining end-of-file condition:
 if (dst_view.size() == num_bytes_to_read) {
+    // If no bytes were read, we've reached the end of the file
     return ErrorCode_EndOfFile;
 }
 return ErrorCode_Success;
This comment would clarify the reasoning behind this check, improving code maintainability.

Line range hint 241-286: Approve refill_reader_buffer changes with a suggestion

The refactoring of refill_reader_buffer successfully implements the transition to using ReaderInterface. The buffer management logic has been updated to handle checkpoints more efficiently, which is a good improvement.

Consider extracting the buffer size calculation logic into a separate private method to improve readability:
+auto BufferedFileReader::calculate_buffer_size(size_t num_bytes_to_refill, size_t data_size) -> size_t {
+    if (m_checkpoint_pos.has_value()) {
+        return int_round_up_to_multiple(
+                get_buffer_end_pos() + num_bytes_to_refill,
+                m_base_buffer_size
+        );
+    } else {
+        return m_base_buffer_size - (get_buffer_end_pos() % m_base_buffer_size);
+    }
+}

 auto BufferedFileReader::refill_reader_buffer(size_t num_bytes_to_refill) -> ErrorCode {
     // ... existing code ...

-    size_t num_bytes_to_read{0};
-    if (m_checkpoint_pos.has_value()) {
-        num_bytes_to_read = int_round_up_to_multiple(
-                buffer_end_pos + num_bytes_to_refill,
-                m_base_buffer_size
-        );
-        // ... existing code ...
-    } else {
-        num_bytes_to_read = m_base_buffer_size - (buffer_end_pos % m_base_buffer_size);
-        // ... existing code ...
-    }
+    size_t num_bytes_to_read = calculate_buffer_size(num_bytes_to_refill, data_size);

     // ... rest of the method ...
 }
This refactoring would make the main method easier to read and understand by separating the buffer size calculation logic.
components/core/tests/test-BufferedFileReader.cpp (1)

286-300: Excellent addition of non-zero position reading test.

The new test case effectively addresses the PR objective of testing BufferedFileReader with a non-zero position reader. The use of GENERATE improves test coverage by testing multiple starting positions.

One suggestion for improvement:

Consider adding a comment explaining the significance of the chosen values (0 and 127) for reader_begin_offset. This would enhance the test's maintainability and clarify the reasoning behind these specific values.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7fc448a and 38076be.

📒 Files selected for processing (3)

components/core/src/clp/BufferReader.cpp (1 hunks)
components/core/src/clp/BufferedFileReader.cpp (8 hunks)
components/core/tests/test-BufferedFileReader.cpp (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

components/core/src/clp/BufferReader.cpp

🧰 Additional context used

🔇 Additional comments (12)

components/core/src/clp/BufferedFileReader.cpp (5)

3-18: LGTM: Updated includes and using declarations

The new includes and using declarations are appropriate for the refactoring changes. They indicate a shift towards using more standard C++ components, which is a good practice.

79-84: LGTM: Simplified try_refill_buffer_if_empty

The simplification of try_refill_buffer_if_empty improves readability and aligns well with the new design using BufferReader. The change to use m_buffer_reader.get_buffer_size() is more direct and efficient.

Line range hint 198-239: LGTM: Improved try_read_to_delimiter implementation

The refactoring of try_read_to_delimiter is well done. The use of BufferReader for reading to a delimiter simplifies the implementation. The error handling is more comprehensive, and the logic for handling end-of-file conditions has been improved. These changes align well with the overall refactoring goals and enhance the robustness of the method.

Line range hint 288-305: LGTM: Improved buffer management and position tracking

The changes to drop_content_before_current_pos and update_file_pos are well-implemented. The use of BufferReader methods for position and size calculations in drop_content_before_current_pos aligns with the new design and simplifies the buffer management logic. The update to m_highest_read_pos in update_file_pos is a good addition for maintaining the highest read position, which should improve the overall functionality of the class.

Line range hint 1-305: Overall assessment: Significant improvements with minor suggestions

The refactoring of BufferedFileReader.cpp to use ReaderInterface and BufferReader is a substantial improvement. These changes enhance the flexibility and abstraction of the file reading operations, which should make the code more maintainable and easier to extend in the future.

Key improvements include:

Consistent use of ReaderInterface throughout the class.

Enhanced error handling and buffer management.

Addition of m_highest_read_pos for better position tracking.

Use of modern C++ features like std::span.

While the changes are generally well-implemented, consider the following suggestions for further improvement:

Extract complex logic in methods like try_seek_from_begin and refill_reader_buffer into separate private methods for better readability.

Add more comments to explain complex conditions or logic, especially in checkpoint-related operations.

Consider further optimizing buffer management to reduce memory allocations where possible.

Overall, this refactoring significantly improves the BufferedFileReader class, making it more robust and flexible.

components/core/tests/test-BufferedFileReader.cpp (7)

7-7: Inclusion of FileDescriptorReader header is appropriate.

The addition of the FileDescriptorReader header is necessary for the changes made in the test cases and is correctly placed in the include section.

12-20: Appropriate updates to using statements and namespace usage.

The additions of using statements for ErrorCode, FileDescriptorReader, and make_unique are consistent with the changes in the test cases. The use of string instead of std::string improves code readability.

43-43: Correct implementation of BufferedFileReader with FileDescriptorReader.

The initialization of BufferedFileReader with a FileDescriptorReader aligns with the PR objective and uses modern C++ practices with make_unique.

265-268: Excellent update to modern random number generation.

The implementation of modern random number generation using the library is a significant improvement. It addresses the previous suggestion and enhances the quality and reproducibility of the tests.

276-276: Consistent use of modern random number generation for test data.

The test data generation now correctly uses the new random number generator, improving the randomness and quality of the test data.

305-305: Consistent use of modern random number generation for delimiter selection.

The delimiter selection now correctly uses the new random number generator, improving the randomness of the test cases.

Line range hint 1-315: Overall excellent refactoring and test improvements.

The changes in this file successfully implement the PR objectives:

BufferedFileReader now uses FileDescriptorReader, aligning with the goal of accepting any arbitrary reader interface.

Test coverage has been improved, particularly for non-zero position reading scenarios.

The use of modern C++ practices, especially in random number generation, enhances the overall quality and reliability of the tests.

These changes contribute significantly to the flexibility and robustness of the BufferedFileReader class and its testing suite.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)

components/core/src/clp/clp/FileCompressor.hpp (1)
58-67: Enhance method documentation with parameter descriptions.

The documentation should describe each parameter's purpose and any constraints or requirements.

Example improvement:
/**
 * Parses and encodes content from the given reader into the given archive_writer.
 * @param target_data_size_of_dicts Target size for dictionaries in bytes
 * @param archive_user_config User-specific archive configuration
 * @param target_encoded_file_size Target size for encoded file in bytes
 * @param path_for_compression Path of the file being compressed
 * @param group_id Group identifier for the content
 * @param archive_writer Destination archive for encoded content
 * @param reader Source interface for reading content
 * @param use_heuristic Whether to use heuristic-based parsing
 * @throws std::runtime_error If parsing or encoding fails
 */
components/core/src/clp/BufferedFileReader.cpp (1)

239-243: Consider adding documentation for buffer management strategy

While the implementation is correct, the complex buffer management strategy would benefit from detailed documentation explaining the relationship between checkpoints and buffer sizing.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 38076be and 74f57bc.

📒 Files selected for processing (3)

components/core/src/clp/BufferedFileReader.cpp (7 hunks)
components/core/src/clp/BufferedFileReader.hpp (9 hunks)
components/core/src/clp/clp/FileCompressor.hpp (3 hunks)

🔇 Additional comments (15)

components/core/src/clp/clp/FileCompressor.hpp (1)
Line range hint 106-117: Consider enhancing error reporting mechanism.

While the boolean return type indicates success/failure, it might not provide enough context about failure reasons. Consider using std::expected (C++23) or a Result type pattern for more detailed error reporting.

Example approach:
enum class CompressionError {
    InvalidFormat,
    ReadError,
    WriteError
    // ... other specific error cases
};

auto try_compressing_as_archive(
    // ... existing parameters ...
) -> std::expected<void, CompressionError>;  // C++23
Also, please complete the documentation for the new parameters as previously requested in the past review.
components/core/src/clp/BufferedFileReader.cpp (5)

3-18: LGTM: Modern C++ includes and using declarations

The added includes and using declarations appropriately support the refactoring to use modern C++ features like std::span for safer buffer management and std::unique_ptr for RAII-compliant resource management.

21-43: LGTM: Constructor properly implements RAII pattern

The constructor correctly:

Validates the reader interface and buffer size

Initializes file position from the reader interface

Sets up the buffer reader with appropriate initial state

46-51: LGTM: Simplified buffer refill check

The method efficiently checks the buffer state before attempting a refill, with proper error code handling.

138-161: LGTM: Robust implementation with span

The implementation correctly:

Uses span for safe buffer management

Handles partial reads appropriately

Implements proper error handling as discussed in previous reviews

255-263: LGTM: Clean buffer management implementation

The method correctly manages buffer content and follows the previous review feedback about removing unnecessary casts.

components/core/src/clp/BufferedFileReader.hpp (9)

4-4: Including <cstddef> for standard definitions is appropriate.

The inclusion of <cstddef> provides definitions for types like size_t, which are used throughout the class.

8-8: Including <utility> for std::move usage is necessary.

Including <utility> is required for the use of std::move in constructors and other parts of the code.

18-21: Updated class documentation accurately reflects new functionality.

The changes to the class comments appropriately describe that BufferedFileReader now reads from another ReaderInterface, enhancing flexibility and supporting various data sources.

29-29: Clarification on 'file position' improves understandability.

Specifying that "file position" refers to the buffered file rather than the original input file helps prevent confusion.

74-81: Constructor refactoring enhances flexibility and resource management.

Changing the constructor to accept a std::unique_ptr<ReaderInterface> allows BufferedFileReader to wrap any reader, not just file descriptors, increasing the utility of the class.

83-84: Delegating constructor simplifies initialization with default values.

Using a delegating constructor to set the default buffer size reduces code duplication and enhances maintainability.

87-87: Explicitly defaulted destructor ensures proper resource cleanup.

Declaring the destructor as override = default clarifies that the default destruction behaviour is sufficient, ensuring resources are properly released.

172-172: Verify accuracy of the @return statement.

The @return comment mentions refill_reader_buffer. Ensure that this reference is still accurate after the refactoring, or update it to reflect the current method behaviour.

223-223: Initialization of m_buffer_reader without std::optional simplifies usage.

By directly instantiating m_buffer_reader, the code eliminates unnecessary optional wrapping, simplifying access. Ensure that m_buffer_reader is properly initialized in all constructors to prevent potential issues.

coderabbitai · 2024-10-24T13:05:33Z

components/core/src/clp/clp/FileCompressor.hpp

+     * Parses and encodes content from the given reader into the given archive_writer.
     * @param target_data_size_of_dicts
     * @param archive_user_config
     * @param target_encoded_file_size
     * @param path_for_compression
     * @param group_id
     * @param archive_writer
     * @param reader
+     * @param use_heuristic
     */
+    auto parse_and_encode(
+            size_t target_data_size_of_dicts,
+            streaming_archive::writer::Archive::UserConfig& archive_user_config,
+            size_t target_encoded_file_size,
+            std::string const& path_for_compression,
+            group_id_t group_id,
+            streaming_archive::writer::Archive& archive_writer,
+            ReaderInterface& reader,
+            bool use_heuristic
+    ) -> void;
+


🛠️ Refactor suggestion

Consider refactoring method signature to reduce parameter count.

The method has 8 parameters, which could make it difficult to maintain and use correctly. Consider grouping related parameters into a configuration object.

Example refactor:

struct ParseConfig { size_t target_data_size_of_dicts; size_t target_encoded_file_size; std::string path_for_compression; group_id_t group_id; bool use_heuristic; streaming_archive::writer::Archive::UserConfig& archive_user_config; }; auto parse_and_encode( ParseConfig const& config, streaming_archive::writer::Archive& archive_writer, ReaderInterface& reader ) -> void;

This is legacy code refactor, so I would prefer not to extend the scope of this PR. Push this off to future

@haiqi96, understood. We can address this in a future PR.

✏️ Learnings added

Learnt from: haiqi96 PR: y-scope/clp#523 File: components/core/src/clp/clp/FileCompressor.hpp:58-78 Timestamp: 2024-10-24T14:25:17.796Z Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2024-10-24T13:05:33Z

components/core/src/clp/BufferedFileReader.cpp

+    auto error_code = m_buffer_reader.try_seek_from_begin(get_buffer_relative_pos(pos));
    if (ErrorCode_Truncated == error_code) {
        if (false == m_checkpoint_pos.has_value()) {
            // If checkpoint is not set, simply move the file_pos and invalidate
            // the buffer reader
-            auto offset = lseek(m_fd, static_cast<off_t>(pos), SEEK_SET);
-            if (-1 == offset) {
-                return ErrorCode_errno;
+            if (auto const error_code = m_file_reader->try_seek_from_begin(pos);
+                error_code != ErrorCode_Success)
+            {
+                return error_code;
            }
-            m_buffer_reader.emplace(m_buffer.data(), 0);
+            m_buffer_reader = BufferReader{m_buffer.data(), 0};


🛠️ Refactor suggestion

Consider simplifying error handling logic

The nested conditions and error handling could be made more readable by extracting the checkpoint-related logic into a separate method.

Consider refactoring like this:

- if (ErrorCode_Truncated == error_code) { - if (false == m_checkpoint_pos.has_value()) { - // If checkpoint is not set, simply move the file_pos and invalidate - // the buffer reader - if (auto const error_code = m_file_reader->try_seek_from_begin(pos); - error_code != ErrorCode_Success) - { - return error_code; - } - m_buffer_reader = BufferReader{m_buffer.data(), 0}; - m_buffer_begin_pos = pos; + if (ErrorCode_Truncated == error_code && !m_checkpoint_pos.has_value()) { + auto const seek_error = m_file_reader->try_seek_from_begin(pos); + if (seek_error != ErrorCode_Success) { + return seek_error; + } + m_buffer_reader = BufferReader{m_buffer.data(), 0}; + m_buffer_begin_pos = pos; + return ErrorCode_Success;

Committable suggestion was skipped due to low confidence.

I don't think this applies to the original code snippet.

@haiqi96, understood. I appreciate your feedback.

(つ﹏⊂)

✏️ Learnings added

Learnt from: haiqi96 PR: y-scope/clp#523 File: components/core/src/clp/BufferedFileReader.cpp:96-106 Timestamp: 2024-10-24T14:45:26.095Z Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2024-10-24T13:05:33Z

components/core/src/clp/BufferedFileReader.hpp

     * @return ErrorCode_EndOfFile on EOF
+     * @return Same as BufferReader::try_read if it fails


⚠️ Potential issue

Update documentation to reflect dependency changes.

The @return statement references BufferReader::try_read, but since the class now utilizes ReaderInterface, consider updating this to reference ReaderInterface::try_read.

Apply this change to correct the documentation:

- * @return Same as BufferReader::try_read if it fails + * @return Same as ReaderInterface::try_read if it fails

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

* @return Same as BufferReader::try_read if it fails

* @return Same as ReaderInterface::try_read if it fails

the suggestion is incorrect, the function explicitly calls BufferReader::try_read

@haiqi96, thank you for the correction. You're right; since the function explicitly calls BufferReader::try_read, the documentation should reference BufferReader::try_read. I apologize for the mistake.

🐰😔

✏️ Learnings added

Learnt from: haiqi96 PR: y-scope/clp#523 File: components/core/src/clp/BufferedFileReader.hpp:158-158 Timestamp: 2024-10-24T14:46:00.490Z Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

components/core/src/clp/BufferedFileReader.cpp (1)
40-43: Consider initializing m_highest_read_pos to m_pos for consistency

While the current initialization works, setting m_highest_read_pos to m_pos would be more consistent with the logical relationship between these variables.
 m_pos = m_reader->get_pos();
 m_buffer_begin_pos = 0;
 m_buffer_reader = BufferReader{m_buffer.data(), 0};
-m_highest_read_pos = m_pos;
+m_highest_read_pos = m_pos;
components/core/src/clp/BufferedFileReader.hpp (2)

74-74: Add description for 'reader_interface' parameter

The parameter reader_interface lacks a description in the constructor's documentation. Please add a brief explanation to clarify its purpose.

189-189: Add description for 'num_bytes_to_refill' parameter

The parameter num_bytes_to_refill lacks a description in the function's documentation. Providing a brief description will enhance clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 74f57bc and 3d843ec.

📒 Files selected for processing (2)

components/core/src/clp/BufferedFileReader.cpp (7 hunks)
components/core/src/clp/BufferedFileReader.hpp (8 hunks)

🧰 Additional context used

📓 Learnings (1)

components/core/src/clp/BufferedFileReader.hpp (1)

Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.664Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.

🔇 Additional comments (6)

components/core/src/clp/BufferedFileReader.cpp (5)
3-18: LGTM: Header includes and namespace declarations are appropriate

The added includes and using declarations properly support the refactoring to use ReaderInterface and modern C++ features.

21-39: LGTM: Constructor properly implements RAII pattern

The constructor correctly takes ownership of the ReaderInterface and validates parameters. The null check for reader_interface enhances safety.

138-159: LGTM: try_read implementation handles partial reads correctly

The implementation properly handles:

Partial reads from the buffer

Buffer refills when needed

End of file conditions

252-265: LGTM: Helper methods maintain buffer and position state correctly

Both drop_content_before_current_pos and update_pos properly maintain the buffer state and position tracking.

Line range hint 87-127: Verify seek behaviour with checkpoints

The seek implementation handles checkpoints correctly, but we should verify the behaviour when:

Seeking beyond the current buffer with an active checkpoint

Seeking to positions that require buffer resizing

✅ Verification successful

Let me try a different search strategy to verify the seek behavior with checkpoints.

This should help us find relevant test files and test cases that verify the seek behavior with checkpoints. Let's analyze the test coverage for these scenarios.

Let me try one more search strategy to locate the test files, adjusting the paths and patterns.

This broader search should help us locate the test files and relevant test cases regardless of their location in the repository.

Seek behaviour with checkpoints is well-tested and verified

The test file test-BufferedFileReader.cpp contains comprehensive test coverage for seek behaviour with checkpoints, specifically testing:

Seeking beyond current buffer with active checkpoint:

Tests verify seeking between checkpoint and current read head

Tests confirm seeking beyond highest file position

Validates buffer refilling when seeking past buffer end

Seeking with buffer resizing:

Tests include scenarios with large reads (base_buffer_size + 4)

Verifies seeking with different buffer states and sizes

Tests boundary conditions around checkpoint positions

The implementation correctly handles all edge cases:

Prevents seeking before checkpoint

Properly manages buffer refilling

Maintains position integrity during seeks

Handles checkpoint clearing correctly
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for test cases that verify seek behaviour with checkpoints
rg -l "set_checkpoint.*try_seek_from_begin" --type cpp
Length of output: 56

Script:
#!/bin/bash
# Search for test files related to BufferedFileReader
fd "BufferedFileReader.*test" --type f

# Search for checkpoint and seek related test cases with more context
rg "TEST.*BufferedFileReader.*seek" -A 10 --type cpp
rg "checkpoint" --type cpp components/core/test
Length of output: 204

Script:
#!/bin/bash
# Find all test files in the codebase
fd "test.*\.cpp$"

# Search for BufferedFileReader in test files
rg "BufferedFileReader" -A 5 --type cpp $(fd "test.*\.cpp$")

# Look for seek and checkpoint related tests
rg "TEST.*seek|checkpoint" -A 5 --type cpp $(fd "test.*\.cpp$")
Length of output: 18229
components/core/src/clp/BufferedFileReader.hpp (1)

18-21: LGTM: Code changes are well-implemented

The modifications and refactoring in these sections appear correct and enhance the codebase. Good use of modern C++ features and adherence to best practices.

Also applies to: 29-29, 78-81, 83-84, 86-87, 98-98, 146-146, 158-158, 172-172, 208-208, 211-211, 217-218, 223-223

haiqi96 added 30 commits July 26, 2024 16:09

first batch of change

edbcc6d

Remove dictionary utils

21b4471

fix exception issues and make sure unit tests and clg can build

3a3c9ba

Linter

2c40e29

Fixes

d503436

Rearrange code

1ff5ea8

fix clo build

8a7ddc7

fix

70325a9

Fix make-dictionaries-readable

99d45a6

fix constructors

b086953

refactor

746834f

Update docstrings and comments. Also updated destructor

d2a6836

Initial change

7d347c1

Linter fix

63f0cfd

Merge branch 'main' into BufferedReaderRefactor

4a0a323

Factor out duplicated function

7192022

Add sysfile Reader

f6b6757

Update BufferedFileReader to use move instead of reference

712d10b

FileDescriptor update

80532ea

Merge branch 'main' into FileReaderRefactor

d30030a

Revert unintended changes

4dabc96

fix

a8d0747

Missing fix

ff0b9d8

Fixes

b3f2795

Revert unintended changes

4d0d80b

Fix one weird issue

70c8342

small fix

83ee77e

Merge branch 'FileReaderRefactor' into BufferedReaderRefactor

9601fa8

Improvement

59db88b

Merge branch 'main' into BufferedReaderRefactor

450632a

haiqi96 and others added 5 commits September 9, 2024 10:46

Apply suggestions from code review

1bbf2a4

Co-authored-by: kirkrodrigues <[email protected]>

small fixes

1260043

allow BufferReader to support null initialization

aaa8c64

A few more polishing

c147b41

A few more polishing again

02b0d10

haiqi96 requested a review from kirkrodrigues September 9, 2024 15:50

kirkrodrigues requested changes Sep 11, 2024

View reviewed changes

haiqi96 and others added 4 commits September 11, 2024 10:28

Apply suggestions from code review

810a77b

Co-authored-by: kirkrodrigues <[email protected]>

update unit-test

654eb35

Merge branch 'BufferedReaderRefactor' of https://github.com/haiqi96/c…

c176c67

…lp_fork into BufferedReaderRefactor

linter

c2b6333

haiqi96 requested a review from kirkrodrigues September 16, 2024 19:31

Code review suggestions

e4bbd30

coderabbitai bot reviewed Oct 8, 2024

View reviewed changes

Apply suggestions from code review

7fc448a

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Oct 8, 2024

View reviewed changes

haiqi96 added 4 commits October 7, 2024 22:53

update code based on comments

4e5d3ff

fix

676dc3c

Update unit test

dfb1211

linter

38076be

coderabbitai bot mentioned this pull request Oct 8, 2024

Handle exceptions from std::filesystem::canonical in FileCompressor.cpp #553

Open

coderabbitai bot reviewed Oct 8, 2024

View reviewed changes

haiqi96 and others added 4 commits October 7, 2024 23:30

Remove unnecessary function

a4eb75f

Missing docstrings

17e8cdb

Other docstrings fixes

8cc9e89

Merge branch 'main' into BufferedReaderRefactor

74f57bc

coderabbitai bot reviewed Oct 24, 2024

View reviewed changes

variable name refactor

3d843ec

coderabbitai bot reviewed Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

haiqi96 commented Aug 22, 2024 •

edited by coderabbitai bot

Loading

kirkrodrigues Sep 11, 2024

haiqi96 Sep 11, 2024

kirkrodrigues Sep 11, 2024

haiqi96 Oct 8, 2024

kirkrodrigues Sep 11, 2024

coderabbitai bot commented Oct 8, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 8, 2024

coderabbitai bot Oct 8, 2024 •

edited

Loading

haiqi96 Oct 8, 2024

coderabbitai bot Oct 8, 2024

haiqi96 Oct 8, 2024

coderabbitai bot Oct 8, 2024

coderabbitai bot Oct 8, 2024 •

edited

Loading

haiqi96 Oct 8, 2024

coderabbitai bot Oct 8, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Oct 24, 2024 •

edited

Loading

haiqi96 Oct 24, 2024

coderabbitai bot Oct 24, 2024

coderabbitai bot Oct 24, 2024 •

edited

Loading

haiqi96 Oct 24, 2024

coderabbitai bot Oct 24, 2024

coderabbitai bot Oct 24, 2024 •

edited

Loading

haiqi96 Oct 24, 2024

coderabbitai bot Oct 24, 2024

coderabbitai bot left a comment

		* @return ErrorCode_EndOfFile on EOF
		* @return Same as BufferReader::try_read if it fails

	* @return Same as BufferReader::try_read if it fails
	* @return Same as ReaderInterface::try_read if it fails

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

Are you sure you want to change the base?

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

Conversation

haiqi96 commented Aug 22, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot commented Oct 8, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 8, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 8, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 24, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

haiqi96 commented Aug 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 8, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Oct 8, 2024 •

edited

Loading

coderabbitai bot Oct 8, 2024 •

edited

Loading

coderabbitai bot Oct 24, 2024 •

edited

Loading

coderabbitai bot Oct 24, 2024 •

edited

Loading

coderabbitai bot Oct 24, 2024 •

edited

Loading