-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523
base: main
Are you sure you want to change the base?
Conversation
m_buffer_begin_pos = 0; | ||
m_buffer_reader.emplace(m_buffer.data(), 0); | ||
m_buffer_reader = BufferReader{m_buffer.data(), 0}; | ||
m_highest_read_pos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be set to m_file_reader->get_pos()
, i.e., the highest position we've read to in the underlying file? Otherwise, I think if we set a checkpoint on a file that has already had some data read, the checkpoint position will be wrong (0); and if we then clear that checkpoint without reading or seeking any data, we'll cause the underlying file to seek backwards.
If we make the suggested change, then it probably makes sense that m_file_pos
should be set to m_highest_read_pos
(the end result is equivalent, but the intention of the code would be clearer).
Can we also add a test for setting and clearing a checkpoint without reading any data beforehand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. We should set both m_buffer_begin_pos
and m_highest_read_pos
to be m_file_reader->get_pos()
Imo, it would be more clear to assign m_file_pos
first, and let m_buffer_begin_pos
and m_highest_read_pos
to be m_file_pos
though
@@ -185,7 +171,6 @@ class BufferedFileReader : public ReaderInterface { | |||
* @param buf | |||
* @param num_bytes_to_read The number of bytes to try and read | |||
* @param num_bytes_read The actual number of bytes read | |||
* @return ErrorCode_NotInit if the file is not open | |||
* @return ErrorCode_BadParam if buf is null | |||
* @return ErrorCode_errno on error reading from the underlying file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think this is a case anymore, right?
- Perhaps we should rewrite the return statements using the "Same as xyz" method we use elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have fixed all?
boost::filesystem::remove(test_file_path); | ||
} | ||
|
||
TEST_CASE("Test reading with beginning offset", "[BufferedFileReader]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this test case, I think you could modify the original test to advance both the reference buffer and the FileDescriptorReader by some amount X before running the tests. Then you could use Catch2's GENERATE(0, N)
(where N is some arbitrary value) to generate tests for the cases where the underlying file hasn't had any content read and where it has had content read.
Co-authored-by: kirkrodrigues <[email protected]>
…lp_fork into BufferedReaderRefactor
WalkthroughThe pull request introduces substantial modifications to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (6)
components/core/src/clp/BufferReader.hpp (1)
26-27
: LGTM! Consider adding a comment about usage.The addition of a default constructor improves the flexibility of the
BufferReader
class. The use of constructor delegation is a good C++ practice.Consider adding a comment to clarify that this constructor creates an empty buffer, and users should ensure data is set before reading operations are performed. This can help prevent potential misuse.
+ // Creates an empty BufferReader. Ensure data is set before performing read operations. BufferReader() : BufferReader(nullptr, 0, 0) {}
components/core/src/clp/BufferedFileReader.hpp (4)
18-21
: Clarify documentation regarding 'ReaderInterface' and sequential accessThe documentation describes performing unordered reads from inputs that only support sequential access, like S3. It would be helpful to explain how
BufferedFileReader
enables unordered reads over sequential inputs and any limitations or requirements for the underlyingReaderInterface
.
98-99
: Enhance documentation of return valuesIn the
try_refill_buffer_if_empty()
method, the return values section mentions:
Same as refill_reader_buffer if it fails
ErrorCode_Success on success
Consider detailing the specific error codes that
refill_reader_buffer
might return. This would improve clarity and assist users in handling errors appropriately.
159-160
: Detail possible error codes in documentationFor the
try_read
method, the return values state:
Same as BufferReader::try_read if it fails
ErrorCode_Success on success
It would be beneficial to enumerate potential error codes from
BufferReader::try_read
, aiding developers in understanding and handling different failure scenarios.
219-219
: Consider renaming 'm_file_reader' to 'm_reader' for claritySince
m_file_reader
holds aReaderInterface
that may not necessarily represent a file, renaming it tom_reader
could better reflect its purpose and enhance code readability.Apply this diff to rename the variable:
- std::unique_ptr<ReaderInterface> m_file_reader; + std::unique_ptr<ReaderInterface> m_reader;components/core/src/clp/clp/FileCompressor.cpp (1)
303-303
: Update parameter name for clarityThe parameter
file_reader
is now of typeReaderInterface&
, which may cause confusion with the existingreader
parameters. Consider renaming it toreader
for consistency and clarity.Apply this diff to rename the parameter:
-bool FileCompressor::try_compressing_as_archive( ... - ReaderInterface& file_reader, + ReaderInterface& reader, ...And update all usages of
file_reader
toreader
within the function.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- components/core/src/clp/BufferReader.cpp (1 hunks)
- components/core/src/clp/BufferReader.hpp (1 hunks)
- components/core/src/clp/BufferedFileReader.cpp (8 hunks)
- components/core/src/clp/BufferedFileReader.hpp (6 hunks)
- components/core/src/clp/clp/FileCompressor.cpp (7 hunks)
- components/core/src/clp/clp/FileCompressor.hpp (2 hunks)
- components/core/tests/test-BufferedFileReader.cpp (2 hunks)
🧰 Additional context used
🪛 cppcheck
components/core/src/clp/clp/FileCompressor.cpp
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
🔇 Additional comments (12)
components/core/src/clp/BufferReader.hpp (2)
26-27
: Review implementation of ReaderInterface methods.With the addition of the default constructor, it's now possible to have a
BufferReader
with no data. Please review the implementation of theReaderInterface
methods (e.g.,try_read
,try_seek_from_begin
,try_get_pos
,try_read_to_delimiter
) to ensure they handle the case of an empty buffer correctly.To assist with this review, you can use the following script to locate the implementations of these methods:
#!/bin/bash # Description: Locate implementations of ReaderInterface methods # Test: Search for implementations of ReaderInterface methods rg --type cpp -A 5 $'auto BufferReader::(try_read|try_seek_from_begin|try_get_pos|try_read_to_delimiter)'
26-27
: Consider adding null checks in critical methods.With the introduction of the default constructor, it's now possible to have a
BufferReader
with a null internal buffer. To prevent potential null pointer dereferences, consider adding null checks in methods that directly access the buffer, such aspeek_buffer
andget_remaining_data_size
.To verify the current implementation, let's check these methods:
components/core/src/clp/BufferReader.cpp (1)
18-18
: Change correctly setsbuf
tonullptr
when no data remains.This update improves clarity by ensuring
buf
is set tonullptr
when there is no data left to peek, which aligns with expected behaviour.components/core/src/clp/clp/FileCompressor.hpp (2)
68-77
: The addition ofparse_and_encode
method is well-structuredThe new
parse_and_encode
method is appropriately defined with clear parameters, enhancing the flexibility and functionality of theFileCompressor
class.
115-117
: Ensure all callers oftry_compressing_as_archive
are updated with the new parameterThe method
try_compressing_as_archive
now includes an additional parameterReaderInterface& file_reader
. Please verify that all calls to this method throughout the codebase have been updated to include the new parameter to prevent any runtime errors or unexpected behaviour.Run the following script to identify all invocations of
try_compressing_as_archive
and check if they have been updated:✅ Verification successful
All calls to
try_compressing_as_archive
have been verified and are up to date with the newfile_reader
parameter.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all calls to 'try_compressing_as_archive' include the new 'file_reader' parameter. # Test: Search for function calls to 'try_compressing_as_archive'. Expect: All calls should pass the 'file_reader' argument. rg --type 'cpp' --pcre2 'try_compressing_as_archive\(' -A 2Length of output: 1846
components/core/src/clp/BufferedFileReader.hpp (2)
29-29
: Verify accuracy of 'file position' reference in documentationThe note states: "Unless otherwise noted, the 'file position' mentioned in docstrings is the position in the buffered file, not the position in the original input file." Given that
BufferedFileReader
wraps anotherReaderInterface
, please verify if this accurately reflects the intended meaning or if it should refer to the position in the original input data source.
87-87
: Confirm necessity of 'override' specifier on destructorThe destructor is declared as
~BufferedFileReader() override = default;
. Ensure that the base classReaderInterface
has a virtual destructor. If not, consider adding one to avoid undefined behaviour during object destruction.components/core/tests/test-BufferedFileReader.cpp (1)
280-285
: Good implementation of parameterized tests usingGENERATE
Utilizing
GENERATE(0, 127);
effectively tests theBufferedFileReader
starting from different offsets, enhancing test coverage and accounting for varied starting positions.components/core/src/clp/clp/FileCompressor.cpp (3)
5-5
: Include directives andusing
statements are appropriateThe added
#include
directives andusing
statements for<memory>
,<utility>
,std::make_unique
,std::move
, andstd::unique_ptr
are necessary for smart pointers and move semantics used in the code.Also applies to: 7-7, 37-38, 41-41
129-129
: Initialization ofbuffered_file_reader
usingmake_unique
The initialization of
buffered_file_reader
withmake_unique
aligns with modern C++ practices and ensures proper memory management.
393-402
: Confirm correct reader is passed toparse_and_encode
The function
parse_and_encode
is called withm_libarchive_file_reader
as the reader parameter. Ensure thatm_libarchive_file_reader
correctly implements theReaderInterface
and is properly initialized before use.Run the following script to verify the implementation of
m_libarchive_file_reader
:components/core/src/clp/BufferedFileReader.cpp (1)
Line range hint
203-230
: LGTM: Thetry_read_to_delimiter
function is correctly updatedThe modifications to
try_read_to_delimiter
appropriately handle error codes and update the file position accurately.
for (size_t i = 0; i < test_data.size(); ++i) { | ||
test_data[i] = static_cast<char>('a' + (std::rand() % (cNumAlphabets))); | ||
} | ||
|
||
// Write to test file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use modern random number generation instead of std::rand()
The use of std::rand()
is outdated and not recommended in modern C++. Consider using the <random>
header with std::mt19937
and std::uniform_int_distribution
for better randomness and test reproducibility.
Here's how you might refactor the code:
#include <random>
// Initialize random number generator
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> distrib('a', 'a' + cNumAlphabets - 1);
for (size_t i = 0; i < test_data.size(); ++i) {
test_data[i] = static_cast<char>(distrib(gen));
}
@@ -116,15 +121,15 @@ bool FileCompressor::compress_file( | |||
streaming_archive::writer::Archive& archive_writer, | |||
bool use_heuristic | |||
) { | |||
std::string file_name = std::filesystem::canonical(file_to_compress.get_path()).string(); | |||
string file_name = std::filesystem::canonical(file_to_compress.get_path()).string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential exceptions from std::filesystem::canonical
The use of std::filesystem::canonical
may throw exceptions if the file path does not exist or there are permission issues. To prevent the application from terminating unexpectedly, consider adding exception handling around this call.
You can modify the code as follows to handle potential exceptions:
string file_name;
try {
file_name = std::filesystem::canonical(file_to_compress.get_path()).string();
} catch (const std::filesystem::filesystem_error& e) {
SPDLOG_ERROR(
"Failed to get canonical path for {}: {}",
file_to_compress.get_path(),
e.what()
);
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to have since I ran into this a few times. but do we want to have the fix in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haiqi96, understood. We can address this in a future PR. Would you like me to open a new GitHub issue to track this enhancement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haiqi96, I've created a new GitHub issue to track this enhancement: Issue #553.
🔗 Analysis chain
@haiqi96, I've opened a new GitHub issue to track this enhancement.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Create a new GitHub issue to handle exceptions from std::filesystem::canonical
gh issue create \
--title "Handle exceptions from std::filesystem::canonical in FileCompressor.cpp" \
--body "As discussed in [PR #523](https://github.com/y-scope/clp/pull/523) and [this comment](https://github.com/y-scope/clp/pull/523#discussion_r1791033838), we should handle potential exceptions from \`std::filesystem::canonical\` in \`FileCompressor.cpp\`.
Requester: @haiqi96" \
--assignee haiqi96
Length of output: 438
auto read_into_buffer( | ||
ReaderInterface* reader, | ||
char* buf, | ||
size_t num_bytes_to_read, | ||
size_t& num_bytes_read | ||
) -> ErrorCode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unnecessary forward declaration of read_into_buffer
to simplify code
Since the implementation of read_into_buffer
follows immediately and there are no calls to it before its definition, the forward declaration within the anonymous namespace is unnecessary.
You can safely remove the forward declaration at lines 32–37.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kirkrodrigues I believe our coding standard is to keep the forward declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/BufferedFileReader.cpp (3)
32-50
: LGTM: Refactored read_into_buffer to use ReaderInterfaceThe changes align well with the PR objective of refactoring to use a ReaderInterface. The implementation correctly uses reader->try_read instead of direct file descriptor operations.
However, consider simplifying the error handling:
auto read_into_buffer( ReaderInterface* reader, char* buf, size_t num_bytes_to_read, size_t& num_bytes_read ) -> ErrorCode { num_bytes_read = 0; auto const error_code = reader->try_read(buf, num_bytes_to_read, num_bytes_read); - if (0 == num_bytes_read) { - return ErrorCode_EndOfFile; - } - return error_code; + return (0 == num_bytes_read) ? ErrorCode_EndOfFile : error_code; }This change maintains the same logic but reduces nesting and improves readability.
126-131
: LGTM: Updated seek operation to use ReaderInterfaceThe changes align well with the PR objective of using ReaderInterface. The error handling for the seek operation is more robust, checking for specific error codes.
Consider simplifying the error handling:
- if (auto const error_code = m_file_reader->try_seek_from_begin(pos); - error_code != ErrorCode_Success) - { - return error_code; - } + if (auto const error_code = m_file_reader->try_seek_from_begin(pos); error_code != ErrorCode_Success) + return error_code;This change reduces nesting and improves readability while maintaining the same logic.
163-187
: LGTM: Improved try_read method with std::span and restructured loopThe use of std::span for buffer management improves safety and clarity. The restructured loop correctly handles reading from the BufferReader and refilling when necessary, aligning well with the overall refactoring objectives.
Consider simplifying the error handling in the loop:
} else if (ErrorCode_EndOfFile == error_code) { error_code = refill_reader_buffer(m_base_buffer_size); - if (ErrorCode_EndOfFile == error_code) { - break; - } - if (ErrorCode_Success != error_code) { + if (ErrorCode_Success != error_code && ErrorCode_EndOfFile != error_code) { return error_code; } + if (ErrorCode_EndOfFile == error_code) { + break; + } } else { return error_code; }This change reduces nesting and improves readability while maintaining the same logic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- components/core/src/clp/BufferReader.cpp (1 hunks)
- components/core/src/clp/BufferedFileReader.cpp (8 hunks)
🧰 Additional context used
🔇 Additional comments (12)
components/core/src/clp/BufferReader.cpp (5)
7-16
: Excellent improvements to the constructor!The changes to the constructor are well-implemented:
- The use of a member initializer list is a good C++ practice that can improve performance.
- The nullptr check when data_size or pos is non-zero is correct and prevents potential issues.
- The additional check
pos > data_size
addresses the past review comment and prevents invalid memory access.These changes enhance the robustness and safety of the BufferReader class.
23-23
: Good clarification in peek_buffer methodThe change to set
buf
tonullptr
whenpeek_size
is zero is a good improvement. It clearly indicates that there's no data to peek, which enhances the method's clarity and prevents potential misuse of an invalid pointer.
Line range hint
27-59
: try_read_to_delimiter method remains robustThe try_read_to_delimiter method has not undergone significant changes. The existing implementation appears to be correct and efficient, handling the delimiter search, appending to the string, and managing the internal buffer position appropriately.
Line range hint
61-64
: Good defensive check in try_read methodThe addition of the nullptr check when
num_bytes_to_read > 0
is an excellent defensive programming practice. This check prevents potential issues with invalid memory access and improves the overall robustness of the method.
Line range hint
95-103
: Improved clarity in overloaded try_read_to_delimiterThe addition of
str.clear()
whenappend
is false is a good improvement. It ensures that the string is empty before appending when not in append mode, which enhances the method's correctness and clarity. This change aligns well with the expected behaviour of the method.components/core/src/clp/BufferedFileReader.cpp (7)
3-18
: LGTM: Appropriate includes and using declarations addedThe new includes and using declarations are suitable for the changes made in the file. They support the transition to using smart pointers (std::unique_ptr) and modern C++ features (std::span), which aligns with the refactoring objectives.
54-68
: LGTM: Constructor refactored to use ReaderInterfaceThe changes in the constructor align well with the PR objective. The use of std::unique_ptr instead of a file descriptor improves resource management and flexibility. The initialization of m_file_pos, m_buffer_reader, and m_highest_read_pos is correct and addresses previous review comments.
72-76
: LGTM: Updated buffer emptiness checkThe change to use m_buffer_reader.get_buffer_size() for checking buffer emptiness is consistent with the new BufferReader usage. The logic remains correct and aligns with the overall refactoring.
264-269
: LGTM: Updated refill_reader_buffer to use ReaderInterfaceThe changes in refill_reader_buffer align well with the PR objective of using ReaderInterface. The use of read_into_buffer with m_file_reader.get() correctly replaces the direct file descriptor operations. The buffer management logic has been appropriately updated to work with the new BufferReader, maintaining the same functionality as before.
281-289
: LGTM: Updated drop_content_before_current_pos to work with BufferReaderThe changes in drop_content_before_current_pos are consistent with the new BufferReader usage. The use of m_buffer_reader.get_pos() and m_buffer_reader.get_buffer_size() for calculations is correct. The buffer management logic has been appropriately updated to work with the new BufferReader, maintaining the same functionality as before.
Line range hint
291-294
: LGTM: Minor improvement in update_file_posThe update_file_pos method now uses std::max to update m_highest_read_pos, which is a minor improvement in clarity and correctness. The method maintains its original functionality while aligning with modern C++ practices.
Line range hint
1-295
: Overall: Excellent refactoring to use ReaderInterfaceThe changes in this file successfully achieve the PR objective of refactoring BufferedFileReader to use ReaderInterface instead of direct file descriptor operations. Key improvements include:
- Adoption of modern C++ features like std::span and std::unique_ptr, enhancing code safety and clarity.
- Consistent use of the new BufferReader interface throughout the file.
- Improved error handling in several methods.
- Removal of direct file descriptor operations, making the code more flexible and easier to test.
These changes significantly improve the code's flexibility, safety, and maintainability. The refactoring aligns well with the RAII model and modern C++ best practices.
While the changes are generally excellent, consider the minor suggestions for error handling improvements in previous comments to further enhance code readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
components/core/src/clp/BufferedFileReader.cpp (6)
32-50
: Approve changes to read_into_buffer with a suggestionThe refactoring of
read_into_buffer
to useReaderInterface
is a good improvement. It simplifies the function and makes it more flexible. However, consider enhancing the error handling to distinguish between different types of errors thattry_read
might return.Consider updating the error handling to be more specific:
auto error_code = reader->try_read(buf, num_bytes_to_read, num_bytes_read); -if (0 == num_bytes_read) { +if (error_code != ErrorCode_Success) { + return error_code; +} else if (0 == num_bytes_read) { return ErrorCode_EndOfFile; } -return error_code; +return ErrorCode_Success;This change ensures that any error from
try_read
is properly propagated.
54-76
: Approve constructor changes with a suggestionThe updated constructor properly implements the transition to using
ReaderInterface
. The null check forreader_interface
is a good addition for robustness.Consider initializing
m_highest_read_pos
in the initializer list for consistency:BufferedFileReader::BufferedFileReader( std::unique_ptr<ReaderInterface> reader_interface, size_t base_buffer_size ) - : m_file_reader(std::move(reader_interface)) { + : m_file_reader(std::move(reader_interface)), + m_highest_read_pos(0) { // ... rest of the constructor - m_highest_read_pos = m_file_pos; + m_highest_read_pos = m_file_pos; // Update after m_file_pos is set }This change improves code clarity and ensures
m_highest_read_pos
is always initialized.
Line range hint
89-112
: Approve checkpoint changes with a suggestionThe updates to
set_checkpoint
andclear_checkpoint
align well with the new buffering mechanism and improve the checkpoint functionality. The use ofm_highest_read_pos
inclear_checkpoint
is a good enhancement.In
set_checkpoint
, consider adding a comment explaining the condition for dropping content:if (m_checkpoint_pos.has_value() && m_checkpoint_pos < m_file_pos - && m_buffer_reader.get_buffer_size() != m_base_buffer_size) + && m_buffer_reader.get_buffer_size() != m_base_buffer_size) // Buffer not full { drop_content_before_current_pos(); }This comment would clarify the reasoning behind this condition, improving code maintainability.
Line range hint
121-160
: Approve try_seek_from_begin changes with a suggestionThe updates to
try_seek_from_begin
successfully implement the transition to usingReaderInterface
andBufferReader
. The error handling is more comprehensive, which is a good improvement.Consider extracting the logic for seeking beyond the current buffer into a separate private method to improve readability:
auto BufferedFileReader::try_seek_from_begin(size_t pos) -> ErrorCode { // ... existing checks ... auto error_code = m_buffer_reader.try_seek_from_begin(get_buffer_relative_pos(pos)); if (ErrorCode_Truncated == error_code) { - if (false == m_checkpoint_pos.has_value()) { - // ... existing logic ... - } else { - // ... existing logic ... - } + return seek_beyond_current_buffer(pos); } else if (ErrorCode_Success != error_code) { return error_code; } update_file_pos(pos); return ErrorCode_Success; } +auto BufferedFileReader::seek_beyond_current_buffer(size_t pos) -> ErrorCode { + if (false == m_checkpoint_pos.has_value()) { + // ... existing logic ... + } else { + // ... existing logic ... + } +}This refactoring would make the main method easier to read and understand.
Line range hint
162-196
: Approve try_read changes with a suggestionThe refactoring of
try_read
is a significant improvement. The use ofstd::span
and the simplified loop structure enhance readability and safety. The error handling is well-aligned with the new design.Consider adding a comment to explain the logic for determining end-of-file condition:
if (dst_view.size() == num_bytes_to_read) { + // If no bytes were read, we've reached the end of the file return ErrorCode_EndOfFile; } return ErrorCode_Success;
This comment would clarify the reasoning behind this check, improving code maintainability.
Line range hint
241-286
: Approve refill_reader_buffer changes with a suggestionThe refactoring of
refill_reader_buffer
successfully implements the transition to usingReaderInterface
. The buffer management logic has been updated to handle checkpoints more efficiently, which is a good improvement.Consider extracting the buffer size calculation logic into a separate private method to improve readability:
+auto BufferedFileReader::calculate_buffer_size(size_t num_bytes_to_refill, size_t data_size) -> size_t { + if (m_checkpoint_pos.has_value()) { + return int_round_up_to_multiple( + get_buffer_end_pos() + num_bytes_to_refill, + m_base_buffer_size + ); + } else { + return m_base_buffer_size - (get_buffer_end_pos() % m_base_buffer_size); + } +} auto BufferedFileReader::refill_reader_buffer(size_t num_bytes_to_refill) -> ErrorCode { // ... existing code ... - size_t num_bytes_to_read{0}; - if (m_checkpoint_pos.has_value()) { - num_bytes_to_read = int_round_up_to_multiple( - buffer_end_pos + num_bytes_to_refill, - m_base_buffer_size - ); - // ... existing code ... - } else { - num_bytes_to_read = m_base_buffer_size - (buffer_end_pos % m_base_buffer_size); - // ... existing code ... - } + size_t num_bytes_to_read = calculate_buffer_size(num_bytes_to_refill, data_size); // ... rest of the method ... }This refactoring would make the main method easier to read and understand by separating the buffer size calculation logic.
components/core/tests/test-BufferedFileReader.cpp (1)
286-300
: Excellent addition of non-zero position reading test.The new test case effectively addresses the PR objective of testing BufferedFileReader with a non-zero position reader. The use of GENERATE improves test coverage by testing multiple starting positions.
One suggestion for improvement:
Consider adding a comment explaining the significance of the chosen values (0 and 127) for
reader_begin_offset
. This would enhance the test's maintainability and clarify the reasoning behind these specific values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- components/core/src/clp/BufferReader.cpp (1 hunks)
- components/core/src/clp/BufferedFileReader.cpp (8 hunks)
- components/core/tests/test-BufferedFileReader.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/BufferReader.cpp
🧰 Additional context used
🔇 Additional comments (12)
components/core/src/clp/BufferedFileReader.cpp (5)
3-18
: LGTM: Updated includes and using declarationsThe new includes and using declarations are appropriate for the refactoring changes. They indicate a shift towards using more standard C++ components, which is a good practice.
79-84
: LGTM: Simplified try_refill_buffer_if_emptyThe simplification of
try_refill_buffer_if_empty
improves readability and aligns well with the new design usingBufferReader
. The change to usem_buffer_reader.get_buffer_size()
is more direct and efficient.
Line range hint
198-239
: LGTM: Improved try_read_to_delimiter implementationThe refactoring of
try_read_to_delimiter
is well done. The use ofBufferReader
for reading to a delimiter simplifies the implementation. The error handling is more comprehensive, and the logic for handling end-of-file conditions has been improved. These changes align well with the overall refactoring goals and enhance the robustness of the method.
Line range hint
288-305
: LGTM: Improved buffer management and position trackingThe changes to
drop_content_before_current_pos
andupdate_file_pos
are well-implemented. The use ofBufferReader
methods for position and size calculations indrop_content_before_current_pos
aligns with the new design and simplifies the buffer management logic. The update tom_highest_read_pos
inupdate_file_pos
is a good addition for maintaining the highest read position, which should improve the overall functionality of the class.
Line range hint
1-305
: Overall assessment: Significant improvements with minor suggestionsThe refactoring of
BufferedFileReader.cpp
to useReaderInterface
andBufferReader
is a substantial improvement. These changes enhance the flexibility and abstraction of the file reading operations, which should make the code more maintainable and easier to extend in the future.Key improvements include:
- Consistent use of
ReaderInterface
throughout the class.- Enhanced error handling and buffer management.
- Addition of
m_highest_read_pos
for better position tracking.- Use of modern C++ features like
std::span
.While the changes are generally well-implemented, consider the following suggestions for further improvement:
- Extract complex logic in methods like
try_seek_from_begin
andrefill_reader_buffer
into separate private methods for better readability.- Add more comments to explain complex conditions or logic, especially in checkpoint-related operations.
- Consider further optimizing buffer management to reduce memory allocations where possible.
Overall, this refactoring significantly improves the
BufferedFileReader
class, making it more robust and flexible.components/core/tests/test-BufferedFileReader.cpp (7)
7-7
: Inclusion of FileDescriptorReader header is appropriate.The addition of the FileDescriptorReader header is necessary for the changes made in the test cases and is correctly placed in the include section.
12-20
: Appropriate updates to using statements and namespace usage.The additions of using statements for ErrorCode, FileDescriptorReader, and make_unique are consistent with the changes in the test cases. The use of
string
instead ofstd::string
improves code readability.
43-43
: Correct implementation of BufferedFileReader with FileDescriptorReader.The initialization of BufferedFileReader with a FileDescriptorReader aligns with the PR objective and uses modern C++ practices with
make_unique
.
265-268
: Excellent update to modern random number generation.The implementation of modern random number generation using the library is a significant improvement. It addresses the previous suggestion and enhances the quality and reproducibility of the tests.
276-276
: Consistent use of modern random number generation for test data.The test data generation now correctly uses the new random number generator, improving the randomness and quality of the test data.
305-305
: Consistent use of modern random number generation for delimiter selection.The delimiter selection now correctly uses the new random number generator, improving the randomness of the test cases.
Line range hint
1-315
: Overall excellent refactoring and test improvements.The changes in this file successfully implement the PR objectives:
- BufferedFileReader now uses FileDescriptorReader, aligning with the goal of accepting any arbitrary reader interface.
- Test coverage has been improved, particularly for non-zero position reading scenarios.
- The use of modern C++ practices, especially in random number generation, enhances the overall quality and reliability of the tests.
These changes contribute significantly to the flexibility and robustness of the BufferedFileReader class and its testing suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
components/core/src/clp/clp/FileCompressor.hpp (1)
58-67
: Enhance method documentation with parameter descriptions.The documentation should describe each parameter's purpose and any constraints or requirements.
Example improvement:
/** * Parses and encodes content from the given reader into the given archive_writer. * @param target_data_size_of_dicts Target size for dictionaries in bytes * @param archive_user_config User-specific archive configuration * @param target_encoded_file_size Target size for encoded file in bytes * @param path_for_compression Path of the file being compressed * @param group_id Group identifier for the content * @param archive_writer Destination archive for encoded content * @param reader Source interface for reading content * @param use_heuristic Whether to use heuristic-based parsing * @throws std::runtime_error If parsing or encoding fails */components/core/src/clp/BufferedFileReader.cpp (1)
239-243
: Consider adding documentation for buffer management strategyWhile the implementation is correct, the complex buffer management strategy would benefit from detailed documentation explaining the relationship between checkpoints and buffer sizing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- components/core/src/clp/BufferedFileReader.cpp (7 hunks)
- components/core/src/clp/BufferedFileReader.hpp (9 hunks)
- components/core/src/clp/clp/FileCompressor.hpp (3 hunks)
🔇 Additional comments (15)
components/core/src/clp/clp/FileCompressor.hpp (1)
Line range hint
106-117
: Consider enhancing error reporting mechanism.While the boolean return type indicates success/failure, it might not provide enough context about failure reasons. Consider using std::expected (C++23) or a Result type pattern for more detailed error reporting.
Example approach:
enum class CompressionError { InvalidFormat, ReadError, WriteError // ... other specific error cases }; auto try_compressing_as_archive( // ... existing parameters ... ) -> std::expected<void, CompressionError>; // C++23Also, please complete the documentation for the new parameters as previously requested in the past review.
components/core/src/clp/BufferedFileReader.cpp (5)
3-18
: LGTM: Modern C++ includes and using declarationsThe added includes and using declarations appropriately support the refactoring to use modern C++ features like std::span for safer buffer management and std::unique_ptr for RAII-compliant resource management.
21-43
: LGTM: Constructor properly implements RAII patternThe constructor correctly:
- Validates the reader interface and buffer size
- Initializes file position from the reader interface
- Sets up the buffer reader with appropriate initial state
46-51
: LGTM: Simplified buffer refill checkThe method efficiently checks the buffer state before attempting a refill, with proper error code handling.
138-161
: LGTM: Robust implementation with spanThe implementation correctly:
- Uses span for safe buffer management
- Handles partial reads appropriately
- Implements proper error handling as discussed in previous reviews
255-263
: LGTM: Clean buffer management implementationThe method correctly manages buffer content and follows the previous review feedback about removing unnecessary casts.
components/core/src/clp/BufferedFileReader.hpp (9)
4-4
: Including<cstddef>
for standard definitions is appropriate.The inclusion of
<cstddef>
provides definitions for types likesize_t
, which are used throughout the class.
8-8
: Including<utility>
forstd::move
usage is necessary.Including
<utility>
is required for the use ofstd::move
in constructors and other parts of the code.
18-21
: Updated class documentation accurately reflects new functionality.The changes to the class comments appropriately describe that
BufferedFileReader
now reads from anotherReaderInterface
, enhancing flexibility and supporting various data sources.
29-29
: Clarification on 'file position' improves understandability.Specifying that "file position" refers to the buffered file rather than the original input file helps prevent confusion.
74-81
: Constructor refactoring enhances flexibility and resource management.Changing the constructor to accept a
std::unique_ptr<ReaderInterface>
allowsBufferedFileReader
to wrap any reader, not just file descriptors, increasing the utility of the class.
83-84
: Delegating constructor simplifies initialization with default values.Using a delegating constructor to set the default buffer size reduces code duplication and enhances maintainability.
87-87
: Explicitly defaulted destructor ensures proper resource cleanup.Declaring the destructor as
override = default
clarifies that the default destruction behaviour is sufficient, ensuring resources are properly released.
172-172
: Verify accuracy of the@return
statement.The
@return
comment mentionsrefill_reader_buffer
. Ensure that this reference is still accurate after the refactoring, or update it to reflect the current method behaviour.
223-223
: Initialization ofm_buffer_reader
withoutstd::optional
simplifies usage.By directly instantiating
m_buffer_reader
, the code eliminates unnecessary optional wrapping, simplifying access. Ensure thatm_buffer_reader
is properly initialized in all constructors to prevent potential issues.
* Parses and encodes content from the given reader into the given archive_writer. | ||
* @param target_data_size_of_dicts | ||
* @param archive_user_config | ||
* @param target_encoded_file_size | ||
* @param path_for_compression | ||
* @param group_id | ||
* @param archive_writer | ||
* @param reader | ||
* @param use_heuristic | ||
*/ | ||
auto parse_and_encode( | ||
size_t target_data_size_of_dicts, | ||
streaming_archive::writer::Archive::UserConfig& archive_user_config, | ||
size_t target_encoded_file_size, | ||
std::string const& path_for_compression, | ||
group_id_t group_id, | ||
streaming_archive::writer::Archive& archive_writer, | ||
ReaderInterface& reader, | ||
bool use_heuristic | ||
) -> void; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider refactoring method signature to reduce parameter count.
The method has 8 parameters, which could make it difficult to maintain and use correctly. Consider grouping related parameters into a configuration object.
Example refactor:
struct ParseConfig {
size_t target_data_size_of_dicts;
size_t target_encoded_file_size;
std::string path_for_compression;
group_id_t group_id;
bool use_heuristic;
streaming_archive::writer::Archive::UserConfig& archive_user_config;
};
auto parse_and_encode(
ParseConfig const& config,
streaming_archive::writer::Archive& archive_writer,
ReaderInterface& reader
) -> void;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is legacy code refactor, so I would prefer not to extend the scope of this PR. Push this off to future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haiqi96, understood. We can address this in a future PR.
✏️ Learnings added
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.796Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
auto error_code = m_buffer_reader.try_seek_from_begin(get_buffer_relative_pos(pos)); | ||
if (ErrorCode_Truncated == error_code) { | ||
if (false == m_checkpoint_pos.has_value()) { | ||
// If checkpoint is not set, simply move the file_pos and invalidate | ||
// the buffer reader | ||
auto offset = lseek(m_fd, static_cast<off_t>(pos), SEEK_SET); | ||
if (-1 == offset) { | ||
return ErrorCode_errno; | ||
if (auto const error_code = m_file_reader->try_seek_from_begin(pos); | ||
error_code != ErrorCode_Success) | ||
{ | ||
return error_code; | ||
} | ||
m_buffer_reader.emplace(m_buffer.data(), 0); | ||
m_buffer_reader = BufferReader{m_buffer.data(), 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider simplifying error handling logic
The nested conditions and error handling could be made more readable by extracting the checkpoint-related logic into a separate method.
Consider refactoring like this:
- if (ErrorCode_Truncated == error_code) {
- if (false == m_checkpoint_pos.has_value()) {
- // If checkpoint is not set, simply move the file_pos and invalidate
- // the buffer reader
- if (auto const error_code = m_file_reader->try_seek_from_begin(pos);
- error_code != ErrorCode_Success)
- {
- return error_code;
- }
- m_buffer_reader = BufferReader{m_buffer.data(), 0};
- m_buffer_begin_pos = pos;
+ if (ErrorCode_Truncated == error_code && !m_checkpoint_pos.has_value()) {
+ auto const seek_error = m_file_reader->try_seek_from_begin(pos);
+ if (seek_error != ErrorCode_Success) {
+ return seek_error;
+ }
+ m_buffer_reader = BufferReader{m_buffer.data(), 0};
+ m_buffer_begin_pos = pos;
+ return ErrorCode_Success;
Committable suggestion was skipped due to low confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this applies to the original code snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haiqi96, understood. I appreciate your feedback.
(つ﹏⊂)
✏️ Learnings added
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.095Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
* @return ErrorCode_EndOfFile on EOF | ||
* @return Same as BufferReader::try_read if it fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update documentation to reflect dependency changes.
The @return
statement references BufferReader::try_read
, but since the class now utilizes ReaderInterface
, consider updating this to reference ReaderInterface::try_read
.
Apply this change to correct the documentation:
- * @return Same as BufferReader::try_read if it fails
+ * @return Same as ReaderInterface::try_read if it fails
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
* @return Same as BufferReader::try_read if it fails | |
* @return Same as ReaderInterface::try_read if it fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the suggestion is incorrect, the function explicitly calls BufferReader::try_read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haiqi96, thank you for the correction. You're right; since the function explicitly calls BufferReader::try_read
, the documentation should reference BufferReader::try_read
. I apologize for the mistake.
🐰😔
✏️ Learnings added
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.hpp:158-158
Timestamp: 2024-10-24T14:46:00.490Z
Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/BufferedFileReader.cpp (1)
40-43
: Consider initializing m_highest_read_pos to m_pos for consistencyWhile the current initialization works, setting m_highest_read_pos to m_pos would be more consistent with the logical relationship between these variables.
m_pos = m_reader->get_pos(); m_buffer_begin_pos = 0; m_buffer_reader = BufferReader{m_buffer.data(), 0}; -m_highest_read_pos = m_pos; +m_highest_read_pos = m_pos;components/core/src/clp/BufferedFileReader.hpp (2)
74-74
: Add description for 'reader_interface' parameterThe parameter
reader_interface
lacks a description in the constructor's documentation. Please add a brief explanation to clarify its purpose.
189-189
: Add description for 'num_bytes_to_refill' parameterThe parameter
num_bytes_to_refill
lacks a description in the function's documentation. Providing a brief description will enhance clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- components/core/src/clp/BufferedFileReader.cpp (7 hunks)
- components/core/src/clp/BufferedFileReader.hpp (8 hunks)
🧰 Additional context used
📓 Learnings (1)
components/core/src/clp/BufferedFileReader.hpp (1)
Learnt from: haiqi96 PR: y-scope/clp#523 File: components/core/src/clp/BufferedFileReader.hpp:158-158 Timestamp: 2024-10-24T14:46:00.664Z Learning: In `components/core/src/clp/BufferedFileReader.hpp`, the `BufferedFileReader::try_read` function explicitly calls `BufferReader::try_read`, so documentation should reference `BufferReader::try_read`.
🔇 Additional comments (6)
components/core/src/clp/BufferedFileReader.cpp (5)
3-18
: LGTM: Header includes and namespace declarations are appropriateThe added includes and using declarations properly support the refactoring to use ReaderInterface and modern C++ features.
21-39
: LGTM: Constructor properly implements RAII patternThe constructor correctly takes ownership of the ReaderInterface and validates parameters. The null check for reader_interface enhances safety.
138-159
: LGTM: try_read implementation handles partial reads correctlyThe implementation properly handles:
- Partial reads from the buffer
- Buffer refills when needed
- End of file conditions
252-265
: LGTM: Helper methods maintain buffer and position state correctlyBoth drop_content_before_current_pos and update_pos properly maintain the buffer state and position tracking.
Line range hint
87-127
: Verify seek behaviour with checkpointsThe seek implementation handles checkpoints correctly, but we should verify the behaviour when:
- Seeking beyond the current buffer with an active checkpoint
- Seeking to positions that require buffer resizing
✅ Verification successful
Let me try a different search strategy to verify the seek behavior with checkpoints.
This should help us find relevant test files and test cases that verify the seek behavior with checkpoints. Let's analyze the test coverage for these scenarios.
Let me try one more search strategy to locate the test files, adjusting the paths and patterns.
This broader search should help us locate the test files and relevant test cases regardless of their location in the repository.
Seek behaviour with checkpoints is well-tested and verified
The test file
test-BufferedFileReader.cpp
contains comprehensive test coverage for seek behaviour with checkpoints, specifically testing:
Seeking beyond current buffer with active checkpoint:
- Tests verify seeking between checkpoint and current read head
- Tests confirm seeking beyond highest file position
- Validates buffer refilling when seeking past buffer end
Seeking with buffer resizing:
- Tests include scenarios with large reads (base_buffer_size + 4)
- Verifies seeking with different buffer states and sizes
- Tests boundary conditions around checkpoint positions
The implementation correctly handles all edge cases:
- Prevents seeking before checkpoint
- Properly manages buffer refilling
- Maintains position integrity during seeks
- Handles checkpoint clearing correctly
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for test cases that verify seek behaviour with checkpoints rg -l "set_checkpoint.*try_seek_from_begin" --type cppLength of output: 56
Script:
#!/bin/bash # Search for test files related to BufferedFileReader fd "BufferedFileReader.*test" --type f # Search for checkpoint and seek related test cases with more context rg "TEST.*BufferedFileReader.*seek" -A 10 --type cpp rg "checkpoint" --type cpp components/core/testLength of output: 204
Script:
#!/bin/bash # Find all test files in the codebase fd "test.*\.cpp$" # Search for BufferedFileReader in test files rg "BufferedFileReader" -A 5 --type cpp $(fd "test.*\.cpp$") # Look for seek and checkpoint related tests rg "TEST.*seek|checkpoint" -A 5 --type cpp $(fd "test.*\.cpp$")Length of output: 18229
components/core/src/clp/BufferedFileReader.hpp (1)
18-21
: LGTM: Code changes are well-implementedThe modifications and refactoring in these sections appear correct and enhance the codebase. Good use of modern C++ features and adherence to best practices.
Also applies to: 29-29, 78-81, 83-84, 86-87, 98-98, 146-146, 158-158, 172-172, 208-208, 211-211, 217-218, 223-223
Description
The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:
Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.
Validation performed
1.Made sure exisitng Unit test passes.
2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos.
3. Compressed and decompressed a single file, verified the decompressed file matches original
Summary by CodeRabbit
New Features
BufferedFileReader
by transitioning to aReaderInterface
.FileDescriptorReader
.FileCompressor
for parsing and encoding content from a reader.Improvements
BufferedFileReader
.BufferReader
methods to handle edge cases more effectively.try_compressing_as_archive
method to allow flexible file reading during compression.Tests
BufferedFileReader
to utilizeReaderInterface
for improved file handling.