feat(clp-s): json to irv2 #657

AVMatthews · 2025-01-09T16:36:16Z

Description

This PR:

Exposes the JSON to IRV2 parsing to the user through the command line
Enables users to write the IRV2 format to a file.

Validation performed

Generated IR V2 format for all 5 JSON public datasets
ex) ./clp-s r elasticsearch_ir elasticsearch/

Summary by CodeRabbit

Release Notes

New Features
- Added a new command-line option for converting JSON files to Intermediate Representation (IR) format.
- Introduced advanced configuration options for JSON to IR conversion, including:
  - Output directory specification
  - Compression settings
  - Encoding type configuration
- New help message to guide users on the JSON to IR command usage.
Improvements
- Enhanced command-line interface with additional parsing capabilities.
- Improved error handling for new JSON conversion functionality.

coderabbitai · 2025-01-09T16:36:24Z

Walkthrough

The pull request introduces a new command-line option JsonToIr in the command-line parsing system, enabling the conversion of JSON files into an Intermediate Representation (IR) format. This implementation includes new command-line options for specifying input and output paths, configuring compression settings, and handling encoding types. Changes are made across multiple files in the components/core/src/clp_s/ directory, adding new methods, enumerations, and functions to support the JSON to IR conversion process.

Changes

File	Change Summary
`CommandLineArguments.cpp`	- Added new command handling for `JsonToIr` - Implemented new command-line options parsing - Added help message function for new command - Updated error handling for new command parameters
`CommandLineArguments.hpp`	- Added `JsonToIr` to `Command` enum - Introduced new getter methods for IR buffer size and encoding type - Added private member variables for encoding and buffer configuration
`JsonParser.hpp`	- Created new `JsonToIrParserOption` structure with parsing configuration options
`clp-s.cpp`	- Added template functions for serialization and buffer management - Implemented `generate_ir` function for JSON to IR conversion - Modified main function to support new `JsonToIr` command

Sequence Diagram

sequenceDiagram
    participant CLI as Command Line Interface
    participant Parser as CommandLineArguments
    participant Generator as JSON to IR Generator
    participant Serializer as Serializer

    CLI->>Parser: Parse JsonToIr command
    Parser->>Generator: Validate and prepare options
    Generator->>Serializer: Initialize serialization
    Serializer->>Generator: Process JSON files
    Generator-->>CLI: Return conversion status

Possibly Related PRs

fix(ffi): Disallow input MessagePack maps that contain non-string keys or array values that contain unsupported types. #570: Enhances serialization processes for MessagePack data structures, which may share similar serialization techniques with the new JSON to IR conversion feature.

Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

components/core/src/clp_s/CommandLineArguments.cpp (2)
758-764: Fix typographical errors in option descriptions.

There are minor typos in the option descriptions:

Line 758: "before ir generation fails" should be "before IR generation fails".

Line 764: "befroe" should be "before".

Apply this diff to correct the typos:
-                        "Maximum allowed size (B) for a single document before ir generation fails."
+                        "Maximum allowed size (B) for a single document before IR generation fails."
...
-                        "Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
+                        "Maximum allowed size (B) for an in-memory IR buffer before being written to file."
747-747: Consider renaming "Compression options" to "JSON to IR options".

For clarity, rename the header of the options from "Compression options" to "JSON to IR options", as these options are specific to the JsonToIr command.

Apply this diff to rename the options group:
-                po::options_description compression_options("Compression options");
+                po::options_description compression_options("JSON to IR options");
components/core/src/clp_s/CommandLineArguments.hpp (2)
29-30: Consider documenting the command character choice

The character 'r' for JsonToIr might not be immediately intuitive to users. Consider adding a comment explaining the rationale for this choice, or consider a more descriptive character if available.

202-203: Use named constants for magic numbers

Consider replacing the magic numbers with named constants to improve code readability and maintainability:

The encoding type value of 8 should be a named constant with documentation explaining its significance

The buffer size of 512MB could use a named constant similar to other size constants in the file
+    // Default encoding type for IR conversion
+    static constexpr int DEFAULT_ENCODING_TYPE = 8;
+    // Default maximum buffer size for IR conversion (512MB)
+    static constexpr size_t DEFAULT_MAX_IR_BUFFER_SIZE = 512ULL * 1024 * 1024;
-    int m_encoding_type{8};
-    size_t m_max_ir_buffer_size{512ULL * 1024 * 1024};
+    int m_encoding_type{DEFAULT_ENCODING_TYPE};
+    size_t m_max_ir_buffer_size{DEFAULT_MAX_IR_BUFFER_SIZE};

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d3b671 and 38229b5.

📒 Files selected for processing (4)

components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
components/core/src/clp_s/CommandLineArguments.hpp (4 hunks)
components/core/src/clp_s/JsonParser.hpp (1 hunks)
components/core/src/clp_s/clp-s.cpp (6 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

components/core/src/clp_s/JsonParser.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)

components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (4)

components/core/src/clp_s/clp-s.cpp (1)

191-191: Verify exception safety in functions to prevent throwing exceptions from noexcept functions.

Static analysis has flagged that an exception may be thrown in a function that should not throw exceptions. Please verify that serialize_msgpack_map does not throw exceptions, or ensure it is not declared with noexcept.

Run the following script to check if serialize_msgpack_map is declared noexcept:

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

components/core/src/clp_s/JsonParser.hpp (1)

55-62: Struct JsonToIrParserOption added successfully.

The new structure JsonToIrParserOption is well-defined and follows appropriate coding standards.

components/core/src/clp_s/CommandLineArguments.hpp (2)

69-72: LGTM! Modern C++ practices well applied

The getter methods follow modern C++ best practices with [[nodiscard]] attribute and trailing return types. The const-qualification ensures thread safety.

178-179: LGTM! Consistent with existing usage methods

The method declaration follows the established pattern of other usage printing methods in the class.

components/core/src/clp_s/clp-s.cpp

components/core/src/clp_s/CommandLineArguments.cpp

LinZhihao-723

Left some comments about using IR v2 APIs

LinZhihao-723 · 2025-01-09T17:03:48Z

components/core/src/clp_s/clp-s.cpp

+    for (auto& path : all_file_paths) {
+        bool success;
+        if (option.encoding == 4) {
+            success = run_serializer<int32_t>(option, path);


Technically we should only generate 4-byte encoded IR stream (despite we do support 8-byte encoding). Correct me if I'm wrong @kirkrodrigues

Confirmed with Kirk: we should only generate 4-byte encoding IR stream.

components/core/src/clp_s/clp-s.cpp

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

components/core/src/clp_s/clp-s.cpp (1)

181-272: Consider breaking down the function for better maintainability.

The function is quite long (91 lines) and handles multiple responsibilities. Consider breaking it down into smaller functions:

File handling setup

JSON processing

IR buffer management

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

components/core/src/clp_s/CommandLineArguments.cpp (2)

764-764: Fix typo in error message.

The error message contains a typo: "befroe" should be "before".

-                    "Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
+                    "Maximum allowed size (B) for an in memory IR buffer before being written to file."

828-861: Enhance error messages with valid value ranges.

The error messages could be more helpful by including the valid value ranges.

-                SPDLOG_ERROR(
-                        "Invalid encoding type specified; --encoding-type {}",
-                        m_encoding_type
-                );
+                SPDLOG_ERROR(
+                        "Invalid encoding type specified (must be 4 or 8); --encoding-type {}",
+                        m_encoding_type
+                );

-                SPDLOG_ERROR(
-                        "Invalid compression level specified; Compression level must be 1-9; "
-                        "--compression-level {}",
-                        m_compression_level
-                );
+                SPDLOG_ERROR(
+                        "Invalid compression level specified (must be between 1 and 9, where 1 is "
+                        "fastest/lowest compression and 9 is slowest/highest compression); "
+                        "--compression-level {}",
+                        m_compression_level
+                );

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38229b5 and 82367fd.

📒 Files selected for processing (2)

components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
components/core/src/clp_s/clp-s.cpp (6 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)

components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (6)

components/core/src/clp_s/clp-s.cpp (4)

57-61: LGTM! Well-structured implementation with proper error handling.

The function follows best practices with proper error handling, type safety checks, and coding guidelines.

Also applies to: 160-179

485-488: LGTM! Clean integration of the new command.

The implementation follows the established pattern for command handling and error management.

234-236: ⚠️ Potential issue

Use safer alternatives to reinterpret_cast.

The use of reinterpret_cast with const_cast is unsafe and could lead to alignment issues. Consider using std::bit_cast (C++20) or ensuring proper alignment.

Also applies to: 258-259

304-308: Use 4-byte encoding by default.

Based on the comment from @LinZhihao-723, we should only generate 4-byte encoded IR stream despite supporting 8-byte encoding.

Consider defaulting to 4-byte encoding and adding a comment explaining why:
-        if (option.encoding == 4) {
-            success = run_serializer<int32_t>(option, path);
-        } else {
-            success = run_serializer<int64_t>(option, path);
-        }
+        // We only generate 4-byte encoded IR stream for better compatibility
+        success = run_serializer<int32_t>(option, path);
components/core/src/clp_s/CommandLineArguments.cpp (2)

109-115: LGTM! Clear and consistent help message.

The help message follows the established format and provides clear information about the new command.

969-971: LGTM! Consistent usage message format.

The usage message follows the established format and provides clear information about the command syntax.

components/core/src/clp_s/clp-s.cpp

coderabbitai · 2025-01-15T20:08:17Z

components/core/src/clp_s/clp-s.cpp

+template <typename T>
+auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) {
+    auto result{Serializer<T>::create()};
+    if (result.has_error()) {
+        SPDLOG_ERROR("Failed to create Serializer");
+        return false;
+    }
+    auto& serializer{result.value()};
+    std::ifstream in_file;
+    in_file.open(path, std::ifstream::in);
+    if (false == in_file.is_open()) {
+        SPDLOG_ERROR("Failed to open input file: {}", path);
+        return false;
+    }
+    std::filesystem::path input_path{path};
+    std::string filename = input_path.filename().string();
+    std::string out_path = option.irs_dir + "/" + filename + ".ir";
+
+    clp_s::FileWriter out_file;
+    out_file.open(out_path, clp_s::FileWriter::OpenMode::CreateForWriting);
+    clp_s::ZstdCompressor zc;
+    try {
+        zc.open(out_file, option.compression_level);
+    } catch (clp_s::ZstdCompressor::OperationFailed& error) {
+        SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
+        in_file.close();
+        out_file.close();
+        return false;
+    }
+
+    std::string line = "";
+    size_t total_size = 0;
+
+    if (in_file.is_open()) {
+        while (getline(in_file, line)) {
+            try {
+                auto j_obj = nlohmann::json::parse(line);
+                if (false
+                    == unpack_and_serialize_msgpack_bytes(
+                            nlohmann::json::to_msgpack(j_obj),
+                            serializer
+                    ))
+                {
+                    SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);
+                    in_file.close();
+                    out_file.close();
+                    zc.close();
+                    return false;
+                }
+                auto bufferSize = serializer.get_ir_buf_view().size();
+                if (bufferSize >= option.max_ir_buffer_size) {
+                    total_size = total_size + bufferSize;
+                    zc.write(
+                            reinterpret_cast<char*>(
+                                    const_cast<int8_t*>(serializer.get_ir_buf_view().data())
+                            ),
+                            bufferSize
+                    );
+                    zc.flush();
+                    serializer.clear_ir_buf();
+                }
+            } catch (nlohmann::json::parse_error const& e) {
+                SPDLOG_ERROR("JSON parsing error: {}", e.what());
+                in_file.close();
+                out_file.close();
+                zc.close();
+                return false;
+            } catch (std::exception const& e) {
+                SPDLOG_ERROR("Error during serialization: {}", e.what());
+                in_file.close();
+                out_file.close();
+                zc.close();
+                return false;
+            }
+        }
+        total_size = total_size + serializer.get_ir_buf_view().size();
+        zc.write(
+                reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),
+                serializer.get_ir_buf_view().size()
+        );
+        std::vector<int8_t> ir_buf;
+        ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
+        zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());
+        zc.flush();
+        serializer.clear_ir_buf();
+        in_file.close();
+        zc.close();
+        out_file.close();
+    }
+
+    return true;
+}


🛠️ Refactor suggestion

Refactor error handling to reduce code duplication.

The error handling code for closing files and compressor is duplicated in multiple catch blocks. Consider extracting this into a helper function.

+ auto cleanup = [&]() { + in_file.close(); + out_file.close(); + zc.close(); + }; + try { zc.open(out_file, option.compression_level); } catch (clp_s::ZstdCompressor::OperationFailed& error) { SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what()); - in_file.close(); - out_file.close(); + cleanup(); return false; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

template <typename T>

auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) {

auto result{Serializer<T>::create()};

if (result.has_error()) {

SPDLOG_ERROR("Failed to create Serializer");

return false;

}

auto& serializer{result.value()};

std::ifstream in_file;

in_file.open(path, std::ifstream::in);

if (false == in_file.is_open()) {

SPDLOG_ERROR("Failed to open input file: {}", path);

return false;

}

std::filesystem::path input_path{path};

std::string filename = input_path.filename().string();

std::string out_path = option.irs_dir + "/" + filename + ".ir";

clp_s::FileWriter out_file;

out_file.open(out_path, clp_s::FileWriter::OpenMode::CreateForWriting);

clp_s::ZstdCompressor zc;

try {

zc.open(out_file, option.compression_level);

} catch (clp_s::ZstdCompressor::OperationFailed& error) {

SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());

in_file.close();

out_file.close();

return false;

}

std::string line = "";

size_t total_size = 0;

if (in_file.is_open()) {

while (getline(in_file, line)) {

try {

auto j_obj = nlohmann::json::parse(line);

if (false

== unpack_and_serialize_msgpack_bytes(

nlohmann::json::to_msgpack(j_obj),

serializer

))

{

SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);

in_file.close();

out_file.close();

zc.close();

return false;

}

auto bufferSize = serializer.get_ir_buf_view().size();

if (bufferSize >= option.max_ir_buffer_size) {

total_size = total_size + bufferSize;

zc.write(

reinterpret_cast<char*>(

const_cast<int8_t*>(serializer.get_ir_buf_view().data())

),

bufferSize

);

zc.flush();

serializer.clear_ir_buf();

}

} catch (nlohmann::json::parse_error const& e) {

SPDLOG_ERROR("JSON parsing error: {}", e.what());

in_file.close();

out_file.close();

zc.close();

return false;

} catch (std::exception const& e) {

SPDLOG_ERROR("Error during serialization: {}", e.what());

in_file.close();

out_file.close();

zc.close();

return false;

}

}

total_size = total_size + serializer.get_ir_buf_view().size();

zc.write(

reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),

serializer.get_ir_buf_view().size()

);

std::vector<int8_t> ir_buf;

ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);

zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());

zc.flush();

serializer.clear_ir_buf();

in_file.close();

zc.close();

out_file.close();

}

return true;

}

template <typename T>

auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) {

auto result{Serializer<T>::create()};

if (result.has_error()) {

SPDLOG_ERROR("Failed to create Serializer");

return false;

}

auto& serializer{result.value()};

std::ifstream in_file;

in_file.open(path, std::ifstream::in);

if (false == in_file.is_open()) {

SPDLOG_ERROR("Failed to open input file: {}", path);

return false;

}

std::filesystem::path input_path{path};

std::string filename = input_path.filename().string();

std::string out_path = option.irs_dir + "/" + filename + ".ir";

clp_s::FileWriter out_file;

out_file.open(out_path, clp_s::FileWriter::OpenMode::CreateForWriting);

clp_s::ZstdCompressor zc;

auto cleanup = [&]() {

in_file.close();

out_file.close();

zc.close();

};

try {

zc.open(out_file, option.compression_level);

} catch (clp_s::ZstdCompressor::OperationFailed& error) {

SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());

cleanup();

return false;

}

std::string line = "";

size_t total_size = 0;

if (in_file.is_open()) {

while (getline(in_file, line)) {

try {

auto j_obj = nlohmann::json::parse(line);

if (false

== unpack_and_serialize_msgpack_bytes(

nlohmann::json::to_msgpack(j_obj),

serializer

))

{

SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);

cleanup();

return false;

}

auto bufferSize = serializer.get_ir_buf_view().size();

if (bufferSize >= option.max_ir_buffer_size) {

total_size = total_size + bufferSize;

zc.write(

reinterpret_cast<char*>(

const_cast<int8_t*>(serializer.get_ir_buf_view().data())

),

bufferSize

);

zc.flush();

serializer.clear_ir_buf();

}

} catch (nlohmann::json::parse_error const& e) {

SPDLOG_ERROR("JSON parsing error: {}", e.what());

cleanup();

return false;

} catch (std::exception const& e) {

SPDLOG_ERROR("Error during serialization: {}", e.what());

cleanup();

return false;

}

}

total_size = total_size + serializer.get_ir_buf_view().size();

zc.write(

reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),

serializer.get_ir_buf_view().size()

);

std::vector<int8_t> ir_buf;

ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);

zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());

zc.flush();

serializer.clear_ir_buf();

cleanup();

}

return true;

}

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

components/core/src/clp_s/clp-s.cpp (1)
162-181: Enhance error message with more context.

The error message could be more specific by including the type of the msgpack object when it's not a MAP.
-            return false;
+            SPDLOG_ERROR("Expected msgpack object of type MAP, got type {}", msgpack_obj.type);
+            return false;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82367fd and 5fc76ed.

📒 Files selected for processing (4)

components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
components/core/src/clp_s/CommandLineArguments.hpp (4 hunks)
components/core/src/clp_s/JsonParser.hpp (1 hunks)
components/core/src/clp_s/clp-s.cpp (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

components/core/src/clp_s/JsonParser.hpp
components/core/src/clp_s/CommandLineArguments.hpp

🧰 Additional context used

📓 Path-based instructions (2)

components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)

components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (7)

components/core/src/clp_s/clp-s.cpp (5)

4-4: LGTM! Required headers are properly included.

The new includes are appropriate for the JSON to IR conversion functionality.

Also applies to: 17-18

488-491: LGTM! Command handling is consistent with existing patterns.

The JsonToIr command is properly integrated into the main function's command handling logic.

226-226: ⚠️ Potential issue

Security: Avoid logging raw JSON data.

Logging raw JSON data could expose sensitive information. Consider logging only the error details.
-                    SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);
+                    SPDLOG_ERROR("Failed to serialize msgpack bytes: {}", e.what());
Likely invalid or redundant comment.

183-274: 🛠️ Refactor suggestion

Refactor error handling to reduce code duplication.

The cleanup code (closing files and compressor) is duplicated across multiple catch blocks. Consider using RAII or a cleanup helper.
+    auto cleanup = [&]() {
+        in_file.close();
+        out_file.close();
+        zc.close();
+    };
+
     try {
         zc.open(out_file, option.compression_level);
     } catch (clp_s::ZstdCompressor::OperationFailed& error) {
         SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
-        in_file.close();
-        out_file.close();
+        cleanup();
         return false;
     }
Likely invalid or redundant comment.

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

306-310: Consider using 4-byte encoding exclusively.

As per the team's guidance, we should only generate 4-byte encoded IR stream despite supporting 8-byte encoding.

✅ Verification successful

Confirmed: 4-byte encoding should be the exclusive option

Remove the 8-byte encoding option and validation in CommandLineArguments.hpp/cpp

Update the default encoding type from 8 to 4 bytes

Simplify the serializer selection in clp-s.cpp to use only int32_t

No tests, configurations, or documentation currently reference 8-byte encoding, making this change safe to implement.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if there are any existing uses of 8-byte encoding
rg -l "encoding.*8" --type cpp
Length of output: 134

Script:
#!/bin/bash
# Check the content of the files and surrounding context
echo "=== CommandLineArguments.hpp ==="
rg -B 2 -A 2 "encoding.*8" components/core/src/clp_s/CommandLineArguments.hpp

echo -e "\n=== CommandLineArguments.cpp ==="
rg -B 2 -A 2 "encoding.*8" components/core/src/clp_s/CommandLineArguments.cpp

echo -e "\n=== Checking for actual usage of encoding=8 or similar patterns ==="
rg "encoding.*=.*8" --type cpp -B 2 -A 2
Length of output: 1130

Script:
#!/bin/bash
# Check for test files that might use encoding type
echo "=== Checking test files for encoding references ==="
rg -l "encoding.*[48]" --type cpp "*test*"

echo -e "\n=== Checking for configuration files ==="
fd -e json -e yaml -e yml -e conf | xargs rg -l "encoding.*[48]"

echo -e "\n=== Checking for documentation mentioning encoding ==="
fd -e md -e txt | xargs rg -l "encoding.*[48]"
Length of output: 542

Script:
#!/bin/bash
# Search test files correctly
echo "=== Checking test files for encoding references ==="
fd -t f "test" -e cpp -e hpp | xargs rg -l "encoding.*[48]"

echo -e "\n=== Checking for direct usage of encoding type ==="
rg -C 3 "option\.encoding.*[48]" --type cpp
Length of output: 825
components/core/src/clp_s/CommandLineArguments.cpp (2)

825-953: LGTM! Command-line options are well-defined with proper validation.

The implementation includes:

Clear error messages

Thorough validation of all parameters

Helpful usage examples

1061-1063: LGTM! Usage message is consistent with other commands.

The help text follows the same format as other command usage messages.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

components/core/src/clp_s/CommandLineArguments.cpp (1)
1068-1070: Enhance help message with more details.

The help message could be more informative by including details about the IR format and its purpose.

Apply this diff to improve the help message:
 void CommandLineArguments::print_json_to_ir_usage() const {
-    std::cerr << "Usage: " << m_program_name << " r [OPTIONS] IRS_DIR [FILE/DIR ...]\n";
+    std::cerr << "Usage: " << m_program_name << " r [OPTIONS] IRS_DIR [FILE/DIR ...]\n"
+              << "Convert JSON files to Intermediate Representation (IR) format.\n"
+              << "The IR format is optimized for efficient processing and querying.\n";
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fc76ed and 680f1e3.

📒 Files selected for processing (3)

components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
components/core/src/clp_s/JsonParser.hpp (1 hunks)
components/core/src/clp_s/clp-s.cpp (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

components/core/src/clp_s/JsonParser.hpp

👮 Files not reviewed due to content moderation or server errors (1)

components/core/src/clp_s/clp-s.cpp

🧰 Additional context used

📓 Path-based instructions (2)

components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)

components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: ubuntu-focal-static-linked-bins
GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: ubuntu-focal-dynamic-linked-bins
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: build-macos (macos-14, false)
GitHub Check: build-macos (macos-13, false)
GitHub Check: lint-check (ubuntu-latest)
GitHub Check: build-macos (macos-13, true)
GitHub Check: lint-check (macos-latest)

coderabbitai · 2025-01-16T10:20:19Z

components/core/src/clp_s/CommandLineArguments.cpp

+                        default_value(m_encoding_type),
+                    "4 (four byte encoding) or 8 (eight byte encoding)"
+            )(
+                    "files-from,f",
+                    po::value<std::string>(&input_path_list_file_path)
+                            ->value_name("FILE")
+                            ->default_value(input_path_list_file_path),
+                    "Compress files specified in FILE"
+            );
+            // clang-format on
+
+            po::positional_options_description positional_options;
+            positional_options.add("irs-dir", 1);
+            positional_options.add("input-paths", -1);
+
+            po::options_description all_compression_options;
+            all_compression_options.add(compression_options);
+            all_compression_options.add(compression_positional_options);
+
+            std::vector<std::string> unrecognized_options
+                    = po::collect_unrecognized(parsed.options, po::include_positional);
+            unrecognized_options.erase(unrecognized_options.begin());
+            po::store(
+                    po::command_line_parser(unrecognized_options)
+                            .options(all_compression_options)
+                            .positional(positional_options)
+                            .run(),
+                    parsed_command_line_options
+            );
+            po::notify(parsed_command_line_options);
+
+            if (parsed_command_line_options.count("help")) {
+                print_json_to_ir_usage();
+
+                std::cerr << "Examples:\n";
+                std::cerr << "  # Parse file1.json and dir1 into irs-dir\n";
+                std::cerr << "  " << m_program_name << " r irs-dir file1.json dir1\n";
+
+                po::options_description visible_options;
+                visible_options.add(general_options);
+                visible_options.add(compression_options);
+                std::cerr << visible_options << '\n';
+                return ParsingResult::InfoCommand;
+            }
+
+            if (m_archives_dir.empty()) {
+                throw std::invalid_argument("No IRs directory specified.");
+            }
+
+            if (false == input_path_list_file_path.empty()) {
+                if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {
+                    SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);
+                    return ParsingResult::Failure;
+                }
+            }
+
+            for (auto const& path : input_paths) {
+                if (false == get_input_files_for_raw_path(path, m_input_paths)) {
+                    throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));
+                }
+            }
+
+            if (m_input_paths.empty()) {
+                throw std::invalid_argument("No input paths specified.");
+            }
+
+            if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
+                SPDLOG_ERROR(
+                        "Invalid encoding type specified; --encoding-type {}",
+                        m_encoding_type
+                );
+                return ParsingResult::Failure;
+            }
+
+            if (0 >= m_max_ir_buffer_size) {
+                SPDLOG_ERROR(
+                        "Invalid max_ir_buffer_size specified; Buffer size must be greater than "
+                        "zero; --max-ir-buffer-size {}",
+                        m_max_ir_buffer_size
+                );
+                return ParsingResult::Failure;
+            }
+
+            if (0 >= m_max_document_size) {
+                SPDLOG_ERROR(
+                        "Invalid max_document_size specified; Document size must be greater than "
+                        "zero; --max-document-size {}",
+                        m_max_document_size
+                );
+                return ParsingResult::Failure;
+            }
+
+            if ((1 > m_compression_level) || (9 < m_compression_level)) {
+                SPDLOG_ERROR(
+                        "Invalid compression level specified; Compression level must be 1-9; "
+                        "--compression-level {}",
+                        m_compression_level
+                );
+                return ParsingResult::Failure;
+            }


🛠️ Refactor suggestion

Add input validation for command line arguments.

While the implementation includes basic validation, consider adding these improvements:

Validate that compression level is within bounds before using it

Add minimum size requirements for buffer and document sizes

Apply this diff to enhance validation:

if ((4 != m_encoding_type) && (8 != m_encoding_type)) { SPDLOG_ERROR( "Invalid encoding type specified; --encoding-type {}", m_encoding_type ); return ParsingResult::Failure; } + constexpr size_t cMinBufferSize = 1024; // 1KB minimum + constexpr size_t cMinDocumentSize = 1024; // 1KB minimum + if (0 >= m_max_ir_buffer_size) { SPDLOG_ERROR( "Invalid max_ir_buffer_size specified; Buffer size must be greater than " "zero; --max-ir-buffer-size {}", m_max_ir_buffer_size ); return ParsingResult::Failure; + } else if (m_max_ir_buffer_size < cMinBufferSize) { + SPDLOG_ERROR( + "Invalid max_ir_buffer_size specified; Buffer size must be at least {} bytes; " + "--max-ir-buffer-size {}", + cMinBufferSize, + m_max_ir_buffer_size + ); + return ParsingResult::Failure; } if (0 >= m_max_document_size) { SPDLOG_ERROR( "Invalid max_document_size specified; Document size must be greater than " "zero; --max-document-size {}", m_max_document_size ); return ParsingResult::Failure; + } else if (m_max_document_size < cMinDocumentSize) { + SPDLOG_ERROR( + "Invalid max_document_size specified; Document size must be at least {} bytes; " + "--max-document-size {}", + cMinDocumentSize, + m_max_document_size + ); + return ParsingResult::Failure; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} else if ((char)Command::JsonToIr == command_input) {

po::options_description compression_positional_options;

std::vector<std::string> input_paths;

// clang-format off

compression_positional_options.add_options()(

"irs-dir",

po::value<std::string>(&m_archives_dir)->value_name("DIR"),

"output directory"

)(

"input-paths",

po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"),

"input paths"

);

// clang-format on

po::options_description compression_options("Compression options");

std::string input_path_list_file_path;

// clang-format off

compression_options.add_options()(

"compression-level",

po::value<int>(&m_compression_level)->value_name("LEVEL")->

default_value(m_compression_level),

"1 (fast/low compression) to 9 (slow/high compression)."

)(

"max-document-size",

po::value<size_t>(&m_max_document_size)->value_name("DOC_SIZE")->

default_value(m_max_document_size),

"Maximum allowed size (B) for a single document before ir generation fails."

)(

"max-ir-buffer-size",

po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")->

default_value(m_max_ir_buffer_size),

"Maximum allowed size (B) for an in memory IR buffer befroe being written to file."

)(

"encoding-type",

po::value<int>(&m_encoding_type)->value_name("ENCODING_TYPE")->

default_value(m_encoding_type),

"4 (four byte encoding) or 8 (eight byte encoding)"

)(

"files-from,f",

po::value<std::string>(&input_path_list_file_path)

->value_name("FILE")

->default_value(input_path_list_file_path),

"Compress files specified in FILE"

);

// clang-format on

po::positional_options_description positional_options;

positional_options.add("irs-dir", 1);

positional_options.add("input-paths", -1);

po::options_description all_compression_options;

all_compression_options.add(compression_options);

all_compression_options.add(compression_positional_options);

std::vector<std::string> unrecognized_options

= po::collect_unrecognized(parsed.options, po::include_positional);

unrecognized_options.erase(unrecognized_options.begin());

po::store(

po::command_line_parser(unrecognized_options)

.options(all_compression_options)

.positional(positional_options)

.run(),

parsed_command_line_options

);

po::notify(parsed_command_line_options);

if (parsed_command_line_options.count("help")) {

print_json_to_ir_usage();

std::cerr << "Examples:\n";

std::cerr << " # Parse file1.json and dir1 into irs-dir\n";

std::cerr << " " << m_program_name << " r irs-dir file1.json dir1\n";

po::options_description visible_options;

visible_options.add(general_options);

visible_options.add(compression_options);

std::cerr << visible_options << '\n';

return ParsingResult::InfoCommand;

}

if (m_archives_dir.empty()) {

throw std::invalid_argument("No IRs directory specified.");

}

if (false == input_path_list_file_path.empty()) {

if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {

SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);

return ParsingResult::Failure;

}

}

for (auto const& path : input_paths) {

if (false == get_input_files_for_raw_path(path, m_input_paths)) {

throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));

}

}

if (m_input_paths.empty()) {

throw std::invalid_argument("No input paths specified.");

}

if ((4 != m_encoding_type) && (8 != m_encoding_type)) {

SPDLOG_ERROR(

"Invalid encoding type specified; --encoding-type {}",

m_encoding_type

);

return ParsingResult::Failure;

}

if (0 >= m_max_ir_buffer_size) {

SPDLOG_ERROR(

"Invalid max_ir_buffer_size specified; Buffer size must be greater than "

"zero; --max-ir-buffer-size {}",

m_max_ir_buffer_size

);

return ParsingResult::Failure;

}

if (0 >= m_max_document_size) {

SPDLOG_ERROR(

"Invalid max_document_size specified; Document size must be greater than "

"zero; --max-document-size {}",

m_max_document_size

);

return ParsingResult::Failure;

}

if ((1 > m_compression_level) || (9 < m_compression_level)) {

SPDLOG_ERROR(

"Invalid compression level specified; Compression level must be 1-9; "

"--compression-level {}",

m_compression_level

);

return ParsingResult::Failure;

}

} else if ((char)Command::JsonToIr == command_input) {

po::options_description compression_positional_options;

std::vector<std::string> input_paths;

// clang-format off

compression_positional_options.add_options()(

"irs-dir",

po::value<std::string>(&m_archives_dir)->value_name("DIR"),

"output directory"

)(

"input-paths",

po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"),

"input paths"

);

// clang-format on

po::options_description compression_options("Compression options");

std::string input_path_list_file_path;

// clang-format off

compression_options.add_options()(

"compression-level",

po::value<int>(&m_compression_level)->value_name("LEVEL")->

default_value(m_compression_level),

"1 (fast/low compression) to 9 (slow/high compression)."

)(

"max-document-size",

po::value<size_t>(&m_max_document_size)->value_name("DOC_SIZE")->

default_value(m_max_document_size),

"Maximum allowed size (B) for a single document before ir generation fails."

)(

"max-ir-buffer-size",

po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")->

default_value(m_max_ir_buffer_size),

"Maximum allowed size (B) for an in memory IR buffer befroe being written to file."

)(

"encoding-type",

po::value<int>(&m_encoding_type)->value_name("ENCODING_TYPE")->

default_value(m_encoding_type),

"4 (four byte encoding) or 8 (eight byte encoding)"

)(

"files-from,f",

po::value<std::string>(&input_path_list_file_path)

->value_name("FILE")

->default_value(input_path_list_file_path),

"Compress files specified in FILE"

);

// clang-format on

po::positional_options_description positional_options;

positional_options.add("irs-dir", 1);

positional_options.add("input-paths", -1);

po::options_description all_compression_options;

all_compression_options.add(compression_options);

all_compression_options.add(compression_positional_options);

std::vector<std::string> unrecognized_options

= po::collect_unrecognized(parsed.options, po::include_positional);

unrecognized_options.erase(unrecognized_options.begin());

po::store(

po::command_line_parser(unrecognized_options)

.options(all_compression_options)

.positional(positional_options)

.run(),

parsed_command_line_options

);

po::notify(parsed_command_line_options);

if (parsed_command_line_options.count("help")) {

print_json_to_ir_usage();

std::cerr << "Examples:\n";

std::cerr << " # Parse file1.json and dir1 into irs-dir\n";

std::cerr << " " << m_program_name << " r irs-dir file1.json dir1\n";

po::options_description visible_options;

visible_options.add(general_options);

visible_options.add(compression_options);

std::cerr << visible_options << '\n';

return ParsingResult::InfoCommand;

}

if (m_archives_dir.empty()) {

throw std::invalid_argument("No IRs directory specified.");

}

if (false == input_path_list_file_path.empty()) {

if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {

SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);

return ParsingResult::Failure;

}

}

for (auto const& path : input_paths) {

if (false == get_input_files_for_raw_path(path, m_input_paths)) {

throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));

}

}

if (m_input_paths.empty()) {

throw std::invalid_argument("No input paths specified.");

}

if ((4 != m_encoding_type) && (8 != m_encoding_type)) {

SPDLOG_ERROR(

"Invalid encoding type specified; --encoding-type {}",

m_encoding_type

);

return ParsingResult::Failure;

}

constexpr size_t cMinBufferSize = 1024; // 1KB minimum

constexpr size_t cMinDocumentSize = 1024; // 1KB minimum

if (0 >= m_max_ir_buffer_size) {

SPDLOG_ERROR(

"Invalid max_ir_buffer_size specified; Buffer size must be greater than "

"zero; --max-ir-buffer-size {}",

m_max_ir_buffer_size

);

return ParsingResult::Failure;

} else if (m_max_ir_buffer_size < cMinBufferSize) {

SPDLOG_ERROR(

"Invalid max_ir_buffer_size specified; Buffer size must be at least {} bytes; "

"--max-ir-buffer-size {}",

cMinBufferSize,

m_max_ir_buffer_size

);

return ParsingResult::Failure;

}

if (0 >= m_max_document_size) {

SPDLOG_ERROR(

"Invalid max_document_size specified; Document size must be greater than "

"zero; --max-document-size {}",

m_max_document_size

);

return ParsingResult::Failure;

} else if (m_max_document_size < cMinDocumentSize) {

SPDLOG_ERROR(

"Invalid max_document_size specified; Document size must be at least {} bytes; "

"--max-document-size {}",

cMinDocumentSize,

m_max_document_size

);

return ParsingResult::Failure;

}

if ((1 > m_compression_level) || (9 < m_compression_level)) {

SPDLOG_ERROR(

"Invalid compression level specified; Compression level must be 1-9; "

"--compression-level {}",

m_compression_level

);

return ParsingResult::Failure;

}

LinZhihao-723 · 2025-01-16T18:36:41Z

components/core/src/clp_s/clp-s.cpp

+    for (auto& path : all_file_paths) {
+        bool success;
+        if (option.encoding == 4) {
+            success = run_serializer<int32_t>(option, path);


Confirmed with Kirk: we should only generate 4-byte encoding IR stream.

LinZhihao-723 · 2025-01-16T18:38:05Z

components/core/src/clp_s/clp-s.cpp

+    for (auto& path : all_file_paths) {
+        bool success;
+        if (option.encoding == 4) {
+            success = run_serializer<int32_t>(option, path);


Instead of using int32_t directly, we should use clp::ir::four_byte_encoded_variable_t defined here

LinZhihao-723 · 2025-01-16T18:40:33Z

components/core/src/clp_s/clp-s.cpp

+) -> bool;
+
+/**
+ * Given user specified options and a file path to a JSON file calls the serailizer one each JSON


Suggested change

* Given user specified options and a file path to a JSON file calls the serailizer one each JSON

* Given user specified options and a file path to a JSON file calls the serializer one each JSON

LinZhihao-723 · 2025-01-16T18:43:39Z

components/core/src/clp_s/clp-s.cpp

+template <typename T>
+auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path);


Suggested change

template <typename T>

auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path);

template <typename encoded_variable_t>

[[nodiscard]] auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) -> bool;

According to our guideline:

We should give a meaningful template parameter name instead of generic ones like T.

We should add [[nodiscard]] to any functions whose return value needs to be checked.

We should explicitly annotate the return type if it's deterministic.

LinZhihao-723 · 2025-01-16T18:44:29Z

components/core/src/clp_s/clp-s.cpp

+        return false;
+    }
+
+    std::string line = "";


Suggested change

std::string line = "";

std::string line;

We should rely on the default constructor if provided one

LinZhihao-723 · 2025-01-16T19:01:01Z

components/core/src/clp_s/clp-s.cpp

+    }
+
+    std::string line = "";
+    size_t total_size = 0;


Do we still need this variable?

LinZhihao-723 · 2025-01-16T19:02:29Z

components/core/src/clp_s/clp-s.cpp

+        }
+        total_size = total_size + serializer.get_ir_buf_view().size();
+        zc.write(
+                reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),


LinZhihao-723 · 2025-01-16T19:04:58Z

components/core/src/clp_s/clp-s.cpp

+        std::vector<int8_t> ir_buf;
+        ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
+        zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());


Suggested change

std::vector<int8_t> ir_buf;

ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);

zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());

constexpr std::array<int8_t, 1> cEndOfStreamBuf{clp::ffi::ir_stream::cProtocol::Eof};

zc.write(

clp::size_checked_pointer_cast<char const>(cEndOfStreamBuf.data()),

cEndOfStreamBuf.size()

);

We can make it as a compile-time constant.

LinZhihao-723 · 2025-01-16T19:06:44Z

components/core/src/clp_s/clp-s.cpp

+
+/**
+ * Given user specified options and a file path to a JSON file calls the serailizer one each JSON
+ * entry to serialize into IR


Normally we should use @tparam to document template parameters.

LinZhihao-723 · 2025-01-16T19:09:18Z

components/core/src/clp_s/clp-s.cpp

+    std::string line = "";
+    size_t total_size = 0;
+
+    if (in_file.is_open()) {


It might be better to first check if in_file.is_open, and do the early exit if it's not.
This makes the code more readable since the major serialization logic has one fewer indentation level.

AVMatthews added 2 commits January 8, 2025 15:15

Json to IR functionality moved over to branch

9a9553b

Remove unneeded option

38229b5

AVMatthews changed the title ~~Feat(clp s): json to irv2~~ feat(clp-s): json to irv2 Jan 9, 2025

coderabbitai bot reviewed Jan 9, 2025

View reviewed changes

components/core/src/clp_s/clp-s.cpp Outdated Show resolved Hide resolved

components/core/src/clp_s/CommandLineArguments.cpp Show resolved Hide resolved

components/core/src/clp_s/CommandLineArguments.cpp Show resolved Hide resolved

LinZhihao-723 requested changes Jan 9, 2025

View reviewed changes

remove extra buffer copy and add a few input validation checks

82367fd

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

Merge branch 'main' into feat(clp-s)-JSON-to-IRv2

5fc76ed

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

Fix build issues

680f1e3

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

LinZhihao-723 requested changes Jan 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(clp-s): json to irv2 #657

feat(clp-s): json to irv2 #657

AVMatthews commented Jan 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

LinZhihao-723 left a comment

LinZhihao-723 Jan 9, 2025

LinZhihao-723 Jan 16, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 15, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

LinZhihao-723 Jan 16, 2025

-        } else if ((char)Command::JsonToIr == command_input) {
-            po::options_description compression_positional_options;
-            std::vector<std::string> input_paths;
-            // clang-format off
-             compression_positional_options.add_options()(
-                     "irs-dir",
-                     po::value<std::string>(&m_archives_dir)->value_name("DIR"),
-                     "output directory"
-             )(
-                     "input-paths",
-                     po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"),
-                     "input paths"
-             );
-            // clang-format on
-            po::options_description compression_options("Compression options");
-            std::string input_path_list_file_path;
-            // clang-format off
-            compression_options.add_options()(
-                    "compression-level",
-                    po::value<int>(&m_compression_level)->value_name("LEVEL")->
-                        default_value(m_compression_level),
-                    "1 (fast/low compression) to 9 (slow/high compression)."
-            )(
-                    "max-document-size",
-                    po::value<size_t>(&m_max_document_size)->value_name("DOC_SIZE")->
-                        default_value(m_max_document_size),
-                    "Maximum allowed size (B) for a single document before ir generation fails."
-            )(
-                    "max-ir-buffer-size",
-                    po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")->
-                        default_value(m_max_ir_buffer_size),
-                    "Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
-            )(
-                    "encoding-type",
-                    po::value<int>(&m_encoding_type)->value_name("ENCODING_TYPE")->
-                        default_value(m_encoding_type),
-                    "4 (four byte encoding) or 8 (eight byte encoding)"
-            )(
-                    "files-from,f",
-                    po::value<std::string>(&input_path_list_file_path)
-                            ->value_name("FILE")
-                            ->default_value(input_path_list_file_path),
-                    "Compress files specified in FILE"
-            );
-            // clang-format on
-            po::positional_options_description positional_options;
-            positional_options.add("irs-dir", 1);
-            positional_options.add("input-paths", -1);
-            po::options_description all_compression_options;
-            all_compression_options.add(compression_options);
-            all_compression_options.add(compression_positional_options);
-            std::vector<std::string> unrecognized_options
-                    = po::collect_unrecognized(parsed.options, po::include_positional);
-            unrecognized_options.erase(unrecognized_options.begin());
-            po::store(
-                    po::command_line_parser(unrecognized_options)
-                            .options(all_compression_options)
-                            .positional(positional_options)
-                            .run(),
-                    parsed_command_line_options
-            );
-            po::notify(parsed_command_line_options);
-            if (parsed_command_line_options.count("help")) {
-                print_json_to_ir_usage();
-                std::cerr << "Examples:\n";
-                std::cerr << "  # Parse file1.json and dir1 into irs-dir\n";
-                std::cerr << "  " << m_program_name << " r irs-dir file1.json dir1\n";
-                po::options_description visible_options;
-                visible_options.add(general_options);
-                visible_options.add(compression_options);
-                std::cerr << visible_options << '\n';
-                return ParsingResult::InfoCommand;
-            }
-            if (m_archives_dir.empty()) {
-                throw std::invalid_argument("No IRs directory specified.");
-            }
-            if (false == input_path_list_file_path.empty()) {
-                if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {
-                    SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);
-                    return ParsingResult::Failure;
-                }
-            }
-            for (auto const& path : input_paths) {
-                if (false == get_input_files_for_raw_path(path, m_input_paths)) {
-                    throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));
-                }
-            }
-            if (m_input_paths.empty()) {
-                throw std::invalid_argument("No input paths specified.");
-            }
-            if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
-                SPDLOG_ERROR(
-                        "Invalid encoding type specified; --encoding-type {}",
-                        m_encoding_type
-                );
-                return ParsingResult::Failure;
-            }
-            if (0 >= m_max_ir_buffer_size) {
-                SPDLOG_ERROR(
-                        "Invalid max_ir_buffer_size specified; Buffer size must be greater than "
-                        "zero; --max-ir-buffer-size {}",
-                        m_max_ir_buffer_size
-                );
-                return ParsingResult::Failure;
-            }
-            if (0 >= m_max_document_size) {
-                SPDLOG_ERROR(
-                        "Invalid max_document_size specified; Document size must be greater than "
-                        "zero; --max-document-size {}",
-                        m_max_document_size
-                );
-                return ParsingResult::Failure;
-            }
-            if ((1 > m_compression_level) || (9 < m_compression_level)) {
-                SPDLOG_ERROR(
-                        "Invalid compression level specified; Compression level must be 1-9; "
-                        "--compression-level {}",
-                        m_compression_level
-                );
-                return ParsingResult::Failure;
-            }
+        } else if ((char)Command::JsonToIr == command_input) {
+            po::options_description compression_positional_options;
+            std::vector<std::string> input_paths;
+            // clang-format off
+             compression_positional_options.add_options()(
+                     "irs-dir",
+                     po::value<std::string>(&m_archives_dir)->value_name("DIR"),
+                     "output directory"
+             )(
+                     "input-paths",
+                     po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"),
+                     "input paths"
+             );
+            // clang-format on
+            po::options_description compression_options("Compression options");
+            std::string input_path_list_file_path;
+            // clang-format off
+            compression_options.add_options()(
+                    "compression-level",
+                    po::value<int>(&m_compression_level)->value_name("LEVEL")->
+                        default_value(m_compression_level),
+                    "1 (fast/low compression) to 9 (slow/high compression)."
+            )(
+                    "max-document-size",
+                    po::value<size_t>(&m_max_document_size)->value_name("DOC_SIZE")->
+                        default_value(m_max_document_size),
+                    "Maximum allowed size (B) for a single document before ir generation fails."
+            )(
+                    "max-ir-buffer-size",
+                    po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")->
+                        default_value(m_max_ir_buffer_size),
+                    "Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
+            )(
+                    "encoding-type",
+                    po::value<int>(&m_encoding_type)->value_name("ENCODING_TYPE")->
+                        default_value(m_encoding_type),
+                    "4 (four byte encoding) or 8 (eight byte encoding)"
+            )(
+                    "files-from,f",
+                    po::value<std::string>(&input_path_list_file_path)
+                            ->value_name("FILE")
+                            ->default_value(input_path_list_file_path),
+                    "Compress files specified in FILE"
+            );
+            // clang-format on
+            po::positional_options_description positional_options;
+            positional_options.add("irs-dir", 1);
+            positional_options.add("input-paths", -1);
+            po::options_description all_compression_options;
+            all_compression_options.add(compression_options);
+            all_compression_options.add(compression_positional_options);
+            std::vector<std::string> unrecognized_options
+                    = po::collect_unrecognized(parsed.options, po::include_positional);
+            unrecognized_options.erase(unrecognized_options.begin());
+            po::store(
+                    po::command_line_parser(unrecognized_options)
+                            .options(all_compression_options)
+                            .positional(positional_options)
+                            .run(),
+                    parsed_command_line_options
+            );
+            po::notify(parsed_command_line_options);
+            if (parsed_command_line_options.count("help")) {
+                print_json_to_ir_usage();
+                std::cerr << "Examples:\n";
+                std::cerr << "  # Parse file1.json and dir1 into irs-dir\n";
+                std::cerr << "  " << m_program_name << " r irs-dir file1.json dir1\n";
+                po::options_description visible_options;
+                visible_options.add(general_options);
+                visible_options.add(compression_options);
+                std::cerr << visible_options << '\n';
+                return ParsingResult::InfoCommand;
+            }
+            if (m_archives_dir.empty()) {
+                throw std::invalid_argument("No IRs directory specified.");
+            }
+            if (false == input_path_list_file_path.empty()) {
+                if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {
+                    SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);
+                    return ParsingResult::Failure;
+                }
+            }
+            for (auto const& path : input_paths) {
+                if (false == get_input_files_for_raw_path(path, m_input_paths)) {
+                    throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));
+                }
+            }
+            if (m_input_paths.empty()) {
+                throw std::invalid_argument("No input paths specified.");
+            }
+            if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
+                SPDLOG_ERROR(
+                        "Invalid encoding type specified; --encoding-type {}",
+                        m_encoding_type
+                );
+                return ParsingResult::Failure;
+            }
+            constexpr size_t cMinBufferSize = 1024;  // 1KB minimum
+            constexpr size_t cMinDocumentSize = 1024;  // 1KB minimum
+            if (0 >= m_max_ir_buffer_size) {
+                SPDLOG_ERROR(
+                        "Invalid max_ir_buffer_size specified; Buffer size must be greater than "
+                        "zero; --max-ir-buffer-size {}",
+                        m_max_ir_buffer_size
+                );
+                return ParsingResult::Failure;
+            } else if (m_max_ir_buffer_size < cMinBufferSize) {
+                SPDLOG_ERROR(
+                        "Invalid max_ir_buffer_size specified; Buffer size must be at least {} bytes; "
+                        "--max-ir-buffer-size {}",
+                        cMinBufferSize,
+                        m_max_ir_buffer_size
+                );
+                return ParsingResult::Failure;
+            }
+            if (0 >= m_max_document_size) {
+                SPDLOG_ERROR(
+                        "Invalid max_document_size specified; Document size must be greater than "
+                        "zero; --max-document-size {}",
+                        m_max_document_size
+                );
+                return ParsingResult::Failure;
+            } else if (m_max_document_size < cMinDocumentSize) {
+                SPDLOG_ERROR(
+                        "Invalid max_document_size specified; Document size must be at least {} bytes; "
+                        "--max-document-size {}",
+                        cMinDocumentSize,
+                        m_max_document_size
+                );
+                return ParsingResult::Failure;
+            }
+            if ((1 > m_compression_level) || (9 < m_compression_level)) {
+                SPDLOG_ERROR(
+                        "Invalid compression level specified; Compression level must be 1-9; "
+                        "--compression-level {}",
+                        m_compression_level
+                );
+                return ParsingResult::Failure;
+            }

	* Given user specified options and a file path to a JSON file calls the serailizer one each JSON
	* Given user specified options and a file path to a JSON file calls the serializer one each JSON

		template <typename T>
		auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path);

feat(clp-s): json to irv2 #657

Are you sure you want to change the base?

feat(clp-s): json to irv2 #657

Conversation

AVMatthews commented Jan 9, 2025 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 9, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Possibly Related PRs

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 15, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AVMatthews commented Jan 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)