Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(clp-s): json to irv2 #657

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

AVMatthews
Copy link
Contributor

@AVMatthews AVMatthews commented Jan 9, 2025

Description

This PR:

  • Exposes the JSON to IRV2 parsing to the user through the command line
  • Enables users to write the IRV2 format to a file.

Validation performed

Generated IR V2 format for all 5 JSON public datasets
ex) ./clp-s r elasticsearch_ir elasticsearch/

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a new command-line option for converting JSON files to Intermediate Representation (IR) format.
    • Introduced advanced configuration options for JSON to IR conversion, including:
      • Output directory specification
      • Compression settings
      • Encoding type configuration
    • New help message to guide users on the JSON to IR command usage.
  • Improvements

    • Enhanced command-line interface with additional parsing capabilities.
    • Improved error handling for new JSON conversion functionality.

Copy link
Contributor

coderabbitai bot commented Jan 9, 2025

Walkthrough

The pull request introduces a new command-line option JsonToIr in the command-line parsing system, enabling the conversion of JSON files into an Intermediate Representation (IR) format. This implementation includes new command-line options for specifying input and output paths, configuring compression settings, and handling encoding types. Changes are made across multiple files in the components/core/src/clp_s/ directory, adding new methods, enumerations, and functions to support the JSON to IR conversion process.

Changes

File Change Summary
CommandLineArguments.cpp - Added new command handling for JsonToIr
- Implemented new command-line options parsing
- Added help message function for new command
- Updated error handling for new command parameters
CommandLineArguments.hpp - Added JsonToIr to Command enum
- Introduced new getter methods for IR buffer size and encoding type
- Added private member variables for encoding and buffer configuration
JsonParser.hpp - Created new JsonToIrParserOption structure with parsing configuration options
clp-s.cpp - Added template functions for serialization and buffer management
- Implemented generate_ir function for JSON to IR conversion
- Modified main function to support new JsonToIr command

Sequence Diagram

sequenceDiagram
    participant CLI as Command Line Interface
    participant Parser as CommandLineArguments
    participant Generator as JSON to IR Generator
    participant Serializer as Serializer

    CLI->>Parser: Parse JsonToIr command
    Parser->>Generator: Validate and prepare options
    Generator->>Serializer: Initialize serialization
    Serializer->>Generator: Process JSON files
    Generator-->>CLI: Return conversion status
Loading

Possibly Related PRs

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@AVMatthews AVMatthews changed the title Feat(clp s): json to irv2 feat(clp-s): json to irv2 Jan 9, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
components/core/src/clp_s/CommandLineArguments.cpp (2)

758-764: Fix typographical errors in option descriptions.

There are minor typos in the option descriptions:

  • Line 758: "before ir generation fails" should be "before IR generation fails".
  • Line 764: "befroe" should be "before".

Apply this diff to correct the typos:

-                        "Maximum allowed size (B) for a single document before ir generation fails."
+                        "Maximum allowed size (B) for a single document before IR generation fails."
...
-                        "Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
+                        "Maximum allowed size (B) for an in-memory IR buffer before being written to file."

747-747: Consider renaming "Compression options" to "JSON to IR options".

For clarity, rename the header of the options from "Compression options" to "JSON to IR options", as these options are specific to the JsonToIr command.

Apply this diff to rename the options group:

-                po::options_description compression_options("Compression options");
+                po::options_description compression_options("JSON to IR options");
components/core/src/clp_s/CommandLineArguments.hpp (2)

29-30: Consider documenting the command character choice

The character 'r' for JsonToIr might not be immediately intuitive to users. Consider adding a comment explaining the rationale for this choice, or consider a more descriptive character if available.


202-203: Use named constants for magic numbers

Consider replacing the magic numbers with named constants to improve code readability and maintainability:

  • The encoding type value of 8 should be a named constant with documentation explaining its significance
  • The buffer size of 512MB could use a named constant similar to other size constants in the file
+    // Default encoding type for IR conversion
+    static constexpr int DEFAULT_ENCODING_TYPE = 8;
+    // Default maximum buffer size for IR conversion (512MB)
+    static constexpr size_t DEFAULT_MAX_IR_BUFFER_SIZE = 512ULL * 1024 * 1024;
-    int m_encoding_type{8};
-    size_t m_max_ir_buffer_size{512ULL * 1024 * 1024};
+    int m_encoding_type{DEFAULT_ENCODING_TYPE};
+    size_t m_max_ir_buffer_size{DEFAULT_MAX_IR_BUFFER_SIZE};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d3b671 and 38229b5.

📒 Files selected for processing (4)
  • components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
  • components/core/src/clp_s/CommandLineArguments.hpp (4 hunks)
  • components/core/src/clp_s/JsonParser.hpp (1 hunks)
  • components/core/src/clp_s/clp-s.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
components/core/src/clp_s/JsonParser.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)
components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (4)
components/core/src/clp_s/clp-s.cpp (1)

191-191: Verify exception safety in functions to prevent throwing exceptions from noexcept functions.

Static analysis has flagged that an exception may be thrown in a function that should not throw exceptions. Please verify that serialize_msgpack_map does not throw exceptions, or ensure it is not declared with noexcept.

Run the following script to check if serialize_msgpack_map is declared noexcept:

🧰 Tools
🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

components/core/src/clp_s/JsonParser.hpp (1)

55-62: Struct JsonToIrParserOption added successfully.

The new structure JsonToIrParserOption is well-defined and follows appropriate coding standards.

components/core/src/clp_s/CommandLineArguments.hpp (2)

69-72: LGTM! Modern C++ practices well applied

The getter methods follow modern C++ best practices with [[nodiscard]] attribute and trailing return types. The const-qualification ensures thread safety.


178-179: LGTM! Consistent with existing usage methods

The method declaration follows the established pattern of other usage printing methods in the class.

components/core/src/clp_s/clp-s.cpp Outdated Show resolved Hide resolved
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments about using IR v2 APIs

for (auto& path : all_file_paths) {
bool success;
if (option.encoding == 4) {
success = run_serializer<int32_t>(option, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we should only generate 4-byte encoded IR stream (despite we do support 8-byte encoding). Correct me if I'm wrong @kirkrodrigues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Kirk: we should only generate 4-byte encoding IR stream.

components/core/src/clp_s/clp-s.cpp Outdated Show resolved Hide resolved
components/core/src/clp_s/clp-s.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
components/core/src/clp_s/clp-s.cpp (1)

181-272: Consider breaking down the function for better maintainability.

The function is quite long (91 lines) and handles multiple responsibilities. Consider breaking it down into smaller functions:

  1. File handling setup
  2. JSON processing
  3. IR buffer management
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

components/core/src/clp_s/CommandLineArguments.cpp (2)

764-764: Fix typo in error message.

The error message contains a typo: "befroe" should be "before".

-                    "Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
+                    "Maximum allowed size (B) for an in memory IR buffer before being written to file."

828-861: Enhance error messages with valid value ranges.

The error messages could be more helpful by including the valid value ranges.

-                SPDLOG_ERROR(
-                        "Invalid encoding type specified; --encoding-type {}",
-                        m_encoding_type
-                );
+                SPDLOG_ERROR(
+                        "Invalid encoding type specified (must be 4 or 8); --encoding-type {}",
+                        m_encoding_type
+                );

-                SPDLOG_ERROR(
-                        "Invalid compression level specified; Compression level must be 1-9; "
-                        "--compression-level {}",
-                        m_compression_level
-                );
+                SPDLOG_ERROR(
+                        "Invalid compression level specified (must be between 1 and 9, where 1 is "
+                        "fastest/lowest compression and 9 is slowest/highest compression); "
+                        "--compression-level {}",
+                        m_compression_level
+                );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38229b5 and 82367fd.

📒 Files selected for processing (2)
  • components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
  • components/core/src/clp_s/clp-s.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)
components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (6)
components/core/src/clp_s/clp-s.cpp (4)

57-61: LGTM! Well-structured implementation with proper error handling.

The function follows best practices with proper error handling, type safety checks, and coding guidelines.

Also applies to: 160-179


485-488: LGTM! Clean integration of the new command.

The implementation follows the established pattern for command handling and error management.


234-236: ⚠️ Potential issue

Use safer alternatives to reinterpret_cast.

The use of reinterpret_cast with const_cast is unsafe and could lead to alignment issues. Consider using std::bit_cast (C++20) or ensuring proper alignment.

Also applies to: 258-259


304-308: Use 4-byte encoding by default.

Based on the comment from @LinZhihao-723, we should only generate 4-byte encoded IR stream despite supporting 8-byte encoding.

Consider defaulting to 4-byte encoding and adding a comment explaining why:

-        if (option.encoding == 4) {
-            success = run_serializer<int32_t>(option, path);
-        } else {
-            success = run_serializer<int64_t>(option, path);
-        }
+        // We only generate 4-byte encoded IR stream for better compatibility
+        success = run_serializer<int32_t>(option, path);
components/core/src/clp_s/CommandLineArguments.cpp (2)

109-115: LGTM! Clear and consistent help message.

The help message follows the established format and provides clear information about the new command.


969-971: LGTM! Consistent usage message format.

The usage message follows the established format and provides clear information about the command syntax.

components/core/src/clp_s/clp-s.cpp Show resolved Hide resolved
Comment on lines +181 to +272
template <typename T>
auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) {
auto result{Serializer<T>::create()};
if (result.has_error()) {
SPDLOG_ERROR("Failed to create Serializer");
return false;
}
auto& serializer{result.value()};
std::ifstream in_file;
in_file.open(path, std::ifstream::in);
if (false == in_file.is_open()) {
SPDLOG_ERROR("Failed to open input file: {}", path);
return false;
}
std::filesystem::path input_path{path};
std::string filename = input_path.filename().string();
std::string out_path = option.irs_dir + "/" + filename + ".ir";

clp_s::FileWriter out_file;
out_file.open(out_path, clp_s::FileWriter::OpenMode::CreateForWriting);
clp_s::ZstdCompressor zc;
try {
zc.open(out_file, option.compression_level);
} catch (clp_s::ZstdCompressor::OperationFailed& error) {
SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
in_file.close();
out_file.close();
return false;
}

std::string line = "";
size_t total_size = 0;

if (in_file.is_open()) {
while (getline(in_file, line)) {
try {
auto j_obj = nlohmann::json::parse(line);
if (false
== unpack_and_serialize_msgpack_bytes(
nlohmann::json::to_msgpack(j_obj),
serializer
))
{
SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);
in_file.close();
out_file.close();
zc.close();
return false;
}
auto bufferSize = serializer.get_ir_buf_view().size();
if (bufferSize >= option.max_ir_buffer_size) {
total_size = total_size + bufferSize;
zc.write(
reinterpret_cast<char*>(
const_cast<int8_t*>(serializer.get_ir_buf_view().data())
),
bufferSize
);
zc.flush();
serializer.clear_ir_buf();
}
} catch (nlohmann::json::parse_error const& e) {
SPDLOG_ERROR("JSON parsing error: {}", e.what());
in_file.close();
out_file.close();
zc.close();
return false;
} catch (std::exception const& e) {
SPDLOG_ERROR("Error during serialization: {}", e.what());
in_file.close();
out_file.close();
zc.close();
return false;
}
}
total_size = total_size + serializer.get_ir_buf_view().size();
zc.write(
reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),
serializer.get_ir_buf_view().size()
);
std::vector<int8_t> ir_buf;
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());
zc.flush();
serializer.clear_ir_buf();
in_file.close();
zc.close();
out_file.close();
}

return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor error handling to reduce code duplication.

The error handling code for closing files and compressor is duplicated in multiple catch blocks. Consider extracting this into a helper function.

+    auto cleanup = [&]() {
+        in_file.close();
+        out_file.close();
+        zc.close();
+    };
+
     try {
         zc.open(out_file, option.compression_level);
     } catch (clp_s::ZstdCompressor::OperationFailed& error) {
         SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
-        in_file.close();
-        out_file.close();
+        cleanup();
         return false;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
template <typename T>
auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) {
auto result{Serializer<T>::create()};
if (result.has_error()) {
SPDLOG_ERROR("Failed to create Serializer");
return false;
}
auto& serializer{result.value()};
std::ifstream in_file;
in_file.open(path, std::ifstream::in);
if (false == in_file.is_open()) {
SPDLOG_ERROR("Failed to open input file: {}", path);
return false;
}
std::filesystem::path input_path{path};
std::string filename = input_path.filename().string();
std::string out_path = option.irs_dir + "/" + filename + ".ir";
clp_s::FileWriter out_file;
out_file.open(out_path, clp_s::FileWriter::OpenMode::CreateForWriting);
clp_s::ZstdCompressor zc;
try {
zc.open(out_file, option.compression_level);
} catch (clp_s::ZstdCompressor::OperationFailed& error) {
SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
in_file.close();
out_file.close();
return false;
}
std::string line = "";
size_t total_size = 0;
if (in_file.is_open()) {
while (getline(in_file, line)) {
try {
auto j_obj = nlohmann::json::parse(line);
if (false
== unpack_and_serialize_msgpack_bytes(
nlohmann::json::to_msgpack(j_obj),
serializer
))
{
SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);
in_file.close();
out_file.close();
zc.close();
return false;
}
auto bufferSize = serializer.get_ir_buf_view().size();
if (bufferSize >= option.max_ir_buffer_size) {
total_size = total_size + bufferSize;
zc.write(
reinterpret_cast<char*>(
const_cast<int8_t*>(serializer.get_ir_buf_view().data())
),
bufferSize
);
zc.flush();
serializer.clear_ir_buf();
}
} catch (nlohmann::json::parse_error const& e) {
SPDLOG_ERROR("JSON parsing error: {}", e.what());
in_file.close();
out_file.close();
zc.close();
return false;
} catch (std::exception const& e) {
SPDLOG_ERROR("Error during serialization: {}", e.what());
in_file.close();
out_file.close();
zc.close();
return false;
}
}
total_size = total_size + serializer.get_ir_buf_view().size();
zc.write(
reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),
serializer.get_ir_buf_view().size()
);
std::vector<int8_t> ir_buf;
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());
zc.flush();
serializer.clear_ir_buf();
in_file.close();
zc.close();
out_file.close();
}
return true;
}
template <typename T>
auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) {
auto result{Serializer<T>::create()};
if (result.has_error()) {
SPDLOG_ERROR("Failed to create Serializer");
return false;
}
auto& serializer{result.value()};
std::ifstream in_file;
in_file.open(path, std::ifstream::in);
if (false == in_file.is_open()) {
SPDLOG_ERROR("Failed to open input file: {}", path);
return false;
}
std::filesystem::path input_path{path};
std::string filename = input_path.filename().string();
std::string out_path = option.irs_dir + "/" + filename + ".ir";
clp_s::FileWriter out_file;
out_file.open(out_path, clp_s::FileWriter::OpenMode::CreateForWriting);
clp_s::ZstdCompressor zc;
auto cleanup = [&]() {
in_file.close();
out_file.close();
zc.close();
};
try {
zc.open(out_file, option.compression_level);
} catch (clp_s::ZstdCompressor::OperationFailed& error) {
SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
cleanup();
return false;
}
std::string line = "";
size_t total_size = 0;
if (in_file.is_open()) {
while (getline(in_file, line)) {
try {
auto j_obj = nlohmann::json::parse(line);
if (false
== unpack_and_serialize_msgpack_bytes(
nlohmann::json::to_msgpack(j_obj),
serializer
))
{
SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);
cleanup();
return false;
}
auto bufferSize = serializer.get_ir_buf_view().size();
if (bufferSize >= option.max_ir_buffer_size) {
total_size = total_size + bufferSize;
zc.write(
reinterpret_cast<char*>(
const_cast<int8_t*>(serializer.get_ir_buf_view().data())
),
bufferSize
);
zc.flush();
serializer.clear_ir_buf();
}
} catch (nlohmann::json::parse_error const& e) {
SPDLOG_ERROR("JSON parsing error: {}", e.what());
cleanup();
return false;
} catch (std::exception const& e) {
SPDLOG_ERROR("Error during serialization: {}", e.what());
cleanup();
return false;
}
}
total_size = total_size + serializer.get_ir_buf_view().size();
zc.write(
reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),
serializer.get_ir_buf_view().size()
);
std::vector<int8_t> ir_buf;
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());
zc.flush();
serializer.clear_ir_buf();
cleanup();
}
return true;
}
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/src/clp_s/clp-s.cpp (1)

162-181: Enhance error message with more context.

The error message could be more specific by including the type of the msgpack object when it's not a MAP.

-            return false;
+            SPDLOG_ERROR("Expected msgpack object of type MAP, got type {}", msgpack_obj.type);
+            return false;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82367fd and 5fc76ed.

📒 Files selected for processing (4)
  • components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
  • components/core/src/clp_s/CommandLineArguments.hpp (4 hunks)
  • components/core/src/clp_s/JsonParser.hpp (1 hunks)
  • components/core/src/clp_s/clp-s.cpp (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/src/clp_s/JsonParser.hpp
  • components/core/src/clp_s/CommandLineArguments.hpp
🧰 Additional context used
📓 Path-based instructions (2)
components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)
components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

🔇 Additional comments (7)
components/core/src/clp_s/clp-s.cpp (5)

4-4: LGTM! Required headers are properly included.

The new includes are appropriate for the JSON to IR conversion functionality.

Also applies to: 17-18


488-491: LGTM! Command handling is consistent with existing patterns.

The JsonToIr command is properly integrated into the main function's command handling logic.


226-226: ⚠️ Potential issue

Security: Avoid logging raw JSON data.

Logging raw JSON data could expose sensitive information. Consider logging only the error details.

-                    SPDLOG_ERROR("Failed to serialize msgpack bytes for line: {}", line);
+                    SPDLOG_ERROR("Failed to serialize msgpack bytes: {}", e.what());

Likely invalid or redundant comment.


183-274: 🛠️ Refactor suggestion

Refactor error handling to reduce code duplication.

The cleanup code (closing files and compressor) is duplicated across multiple catch blocks. Consider using RAII or a cleanup helper.

+    auto cleanup = [&]() {
+        in_file.close();
+        out_file.close();
+        zc.close();
+    };
+
     try {
         zc.open(out_file, option.compression_level);
     } catch (clp_s::ZstdCompressor::OperationFailed& error) {
         SPDLOG_ERROR("Failed to open ZSTDcompressor - {}", error.what());
-        in_file.close();
-        out_file.close();
+        cleanup();
         return false;
     }

Likely invalid or redundant comment.

🧰 Tools
🪛 cppcheck (2.10-2)

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)


306-310: Consider using 4-byte encoding exclusively.

As per the team's guidance, we should only generate 4-byte encoded IR stream despite supporting 8-byte encoding.

✅ Verification successful

Confirmed: 4-byte encoding should be the exclusive option

  • Remove the 8-byte encoding option and validation in CommandLineArguments.hpp/cpp
  • Update the default encoding type from 8 to 4 bytes
  • Simplify the serializer selection in clp-s.cpp to use only int32_t

No tests, configurations, or documentation currently reference 8-byte encoding, making this change safe to implement.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any existing uses of 8-byte encoding
rg -l "encoding.*8" --type cpp

Length of output: 134


Script:

#!/bin/bash
# Check the content of the files and surrounding context
echo "=== CommandLineArguments.hpp ==="
rg -B 2 -A 2 "encoding.*8" components/core/src/clp_s/CommandLineArguments.hpp

echo -e "\n=== CommandLineArguments.cpp ==="
rg -B 2 -A 2 "encoding.*8" components/core/src/clp_s/CommandLineArguments.cpp

echo -e "\n=== Checking for actual usage of encoding=8 or similar patterns ==="
rg "encoding.*=.*8" --type cpp -B 2 -A 2

Length of output: 1130


Script:

#!/bin/bash
# Check for test files that might use encoding type
echo "=== Checking test files for encoding references ==="
rg -l "encoding.*[48]" --type cpp "*test*"

echo -e "\n=== Checking for configuration files ==="
fd -e json -e yaml -e yml -e conf | xargs rg -l "encoding.*[48]"

echo -e "\n=== Checking for documentation mentioning encoding ==="
fd -e md -e txt | xargs rg -l "encoding.*[48]"

Length of output: 542


Script:

#!/bin/bash
# Search test files correctly
echo "=== Checking test files for encoding references ==="
fd -t f "test" -e cpp -e hpp | xargs rg -l "encoding.*[48]"

echo -e "\n=== Checking for direct usage of encoding type ==="
rg -C 3 "option\.encoding.*[48]" --type cpp

Length of output: 825

components/core/src/clp_s/CommandLineArguments.cpp (2)

825-953: LGTM! Command-line options are well-defined with proper validation.

The implementation includes:

  • Clear error messages
  • Thorough validation of all parameters
  • Helpful usage examples

1061-1063: LGTM! Usage message is consistent with other commands.

The help text follows the same format as other command usage messages.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
components/core/src/clp_s/CommandLineArguments.cpp (1)

1068-1070: Enhance help message with more details.

The help message could be more informative by including details about the IR format and its purpose.

Apply this diff to improve the help message:

 void CommandLineArguments::print_json_to_ir_usage() const {
-    std::cerr << "Usage: " << m_program_name << " r [OPTIONS] IRS_DIR [FILE/DIR ...]\n";
+    std::cerr << "Usage: " << m_program_name << " r [OPTIONS] IRS_DIR [FILE/DIR ...]\n"
+              << "Convert JSON files to Intermediate Representation (IR) format.\n"
+              << "The IR format is optimized for efficient processing and querying.\n";
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fc76ed and 680f1e3.

📒 Files selected for processing (3)
  • components/core/src/clp_s/CommandLineArguments.cpp (4 hunks)
  • components/core/src/clp_s/JsonParser.hpp (1 hunks)
  • components/core/src/clp_s/clp-s.cpp (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp_s/JsonParser.hpp
👮 Files not reviewed due to content moderation or server errors (1)
  • components/core/src/clp_s/clp-s.cpp
🧰 Additional context used
📓 Path-based instructions (2)
components/core/src/clp_s/clp-s.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp_s/CommandLineArguments.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🪛 cppcheck (2.10-2)
components/core/src/clp_s/clp-s.cpp

[error] 191-191: Exception thrown in function declared not to throw exceptions.

(throwInNoexceptFunction)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (macos-latest)

Comment on lines +825 to +960
default_value(m_encoding_type),
"4 (four byte encoding) or 8 (eight byte encoding)"
)(
"files-from,f",
po::value<std::string>(&input_path_list_file_path)
->value_name("FILE")
->default_value(input_path_list_file_path),
"Compress files specified in FILE"
);
// clang-format on

po::positional_options_description positional_options;
positional_options.add("irs-dir", 1);
positional_options.add("input-paths", -1);

po::options_description all_compression_options;
all_compression_options.add(compression_options);
all_compression_options.add(compression_positional_options);

std::vector<std::string> unrecognized_options
= po::collect_unrecognized(parsed.options, po::include_positional);
unrecognized_options.erase(unrecognized_options.begin());
po::store(
po::command_line_parser(unrecognized_options)
.options(all_compression_options)
.positional(positional_options)
.run(),
parsed_command_line_options
);
po::notify(parsed_command_line_options);

if (parsed_command_line_options.count("help")) {
print_json_to_ir_usage();

std::cerr << "Examples:\n";
std::cerr << " # Parse file1.json and dir1 into irs-dir\n";
std::cerr << " " << m_program_name << " r irs-dir file1.json dir1\n";

po::options_description visible_options;
visible_options.add(general_options);
visible_options.add(compression_options);
std::cerr << visible_options << '\n';
return ParsingResult::InfoCommand;
}

if (m_archives_dir.empty()) {
throw std::invalid_argument("No IRs directory specified.");
}

if (false == input_path_list_file_path.empty()) {
if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {
SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);
return ParsingResult::Failure;
}
}

for (auto const& path : input_paths) {
if (false == get_input_files_for_raw_path(path, m_input_paths)) {
throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));
}
}

if (m_input_paths.empty()) {
throw std::invalid_argument("No input paths specified.");
}

if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
SPDLOG_ERROR(
"Invalid encoding type specified; --encoding-type {}",
m_encoding_type
);
return ParsingResult::Failure;
}

if (0 >= m_max_ir_buffer_size) {
SPDLOG_ERROR(
"Invalid max_ir_buffer_size specified; Buffer size must be greater than "
"zero; --max-ir-buffer-size {}",
m_max_ir_buffer_size
);
return ParsingResult::Failure;
}

if (0 >= m_max_document_size) {
SPDLOG_ERROR(
"Invalid max_document_size specified; Document size must be greater than "
"zero; --max-document-size {}",
m_max_document_size
);
return ParsingResult::Failure;
}

if ((1 > m_compression_level) || (9 < m_compression_level)) {
SPDLOG_ERROR(
"Invalid compression level specified; Compression level must be 1-9; "
"--compression-level {}",
m_compression_level
);
return ParsingResult::Failure;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for command line arguments.

While the implementation includes basic validation, consider adding these improvements:

  1. Validate that compression level is within bounds before using it
  2. Add minimum size requirements for buffer and document sizes

Apply this diff to enhance validation:

     if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
         SPDLOG_ERROR(
                 "Invalid encoding type specified; --encoding-type {}",
                 m_encoding_type
         );
         return ParsingResult::Failure;
     }

+    constexpr size_t cMinBufferSize = 1024;  // 1KB minimum
+    constexpr size_t cMinDocumentSize = 1024;  // 1KB minimum
+
     if (0 >= m_max_ir_buffer_size) {
         SPDLOG_ERROR(
                 "Invalid max_ir_buffer_size specified; Buffer size must be greater than "
                 "zero; --max-ir-buffer-size {}",
                 m_max_ir_buffer_size
         );
         return ParsingResult::Failure;
+    } else if (m_max_ir_buffer_size < cMinBufferSize) {
+        SPDLOG_ERROR(
+                "Invalid max_ir_buffer_size specified; Buffer size must be at least {} bytes; "
+                "--max-ir-buffer-size {}",
+                cMinBufferSize,
+                m_max_ir_buffer_size
+        );
+        return ParsingResult::Failure;
     }

     if (0 >= m_max_document_size) {
         SPDLOG_ERROR(
                 "Invalid max_document_size specified; Document size must be greater than "
                 "zero; --max-document-size {}",
                 m_max_document_size
         );
         return ParsingResult::Failure;
+    } else if (m_max_document_size < cMinDocumentSize) {
+        SPDLOG_ERROR(
+                "Invalid max_document_size specified; Document size must be at least {} bytes; "
+                "--max-document-size {}",
+                cMinDocumentSize,
+                m_max_document_size
+        );
+        return ParsingResult::Failure;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if ((char)Command::JsonToIr == command_input) {
po::options_description compression_positional_options;
std::vector<std::string> input_paths;
// clang-format off
compression_positional_options.add_options()(
"irs-dir",
po::value<std::string>(&m_archives_dir)->value_name("DIR"),
"output directory"
)(
"input-paths",
po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"),
"input paths"
);
// clang-format on
po::options_description compression_options("Compression options");
std::string input_path_list_file_path;
// clang-format off
compression_options.add_options()(
"compression-level",
po::value<int>(&m_compression_level)->value_name("LEVEL")->
default_value(m_compression_level),
"1 (fast/low compression) to 9 (slow/high compression)."
)(
"max-document-size",
po::value<size_t>(&m_max_document_size)->value_name("DOC_SIZE")->
default_value(m_max_document_size),
"Maximum allowed size (B) for a single document before ir generation fails."
)(
"max-ir-buffer-size",
po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")->
default_value(m_max_ir_buffer_size),
"Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
)(
"encoding-type",
po::value<int>(&m_encoding_type)->value_name("ENCODING_TYPE")->
default_value(m_encoding_type),
"4 (four byte encoding) or 8 (eight byte encoding)"
)(
"files-from,f",
po::value<std::string>(&input_path_list_file_path)
->value_name("FILE")
->default_value(input_path_list_file_path),
"Compress files specified in FILE"
);
// clang-format on
po::positional_options_description positional_options;
positional_options.add("irs-dir", 1);
positional_options.add("input-paths", -1);
po::options_description all_compression_options;
all_compression_options.add(compression_options);
all_compression_options.add(compression_positional_options);
std::vector<std::string> unrecognized_options
= po::collect_unrecognized(parsed.options, po::include_positional);
unrecognized_options.erase(unrecognized_options.begin());
po::store(
po::command_line_parser(unrecognized_options)
.options(all_compression_options)
.positional(positional_options)
.run(),
parsed_command_line_options
);
po::notify(parsed_command_line_options);
if (parsed_command_line_options.count("help")) {
print_json_to_ir_usage();
std::cerr << "Examples:\n";
std::cerr << " # Parse file1.json and dir1 into irs-dir\n";
std::cerr << " " << m_program_name << " r irs-dir file1.json dir1\n";
po::options_description visible_options;
visible_options.add(general_options);
visible_options.add(compression_options);
std::cerr << visible_options << '\n';
return ParsingResult::InfoCommand;
}
if (m_archives_dir.empty()) {
throw std::invalid_argument("No IRs directory specified.");
}
if (false == input_path_list_file_path.empty()) {
if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {
SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);
return ParsingResult::Failure;
}
}
for (auto const& path : input_paths) {
if (false == get_input_files_for_raw_path(path, m_input_paths)) {
throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));
}
}
if (m_input_paths.empty()) {
throw std::invalid_argument("No input paths specified.");
}
if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
SPDLOG_ERROR(
"Invalid encoding type specified; --encoding-type {}",
m_encoding_type
);
return ParsingResult::Failure;
}
if (0 >= m_max_ir_buffer_size) {
SPDLOG_ERROR(
"Invalid max_ir_buffer_size specified; Buffer size must be greater than "
"zero; --max-ir-buffer-size {}",
m_max_ir_buffer_size
);
return ParsingResult::Failure;
}
if (0 >= m_max_document_size) {
SPDLOG_ERROR(
"Invalid max_document_size specified; Document size must be greater than "
"zero; --max-document-size {}",
m_max_document_size
);
return ParsingResult::Failure;
}
if ((1 > m_compression_level) || (9 < m_compression_level)) {
SPDLOG_ERROR(
"Invalid compression level specified; Compression level must be 1-9; "
"--compression-level {}",
m_compression_level
);
return ParsingResult::Failure;
}
} else if ((char)Command::JsonToIr == command_input) {
po::options_description compression_positional_options;
std::vector<std::string> input_paths;
// clang-format off
compression_positional_options.add_options()(
"irs-dir",
po::value<std::string>(&m_archives_dir)->value_name("DIR"),
"output directory"
)(
"input-paths",
po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"),
"input paths"
);
// clang-format on
po::options_description compression_options("Compression options");
std::string input_path_list_file_path;
// clang-format off
compression_options.add_options()(
"compression-level",
po::value<int>(&m_compression_level)->value_name("LEVEL")->
default_value(m_compression_level),
"1 (fast/low compression) to 9 (slow/high compression)."
)(
"max-document-size",
po::value<size_t>(&m_max_document_size)->value_name("DOC_SIZE")->
default_value(m_max_document_size),
"Maximum allowed size (B) for a single document before ir generation fails."
)(
"max-ir-buffer-size",
po::value<size_t>(&m_max_ir_buffer_size)->value_name("BUFFER_SIZE")->
default_value(m_max_ir_buffer_size),
"Maximum allowed size (B) for an in memory IR buffer befroe being written to file."
)(
"encoding-type",
po::value<int>(&m_encoding_type)->value_name("ENCODING_TYPE")->
default_value(m_encoding_type),
"4 (four byte encoding) or 8 (eight byte encoding)"
)(
"files-from,f",
po::value<std::string>(&input_path_list_file_path)
->value_name("FILE")
->default_value(input_path_list_file_path),
"Compress files specified in FILE"
);
// clang-format on
po::positional_options_description positional_options;
positional_options.add("irs-dir", 1);
positional_options.add("input-paths", -1);
po::options_description all_compression_options;
all_compression_options.add(compression_options);
all_compression_options.add(compression_positional_options);
std::vector<std::string> unrecognized_options
= po::collect_unrecognized(parsed.options, po::include_positional);
unrecognized_options.erase(unrecognized_options.begin());
po::store(
po::command_line_parser(unrecognized_options)
.options(all_compression_options)
.positional(positional_options)
.run(),
parsed_command_line_options
);
po::notify(parsed_command_line_options);
if (parsed_command_line_options.count("help")) {
print_json_to_ir_usage();
std::cerr << "Examples:\n";
std::cerr << " # Parse file1.json and dir1 into irs-dir\n";
std::cerr << " " << m_program_name << " r irs-dir file1.json dir1\n";
po::options_description visible_options;
visible_options.add(general_options);
visible_options.add(compression_options);
std::cerr << visible_options << '\n';
return ParsingResult::InfoCommand;
}
if (m_archives_dir.empty()) {
throw std::invalid_argument("No IRs directory specified.");
}
if (false == input_path_list_file_path.empty()) {
if (false == read_paths_from_file(input_path_list_file_path, input_paths)) {
SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path);
return ParsingResult::Failure;
}
}
for (auto const& path : input_paths) {
if (false == get_input_files_for_raw_path(path, m_input_paths)) {
throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path));
}
}
if (m_input_paths.empty()) {
throw std::invalid_argument("No input paths specified.");
}
if ((4 != m_encoding_type) && (8 != m_encoding_type)) {
SPDLOG_ERROR(
"Invalid encoding type specified; --encoding-type {}",
m_encoding_type
);
return ParsingResult::Failure;
}
constexpr size_t cMinBufferSize = 1024; // 1KB minimum
constexpr size_t cMinDocumentSize = 1024; // 1KB minimum
if (0 >= m_max_ir_buffer_size) {
SPDLOG_ERROR(
"Invalid max_ir_buffer_size specified; Buffer size must be greater than "
"zero; --max-ir-buffer-size {}",
m_max_ir_buffer_size
);
return ParsingResult::Failure;
} else if (m_max_ir_buffer_size < cMinBufferSize) {
SPDLOG_ERROR(
"Invalid max_ir_buffer_size specified; Buffer size must be at least {} bytes; "
"--max-ir-buffer-size {}",
cMinBufferSize,
m_max_ir_buffer_size
);
return ParsingResult::Failure;
}
if (0 >= m_max_document_size) {
SPDLOG_ERROR(
"Invalid max_document_size specified; Document size must be greater than "
"zero; --max-document-size {}",
m_max_document_size
);
return ParsingResult::Failure;
} else if (m_max_document_size < cMinDocumentSize) {
SPDLOG_ERROR(
"Invalid max_document_size specified; Document size must be at least {} bytes; "
"--max-document-size {}",
cMinDocumentSize,
m_max_document_size
);
return ParsingResult::Failure;
}
if ((1 > m_compression_level) || (9 < m_compression_level)) {
SPDLOG_ERROR(
"Invalid compression level specified; Compression level must be 1-9; "
"--compression-level {}",
m_compression_level
);
return ParsingResult::Failure;
}

for (auto& path : all_file_paths) {
bool success;
if (option.encoding == 4) {
success = run_serializer<int32_t>(option, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Kirk: we should only generate 4-byte encoding IR stream.

for (auto& path : all_file_paths) {
bool success;
if (option.encoding == 4) {
success = run_serializer<int32_t>(option, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using int32_t directly, we should use clp::ir::four_byte_encoded_variable_t defined here

) -> bool;

/**
* Given user specified options and a file path to a JSON file calls the serailizer one each JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Given user specified options and a file path to a JSON file calls the serailizer one each JSON
* Given user specified options and a file path to a JSON file calls the serializer one each JSON

Comment on lines +71 to +72
template <typename T>
auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T>
auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path);
template <typename encoded_variable_t>
[[nodiscard]] auto run_serializer(clp_s::JsonToIrParserOption const& option, std::string path) -> bool;

According to our guideline:

  • We should give a meaningful template parameter name instead of generic ones like T.
  • We should add [[nodiscard]] to any functions whose return value needs to be checked.
  • We should explicitly annotate the return type if it's deterministic.

return false;
}

std::string line = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string line = "";
std::string line;

We should rely on the default constructor if provided one

}

std::string line = "";
size_t total_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this variable?

}
total_size = total_size + serializer.get_ir_buf_view().size();
zc.write(
reinterpret_cast<char*>(const_cast<int8_t*>(serializer.get_ir_buf_view().data())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above

Comment on lines +263 to +265
std::vector<int8_t> ir_buf;
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<int8_t> ir_buf;
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
zc.write(reinterpret_cast<char*>(ir_buf.data()), ir_buf.size());
constexpr std::array<int8_t, 1> cEndOfStreamBuf{clp::ffi::ir_stream::cProtocol::Eof};
zc.write(
clp::size_checked_pointer_cast<char const>(cEndOfStreamBuf.data()),
cEndOfStreamBuf.size()
);

We can make it as a compile-time constant.


/**
* Given user specified options and a file path to a JSON file calls the serailizer one each JSON
* entry to serialize into IR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we should use @tparam to document template parameters.

std::string line = "";
size_t total_size = 0;

if (in_file.is_open()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to first check if in_file.is_open, and do the early exit if it's not.
This makes the code more readable since the major serialization logic has one fewer indentation level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants