Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json base64 parse doesn't do what it is supposed to #1326

Open
JustinPrivitera opened this issue Oct 9, 2024 · 13 comments
Open

json base64 parse doesn't do what it is supposed to #1326

JustinPrivitera opened this issue Oct 9, 2024 · 13 comments
Assignees
Milestone

Comments

@JustinPrivitera
Copy link
Member

std::string RC_CINEMA_WEB =  R"xyzxyz(

{
  "schema": 
  {
    "index.html": {"dtype":"char8_str","number_of_elements": 1,"offset": 0,"stride": 1,"element_bytes": 1,"endianness": "little"},
    "cvlib": {"dtype":"char8_str","number_of_elements": 1,"offset": 1,"stride": 1,"element_bytes": 1,"endianness": "little"}
  },
  "data": 
  {
    "base64": "ab"
  }
}
)xyzxyz";

conduit::Node res;
res.parse(RC_CINEMA_WEB,"conduit_base64_json");
std::cout << res.to_yaml() << std::endl;

gives you

index.html: "i�"
cvlib: "�"

Both should be a single character long. index.html is picking up the character from cvlib.

@cyrush
Copy link
Member

cyrush commented Oct 9, 2024

thanks for finding this reproducer.

@cyrush
Copy link
Member

cyrush commented Oct 9, 2024

How did you create this example? The string usually should include the null term char.

@JustinPrivitera
Copy link
Member Author

@JustinPrivitera
Copy link
Member Author

Here's a better example:

std::string RC_CINEMA_WEB =  R"xyzxyz(

{
  "schema": 
  {
    "index.html": {"dtype":"char8_str","number_of_elements": 1,"offset": 0,"stride": 1,"element_bytes": 1,"endianness": "little"},
    "cvlib": {"dtype":"char8_str","number_of_elements": 1,"offset": 1,"stride": 1,"element_bytes": 1,"endianness": "little"},
    "cvlib2": {"dtype":"char8_str","number_of_elements": 1,"offset": 2,"stride": 1,"element_bytes": 1,"endianness": "little"}
  },
  "data": 
  {
    "base64": "ab123"
  }
}
)xyzxyz";

conduit::Node res;
res.parse(RC_CINEMA_WEB,"conduit_base64_json");
std::cout << res.to_yaml() << std::endl;
index.html: "i�v"
cvlib: "�v"
cvlib2: "v"

@cyrush
Copy link
Member

cyrush commented Oct 9, 2024

Strings usually should include the null term char.
Can you try creating one with conduit?

@cyrush
Copy link
Member

cyrush commented Oct 9, 2024

When i see number of elements ==1, there maybe an expectation that is a null termed string.

Could be why our other cases are wrong, or it could mean this case is ill posed.

@JustinPrivitera
Copy link
Member Author

How do I put the null term char into base64 representation to test?

@JustinPrivitera
Copy link
Member Author

I would argue that conduit should only read number_of_elements elements instead of reading until it hits a null terminator

@JustinPrivitera
Copy link
Member Author

What is going on here?

const char *src_ptr = base64_str.c_str();
index_t encoded_len = (index_t) base64_str.length();
index_t dec_buff_size = utils::base64_decode_buffer_size(encoded_len);
// decode buffer
Node bb64_decode;
bb64_decode.set(DataType::char8_str(dec_buff_size));
char *decode_ptr = (char*)bb64_decode.data_ptr();
memset(decode_ptr,0,dec_buff_size);
utils::base64_decode(src_ptr,
encoded_len,
decode_ptr);
node->set(s,decode_ptr);

My thought is that dec_buff_size isn't actually the right size, because it doesn't take into account the schema. If we take into account the schema, we must take it into account while decoding the base64 string so that we read the correct pieces of it.

@JustinPrivitera
Copy link
Member Author

std::string RC_CINEMA_WEB =  R"xyzxyz(

{
  "schema": 
  {
    "index.html": {"dtype":"char8_str","number_of_elements": 1,"offset": 0,"stride": 1,"element_bytes": 1,"endianness": "little"},
    "cvlib": {"dtype":"char8_str","number_of_elements": 4,"offset": 4,"stride": 1,"element_bytes": 1,"endianness": "little"},
    "cvlib2": {"dtype":"char8_str","number_of_elements": 4,"offset": 8,"stride": 1,"element_bytes": 1,"endianness": "little"}
  },
  "data": 
  {
    "base64": "YWFhAGJiYgBjY2MA"
  }
}
)xyzxyz";
// Copyright (c) Lawrence Livermore National Security, LLC and other Conduit
// Project developers. See top-level LICENSE AND COPYRIGHT files for dates and
// other details. No copyright assignment is required to contribute to Conduit.

//-----------------------------------------------------------------------------
///
/// file: t_blueprint_smoke.cpp
///
//-----------------------------------------------------------------------------

#include "conduit.hpp"
#include "conduit_blueprint.hpp"
#include "ascent_resources_cinema_web2.hpp"

#include <iostream>
#include "gtest/gtest.h"

void
gogogo(conduit::Node res, int tabs)
{
    for (const auto & childname : res.child_names())
    {
        for (int i = 0; i < tabs; i ++)
        {
            std::cout << "  ";
        }
        std::cout << childname << ":" << std::endl;
        gogogo(res[childname], tabs + 1);
    }
}

//-----------------------------------------------------------------------------
TEST(blueprint_smoke, basic_use)
{
    std::cout << conduit::blueprint::about() << std::endl;
    conduit::Node res;
    res.parse(RC_CINEMA_WEB,"conduit_base64_json");
    std::cout << res.to_yaml() << std::endl;
    res.schema_ptr()->print();

    gogogo(res, 0);

    conduit::Node bungus;
    bungus["mystring1"] = "aaa";
    bungus["mystring2"] = "bbb";
    bungus["mystring3"] = "ccc";

    // std::cout << bungus.to_string("conduit_base64_json") << std::endl;
}

@cyrush
Copy link
Member

cyrush commented Dec 2, 2024

See:
Alpine-DAV/ascent#1425

I think the underlying issue is related to string null terms.

@JustinPrivitera
Copy link
Member Author

It just seems bugged to me that no matter what number_of_elements is, base64 parse will keep reading until it hits a null-terminator. And then I think you get a node that does not match its own schema.

@cyrush cyrush added this to the 0.9.3 milestone Jan 13, 2025
@cyrush
Copy link
Member

cyrush commented Jan 14, 2025

I tried to modify as_string() to use the leaf size and it cause some issues. So we will need to explore more.
I think there might be an issue with base64 dropping the null term, which conduit always includes in the string.

@cyrush cyrush modified the milestones: 0.9.3, 0.9.4 Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants