Possible early free of input in generated C++ Parser #377

rkesters · 2024-12-11T20:24:56Z

I'm getting a memory corruption of the name data for the Input instance in the following test:

TEST(Bugs, bad_input_after_good_string_FAILS) {
  DDL::Input i = DDL::Input("test", R"x("1")x");
  EXPECT_GT(i.length(), 0);
  DDL::ParseError<DDL::Input> err;
  std::vector<DDL::ResultOf::parseMain> out;
 
  std::ostringstream oss;
  parseMain(err, out, i);
  ASSERT_EQ(out.size(), 1);
 
  oss << i.getName();
  ASSERT_EQ(oss.str(), R"x("test")x");
}

If i.getName() is called before parseMain, then the above test will pass.

After some digging around in the debugger, one of the places the input is freed is:

void parseMain ( DDL::ParseError<DDL::Input> &error
               , std::vector<User::Value> &results
               , DDL::Input a1
               ) {
  DDL::ParserState<DDL::Input> p;
  User::Value out_result;
  DDL::Input out_input;
  if (parser_Main(p, &out_result, &out_input, a1)) {
    results.push_back(out_result);
    out_input.free();
  } else {
    error = p.getParseError();
  }
}

The text was updated successfully, but these errors were encountered:

yav · 2024-12-11T22:17:01Z

The issue here is that the generated parsers "own" their arguments, in particular they own the input. So when calling parserMain we are transferring ownership of i to the the parser, and we should not use i anymore after.

If you do want to use i after giving it to the parser, you need to make a copy (i.e., increment the reference count) before passing it to the parser. This is done by calling the copy method: i.copy(). If you do that, you should also call i.free() at the end of the function to decrement the reference count.

In general, this fine grained control over when things are copied/deallocated is very convenient for generating code, but is quite error prone for using manually from C++. To make it a bit more convenient, we have a wrapper class called Owned which defines the usual C++ operators that will do the increment/decrement of the object.

It is defined here:
https://github.com/GaloisInc/daedalus/blob/master/rts-c/ddl/owned.h

To use it in this particular example, one would change the first line to:

DDL::Owend<DDL::Input> i = DDL::Input("test", R"x("1")x");

Then when passing it to the parser, you need to use i.get() to get a new "owned" copy to give the parser.
Finally if you want to print the name, you can change i.getName() to i->getName().

In short get gives you a pointer to the underlying object, but first increments the reference count, while -> just gives you a pointer to the underlying object without incrementing the reference count and as such should not be stored in things.

yav · 2024-12-11T22:25:54Z

We should write some documentation about this in the reference manual.

Also, the main.cpp we generate for testing purposes, contains a similar error where it accesses the size of the file after having passed it to the parser. This works on a technicality, but the code should be updated to not do that.

rkesters · 2024-12-12T15:34:47Z

Agree that documentation should be improved on this matter. It is not the generally expected behavior in C/C++ that function will take ownership of an argument in this manner (normally if the caller created the memory then the caller free's the memory).

One suggestion is to not take ownership and instead, have the parserMain immediately call copy on i. then parserMain could free is as it does, without unexpected (and undocumented) side affects.

yav · 2024-12-12T17:50:46Z

The reason the top-level parsers take their arguments as owned is so they can be de-allocated as soon as the parser is done with them, rather than having to hold on to them until parsing is complete. For normal inputs this doesn't matter hugely as you can't deallocate the input in parts, however, this is quite important for the streaming parsers, where chunks of input arrive while parsing. In that case it is nice that the parser can let go of earlier chunks of input that it knows it does not need anymore, and is essential if you want to support "infinite" or long running streams.

I do agree that the behavior is quite surprising though, so at the very least it should be documented. Note that, at present, the same sort of memory management scheme is used everywhere in Daedalus, and you need the same sort of things with the results of the parser too.

I haven't thought about it very deeply, but perhaps a better plan would be to simply use Owned more in the user facing interface of the parser. We'd have to wrap the inputs and results of the parser in Owned, but that's pretty easy. The main big change is that we'd have to modify the Daedalus standard types, and the types generated by the compiler, to have additional accessor methods that return Owned values in additional to all the current accessors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible early free of input in generated C++ Parser #377

Possible early free of input in generated C++ Parser #377

rkesters commented Dec 11, 2024

yav commented Dec 11, 2024

yav commented Dec 11, 2024

rkesters commented Dec 12, 2024

yav commented Dec 12, 2024

Possible early free of input in generated C++ Parser #377

Possible early free of input in generated C++ Parser #377

Comments

rkesters commented Dec 11, 2024

yav commented Dec 11, 2024

yav commented Dec 11, 2024

rkesters commented Dec 12, 2024

yav commented Dec 12, 2024