Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for macros to IonRawBinaryWriter_1_1 #668

Merged
merged 7 commits into from
Dec 15, 2023

Conversation

popematt
Copy link
Contributor

@popematt popematt commented Dec 12, 2023

Issue #, if available:

None

Description of changes:

Updates the signature of stepInEExp() and implements it.
There are a couple of incidental changes that I've called out with a 🗺️ comment.

Potentially controversial... this PR uses the end delimiter to signal the end of a macro's args if that macro has rest-parameter in it's signature.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -55,7 +55,7 @@ class IonRawBinaryWriter_1_1 internal constructor(
/**
* Clears this [ContainerInfo] of old data and initializes it with the given new data.
*/
fun reset(type: ContainerType? = null, position: Long = -1, isDelimited: Boolean = false) {
fun reset(type: ContainerType, position: Long, isDelimited: Boolean = false) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Incidental change—I realize that the default values are always supplied for both type and position, so I removed the default values.

@@ -448,6 +467,9 @@ class IonRawBinaryWriter_1_1 internal constructor(
List, SExp, Struct -> {
val contentLength = currentContainer.length
if (contentLength <= 0xF && !currentContainer.usesFlexSym) {
// TODO: Right now, this is skipped if we switch to FlexSym after starting a struct
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This is incidental. I was aware of this when I implemented structs, but I had since forgotten it, and then I rediscovered this while I was writing tests for E-Exps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a little more detail here? What are the implications? That we can't reclaim space for FlexSym structs? Why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario is when we have a struct that starts out as a SID struct, has at least one SID field before switching to FlexSyms, and has a total length of less than 16 bytes. The consequence of this limitation is that we're wasting a byte. E.g.:

FC             | Variable length SID Struct     ]____ Could be replaced with
17             | Length = 11                    ]     just `CB` to save a byte.
81             | SID 64
5E             | true
01             | switch to FlexSym encoding
FB 66 6F 6F    | FlexSym 'foo'
5E             | true
02 01          | FlexSym SID 64
5E             | true

Ideally, we could write this starting with CB but we don't store enough state right now to track whether it's safe to do that. We only know that the struct is currently using FlexSyms, which means that we don't know whether the OpCode at the start is FC or FD. If it's FC, then it's safe to collapse FC 17 to CB (as in the example above). If it actually started with FD, then going to CB would cause the struct to be malformed.

It's not difficult to solve this—it just requires a way to check how the struct started out and a few more branches for the different scenarios—but I didn't really want to address it while the struct encoding is still in a bit of flux.

Comment on lines +532 to +562
// Instead of allocating iterator, we share one iterator instance within the scope of the container stack and
// reset the cursor every time we track back to the ancestors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Incidental formatting change so that the line isn't ridiculously long.

@popematt popematt requested a review from tgregg December 12, 2023 01:09
@@ -1044,12 +1044,12 @@ private void patchSingleByteTypedOptimisticValue(final byte type, final Containe
if (info.length <= 0xD)
{
// we fit -- overwrite the type byte
buffer.writeUInt8At(info.position - 1, type | info.length);
buffer.writeByteAt(info.position - 1, type | info.length);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The writeUInt8At method got renamed to writeByteAt because I found it misleading to use writeUInt8 to write FlexInt.ZERO. I checked all usages of the method, and the new name writeByteAt works for all of them. None of the existing usages were specifically trying to write an 8-bit integer; rather they were writing either a sentinel (such as an op code) or 8 bits of a larger value.

Comment on lines +103 to +105
while (this.index != index) {
blocks.remove(this.index--);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This fixes a bug in WriteBuffer that could potentially lead to corrupted data. Basically, if we change the index without either removing or resetting the blocks higher than index, then when we start writing again, and we get to one of those blocks, we wouldn't have started at position 0. For example:

| Block 0 | Block 1 |
|=========|=========|
|         |         |  (1) write "ABCDEF" to the buffer
| A B C D | E F _ _ |
|         |         |  (2) truncate to position 3
| A B C _ | E F _ _ |  <-- Notice that while block 0 is reset to the correct position, there 
|         |         |      was no clean up done in block 1
|         |         |      
|         |         |  (3) write "123" to the buffer
| A B C 1 | E F 2 3 |  !!! Block 1 resumed at offset 2, so we still 
|         |         |      have the "EF" that we tried to truncate earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long has this bug existed? Does it require patching in master and/or previous versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's been there all along, but I haven't looked into the potential impact. Since it's been there all along, it's even possible that the current implementation has a way to avoid the buggy behavior by cleaning the blocks at some other time.

@@ -1300,7 +1301,11 @@ public void writeVarUIntDirect2At(long position, long value)
block.data[offset + 1] = (byte) ((value & VAR_INT_MASK) | VAR_INT_FINAL_OCTET_SIGNAL_MASK);
}

public void writeUInt8At(final long position, final long value)
public void writeByteAt(final long position, final byte value) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Convenience overload. If it's a perf problem we can remove it, but I bet the JIT is smart enough to optimize this out.

@@ -448,6 +467,9 @@ class IonRawBinaryWriter_1_1 internal constructor(
List, SExp, Struct -> {
val contentLength = currentContainer.length
if (contentLength <= 0xF && !currentContainer.usesFlexSym) {
// TODO: Right now, this is skipped if we switch to FlexSym after starting a struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a little more detail here? What are the implications? That we can't reclaim space for FlexSym structs? Why not?

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (ion-11-encoding@0255c1a). Click here to learn what that means.

Additional details and impacted files
@@                Coverage Diff                 @@
##             ion-11-encoding     #668   +/-   ##
==================================================
  Coverage                   ?   65.05%           
  Complexity                 ?     5828           
==================================================
  Files                      ?      200           
  Lines                      ?    25029           
  Branches                   ?     4468           
==================================================
  Hits                       ?    16282           
  Misses                     ?     7433           
  Partials                   ?     1314           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@popematt popematt merged commit b3417e2 into amazon-ion:ion-11-encoding Dec 15, 2023
8 of 32 checks passed
@popematt popematt deleted the ion11-macros branch December 15, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants