Store tag markup for serialization #136

peterzhu2118 · 2020-12-16T21:01:37Z

This PR stores tags encountered during parsing so we can use it during serialization. A block body has many (zero or more) tags and each tags also recursively has zero or one block body.

For example, consider this liquid code:

{% assign test = true %}
{% if test %}
Hello
{% else %}
Goodbye
{% endif %}

It would generate a structure that looks like the following:

Block body (body: "")
  |-> Tag markup (tag_name: "assign", markup: "test = true")
  |-> Tag markup (tag_name: "if", markup: "test")
        |-> Block body (body: "Hello")
              |-> Tag markup (tag_name: "else", markup: "")
                    |-> Block body (body: "Goodbye")

This PR also changes how tag objects (not to be confused with the tag markup) are stored. They are no longer pushed into the array of constants (since we don't want to serialize it) and instead written to a separate buffer that doesn't get serialized. Then during serialization, we serialize the tag markup, and upon deserialization we re-parse the tag markup into a tag object.

macournoyer

Unless I'm missing something the missing tags_ptr = in vm.c are necessary.

ext/liquid_c/tag_markup.h

ext/liquid_c/vm.c

ext/liquid_c/tag_markup.c

dylanahsmith

I pushed some suggested changes to a branch (pz-deserialize...deserialize-suggestions) based on top of #138 and linked to relevant commits in review comments

ext/liquid_c/document_body.c

dylanahsmith · 2020-12-18T22:40:05Z

ext/liquid_c/vm.c

@@ -361,7 +363,7 @@ static VALUE vm_render_until_error(VALUE uncast_args)
            }

            case OP_WRITE_NODE:
-                rb_funcall(cLiquidBlockBody, id_render_node, 3, vm->context.self, output, (VALUE)*const_ptr++);
+                rb_funcall(cLiquidBlockBody, id_render_node, 3, vm->context.self, output, *tags_ptr++);


Adding another moving constant pointer seems like a step in the wrong direction, since we are trying to get rid of these in order to not have to add extra complexity for control flow.

An alternative would be to combine these during deserialization into a single constant table. We could do that by marshal dumping the constant array with nil placeholders for the tag objects and serializing an index with the tag markup to use for assigning the tag object into the constant table on deserialization. However, this would be a lot of wasted effort if we wanted to have a separate tag table from a constant table when addressing #84.

Tags are used with a separate instruction, so I think a separate constant table might be appropriate, like how the liquid VM prototype uses a separate table for filters. E.g. this way we could have 255 tag write instructions with a 1-byte operation in a block. If so, this approach is fine for this PR, but I wanted to make sure we are aligned on the expected longer term approach.

Yeah I don't think the current way is the best way of implementing. I didn't store tags inside of the constant table because it added complexity (the need to iterate over the constants array and remove all the tags, and then during deserialization update the correct indexes in the constants array). I think in the future, we be storing indexes for constants/tags which would eliminate the need for both the const_ptr and the tags_ptr.

dylanahsmith · 2021-01-05T15:21:32Z

ext/liquid_c/vm_assembler.c

-void vm_assembler_add_write_node(vm_assembler_t *code, VALUE node)
+void vm_assembler_add_write_node(vm_assembler_t *code)
 {
    vm_assembler_write_opcode(code, OP_WRITE_NODE);
-    vm_assembler_write_ruby_constant(code, node);
 }


This function is no longer adding a complete instruction, unlike the other functions with a similar name. Should the tag objects be a part of the vm_assembler_t so that we can write it along with the instruction? Should it be given both the node and the node buffer so it can write to it here? Otherwise, we should just use vm_assembler_write_opcode directly in the caller to make it clear that it is writing parts of the instruction rather making it seems like it is using a function that is writing a whole instruction.

The vm_assembler_t is recycled after freezing the block and the tag objects live longer than that, so I don't think it can be stored there. The only place where this function is called is in block_body_add_node which calls this function (vm_assembler_add_write_node) and then writes the tag onto the buffer.

ext/liquid_c/vm_assembler.h

ext/liquid_c/document_body.c

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

peterzhu2118 requested a review from dylanahsmith December 16, 2020 21:01

peterzhu2118 mentioned this pull request Dec 16, 2020

Implement serialize #137

Open

dylanahsmith requested a review from macournoyer December 18, 2020 22:30

Base automatically changed from pz-raw-tag-tokenizer to master January 4, 2021 20:42

macournoyer requested changes Jan 5, 2021

View reviewed changes

ext/liquid_c/tag_markup.h Show resolved Hide resolved

ext/liquid_c/vm.c Show resolved Hide resolved

ext/liquid_c/vm.c Show resolved Hide resolved

ext/liquid_c/tag_markup.c Show resolved Hide resolved

macournoyer approved these changes Jan 5, 2021

View reviewed changes

peterzhu2118 force-pushed the pz-write-tags branch from a15e79f to 2621c37 Compare January 5, 2021 19:13

peterzhu2118 added 3 commits January 7, 2021 11:10

Store tag markup for serialization

1c693ba

Raise an error when child block body is not compiled

3f60dd7

Address comments

8968499

peterzhu2118 force-pushed the pz-write-tags branch from 2621c37 to 8968499 Compare January 7, 2021 16:10

dylanahsmith reviewed Jan 7, 2021

View reviewed changes

peterzhu2118 and others added 9 commits January 7, 2021 17:56

Remove unnecessary offset fields in tag_markup_header_t

d2cf89a

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Write directly into the buffer in document_body_write_tag_markup

4c030a9

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Remove unnecessary offset field in block_body_header_t

8adb792

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Write directly into the buffer in document_body_write_block_body

6c3f833

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Store first tag offset and next tag offsets

e8835c7

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Align tag markup headers

8245feb

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Rename tags to tag_markups on the vm_assembler_t to avoid confusion

87d9dfd

Co-Authored-By: Dylan Thacker-Smith <[email protected]>

Write line numbers in tag_markup

4f37858

Bind block to first tag only

bc1c705

dylanahsmith requested a review from ggmichaelgo October 21, 2021 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store tag markup for serialization #136

Store tag markup for serialization #136

peterzhu2118 commented Dec 16, 2020

macournoyer left a comment

dylanahsmith left a comment

dylanahsmith Dec 18, 2020

peterzhu2118 Jan 7, 2021

dylanahsmith Jan 5, 2021

peterzhu2118 Jan 7, 2021

Store tag markup for serialization #136

Are you sure you want to change the base?

Store tag markup for serialization #136

Conversation

peterzhu2118 commented Dec 16, 2020

macournoyer left a comment

Choose a reason for hiding this comment

dylanahsmith left a comment

Choose a reason for hiding this comment

dylanahsmith Dec 18, 2020

Choose a reason for hiding this comment

peterzhu2118 Jan 7, 2021

Choose a reason for hiding this comment

dylanahsmith Jan 5, 2021

Choose a reason for hiding this comment

peterzhu2118 Jan 7, 2021

Choose a reason for hiding this comment