Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wazevo(frontend): simple bounds check elimination on mem access #1883

Merged
merged 2 commits into from
Dec 20, 2023

Conversation

mathetake
Copy link
Member

@mathetake mathetake commented Dec 20, 2023

This patch changes the lowering of memory access and introduces a very
simple bounds check elimination which is performed per Basic Block.

For example, the following assembly, which is extracted from
Zig's memset implementation without bulk-memory feature,

      loop ;; label = @2
        local.get 4
        local.get 1
        i32.store8 offset=7
        local.get 4
        local.get 1
        i32.store8 offset=6
        local.get 4
        local.get 1
        i32.store8 offset=5
        local.get 4
        local.get 1
        i32.store8 offset=4
        local.get 4
        local.get 1
        i32.store8 offset=3
        local.get 4
        local.get 1
        i32.store8 offset=2
        local.get 4
        local.get 1
        i32.store8 offset=1
        local.get 4
        local.get 1
...
      end

needs the boundary check only at the first i32.store8 inside the loop because
the access range can be statically known to be lower than the previous ones.

In short, with this patch the frontend compiler caches the maximum of such
"already checked memory bounds" and uses them to optimize out the entire
bounds check sequences.

The example is now lowered like

	v37:i64 = Iconst_64 0x8
	v38:i64 = UExtend v35, 32->64
	v39:i64 = Uload32 module_ctx, 0x10
	v40:i64 = Iadd v38, v37
	v41:i32 = Icmp lt_u, v39, v40
	ExitIfTrue v41, exec_ctx, memory_out_of_bounds
	v42:i64 = Load module_ctx, 0x8
	v43:i64 = Iadd v42, v38
	Istore8 v36, v43, 0x7
	Istore8 v36, v43, 0x6
	Istore8 v36, v43, 0x5
	Istore8 v36, v43, 0x4
	Istore8 v36, v43, 0x3
	Istore8 v36, v43, 0x2
	Istore8 v36, v43, 0x1
	Istore8 v36, v43, 0x0

vs previously

	v37:i64 = Iconst_64 0x8
	v38:i64 = UExtend v35, 32->64
	v39:i64 = Uload32 module_ctx, 0x10
	v40:i64 = Iadd v38, v37
	v41:i32 = Icmp lt_u, v39, v40
	ExitIfTrue v41, exec_ctx, memory_out_of_bounds
	v42:i64 = Load module_ctx, 0x8
	v43:i64 = Iadd v42, v38
	Istore8 v36, v43, 0x7
	v44:i64 = Iconst_64 0x7
	v45:i64 = UExtend v35, 32->64
	v46:i64 = Iadd v45, v44
	v47:i32 = Icmp lt_u, v39, v46
	ExitIfTrue v47, exec_ctx, memory_out_of_bounds
	v48:i64 = Iadd v42, v45
	Istore8 v36, v48, 0x6
	v49:i64 = Iconst_64 0x6
	v50:i64 = UExtend v35, 32->64
	v51:i64 = Iadd v50, v49
	v52:i32 = Icmp lt_u, v39, v51
	ExitIfTrue v52, exec_ctx, memory_out_of_bounds
	v53:i64 = Iadd v42, v50
	Istore8 v36, v53, 0x5
	v54:i64 = Iconst_64 0x5
	v55:i64 = UExtend v35, 32->64
	v56:i64 = Iadd v55, v54
	v57:i32 = Icmp lt_u, v39, v56
	ExitIfTrue v57, exec_ctx, memory_out_of_bounds
	v58:i64 = Iadd v42, v55
	Istore8 v36, v58, 0x4
	v59:i64 = Iconst_64 0x4
	v60:i64 = UExtend v35, 32->64
	v61:i64 = Iadd v60, v59
	v62:i32 = Icmp lt_u, v39, v61
	ExitIfTrue v62, exec_ctx, memory_out_of_bounds
	v63:i64 = Iadd v42, v60
	Istore8 v36, v63, 0x3
	v64:i64 = Iconst_64 0x3
	v65:i64 = UExtend v35, 32->64
	v66:i64 = Iadd v65, v64
	v67:i32 = Icmp lt_u, v39, v66
	ExitIfTrue v67, exec_ctx, memory_out_of_bounds
	v68:i64 = Iadd v42, v65
	Istore8 v36, v68, 0x2
	v69:i64 = Iconst_64 0x2
	v70:i64 = UExtend v35, 32->64
	v71:i64 = Iadd v70, v69
	v72:i32 = Icmp lt_u, v39, v71
	ExitIfTrue v72, exec_ctx, memory_out_of_bounds
	v73:i64 = Iadd v42, v70
	Istore8 v36, v73, 0x1
	v74:i64 = Iconst_64 0x1
	v75:i64 = UExtend v35, 32->64
	v76:i64 = Iadd v75, v74
	v77:i32 = Icmp lt_u, v39, v76
	ExitIfTrue v77, exec_ctx, memory_out_of_bounds
	v78:i64 = Iadd v42, v75
	Istore8 v36, v78, 0x0

As a result, running the entire Zig stdlib gets 1.5x faster and the resulting binary
was reduced from 80BM to 65MB. Also coremark benchmark score improved
from 12843.565 to 13535.463 on my local run.

As a future work, we can expand this beyond the per-block and make it CFG-aware
to more aggressively eliminate the bounds check. But this simple one has already
improved enough the baseline!

Signed-off-by: Takeshi Yoneda <[email protected]>
Copy link
Contributor

@evacchi evacchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@mathetake mathetake merged commit fa2b2fc into main Dec 20, 2023
55 checks passed
@mathetake mathetake deleted the safeboundelimination branch December 20, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants