Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework StringEncOptStack option to use globals #68

Merged

Conversation

antoniofrighetto
Copy link
Collaborator

We should never desire to have the decoded string variable live in a stack-allocated variable; as literals must remain valid throughout the lifetime of the application. This poses the question of why not using StringEncOptGlobal option in the first place. Leveraging StringEncOptStack option allows the strings to be decoded lazily; whereas, using StringEncOptGlobal as a default option would come to the detriment of performance, since all the strings are decoded at load-time. Hence, decode the string in a global variable locally within the function, and do not decode it twice, if it has already been decoded.
Minor opportunity to clean up things here and there too.

src/passes/string-encoding/StringEncoding.cpp Show resolved Hide resolved
%buffer = alloca [13 x i8], align 1
call void @llvm.lifetime.start.p0(i64 13, ptr %buffer)
call void @llvm.memcpy.p0.p0.i64(ptr align 1 %buffer, ptr align 1 @__const.main.Hello, i64 13, i1 false)
%puts = call i32 @puts(ptr %buffer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test representative? In particular, that there is a buffer existing inline in the original IR?

I had an example like void test() { puts("Hello, Stack"); } in mind, that results in IR like this:

@.str = private unnamed_addr constant [13 x i8] c"Hello, Stack\00", align 1

define void @test() #0 {
  %1 = call i32 @puts(ptr noundef @.str)
  ret void
}

Would the pass apply here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tests are fine, the one I added uses an array buffer, yours uses a string literal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the difference is that the string literal is not a writable location. But ok, I see now that test is just exercising a more complex case. The string literal is the (read-only) input and the array buffer is basically unrelated: The alloca could be optimized away and %5 passed to @puts directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literal is not in a writeable location at runtime, the purpose of the pass is exactly to encode constant strings. What I meant above, is that both examples use a constant global variable, which is what we are testing here. The one I added simply involves an extra array buffer instead of directly passing the literal to puts. I agree it's slightly more complex, so I'm removing it and using directly the literal.

@antoniofrighetto antoniofrighetto force-pushed the feature/stackopt-with-gvars branch from 2e07cd3 to 312e45f Compare January 20, 2025 08:47
Copy link
Collaborator

@weliveindetail weliveindetail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. With the refactors stripped away it's more obvious now, that the code always allocates a global writable ClearBuffer as the decode destination. Before the cleanup I didn't see that this is generating the %5 in your test.

I think it's always worth having a test with minimal complexity (as well). It helps understanding things :) Otherwise LGTM

%buffer = alloca [13 x i8], align 1
call void @llvm.lifetime.start.p0(i64 13, ptr %buffer)
call void @llvm.memcpy.p0.p0.i64(ptr align 1 %buffer, ptr align 1 @__const.main.Hello, i64 13, i1 false)
%puts = call i32 @puts(ptr %buffer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the difference is that the string literal is not a writable location. But ok, I see now that test is just exercising a more complex case. The string literal is the (read-only) input and the array buffer is basically unrelated: The alloca could be optimized away and %5 passed to @puts directly.

We should never desire to have the decoded string variable live in a
stack-allocated variable; as literals must remain valid throughout
the lifetime of the application. This poses the question of why not
using `StringEncOptGlobal` option in the first place. Leveraging
`StringEncOptStack` option allows the strings to be decoded lazily;
whereas, using `StringEncOptGlobal` as a default option would come
to the detriment of performance, since all the strings are decoded
at load-time. Hence, decode the string in a global variable locally
within the function, and do not decode it twice, if it has already
been decoded.
@antoniofrighetto antoniofrighetto force-pushed the feature/stackopt-with-gvars branch from 312e45f to 2ec6f40 Compare January 22, 2025 17:04
@antoniofrighetto antoniofrighetto merged commit 2ec6f40 into open-obfuscator:main Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants