Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visit::alloc is unsound #8461

Open
overlookmotel opened this issue Jan 13, 2025 · 1 comment
Open

Visit::alloc is unsound #8461

overlookmotel opened this issue Jan 13, 2025 · 1 comment
Labels
A-ast Area - AST A-semantic Area - Semantic C-bug Category - Bug

Comments

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 13, 2025

#8437 surfaced unsoundness in SemanticBuilder - a use after free bug.

@bluurryy also hit this previously while working on #6668 (discussed on Discord).

The root cause of that problem is the lifetime extension in Visit::alloc (which all the walk_* methods call to create AstKinds).

fn alloc<T>(&self, t: &T) -> &'a T {
// SAFETY:
// This should be safe as long as `src` is an reference from the allocator.
// But honestly, I'm not really sure if this is safe.
unsafe { std::mem::transmute(t) }
}

The problem

Looking at how the lifetime extension causes #8437:

SemanticBuilder::build takes a &Program<'a> (unspecified lifetime on the & borrow of Program) and returns SemanticBuilderReturn<'a>:

pub fn build(mut self, program: &Program<'a>) -> SemanticBuilderReturn<'a> {

Tracing the effect of this through the different types:

  • SemanticBuilderReturn<'a> contains Semantic<'a>.
  • Semantic<'a> contains AstNodes<'a>.
  • AstNodes<'a> contains a Vec of AstNode<'a>.
  • AstNode<'a> contains AstKind<'a>.
  • AstKind<'a> contains &'a refs to AST nodes (including &'a Program<'a>).

pub struct SemanticBuilderReturn<'a> {
pub semantic: Semantic<'a>,
pub errors: Vec<OxcDiagnostic>,
}

pub struct Semantic<'a> {
/// Source code of the JavaScript/TypeScript program being analyzed.
source_text: &'a str,
/// What kind of source code is being analyzed. Comes from the parser.
source_type: SourceType,
/// The Abstract Syntax Tree (AST) nodes.
nodes: AstNodes<'a>,

pub struct AstNodes<'a> {
/// The root node should always point to a `Program`, which is the real
/// root of the tree. It isn't possible to statically check for this, so
/// users should beware.
root: Option<NodeId>,
nodes: IndexVec<NodeId, AstNode<'a>>,
/// `node` -> `parent`
parent_ids: IndexVec<NodeId, Option<NodeId>>,
}

pub struct AstNode<'a> {
id: NodeId,
/// A pointer to the ast node, which resides in the `bumpalo` memory arena.
kind: AstKind<'a>,

pub enum AstKind<'a> {
BooleanLiteral(&'a BooleanLiteral),
NullLiteral(&'a NullLiteral),
NumericLiteral(&'a NumericLiteral<'a>),
StringLiteral(&'a StringLiteral<'a>),
BigIntLiteral(&'a BigIntLiteral<'a>),
RegExpLiteral(&'a RegExpLiteral<'a>),
Program(&'a Program<'a>),
IdentifierName(&'a IdentifierName<'a>),
IdentifierReference(&'a IdentifierReference<'a>),
BindingIdentifier(&'a BindingIdentifier<'a>),
LabelIdentifier(&'a LabelIdentifier<'a>),
ThisExpression(&'a ThisExpression),
ArrayExpression(&'a ArrayExpression<'a>),
ArrayExpressionElement(&'a ArrayExpressionElement<'a>),
Elision(&'a Elision),
ObjectExpression(&'a ObjectExpression<'a>),
ObjectProperty(&'a ObjectProperty<'a>),
PropertyKey(&'a PropertyKey<'a>),
TemplateLiteral(&'a TemplateLiteral<'a>),
TaggedTemplateExpression(&'a TaggedTemplateExpression<'a>),
MemberExpression(&'a MemberExpression<'a>),
CallExpression(&'a CallExpression<'a>),
NewExpression(&'a NewExpression<'a>),
MetaProperty(&'a MetaProperty<'a>),
SpreadElement(&'a SpreadElement<'a>),
Argument(&'a Argument<'a>),
UpdateExpression(&'a UpdateExpression<'a>),
UnaryExpression(&'a UnaryExpression<'a>),
BinaryExpression(&'a BinaryExpression<'a>),
PrivateInExpression(&'a PrivateInExpression<'a>),
LogicalExpression(&'a LogicalExpression<'a>),
ConditionalExpression(&'a ConditionalExpression<'a>),
AssignmentExpression(&'a AssignmentExpression<'a>),
AssignmentTarget(&'a AssignmentTarget<'a>),
SimpleAssignmentTarget(&'a SimpleAssignmentTarget<'a>),
AssignmentTargetPattern(&'a AssignmentTargetPattern<'a>),
ArrayAssignmentTarget(&'a ArrayAssignmentTarget<'a>),
ObjectAssignmentTarget(&'a ObjectAssignmentTarget<'a>),
AssignmentTargetWithDefault(&'a AssignmentTargetWithDefault<'a>),
SequenceExpression(&'a SequenceExpression<'a>),
Super(&'a Super),
AwaitExpression(&'a AwaitExpression<'a>),
ChainExpression(&'a ChainExpression<'a>),
ParenthesizedExpression(&'a ParenthesizedExpression<'a>),
Directive(&'a Directive<'a>),
Hashbang(&'a Hashbang<'a>),
BlockStatement(&'a BlockStatement<'a>),
VariableDeclaration(&'a VariableDeclaration<'a>),
VariableDeclarator(&'a VariableDeclarator<'a>),
EmptyStatement(&'a EmptyStatement),
ExpressionStatement(&'a ExpressionStatement<'a>),
IfStatement(&'a IfStatement<'a>),
DoWhileStatement(&'a DoWhileStatement<'a>),
WhileStatement(&'a WhileStatement<'a>),
ForStatement(&'a ForStatement<'a>),
ForStatementInit(&'a ForStatementInit<'a>),
ForInStatement(&'a ForInStatement<'a>),
ForOfStatement(&'a ForOfStatement<'a>),
ContinueStatement(&'a ContinueStatement<'a>),
BreakStatement(&'a BreakStatement<'a>),
ReturnStatement(&'a ReturnStatement<'a>),
WithStatement(&'a WithStatement<'a>),
SwitchStatement(&'a SwitchStatement<'a>),
SwitchCase(&'a SwitchCase<'a>),
LabeledStatement(&'a LabeledStatement<'a>),
ThrowStatement(&'a ThrowStatement<'a>),
TryStatement(&'a TryStatement<'a>),
CatchClause(&'a CatchClause<'a>),
CatchParameter(&'a CatchParameter<'a>),
DebuggerStatement(&'a DebuggerStatement),
AssignmentPattern(&'a AssignmentPattern<'a>),
ObjectPattern(&'a ObjectPattern<'a>),
ArrayPattern(&'a ArrayPattern<'a>),
BindingRestElement(&'a BindingRestElement<'a>),
Function(&'a Function<'a>),
FormalParameters(&'a FormalParameters<'a>),
FormalParameter(&'a FormalParameter<'a>),
FunctionBody(&'a FunctionBody<'a>),
ArrowFunctionExpression(&'a ArrowFunctionExpression<'a>),
YieldExpression(&'a YieldExpression<'a>),
Class(&'a Class<'a>),
ClassBody(&'a ClassBody<'a>),
MethodDefinition(&'a MethodDefinition<'a>),
PropertyDefinition(&'a PropertyDefinition<'a>),
PrivateIdentifier(&'a PrivateIdentifier<'a>),
StaticBlock(&'a StaticBlock<'a>),
ModuleDeclaration(&'a ModuleDeclaration<'a>),
ImportExpression(&'a ImportExpression<'a>),
ImportDeclaration(&'a ImportDeclaration<'a>),
ImportSpecifier(&'a ImportSpecifier<'a>),
ImportDefaultSpecifier(&'a ImportDefaultSpecifier<'a>),
ImportNamespaceSpecifier(&'a ImportNamespaceSpecifier<'a>),
ExportNamedDeclaration(&'a ExportNamedDeclaration<'a>),
ExportDefaultDeclaration(&'a ExportDefaultDeclaration<'a>),
ExportAllDeclaration(&'a ExportAllDeclaration<'a>),
ExportSpecifier(&'a ExportSpecifier<'a>),
TSThisParameter(&'a TSThisParameter<'a>),
TSEnumDeclaration(&'a TSEnumDeclaration<'a>),
TSEnumMember(&'a TSEnumMember<'a>),
TSTypeAnnotation(&'a TSTypeAnnotation<'a>),
TSLiteralType(&'a TSLiteralType<'a>),
TSConditionalType(&'a TSConditionalType<'a>),
TSUnionType(&'a TSUnionType<'a>),
TSIntersectionType(&'a TSIntersectionType<'a>),
TSParenthesizedType(&'a TSParenthesizedType<'a>),
TSIndexedAccessType(&'a TSIndexedAccessType<'a>),
TSNamedTupleMember(&'a TSNamedTupleMember<'a>),
TSAnyKeyword(&'a TSAnyKeyword),
TSStringKeyword(&'a TSStringKeyword),
TSBooleanKeyword(&'a TSBooleanKeyword),
TSNumberKeyword(&'a TSNumberKeyword),
TSNeverKeyword(&'a TSNeverKeyword),
TSIntrinsicKeyword(&'a TSIntrinsicKeyword),
TSUnknownKeyword(&'a TSUnknownKeyword),
TSNullKeyword(&'a TSNullKeyword),
TSUndefinedKeyword(&'a TSUndefinedKeyword),
TSVoidKeyword(&'a TSVoidKeyword),
TSSymbolKeyword(&'a TSSymbolKeyword),
TSThisType(&'a TSThisType),
TSObjectKeyword(&'a TSObjectKeyword),
TSBigIntKeyword(&'a TSBigIntKeyword),
TSTypeReference(&'a TSTypeReference<'a>),
TSTypeName(&'a TSTypeName<'a>),
TSQualifiedName(&'a TSQualifiedName<'a>),
TSTypeParameterInstantiation(&'a TSTypeParameterInstantiation<'a>),
TSTypeParameter(&'a TSTypeParameter<'a>),
TSTypeParameterDeclaration(&'a TSTypeParameterDeclaration<'a>),
TSTypeAliasDeclaration(&'a TSTypeAliasDeclaration<'a>),
TSClassImplements(&'a TSClassImplements<'a>),
TSInterfaceDeclaration(&'a TSInterfaceDeclaration<'a>),
TSPropertySignature(&'a TSPropertySignature<'a>),
TSMethodSignature(&'a TSMethodSignature<'a>),
TSConstructSignatureDeclaration(&'a TSConstructSignatureDeclaration<'a>),
TSInterfaceHeritage(&'a TSInterfaceHeritage<'a>),
TSModuleDeclaration(&'a TSModuleDeclaration<'a>),
TSModuleBlock(&'a TSModuleBlock<'a>),
TSTypeLiteral(&'a TSTypeLiteral<'a>),
TSInferType(&'a TSInferType<'a>),
TSTypeQuery(&'a TSTypeQuery<'a>),
TSImportType(&'a TSImportType<'a>),
TSMappedType(&'a TSMappedType<'a>),
TSTemplateLiteralType(&'a TSTemplateLiteralType<'a>),
TSAsExpression(&'a TSAsExpression<'a>),
TSSatisfiesExpression(&'a TSSatisfiesExpression<'a>),
TSTypeAssertion(&'a TSTypeAssertion<'a>),
TSImportEqualsDeclaration(&'a TSImportEqualsDeclaration<'a>),
TSModuleReference(&'a TSModuleReference<'a>),
TSExternalModuleReference(&'a TSExternalModuleReference<'a>),
TSNonNullExpression(&'a TSNonNullExpression<'a>),
Decorator(&'a Decorator<'a>),
TSExportAssignment(&'a TSExportAssignment<'a>),
TSInstantiationExpression(&'a TSInstantiationExpression<'a>),
JSXElement(&'a JSXElement<'a>),
JSXOpeningElement(&'a JSXOpeningElement<'a>),
JSXClosingElement(&'a JSXClosingElement<'a>),
JSXFragment(&'a JSXFragment<'a>),
JSXElementName(&'a JSXElementName<'a>),
JSXNamespacedName(&'a JSXNamespacedName<'a>),
JSXMemberExpression(&'a JSXMemberExpression<'a>),
JSXMemberExpressionObject(&'a JSXMemberExpressionObject<'a>),
JSXExpressionContainer(&'a JSXExpressionContainer<'a>),
JSXAttributeItem(&'a JSXAttributeItem<'a>),
JSXSpreadAttribute(&'a JSXSpreadAttribute<'a>),
JSXIdentifier(&'a JSXIdentifier<'a>),
JSXText(&'a JSXText<'a>),
}

So the end result is that SemanticBuilder::build borrows the Program only for the duration of build (i.e. the borrow ends when build returns). But it's returning SemanticBuilderReturn<'a> which ultimately contains a &'a Program<'a>. i.e. the &Program in input params only guarantees the Program lives for 'very_short_time but then it returns a reference &'a Program which it claims will live as long as the allocator.

So nothing stops you dropping the Program while retaining a reference to it = use after free.

Solution for SemanticBuilder

In the case of SemanticBuilder, I think the fix is fairly simple. We can make SemanticBuilder::build take a &'a Program<'a>. This makes the lifetime extension in Visit::alloc valid, and removes the unsoundness.

See #8437 (comment).

Broader solution

The problem is wider than just SemanticBuilder, though. Visit::alloc is, in general, unsound. And that means Visit::enter_node and Visit::leave_node are unsound too.

The question is how to fix it in a way that's ergonomic.

The "correct" approach is to add a 2nd lifetime to AstKind:

/// `'a` is lifetime of the AST nodes.
/// `'r` is lifetime of the *references* to the AST nodes.
enum AstKind<'a, 'r> {
    BooleanLiteral(&'r BooleanLiteral),
    NullLiteral(&'r NullLiteral),
    NumericLiteral(&'r NumericLiteral<'a>),
    // ...
}

But then this probably also requires a 2nd lifetime on Visit:

pub trait Visit<'a, 'r>: Sized {
    fn enter_node(&mut self, kind: AstKind<'a, 'r>) {}
    fn leave_node(&mut self, kind: AstKind<'a, 'r>) {}
    // ...
}

or maybe we can avoid that by putting the lifetime on Visit::enter_node instead:

pub trait Visit<'a>: Sized {
    fn enter_node<'r>(&'r mut self, kind: AstKind<'a, 'r>) {}
    fn leave_node<'r>(&'r mut self, kind: AstKind<'a, 'r>) {}
    // ...
}

Either way, this is not at all ideal. In the linter (main user of AstKind), the current single-lifetime AstKind<'a> is fine, as linter immutably borrows the whole AST for the duration of it's operation, and introducing a 2nd lifetime complicates all that code for no gain.

Adding a 2nd lifetime to Visit would be really annoying. We use Visit in a lot of places, and most of them don't use AstKind or enter_node / leave_node so, again, it's a complication for no gain.

I'm not entirely sure of solution, but maybe we could:

  • Make AstKind a linter-only thing.
  • Remove Visit::enter_node + leave_node. Or make them receive an AstType instead of AstKind, like VisitMut::enter_node does - AstType has no lifetime at all.
  • SemanticBuilder can create AstKinds itself, and handle lifetimes correctly internally.

In any case, SemanticBuilder should not be using enter_node / leave_node because it hurts performance (oxc-project/backlog#72). But that refactor will be a fairly large task.

Battle plan

I suggest:

  • Solve ast: Out of bounds memory access crash in AST Vititor #8437 first with simple fix of making SemanticBuilder::build take a &'a Program<'a>.
  • Tackle the broader problem later.
  • In meantime, try to remove use of enter_node from the codebase as much as possible (it's not used much, and most usages outside of SemanticBuilder can be removed with simple refactoring).
@overlookmotel
Copy link
Contributor Author

I think I may have a solution.

  • Visit does need a 2nd lifetime.
  • So does AstKind.
  • But we can rename AstKind to AstKindRef, and have AstKind as a type alias with only 1 lifetime (so it's same as now).
  • Compiler is fine with omitting lifetimes in most cases, so the extra lifetime on Visit doesn't create too much noise.
/// 'a = AST
/// 'r = References
trait Visit<'a, 'r> {
    fn visit_program(&mut self, program: &'r Program<'a>) {
        self.enter_node(AstKindRef::Program(program));
        // ...
        self.leave_node(AstKindRef::Program(program));
    }

    fn enter_node(&mut self, kind: AstKindRef<'a, 'r>) {}
    fn leave_node(&mut self, kind: AstKindRef<'a, 'r>) {}
}

enum AstKindRef<'a, 'r> {
    Program(&'r Program<'a>),
    // ...
}

type AstKind<'a> = AstKindRef<'a, 'a>;

Most visitors remain exactly the same as before, except for an extra '_ lifetime in the impl line:

impl<'a> Visit<'a, '_> for MyVisitor<'a> { // <-- `'_` is new
    fn visit_program(&mut self, program: &Program<'a>) { // <-- `'r` lifetime on `&Program<'a>` omitted
        // ...
    }
}

SemanticBuilder can implement Visit<'a, 'a> (i.e. 'a and 'r lifetimes are the same):

struct SemanticBuilder<'a> {
    // Actually this is wrapped inside `AstNodes`, but principle is the same
    kinds: Vec<AstKind<'a>>,
}

impl<'a> Visit<'a, 'a> for SemanticBuilder<'a> { // <-- `Visit<'a, 'a>`
    fn enter_node(&mut self, kind: AstKind<'a>) { // <-- Because 'r = 'a, can use `AstKind<'a>` here
        self.kinds.push(kind);
    }
}

Visit::alloc is no longer needed, and can be removed. Everything now works as intended without any unsafe code.

overlookmotel added a commit that referenced this issue Jan 16, 2025
…f `AstKind`s (#8522)

This lint rule keeps a stack tracing the "visitation path" during `Visit`. Only the type of the nodes is used, not their values, so store only `AstType` (1 byte) rather than `AstKind` (16 bytes).

This should be more performant, but the main motivation is #8461. This is one of very few places in the codebase which uses `enter_node` and `leave_node`. Not storing `AstKind`s here clears the way to remove the unsound lifetime extension in `Visit::alloc`.
overlookmotel added a commit that referenced this issue Jan 16, 2025
Use `Visit::visit_*` method instead of `Visit::enter_node` / `leave_node`. A step towards solving #8461, but also may improve performance.
overlookmotel added a commit that referenced this issue Jan 16, 2025
…de` usage (#8538)

Use `Visit::visit_*` method instead of `Visit::enter_node` / `leave_node`. A step towards solving #8461, but also may improve performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST A-semantic Area - Semantic C-bug Category - Bug
Projects
None yet
Development

No branches or pull requests

1 participant