Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Access form control supertype names #5881

Open
MDoerner opened this issue Oct 11, 2021 · 12 comments
Open

Unexpected Access form control supertype names #5881

MDoerner opened this issue Oct 11, 2021 · 12 comments
Labels
bug Identifies work items for known bugs

Comments

@MDoerner
Copy link
Contributor

Rubberduck version information

Version 2.5.2.5994
OS: Microsoft Windows NT 10.0.18363.0, x64
Host Product: Microsoft Office x64
Host Version: 16.0.13801.20960
Host Executable: MSACCESS.EXE

Description
As observed in issue #5782, the supertypes of (some) Access form controls we read in end in a dot, which previously failed the entire resolution. This probably indicates that the supertypes are not read correctly. The final part of the type seems to be missing for some reason.

To Reproduce

TBD -> Still need to figure out under which exact circumstances this happens.

Expected behavior
Read sensible supertypes that can be resolved.

Logfile
Extract provide in the original bug.

5994;Rubberduck.Parsing.Common.ParsingStageTimer;Added supertypes for document modules in 7ms.;
2021-10-11 08:30:07.4267;WARN-2.5.2.5994;Rubberduck.Parsing.VBA.ReferenceManagement.CompilationPasses.TypeHierarchyPass;Failed to resolve interface _Report_rptPCproduct_currentFY.;
@MDoerner MDoerner added the bug Identifies work items for known bugs label Oct 11, 2021
@bclothier
Copy link
Contributor

Having reproduced the error and examining the logs a bit more, I do not think the problem is with reading it correctly but rather introducing extraneous interfaces that do not in fact exist.

First, let's review the OLE view of the Access Form; there are 3 coclasses for a form.

coclass Form {
    [default] interface _Form3;
    [source] interface _FormEvents;
    [default, source] dispinterface _FormEvents2;
};
...
coclass FormOldV10 {
    [default] interface _Form2;
    [default, source] interface _FormEvents;
};
...
coclass FormOld {
    [default] interface _Form;
    [default, source] interface _FormEvents;
};

Next, let's do some tests via the VBA immediate window:

All those will return true.

?TypeOf Form_Inventory Is Form_Inventory
?TypeOf Form_Inventory Is Access.Form
?TypeOf Form_Inventory Is Access.FormOldV10
?TypeOf Form_Inventory Is Access.FormOld
?TypeOf Form_Inventory Is Access.[_Form]
?TypeOf Form_Inventory Is Access.[_Form2]
?TypeOf Form_Inventory Is Access.[_Form3]

This does fail with a compile error as the interfaces does not exist:

?TypeOf Form_Inventory Is [Form3]
?TypeOf Form_Inventory Is [Form2]

This also fails with a compile error, even though it is available via the typelib API. VBA may not expose this hidden interface:

?TypeOf Form_Inventory Is [_Form_Inventory]

When the type hierarchy pass runs, we have 9 classModule.SupertypeNames:
image

But only 3 uniques. I should note that for other forms, the count can vary (most seem to only have 6, not sure why this case, there was a 3rd set listed). Note that Form3 is listed even though it will not exist as a valid type in VBA (we can only use Form or [_Form3]).

Based on what I observed, the failures occurs with the Form3 and the _<specific form name> that are added to the supertypeNames incorrectly. I believe to fix this, the type needs to be cross-referenced with the coclasses in order to infer that _Form3 could reference Access.Form (which is the typical coclass used in VBA code). For the _Form_Inventory, I do not think it should be considered given that it cannot be referenced in VBA code at all. The Form_Inventory should be the only possible candidate.

Do you concur with the proposed fix?

@MDoerner
Copy link
Contributor Author

MDoerner commented Oct 12, 2021

I think Form3 is there because we generally add a version without the leading underscore. As far as I remember, this is necessary in Excel to get the supertype names Worksheet and Workbook, which do not exist in the type lib. In the Excel OM these seem to aliases [_Worksheet] and [_Workbook] respectively.

@MDoerner
Copy link
Contributor Author

Thinking again about the original issue this bug is a follow up to, I guess the real problem is that Access forms have rather lax naming restrictions.

Is whitespace allowed in the name or special characters? If yes, what does the specific supertype look like?

@bclothier
Copy link
Contributor

Yes.

I could create a form with this name, comparing the VBA's name with the name as seen via Access UI. I aligned the name for easy comparison, so ignore the leading spaces for the 2nd line.

?[Form_Inventory {},~@#$%&*()-_+=|\}"':?/><].Name
       Inventory {},~@#$%&*()-_+=|\}"':?/><

Access does not allow the following characters in a name. I only considered non-alphanumeric characters that I can type on a US keyboard:

[].!

Just to ensure everything agrees on how it's named, here's how we receive it from the TypeLib API:
image

The ProgID is not bracketed which seems questionable.

Looking at its implemented interface:
image

Again, ProgID isn't bracketed, contains a path, which doesn't feel right. I don't think we are using it but just wanted to point it out because there it contains a dot from the file path and is not bracketed.

When we then run the ReferenceResolverRunnerBase, we do get this:
image

We are able to resolve the supertypes apparently within the DocumentModuleSuperTypeNamesProvider but for some reason, the spaces gets stripped:
image

The first time, I did not hit the warning in the TypeHierarchyPass because there were no reference to the form in the project. Trying again with a single reference:

Private Sub Form_Current()
    Dim x As [Form_Inventory {},~@#$%&*()-_+=|\}"':?/><]
    Set x = Me
End Sub

But even so, that did not hit the warn log for that form.

@bclothier
Copy link
Contributor

bclothier commented Oct 12, 2021

Addendum:

The controls has different rules than forms. I did not do a full analysis but for quick reference:

?[Form_Inventory {},~@#$%&*()-_+=|\}"':?/><].Inventory_____________________________.Name
                                             Inventory {},~@#$%&*()-_+=|\}"':?/><.!

Note that in control's case, all characters are replaced with _, and the control name allows .! which is more lax than form's name. The [] still were not allowed.

@bclothier
Copy link
Contributor

bclothier commented Oct 12, 2021

An addendum to the rules used by the control naming....

If the name leads to ambiguous name in VBA, the name will be rejected. Thus, one cannot simply tranpose some characters in the name because that would result in the same VBA name which is already used. For example, a form cannot have controls with name Inventory {},~@#$%&*()-_+=|\}"':?/><.! and Inventory {},~@#$%&*()-_+=|\}"':?/><.. or Inventory {},~@#$%&*()-_+=|\}"':?/><!. (note the last 2 characters). Those 3 examples would have the same VBA name of Inventory_____________________________; only one such name could be allowed on the form.

Here are how the control is reported via the TypeLib API. Those exists on the form's hidden interface as a FuncDesc and we have 3 of them.
image
image
image

Though in VBA, the sanitized version will be only usable, the TypeLib API does provide the information and it may be possible that we can infer that 2 property getters refer to the same because they share the same pointer to the underlying FUNCDESC even though they have different names and member IDs (further testing show that this is not a valid assumption and may have been a mere concidence).

With the control added, the form Form_Inventory {},~@#$%&*()-_+=|\}"':?/>< did finally hit the warn log in the TypeHierarchyPass.

@bclothier
Copy link
Contributor

bclothier commented Oct 13, 2021

Looking at the issue itself, it seems to break down in the TypeHierarchyPass because we use an expression parser, which does not recognize spaces as a part of the identifier name. As an experiment, I tried bracketing the identifiers with spaces when both adding supertypes via ReferenceResolverRunnerBase and TypeHierarchyPass. However, that did not resolve the expression. It’s possible I’m missing something, though.

Addendum:

It appears that the hidden interfaces (e.g. _Form_...) are not added to the declaration finder, and as a consequence, it still fails to resolve even if we bracket the name (which do get de-bracketed as part of the resolution process).

I also wonder about the fact that the hidden interface are reported to be in another project (see screenshot in post 4 above where you can see the filepath to the accdb is used as a project rather than the actual VBA project's name) is causing problems.

@bclothier
Copy link
Contributor

From a test, I could determine that if a declaration was created for the hidden interface, then bracketing the supertype name would enable the resolution to succeed. In my test, I used ClassModuleDeclaration to "declare" the hidden interface but that would not be a proper use of the declarations. Thus, question then become:

  1. Do we want to add hidden interfaces as a declaration? For instance, we could create HiddenTypeDeclaration and stick those types which would then allow the resolver to fully resolve all the expressions without requiring changing the code or using some other kind of parser. Since it'd be a new type of declaration, it should not be picked up by existing code that works with ClassModuleDeclaration objects.

  2. Do we want to avoid using the expression parser to resolve the reference? Currently we require it because AddSuperType needs a declaration, and expression parser is a convenient way of generating a declaration from a string (SuperTypeNames). As an alternative, we could just create a declaration by directly newing it up, but it would then be not in the finder and would have no other references.

  3. We can consider adding a test if we fail to resolve, to try and resolve again without the leading _ in the name. That may still require bracketing the name but would not require any new declaration and would just reference the actual object rather than the hidden interface. This is probably the most minimal change to the code but this is dealing with only one specific edge cases. We have no way of knowing whether other hosts will behave the same way and may still fail to resolve for different cases.

Thoughts?

@retailcoder
Copy link
Member

Adding a new DeclarationType would work, but IIRC that flag enum is very near capacity already (there's only so many bits to offset in an Int32).

If hidden interfaces give us access to members we couldn't resolve before, then I'm all for it! Not giving it the ClassModule flag should effectively mitigate risks of regression, but Max would likely have deeper insights on that matter.

In my mind the decision hinges on what we're looking at - if having these types in the finder grants Rubberduck a better, more complete view of user modules, then I think it's worth having them in the resolver pipeline. OTOH if the idea is just to get the resolver to not choke and explode under certain circumstances, then the less-disruptive option 3 would be it.

@MDoerner
Copy link
Contributor Author

First of all, the unexpected supertype names no longer make the resolver explode. It simply issues a warning and does not add the supertype.

Option 2 is out of the question. The entire point of the TypeHierarchyPass is to connect classes with their supertypes. Adding a new declaration here would either duplication in our state with partial information on each instance, provided we actually make the effort to generate the right declaration, or add a disfunctional stub.

Anyway, we really need the parser here because we can have multi-part typenames like Excel.Workbook.

I think a proper solution to deal with interfaces in the typelib, but hidden by the VBE, is none of the options provided.

Rather, I think what we should do here is the following, if we really want to resolve the hidden interfaces.
We should add the declarations of these interfaces from the typelib as part of the parsing process. This requires figuring out which modules are available via the typelib and not via the VBE, enhancing theUserProjectsFromComProjectsLoader to be able to refresh selected modules. (The modules loaded that way are marked as not user defined, so that they do not interfere with actions concerning user defined modules, like refactorings and inspections.)
Then, the information what was unloaded or loaded needs to be enhanced to deal with the fact that we load only some declarations from some COM projects now.
In addition, we need to figure out how to best determine whether the hidden modules have changed, i.e. whether old ones have been deleted, new ones have been added or relevant data has been changed. (Currently, the assumption is that no relevant data changes while something is loaded from there; such projects are treated as references.)
Finally, we need to enhance the supertype resolution to use brackets appropriately, if non-identifier characters are present in a type name.

@bclothier
Copy link
Contributor

Adding a new type shouldn't be a problem because the DeclarationType is already a long and we are at 35th bit so we exceeded the 32nd bit a while ago. :-)

The hidden interface is the one that provides us with definitions of controls on a form. We would need to use that hidden interface to additionally determine whether a procedure declared on a form is an event handler.

However, from a user's POV, we want to show it in the toolbar as [Form_Silly Name].Silly_TextBox, rather than [_Form_Silly Name].Silly_TextBox. Therefore, we'd need to merge all the members from the original "interface" and the hidden interface and expose them as a children of the DocumentClassModule declaration. Not sure if we already have a mechanism for doing that in which case should be used if possible. Otherwise, we'll have to add a step somewhere in the resolving process to unify those members under the exposed declarations. We need to also think about the declaration finder as this can affect the expectations when trying to find a related declaration for a document module.

In addition, we need to figure out how to best determine whether the hidden modules have changed, i.e. whether old ones have been deleted, new ones have been added or relevant data has been changed.

Unfortunately this would work out poorly. As an example, if I rename a control on the form, there is no change in VBE's POV but the hidden interface would be now different and that needs to be refreshed. I would suggest that for simplicity, the type lib information should never be cached especially for the current project. Otherwise, the resolver will not be able to reliably give out current information about a document module.

Finally, we need to enhance the supertype resolution to use brackets appropriately, if non-identifier characters are present in a type name.

The issue is that currently the hidden interfaces don't even get declared so adding brackets wouldn't have helped. At least, it would only help resolving the non-hidden interface to the document class module. We can provide a trivial fix to bracket the interface name which would reduce the warnings and allow resolution of forms with spaces in their name to their public interfaces.

@bclothier
Copy link
Contributor

Just to take some notes based on the PR #5888 that may be relevant for full resolution of hidden interfaces.

The supertypes are also provided from the Implements statement, handled within DeclarationSymbolsListener. This is deferred until the resolution in TypeHierarchyPass and for that reason, the GetText() is used there. That made me wonder whether it was possible to implements a poorly named module. Thankfully, VBA does not seem to permit names like those:

Implements [Bad Names].[Silly Name]
Implements [Silly Name]
Implements [VBAProject].[Foo]
Implements [Foo]

In all those cases, the line will be highlighted red and be a compiler error. VBA will try to strip away the brackets. Thus it is impossible to Implements a poorly named module or even to bracket a legal identifier. Thus, we do not have to worry about bad names come through from the Implements statement. We only need to worry about it from ReferenceResolveRunnerBase.AddSuperTypeNamesForDocumentModules.

Incidentally, that made me wonder if the semantics for the Implements statement was different in regards to the identifier so I consulted MS-VBAL 5.2.4.2 which actually says nothing about the identifier itself. However, I noticed something else tangential and was reminded of @retailcoder 's odyssey with implementing the Excel worksheet interface:

An cannot occur within an extension module.

The extension module is defined in MS-VBAL 4.2.1. The term open host project is defined in MS-VBAL 4.1:

An open host project is one to which additional modules can be added to it by agents other than the host application. The means of designating an open host project and of adding modules to one is implementation defined.

I'm still unclear whether Excel qualifies as a "open host module" because it's not clear what qualifies as an "agent". Still, given that an extension module are supposed to have the same name as extensible module, it does implies that it would not implement a hidden interface, which is the case. In contrast, Access's document module are probably not extension modules since they simply implements Access' objects and provide their own hidden interfaces. This would explain why Access' document modules can contain Implements statements whereas Excel's document modules cannot (or rather, it could but behaves poorly as a consequence). Oddly, though, Excel or VBA won't give a compile-time error when one attempts to put in an Implements statement in such document module whereas the MS-VBAL implies that it should not be a valid thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Identifies work items for known bugs
Projects
None yet
Development

No branches or pull requests

3 participants