-
Notifications
You must be signed in to change notification settings - Fork 169
Improve performance when fetching Decls. #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The current implementation in clangsharp_Cursor_getDecl to get a Decl is an N^2 algorithm; getting a Decl for index X means looping through all the previous indices 0-(X-1) first. This is rather slow when dealing with hundreds of thousands of Decl instances, here's a screenshot from Instruments showing 74% of the time spent in clangsharp_Cursor_getDecl: <img width="921" height="348" alt="Screenshot 2026-01-07 at 12 47 26" src="https://github.com/user-attachments/assets/c96d5611-683e-4f0d-9878-cc551ba60c71" /> I had a few observations: 1. In my use case, I'll always iterate over all the Decls (once I start iterating). 2. There doesn't seem to be a way in native code to get the Decl given an index. 3. Storing the C++ iterator in C# to keep iterating on it is beyond my C++ knowledge (in particular any iterator lifetime management didn't look trivial). So I implemented a way to get all the Decls and store them into a C# array directly, so now the algorithm is 2N instead (iterate once to count the number of Decls, iterate again to get them). After this change, getting the Decls now takes up 0.0% (!): <img width="930" height="400" alt="Screenshot 2026-01-07 at 12 47 14" src="https://github.com/user-attachments/assets/b40ac6d3-abb3-47c7-977c-884015b617ce" /> Also note the total time of the profiling session: 1.30 min vs 3.38 min.
0ad2f48 to
d2dfe56
Compare
| _attrs = LazyList.Create<Attr>(Handle.NumAttrs, (i) => TranslationUnit.GetOrCreate<Attr>(Handle.GetAttr(unchecked((uint)i)))); | ||
| _body = new ValueLazy<Stmt?>(() => !Handle.Body.IsNull ? TranslationUnit.GetOrCreate<Stmt>(Handle.Body) : null); | ||
| _canonicalDecl = new ValueLazy<Decl>(() => TranslationUnit.GetOrCreate<Decl>(Handle.CanonicalCursor)); | ||
| _decls = LazyList.Create<Decl>(Handle.NumDecls, (i) => TranslationUnit.GetOrCreate<Decl>(Handle.GetDecl(unchecked((uint)i)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW counting the decls (i.e. calling Handle.NumDecls) also shows up in the profile, although not nearly as prominent, so I also considered lazy-loading the count of LazyList instead of having to specify it up-front.
| [Test] | ||
| public void AttrKindSpelling() | ||
| { | ||
| AssertNeedNewClangSharp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new native libClangSharp method makes a lot of tests fail because the new code is in a rather fundamental part of ClangSharp, not sure how to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest thing would be to use NativeLibrary.GetExport to try and resolve the new GetDecls API and fallback to the old logic if not present.
I can then fix that up to directly use the new API when I get the new libClangSharp published.
In this particular case, the int i = 0;
CXCursor currentDecl = Handle.GetDecl(i);
do
{
yield return TranslationUnit.GetOrCreate<Decl>(currentDecl);
currentDecl = currentDecl.NextDeclInContext;
}
while (!currentDecl.IsNull);If This would allow the implementation to be something like this: _decls = LazyList.Create<Decl>(Handle.NumDecls, (i, initializedDecls) => {
if (i == 0)
{
return Handle.GetDecl(0);
}
else
{
return initializedDecls[^1].NextDeclInContext;
}
}
); |
The current implementation in clangsharp_Cursor_getDecl to get a Decl is an N^2 algorithm; getting a Decl for index X means looping through all the previous indices 0-(X-1) first.
This is rather slow when dealing with hundreds of thousands of Decl instances, here's a screenshot from Instruments showing 74% of the time spent in clangsharp_Cursor_getDecl:
I had a few observations:
In my use case, I'll always iterate over all the Decls (once I start iterating).
There doesn't seem to be a way in native code to get the Decl given an index.
Storing the C++ iterator in C# to keep iterating on it is beyond my C++ knowledge (in particular any iterator lifetime management didn't look trivial).
So I implemented a way to get all the Decls and store them into a C# array directly, so now the algorithm is 2N instead (iterate once to count the number of Decls, iterate again to get them).
After this change, getting the Decls now takes up 0.0% (!):
Also note the total time of the profiling session: 1.30 min vs 3.38 min.