Skip to content

Conversation

@rolfbjarne
Copy link
Member

@rolfbjarne rolfbjarne commented Jan 7, 2026

The current implementation in clangsharp_Cursor_getDecl to get a Decl is an N^2 algorithm; getting a Decl for index X means looping through all the previous indices 0-(X-1) first.

This is rather slow when dealing with hundreds of thousands of Decl instances, here's a screenshot from Instruments showing 74% of the time spent in clangsharp_Cursor_getDecl:

Screenshot 2026-01-07 at 12 47 26

I had a few observations:

  1. In my use case, I'll always iterate over all the Decls (once I start iterating).

  2. There doesn't seem to be a way in native code to get the Decl given an index.

  3. Storing the C++ iterator in C# to keep iterating on it is beyond my C++ knowledge (in particular any iterator lifetime management didn't look trivial).

So I implemented a way to get all the Decls and store them into a C# array directly, so now the algorithm is 2N instead (iterate once to count the number of Decls, iterate again to get them).

After this change, getting the Decls now takes up 0.0% (!):

Screenshot 2026-01-07 at 12 47 14

Also note the total time of the profiling session: 1.30 min vs 3.38 min.

The current implementation in clangsharp_Cursor_getDecl to get a Decl is an N^2 algorithm;
getting a Decl for index X means looping through all the previous indices 0-(X-1)
first.

This is rather slow when dealing with hundreds of thousands of Decl instances, here's
a screenshot from Instruments showing 74% of the time spent in clangsharp_Cursor_getDecl:

<img width="921" height="348" alt="Screenshot 2026-01-07 at 12 47 26" src="https://github.com/user-attachments/assets/c96d5611-683e-4f0d-9878-cc551ba60c71" />

I had a few observations:

1. In my use case, I'll always iterate over all the Decls (once I start iterating).

2. There doesn't seem to be a way in native code to get the Decl given an index.

3. Storing the C++ iterator in C# to keep iterating on it is beyond my C++ knowledge
   (in particular any iterator lifetime management didn't look trivial).

So I implemented a way to get all the Decls and store them into a C# array directly,
so now the algorithm is 2N instead (iterate once to count the number of Decls, iterate
again to get them).

After this change, getting the Decls now takes up 0.0% (!):

<img width="930" height="400" alt="Screenshot 2026-01-07 at 12 47 14" src="https://github.com/user-attachments/assets/b40ac6d3-abb3-47c7-977c-884015b617ce" />

Also note the total time of the profiling session: 1.30 min vs 3.38 min.
_attrs = LazyList.Create<Attr>(Handle.NumAttrs, (i) => TranslationUnit.GetOrCreate<Attr>(Handle.GetAttr(unchecked((uint)i))));
_body = new ValueLazy<Stmt?>(() => !Handle.Body.IsNull ? TranslationUnit.GetOrCreate<Stmt>(Handle.Body) : null);
_canonicalDecl = new ValueLazy<Decl>(() => TranslationUnit.GetOrCreate<Decl>(Handle.CanonicalCursor));
_decls = LazyList.Create<Decl>(Handle.NumDecls, (i) => TranslationUnit.GetOrCreate<Decl>(Handle.GetDecl(unchecked((uint)i))));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW counting the decls (i.e. calling Handle.NumDecls) also shows up in the profile, although not nearly as prominent, so I also considered lazy-loading the count of LazyList instead of having to specify it up-front.

[Test]
public void AttrKindSpelling()
{
AssertNeedNewClangSharp();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new native libClangSharp method makes a lot of tests fail because the new code is in a rather fundamental part of ClangSharp, not sure how to handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest thing would be to use NativeLibrary.GetExport to try and resolve the new GetDecls API and fallback to the old logic if not present.

I can then fix that up to directly use the new API when I get the new libClangSharp published.

@tannergooding
Copy link
Member

Storing the C++ iterator in C# to keep iterating on it is beyond my C++ knowledge (in particular any iterator lifetime management didn't look trivial).

In this particular case, the decl_iterator is actually rather trivial, it is effectively:

int i = 0;
CXCursor currentDecl = Handle.GetDecl(i);

do
{
    yield return TranslationUnit.GetOrCreate<Decl>(currentDecl);
    currentDecl = currentDecl.NextDeclInContext;
}
while (!currentDecl.IsNull);

If LazyList changed the signature of valueFactory from Func<int, T> to instead be Func<int, ROSpan<T>, T>>, then each iteration could pass in the already initialized elements. i.e. 0 is empty while 2 contains elements 0 and 1.

This would allow the implementation to be something like this:

_decls = LazyList.Create<Decl>(Handle.NumDecls, (i, initializedDecls) => {
        if (i == 0)
        {
            return Handle.GetDecl(0);
        }
        else
        {
            return initializedDecls[^1].NextDeclInContext;
        }
    }
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants