Skip to content

Conversation

@matthiasdiener
Copy link
Collaborator

@matthiasdiener matthiasdiener commented Apr 4, 2025

TODOs:

  • needs a test

Please squash

@matthiasdiener matthiasdiener self-assigned this Apr 4, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

arraycontext/impl/pytato/compile.py:643

  • Add tests that verify kernel profiling support works as expected when profile_kernels is enabled.
if self.actx.profile_kernels:

arraycontext/impl/pytato/init.py:299

  • Add tests for the new profiling methods (e.g., _wait_and_transfer_profile_events, get_profiling_data_for_kernel, reset_profiling_data_for_kernel, tabulate_profiling_data) to ensure that profiling data is collected and summarized correctly.
self.profile_kernels = profile_kernels

Copy link
Collaborator

@alexfikl alexfikl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got excited about the profiling so I skimmed a bit and left some comments, hope that's ok!

"""Reset profiling data for kernel `kernel_name`."""
self.profile_results.pop(kernel_name, None)

def tabulate_profiling_data(self) -> pytools.Table:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more like a helper function than a class method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno - this is currently the main way of "getting" the complete profiling data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the main way of getting it actx.profile_results and/or actx.get_profile_for_kernel? This is just a handy way to display it (probably one of many).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's just a convenience function for the user. We have the same method for eager profiling: https://github.com/illinois-ceesd/mirgecom/blob/41c6e567d44258557d95ef8453c3b82c4c89ab55/mirgecom/profiling.py#L216

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @alexfikl that this should be a function, not a method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you guys think of 1123c16?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

matthiasdiener and others added 3 commits April 4, 2025 15:27
@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Apr 4, 2025

Thanks for taking a look @alexfikl ! This is ready for a first look @inducer .

@matthiasdiener matthiasdiener requested a review from inducer April 4, 2025 20:57
@matthiasdiener matthiasdiener marked this pull request as ready for review April 7, 2025 17:45
"""Reset profiling data for kernel `kernel_name`."""
self.profile_results.pop(kernel_name, None)

def tabulate_profiling_data(self) -> pytools.Table:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @alexfikl that this should be a function, not a method.

"""Holds a profile event that has not been collected by the profiler yet."""

cl_event: cl._cl.Event
translation_unit: Any
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the translation_unit in here? I'm not sold.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to just the name of the translation unit in 7216ccf

allocator=self.allocator,
**bound_arguments)

self.add_profiling_event(evt, pt_prg)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only call the function if profiling is enabled, to avoid incurring the function call penalty. (Also below.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7216ccf

allocator=self.allocator,
**bound_arguments)

self.add_profiling_event(evt, pt_prg)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inaccurate: one loopy TranslationUnit will enqueue multiple kernels, so you'll need to either

  • collect multiple events, or
  • enqueue a marker before and after.

(Also below.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the "marker" strategy in 7216ccf

@matthiasdiener matthiasdiener marked this pull request as draft April 7, 2025 21:33
@matthiasdiener matthiasdiener marked this pull request as ready for review April 8, 2025 19:42
@matthiasdiener matthiasdiener requested a review from inducer April 8, 2025 19:42
Comment on lines 385 to 402
def get_profiling_data_for_kernel(self, kernel_name: str) \
-> MultiCallKernelProfile:
"""Return profiling data for kernel *kernel_name*."""
self._wait_and_transfer_profile_events()

time = 0
num_calls = 0

if kernel_name in self.profile_results:
knl_results = self.profile_results[kernel_name]

num_calls = len(knl_results)

for r in knl_results:
time += r

return MultiCallKernelProfile(num_calls, time)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd like from the user interface here is just minimal access to the raw data and a reset primitive, without aggregation, and without exposing the before-wait/after-wait distinction. Everything else belongs in external utility routines, IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to minimize the API in 6bdb85b. What do you think?

# }}}


def tabulate_profiling_data(profile_results: dict[str, MultiCallKernelProfile])\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the (so far only) interface to the doc in 6bdb85b.

time += r

return MultiCallKernelProfile(num_calls, time)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to add the final interface to the class documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the (so far only) interface to the doc in 6bdb85b.

Co-authored-by: Andreas Klöckner <inform@tiker.net>
@inducer inducer merged commit 1013ecb into main Apr 16, 2025
12 checks passed
@inducer inducer deleted the profile-pytato-actx branch April 16, 2025 22:00
@inducer
Copy link
Owner

inducer commented Apr 16, 2025

Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants