Added Plugin system in cubecl-runtime with an example in cubecl cuda…#1057
Added Plugin system in cubecl-runtime with an example in cubecl cuda…#1057Q-Lucca wants to merge 4 commits intotracel-ai:mainfrom
Conversation
| type ClientHandle: Send; | ||
| /// This is a type which needs to be build by a `ComputeServer`. | ||
| type ServerHandle: Send; |
There was a problem hiding this comment.
It is strange to have two handles here, I think it would be cleaner to have maybe 3 associated types:
- Input (ServerHandle) the input of the function.
- Output (ReturnVal) the output of the function.
- State (InitType) the state of the plugin that needs an init.
I would ditch the client handle.
There was a problem hiding this comment.
First of all thanks for your time and feedback. :)
Fair enough, I'm not particularly creative and consistent with my choice of words :D So yeah fully with you and will definitely improve on that in future iterations.
| /// Since we know how the `ServerHandle` will look the `Plugin` will | ||
| /// be able to define additional functions which can be executed over the | ||
| /// `ServerHandle` by a `ComputeServer` | ||
| type Fns: FnOnce(Self::ServerHandle) -> Result<Self::ReturnVal, PluginError>; |
There was a problem hiding this comment.
Why use an associated type for a function rather than.. a function:
fn execute(input: Self::Input) -> Result<Self::Output, PluginError>;I think it would be cleaner.
There was a problem hiding this comment.
In my targeted architecture the goal is to decouple cluster function orchestration from the whole cluster lifecycle. So a deployment could profit from ai research while handling production without the need of komplex re-deploynment iterations. There is an hole reseaoning for this decision, but ... (Conv0)
| /// Type we want for initialisation. | ||
| pub trait PluginType { | ||
| type Insert: Send + Sync; | ||
|
|
||
| fn init(self) -> Self::Insert; | ||
| } |
There was a problem hiding this comment.
I would put that function inside the Plugin trait:
pub trait Plugin {
...
fn init(state: Self::State);
}I don't think we need a returned type here.
There was a problem hiding this comment.
Thats true xD Made the type function while designing my fn orchestrator. The function use case is not necessary even with my architecture xD Unnecessary and like you said in future iterations.
| &mut self, | ||
| client_handle: SP::ClientHandle, | ||
| stream_id: StreamId, | ||
| op: SP::Fns, |
There was a problem hiding this comment.
I don't think we need to pass the function as input here, or maybe there's something I'm not understanding.
There was a problem hiding this comment.
(Conv0) ... fair enough, would say there is too much of my architecture leaking in here. These concepts are more for my specific case not for general use. Your suggestion is definitely a better match for CubeCL's general API.
| client0 | ||
| .plugin_fn::<NcclExtension>(handle0, op) |
There was a problem hiding this comment.
I think it's probably a bit complex for the user API. I like the idea of plugin, but the function call can use enum instead of function definitions. If for instance the Plugin::Input is an enum, like AllReduce, Init, Gather, etc. We don't have to support many functions per extension. It would be cleaner to call:
client.execute::<NcclExtension>(NcclArg::AllReduce(NcclReduce::Avg, inputs, output));Do you think that would be better?
There was a problem hiding this comment.
With a deeper understanding of the differentiations you introduced me to this makes sense. Also had the idea to improve on the execution method of ComputeClient, but i'm carefull with reworking implementations due to the respect of the author. Coming from a corporate employe background my additions already felt illegal. :D Will adapt and make a more integrated solution to abstract complexities inline with CubeCl's design. Already hade a more profound look into it and got an idea will iterate on this. But didn't wanted to do that if you have had no intressed in the general approach.
There was a problem hiding this comment.
No worries, I think the plugin system makes sense. It's normal to iterate over a design :)
|
Ok, big thanks for the input. I have a deeper understanding of the differentiation in our project philosophies. The general goal for my next iteration will be a more straightforward usage within the client for general use while also allowing me to approach plugins with capabilities for optimizations needed in production grade multi service network deployments. The second iteration will propose a integration with the client's execution methods, to abstract complexities in sync with CubeCL while maintaining compabilitie. |
Hello Lucca from Use-Ai.rs here,
Decided on branding therefore know my final account for my project.
So, to enable the possibility to add additional features to ComputeServer which can be accessed through the ComputeClient i propose the Plugin system implemented here. For now i used it to get nccl to cubecl-cuda. The system allows to introduce and initialize additional types and execute additional functions over ComputeServer. This is used to not lose threads when building worker based Runtime for integrated production deployments. A library released next week will be in need for a implementation of that kind.
This is a draft with the first docs within the files. Before implementing full docs for cubecl i would need to know if you guys would generally accept a full proposal?