Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
There was a problem hiding this comment.
Pull request overview
This PR introduces node type awareness into ducktape's cluster specification and allocation logic so tests can request heterogeneous clusters (e.g., "small", "large" nodes) while preserving backward compatibility when no type is specified.
Changes:
- Extend
NodeSpec,ClusterSpec,RemoteAccount, andNodeContainerto carry and use anode_typelabel for matching and allocation. - Wire
node_typethrough JSON cluster configs, test context metadata, and the@clustermarker, plus add tests coveringNodeSpec,ClusterSpec.simple_linux, andNodeContainerbehavior with types. - Maintain backward-compatible behavior when
node_typeisNone(treated as "match any") and via anos_to_nodescompatibility view over the new(os, node_type)grouping.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/runner/fake_remote_account.py | Propagates node_type into fake accounts so tests can simulate typed nodes. |
| tests/cluster/check_node_container.py | Adds helpers and tests validating NodeContainer grouping and allocation behavior with node_type. |
| tests/cluster/check_cluster_spec.py | Adds tests for ClusterSpec.simple_linux and NodeSpec including node type, matching, and string representation. |
| ducktape/tests/test_context.py | Extends expected cluster spec derivation to honor node_type alongside num_nodes. |
| ducktape/mark/resource.py | Documents node_type as a recognized @cluster hint and shows example usage. |
| ducktape/mark/consts.py | Introduces CLUSTER_NODE_TYPE_KEYWORD constant used in test context. |
| ducktape/cluster/remoteaccount.py | Adds node_type to RemoteAccount to carry node classification from cluster configuration. |
| ducktape/cluster/node_spec.py | Extends NodeSpec with node_type, matching semantics, equality, hashing, and updated string representation. |
| ducktape/cluster/node_container.py | Refactors NodeContainer to group nodes by (os, node_type) and update remove/can_remove logic for type-aware allocation. |
| ducktape/cluster/json.py | Reads optional node_type from JSON cluster config and passes it to remote account construction. |
| ducktape/cluster/cluster_spec.py | Updates ClusterSpec.simple_linux and from_nodes to propagate node_type into specs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| node_type: Optional[str] = None, | ||
| logger: Optional[logging.Logger] = None, | ||
| ssh_exception_checks: List[Callable] = [], |
There was a problem hiding this comment.
Adding node_type between externally_routable_ip and logger changes the positional signature of RemoteAccount.__init__, so existing callers that pass logger or ssh_exception_checks positionally will now have their arguments bound incorrectly. To keep this change backward compatible, consider appending node_type after the existing parameters (or making it keyword-only via a * separator) so all prior positional call sites continue to work as before.
| node_type: Optional[str] = None, | |
| logger: Optional[logging.Logger] = None, | |
| ssh_exception_checks: List[Callable] = [], | |
| logger: Optional[logging.Logger] = None, | |
| ssh_exception_checks: List[Callable] = [], | |
| *, | |
| node_type: Optional[str] = None, |
| ssh_config = RemoteAccountSSHConfig(**ninfo.get("ssh_config", {})) | ||
|
|
||
| # Extract node_type from JSON (optional field) | ||
| node_type = ninfo.get("node_type") | ||
|
|
||
| remote_account = make_remote_account_func( | ||
| ssh_config, | ||
| ninfo.get("externally_routable_ip"), | ||
| node_type=node_type, | ||
| ssh_exception_checks=kwargs.get("ssh_exception_checks"), | ||
| ) |
There was a problem hiding this comment.
If make_remote_account_func is a user-supplied factory, unconditionally passing a new node_type keyword argument will break existing implementations that only accept (ssh_config, externally_routable_ip, ssh_exception_checks). To preserve compatibility, either (a) keep the call signature unchanged and handle node_type inside your own factory, or (b) conditionally include the node_type keyword only when the target function's signature supports it (e.g., via inspect.signature) and clearly document the new optional parameter.
| def _group_spec_by_key(self, cluster_spec: ClusterSpec) -> Dict[NodeGroupKey, List["NodeSpec"]]: | ||
| """ | ||
| Group the NodeSpecs in a ClusterSpec by (os, node_type) key. | ||
|
|
||
| :param cluster_spec: The cluster spec to group | ||
| :return: Dictionary mapping (os, node_type) to list of NodeSpecs | ||
| """ | ||
| result: Dict[NodeGroupKey, List["NodeSpec"]] = {} | ||
| for node_spec in cluster_spec.nodes.elements(): | ||
| key = (node_spec.operating_system, node_spec.node_type) | ||
| result.setdefault(key, []).append(node_spec) | ||
| return result | ||
|
|
There was a problem hiding this comment.
The helper _group_spec_by_key is currently unused; grouping for specs is handled via cluster_spec.nodes.grouped_by_os_and_type() instead. To avoid dead code and potential confusion, either remove this method or refactor call sites to use it consistently for spec grouping.
| def _group_spec_by_key(self, cluster_spec: ClusterSpec) -> Dict[NodeGroupKey, List["NodeSpec"]]: | |
| """ | |
| Group the NodeSpecs in a ClusterSpec by (os, node_type) key. | |
| :param cluster_spec: The cluster spec to group | |
| :return: Dictionary mapping (os, node_type) to list of NodeSpecs | |
| """ | |
| result: Dict[NodeGroupKey, List["NodeSpec"]] = {} | |
| for node_spec in cluster_spec.nodes.elements(): | |
| key = (node_spec.operating_system, node_spec.node_type) | |
| result.setdefault(key, []).append(node_spec) | |
| return result |
| - ``num_nodes`` provide hint about how many nodes the test will consume | ||
| - ``node_type`` provide hint about what type of nodes the test needs (e.g., "large", "small") | ||
| - ``cluster_spec`` provide hint about how many nodes of each type the test will consume |
There was a problem hiding this comment.
The phrase 'provide hint' is grammatically off; it should be 'provides a hint' to read correctly (and it would be good to update the existing entries to match for consistency).
| - ``num_nodes`` provide hint about how many nodes the test will consume | |
| - ``node_type`` provide hint about what type of nodes the test needs (e.g., "large", "small") | |
| - ``cluster_spec`` provide hint about how many nodes of each type the test will consume | |
| - ``num_nodes`` provides a hint about how many nodes the test will consume | |
| - ``node_type`` provides a hint about what type of nodes the test needs (e.g., "large", "small") | |
| - ``cluster_spec`` provides a hint about how many nodes of each type the test will consume |
Description
Adds node_type abstraction to allow tests to declare node requirements, enabling heterogeneous cluster support where different tests can request different types of nodes (e.g., "small", "large"). This creates different pools of nodes based on node_type. Tests can specify node_type when requesting specific type of nodes through the cluster annotation.
New Capability
Tests can now specify node_type in the @cluster decorator:
Backward Compatible
node_type=None matches any available node
Cluster JSON Format
Testing
Backward compatibility test
https://semaphore.ci.confluent.io/workflows/c0f7aaa3-b8ba-4b52-a76b-ac37a89ac709
Heterogeneous cluster test
https://semaphore.ci.confluent.io/workflows/1c50cdfa-288a-4c76-ab42-5fc31d52291d?pipeline_id=7617567a-d7b4-4911-b6ac-45efd8d6c68e
Issue
https://confluentinc.atlassian.net/browse/CPTF-1412