feat(spark): Refactor unit tests to sdk coding standards by digvijay-y · Pull Request #293 · kubeflow/sdk

digvijay-y · 2026-02-12T18:55:55Z

Refactor unit tests to follow Kubeflow SDK coding standards:

Added testcase dataclass
Removed all individual Test cases
All test now follow Kubeflow SDK coding standards

Checklist:

Docs included if any changes are user facing

google-oss-prow · 2026-02-12T18:56:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kramaranya for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-02-12T18:56:25Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

This PR refactors the SparkClient unit tests to follow the Kubeflow SDK coding standards by converting individual test classes to @pytest.mark.parametrize format with TestCase dataclass patterns. The refactoring aligns with issue #273 and follows the approach established in PR #225.

Changes:

Introduced TestCase dataclass for parametrized test structure
Converted all test methods from individual test classes to parametrized tests
Added mock improvements for handling both existing and non-existent sessions
Consolidated test organization into four main parametrized test functions

Copilot · 2026-02-12T19:01:46Z

kubeflow/spark/api/spark_client_test.py

        call_args = mock_backend.create_and_connect.call_args
        assert call_args.kwargs["options"] is None
+
+    print(f"✓ {test_case.name} succeeded")


Print statements are not used in other test files in the codebase. Remove these debugging print statements to maintain consistency with the codebase testing conventions.

Copilot · 2026-02-12T19:01:46Z

kubeflow/spark/api/spark_client_test.py

+    if test_case.expected_status == SUCCESS:
+        if "operation" in test_case.config:
+            if test_case.config["operation"] == "delete":
+                spark_client.delete_session(test_case.config["session_name"])
+                mock_backend.delete_session.assert_called_once_with(
+                    test_case.config["session_name"]
+                )
+            elif test_case.config["operation"] == "logs":
+                result = list(spark_client.get_session_logs(test_case.config["session_name"]))
+                assert len(result) == test_case.expected_output
+                mock_backend.get_session_logs.assert_called_once_with(
+                    test_case.config["session_name"], follow=False
+                )
+        elif "session_name" in test_case.config:
+            result = spark_client.get_session(test_case.config["session_name"])
+            assert result.name == test_case.expected_output
+            mock_backend.get_session.assert_called_with(test_case.config["session_name"])
+        else:
+            result = spark_client.list_sessions()
+            assert len(result) == test_case.expected_output
+            mock_backend.list_sessions.assert_called_once()
+        print(f"✓ {test_case.name} succeeded")
+    else:
+        with pytest.raises(Exception) as exc_info:
+            spark_client.get_session(test_case.config["session_name"])
+        assert test_case.expected_error in str(exc_info.value)
+        print(f"✓ {test_case.name} failed as expected")


The test structure should use try/except blocks to check expected_status instead of if/else conditionals, following the established pattern in other test files.

Copilot · 2026-02-12T19:01:47Z

kubeflow/spark/api/spark_client_test.py

+    if "options" in test_case.config:
+        options = test_case.config["options"]
        spark_client.connect(options=options)
        mock_backend.create_and_connect.assert_called_once()
        call_args = mock_backend.create_and_connect.call_args
        assert call_args.kwargs["options"] == options
-
-    def test_connect_without_options_auto_generates(self, spark_client, mock_backend):
-        """C19: Connect without options auto-generates name via backend."""
-        mock_session = Mock()
-        mock_backend.create_and_connect.return_value = mock_session
+    else:
        spark_client.connect()
        mock_backend.create_and_connect.assert_called_once()
        call_args = mock_backend.create_and_connect.call_args
        assert call_args.kwargs["options"] is None
+
+    print(f"✓ {test_case.name} succeeded")


The test structure should use try/except blocks to check expected_status instead of if/else conditionals, following the established pattern in other test files.

Copilot · 2026-02-12T19:01:47Z

kubeflow/spark/api/spark_client_test.py

+            expected_output="backend_created",
+        ),
+        TestCase(
+            name="custom namespace initialization",
+            expected_status=SUCCESS,
+            config={"namespace": "spark"},
+            expected_output="backend_created",


The expected_output value "backend_created" is not used in the test logic. Remove this unused field or use it to validate actual test outputs.

Suggested change

expected_output="backend_created",

),

TestCase(

name="custom namespace initialization",

expected_status=SUCCESS,

config={"namespace": "spark"},

expected_output="backend_created",

),

TestCase(

name="custom namespace initialization",

expected_status=SUCCESS,

config={"namespace": "spark"},

Copilot · 2026-02-12T19:01:47Z

kubeflow/spark/api/spark_client_test.py

+@dataclass
+class TestCase:
+    """Test case structure for parametrized SparkClient tests."""
+
+    name: str
+    expected_status: str
+    config: dict[str, Any]
+    expected_output: Optional[Any] = None
+    expected_error: Optional[str] = None
+    # Prevent pytest from collecting this dataclass as a test
+    __test__ = False
+
+
+SUCCESS = "SUCCESS"
+EXCEPTION = "EXCEPTION"


The TestCase dataclass and SUCCESS/FAILED constants should be imported from kubeflow.spark.test.common instead of being redefined. This ensures consistency across the codebase. Additionally, expected_error should be Optional[type[Exception]] instead of Optional[str] to match the established pattern.

Copilot · 2026-02-12T19:01:50Z

kubeflow/spark/api/spark_client_test.py

+            expected_output="options_passed",
+        ),
+        TestCase(
+            name="connect without options auto-generates",
+            expected_status=SUCCESS,
+            config={},
+            expected_output="no_options",


The expected_output value "options_passed" is not used in the test logic. Remove this unused field or use it to validate actual test outputs.

Suggested change

expected_output="options_passed",

),

TestCase(

name="connect without options auto-generates",

expected_status=SUCCESS,

config={},

expected_output="no_options",

),

TestCase(

name="connect without options auto-generates",

expected_status=SUCCESS,

config={},

Copilot · 2026-02-12T19:01:50Z

kubeflow/spark/api/spark_client_test.py

+            with patch("kubeflow.spark.api.spark_client.KubernetesBackend"):
+                client = SparkClient()
+                assert client.backend is not None
+        print(f"✓ {test_case.name} succeeded")


Print statements are not used in other test files in the codebase. Remove these debugging print statements to maintain consistency with the codebase testing conventions.

Copilot · 2026-02-12T19:01:51Z

kubeflow/spark/api/spark_client_test.py

-    def test_connect_with_url_invalid(self, spark_client):
-        """C04b: Connect with invalid URL raises ValueError."""
+            SparkClient(backend_config=test_case.config["backend_config"])
+        print(f"✓ {test_case.name} failed as expected")


Print statements are not used in other test files in the codebase. Remove these debugging print statements to maintain consistency with the codebase testing conventions.

Copilot · 2026-02-12T19:01:51Z

kubeflow/spark/api/spark_client_test.py

+)
+def test_spark_client_connect(test_case: TestCase, spark_client):
+    """Test SparkClient connect method scenarios."""
+    print(f"Running test: {test_case.name}")


Print statements are not used in other test files in the codebase. Remove these debugging print statements to maintain consistency with the codebase testing conventions.

kubeflow/spark/api/spark_client_test.py

Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

Signed-off-by: Jon Burdo <jon@jonburdo.com> Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.

kubeflow/spark/api/spark_client_test.py

Copilot · 2026-02-12T19:39:45Z

kubeflow/spark/api/spark_client_test.py

+    except Exception as e:
+        # If we got an exception but expected success, fail
+        assert test_case.expected_status == EXCEPTION, (
+            f"Unexpected exception in {test_case.name}: {e}"
+        )
+        # Validate the exception type/message if specified
+        if test_case.expected_error:
+            assert test_case.expected_error in str(e), (
+                f"Expected error '{test_case.expected_error}' but got '{str(e)}'"
+            )


expected_error is set to the string "ValueError", but the assertion checks "ValueError" in str(e); str(ValueError(...)) is just the message and typically won’t contain the class name, so this test will fail even when the correct exception is raised (prefer asserting the exception type, e.g. type(e) is ValueError, or using pytest.raises(ValueError)).

Copilot · 2026-02-12T19:39:45Z

kubeflow/spark/api/spark_client_test.py

+            f"Unexpected exception in {test_case.name}: {e}"
+        )
+        # Validate the exception type/message if specified
+        if test_case.expected_error:


Same issue here: the invalid-URL case sets expected_error="ValueError" but validates it via substring match against str(e), which won’t include the exception type (use pytest.raises(ValueError, match=...) or compare type(e) to the expected exception).

Suggested change

if test_case.expected_error:

if test_case.expected_error:

# First, allow matching by exception type name (e.g., "ValueError")

if type(e).__name__ == test_case.expected_error:

return

# Otherwise, fall back to matching against the exception message

Copilot · 2026-02-12T19:39:45Z

kubeflow/spark/api/spark_client_test.py

+        TestCase(
+            name="connect create session verification",
+            expected_status=SUCCESS,
+            config={},
+        ),
+    ],
+)
+def test_spark_client_connect(test_case: TestCase, spark_client):
+    """Test SparkClient connect method scenarios."""
+
+    try:
+        if "url" in test_case.config:
+            from kubeflow.spark.backends.kubernetes.utils import validate_spark_connect_url
+
+            result = validate_spark_connect_url(test_case.config["url"])
+            assert result == test_case.expected_output
+        else:
+            # Verify backend methods are not called initially
+            spark_client.backend.create_session.assert_not_called()
+            spark_client.backend.wait_for_session_ready.assert_not_called()
+


The "connect create session verification" case doesn’t call SparkClient.connect() and only asserts some backend methods haven’t been called, so it doesn’t validate any behavior and can’t fail if connect() regresses; either remove this case or make it call connect() and assert the expected backend interaction (e.g., create_and_connect is invoked).

pyproject.toml

Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

andreyvelich · 2026-02-20T01:52:40Z

/ok-to-test

Copilot AI review requested due to automatic review settings February 12, 2026 18:55

google-oss-prow bot requested review from astefanutti, kramaranya and szaher February 12, 2026 18:56

google-oss-prow bot added the size/L label Feb 12, 2026

Copilot started reviewing on behalf of digvijay-y February 12, 2026 18:56 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

digvijay-y and others added 3 commits February 13, 2026 00:48

Refactored unit test

70ff95a

Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

chore: bump minimum model-registry version to 0.3.6 (kubeflow#289)

78cff45

Signed-off-by: Jon Burdo <jon@jonburdo.com> Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

Changes made

5d8280e

Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

digvijay-y force-pushed the spark_test branch from db47042 to 5d8280e Compare February 12, 2026 19:18

digvijay-y changed the title ~~refactor(spark): convert individual test classes to @pytest.mark.parametrize format~~ feat(spark): Refactor unit tests to sdk coding standards Feb 12, 2026

digvijay-y requested a review from Copilot February 12, 2026 19:34

Copilot started reviewing on behalf of digvijay-y February 12, 2026 19:35 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

digvijay-y and others added 2 commits February 13, 2026 12:42

Version

efdb709

Signed-off-by: digvijay-y <yewaredigvijay@gmail.com>

Merge branch 'kubeflow:main' into spark_test

a0f3194

google-oss-prow bot added the ok-to-test label Feb 20, 2026

-        if test_case.expected_error:
+        if test_case.expected_error:
+            # First, allow matching by exception type name (e.g., "ValueError")
+            if type(e).__name__ == test_case.expected_error:
+                return
+            # Otherwise, fall back to matching against the exception message

Comments

Conversation

digvijay-y commented Feb 12, 2026

Uh oh!

google-oss-prow bot commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreyvelich commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants