chore(spark): migrate SDK to kubeflow_spark_api Pydantic models by tariq-hasan · Pull Request #295 · kubeflow/sdk

tariq-hasan · 2026-02-15T21:50:34Z

What this PR does / why we need it:

This PR migrates the Spark SDK from constructing CRDs using raw dictionaries to using the typed Pydantic models provided by kubeflow_spark_api. There are no user-facing API changes in this PR.

What changed:

Replace raw dict-based CRD construction with typed Pydantic models from kubeflow_spark_api
Convert to dict only at the Kubernetes API boundary
Parse API responses using .from_dict() instead of manual extraction
Keep user-facing dataclasses (Driver, Executor, SparkConnectInfo) unchanged

Why:

Improves type safety within the SDK
Aligns Spark with the established Trainer/Optimizer architecture

Testing:
Tested against kubeflow_spark_api==2.4.0rc0.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #271

Checklist:

Docs included if any changes are user facing

github-actions · 2026-02-15T21:50:43Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

google-oss-prow · 2026-02-15T21:50:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2026-02-15T21:53:51Z

Pull Request Test Coverage Report for Build 22043884336

Details

215 of 218 (98.62%) changed or added relevant lines in 6 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.06%) to 72.887%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
kubeflow/spark/backends/kubernetes/utils.py	48	51	94.12%

Totals
Change from base Build 21955571005:	0.06%
Covered Lines:	4105
Relevant Lines:	5632

💛 - Coveralls

Copilot

Pull request overview

This PR refactors the Spark SDK to use typed Pydantic models from kubeflow_spark_api instead of raw dictionaries for CRD construction. This aligns the Spark SDK with the established architecture pattern used by the Trainer SDK.

Changes:

Added kubeflow-spark-api>=2.3.0 dependency and migrated from dict-based CRD construction to typed Pydantic models
Updated all option implementations to work with Pydantic models instead of dictionaries
Refactored backend methods to convert to/from Pydantic models at the Kubernetes API boundary

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pyproject.toml	Added kubeflow-spark-api>=2.3.0 dependency
uv.lock	Lock file updates for new kubeflow-spark-api dependency
kubeflow/spark/backends/kubernetes/backend.py	Convert between dict and Pydantic models at API boundary using .to_dict() and .from_dict()
kubeflow/spark/backends/kubernetes/utils.py	Refactored build_spark_connect_crd to return Pydantic model; renamed parse_spark_connect_status to get_spark_connect_info_from_cr with Pydantic input
kubeflow/spark/types/options.py	Updated all option callables to accept SparkConnect Pydantic model instead of dict
kubeflow/spark/backends/kubernetes/backend_test.py	Enhanced mock responses to include all required fields for Pydantic model validation
kubeflow/spark/backends/kubernetes/utils_test.py	Updated tests to work with Pydantic models and added validation test for invalid CR
kubeflow/spark/types/options_test.py	Migrated tests to use spark_connect_model fixture and verify Pydantic model attributes
hack/Dockerfile.spark-e2e-runner	Added --pre flag to allow installation of pre-release versions

Copilot · 2026-02-15T21:54:20Z

kubeflow/spark/types/options.py

+            role_spec.template = models.IoK8sApiCoreV1PodTemplateSpec()
+
+        # Convert existing template to dict, merge, and convert back
+        existing_dict = role_spec.template.to_dict() if role_spec.template else {}


Redundant None check in ternary expression. Since role_spec.template is guaranteed to be non-None after line 193, the ternary expression role_spec.template.to_dict() if role_spec.template else {} can be simplified to role_spec.template.to_dict().

Suggested change

existing_dict = role_spec.template.to_dict() if role_spec.template else {}

existing_dict = role_spec.template.to_dict()

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

Copilot AI review requested due to automatic review settings February 15, 2026 21:50

google-oss-prow bot requested review from Electronic-Waste, kramaranya and szaher February 15, 2026 21:50

google-oss-prow bot added the size/XL label Feb 15, 2026

Copilot started reviewing on behalf of tariq-hasan February 15, 2026 21:50 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

tariq-hasan changed the title ~~refactor(spark): migrate SDK to kubeflow_spark_api Pydantic models~~ chore(spark): migrate SDK to kubeflow_spark_api Pydantic models Feb 15, 2026

tariq-hasan added 4 commits February 15, 2026 17:01

chore(spark): add kubeflow-spark-api dependency

9c20765

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): migrate options to typed Pydantic models

0fabaa0

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): migrate utils to typed Pydantic models

e1cc4ac

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): migrate backend to typed Pydantic models

255b2ad

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

tariq-hasan force-pushed the refactor/spark-pypi-models branch from 2df7db9 to 255b2ad Compare February 15, 2026 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

chore(spark): migrate SDK to kubeflow_spark_api Pydantic models#295

chore(spark): migrate SDK to kubeflow_spark_api Pydantic models#295
tariq-hasan wants to merge 4 commits intokubeflow:mainfrom
tariq-hasan:refactor/spark-pypi-models

tariq-hasan commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

google-oss-prow bot commented Feb 15, 2026

Uh oh!

coveralls commented Feb 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	existing_dict = role_spec.template.to_dict() if role_spec.template else {}
	existing_dict = role_spec.template.to_dict()

Comments

Conversation

tariq-hasan commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

google-oss-prow bot commented Feb 15, 2026

Uh oh!

coveralls commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 22043884336

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coveralls commented Feb 15, 2026 •

edited

Loading