Replies: 1 comment 2 replies
-
|
@Yicong-Huang Did you see similar issues in other Apache projects? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
summary
Seeing severe latency when data flows from an upstream operator into a Python-based operator. The compute inside Python is fast; the delays happen before and after it, suggesting a transmission/bridge issue.
Example workflow
Workflow A: CSV Scan → Sort (Python)
~3 minutes from workflow start until Sort begins processing
Sort compute itself < 1s
Then ~2 minutes before the workflow reports completion
Workflow B: CSV Scan → Aggregation (Java native)
End-to-end ≈ 20s
Hypothesis
Bottleneck in JVM ↔ Python handoff
Beta Was this translation helpful? Give feedback.
All reactions