-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
From a comment in a PR #101:
Apache recommends 512-1024, Databricks recommends 1024 with a note that this may reduce parallelism in smaller data sets. The caveat here is to adjust partitioning to the data size and dimensions.
With that in mind... Both are correct. The second seems more complete.I would mashup the two or use the official Databricks/Apache recommendation.
1024 MB is about 1GB but I can't find the links from Databricks or Apache Spark with that reference. This issue is a placeholder so that we can get those links (request sent from the original poster) and validate before adding to the content.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels