-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
- almost all of the links in the "Additional Spark Functions (Bonus)" section are broken 🥲
- Is anybody else having issues accessing this link? www.indellient.com/blog/a-journey-from-pandas-to-spark-data-frames/
- Same with this one - it doesn't go to to the batch vs stream article
Streaming isn’t always the solution I think this doesn't exist anymore after manually typing out the url https://www.webscale.com/engineering-education/batch-processing-vs-stream-processing/ - In "Spark Workflow and Partitioning" course the link to "to emphasise the point" is broken or not sure if it meant some joke. In the same section the hyper link to "The coalesce transformation applied to a DataFrame "
- Broken link (intro to streaming): https://datastorageasean.com/blogs/5-use-cases-stream-processing-demonstrate-its-business-value-0
- I noticed in the Handling Late Data section of House 9, under Tumbling Time Window section that the 3rd paragraph says we should "sum up all the prices for the stock symbol" but below in the example, the aggregation used is actually max.
- Stateful streaming: Exercise: Read from Stream: It says "Rows per Section" instead of "Rows per Second"
Local setup
- Installing Java M1 chip which made it a bit more complex. I ended up downloading Rosetta 2 to make it work. Maybe its worth adding?
- In the readme under "Apache Spark Set up instructions", step 8 says "The same can be done for pyspark". This instruction is too unclear, what does "the same" referr to, is there any way to make it more explicit? I copy and pasted the code and it worked but too little explanation.
- Also in the readme, for the Aapche section, on step 3 when you're exporting SPARK_HOME it says you can put it in your preferred location, would it be possible to add a suggested location?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels