Found issues in Oct. 2024 tour

- almost all of the links in the "Additional Spark Functions (Bonus)" section are broken 🥲
- Is anybody else having issues accessing this link? [www.indellient.com/blog/a-journey-from-pandas-to-spark-data-frames/](http://www.indellient.com/blog/a-journey-from-pandas-to-spark-data-frames/)
- Same with this one - it doesn't go to to the batch vs stream article
[Streaming isn’t always the solution](https://www.section.io/engineering-education/batch-processing-vs-stream-processing/) I think this doesn't exist anymore after manually typing out the url https://www.webscale.com/engineering-education/batch-processing-vs-stream-processing/
- In "Spark Workflow and Partitioning" course the link to "to emphasise the point"  is broken or not sure if it meant some joke. In the same section the hyper link to "[The coalesce transformation applied to a DataFrame](https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.coalesce) "
- Broken link (intro to streaming): https://datastorageasean.com/blogs/5-use-cases-stream-processing-demonstrate-its-business-value-0
- I noticed in the Handling Late Data section of House 9, under Tumbling Time Window section that the 3rd paragraph says we should "sum up all the prices for the stock symbol" but below in the example, the aggregation used is actually max. 
- Stateful streaming: Exercise: Read from Stream: It says "Rows per Section" instead of "Rows per Second"

**Local setup**
- Installing Java M1 chip which made it a bit more complex. I ended up downloading Rosetta 2 to make it work. Maybe its worth adding?
- In the [readme](https://github.com/data-derp/exercise-vanilla-spark/tree/main?tab=readme-ov-file#readme) under  "Apache Spark Set up instructions", step 8 says "The same can be done for pyspark". This instruction is too unclear, what does "the same" referr to, is there any way to make it more explicit? I copy and pasted the code and it worked but too little explanation.
- Also in the readme, for the Aapche section, on step 3 when you're exporting  SPARK_HOME it says you can put it in your preferred location, would it be possible to add a suggested location?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Found issues in Oct. 2024 tour #132

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Found issues in Oct. 2024 tour #132

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions