Skip to content

cur4so/SparkExamples

Repository files navigation

top_bottom_from_apache_log.py - the script parses a standard apache log file and returns top/bottom with the highest/lowest successful/failure request ratio, with at least one failed request. The output: , the ratio, the request count.

To run:

spark-submit top_bottom_from_apache_log.py <your_access.log> top|bottom ['{"limit":, "select":["<valid_field1>","<valid_field2>"]}']

kafka-stream-find-word-example.py - this script reads Kafka stream from specified broker(s) every 5 seconds. If a message contains a word of interest, it is copied to a file with a timestamp when it has been received. The output shows how many messages with the word have been received in a given 30 sec window, with 30 sec sliding, plus the file with massages, containing the word.

To run:

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1 kafka-stream-find-word-example.py server:port <word_of_interest>

kafka-in-out-example.py - this script reads Kafka unstructured stream from specified broker(s) and writes filtered and modified unstructured stream to . Filtering: only messages started with '#' go to the output. Modification: if a message has '"' symbols, they are backslashed in the output.

To run:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka-in-out-example.py server:port <checkpoints_dir>

kafka-to-file.py - this script reads Kafka json structured stream from specified broker(s) and writes predefined columns and rows to specified <data_dir>

To run:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka-to-file.py server:port <data_dir> <checkpoints_dir>

files-to-cassandra-example.py - this script reads specified <data_dir> and writes the content to specified Cassandra

To run:

spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.6 files-to-cassandra-example.py

<data_dir> []

About

here is (or will be) collection of simple Spark (pyspark) scripts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages