GitHub - cur4so/SparkExamples: here is (or will be) collection of simple Spark (pyspark) scripts

top_bottom_from_apache_log.py - the script parses a standard apache log file and returns top/bottom with the highest/lowest successful/failure request ratio, with at least one failed request. The output: , the ratio, the request count.

To run:

spark-submit top_bottom_from_apache_log.py <your_access.log> top|bottom ['{"limit":, "select":["<valid_field1>","<valid_field2>"]}']

kafka-stream-find-word-example.py - this script reads Kafka stream from specified broker(s) every 5 seconds. If a message contains a word of interest, it is copied to a file with a timestamp when it has been received. The output shows how many messages with the word have been received in a given 30 sec window, with 30 sec sliding, plus the file with massages, containing the word.

To run:

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1 kafka-stream-find-word-example.py server:port <word_of_interest>

kafka-in-out-example.py - this script reads Kafka unstructured stream from specified broker(s) and writes filtered and modified unstructured stream to . Filtering: only messages started with '#' go to the output. Modification: if a message has '"' symbols, they are backslashed in the output.

To run:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka-in-out-example.py server:port <checkpoints_dir>

kafka-to-file.py - this script reads Kafka json structured stream from specified broker(s) and writes predefined columns and rows to specified <data_dir>

To run:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka-to-file.py server:port <data_dir> <checkpoints_dir>

files-to-cassandra-example.py - this script reads specified <data_dir> and writes the content to specified Cassandra

To run:

spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.6 files-to-cassandra-example.py

<data_dir> []

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
files-to-cassandra-example.py		files-to-cassandra-example.py
kafka-in-out-example.py		kafka-in-out-example.py
kafka-stream-find-word-example.py		kafka-stream-find-word-example.py
kafka-to-file.py		kafka-to-file.py
top_bottom_from_apache_log.py		top_bottom_from_apache_log.py

cur4so/SparkExamples

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages