welcome to pyspark i am going explain about what is Pyspark and Rdd ,Dataframe and DataSets .
RDD Introduction
RDDs are collections of objects similar to a list in Python the difference is that RDD is computed on several processes scattered across multiple physical servers, also called nodes in a cluster, while a Python collection lives and processes in just one process. Rdd is resielient distrubuted dataset which contain extra feature like immutable,falut tolarance....
In-Memory processing:- PySPark load the data from disk and process it in memory, and keeps the data in memory.