Skip to content

ashok918/pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

pyspark

welcome to pyspark i am going explain about what is Pyspark and Rdd ,Dataframe and DataSets .

RDD Introduction

RDDs are collections of objects similar to a list in Python the difference is that RDD is computed on several processes scattered across multiple physical servers, also called nodes in a cluster, while a Python collection lives and processes in just one process. Rdd is resielient distrubuted dataset which contain extra feature like immutable,falut tolarance....

Rdd Benefits

In-Memory processing:- PySPark load the data from disk and process it in memory, and keeps the data in memory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages