MAP :
- map is a transformation operation in Spark hence it is lazily evaluated
- It is a narrow operation as it is not shuffling data from one partition to multiple partitions
scala> val x=sc.parallelize(List("spark","rdd","example","sample","example"),3) x: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[4]
at parallelize at <console>:27 scala> val y=x.map(x=>(x,1)) y: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[5]
at map at <console>:29 scala> y.collect res0: Array[(String, Int)] = Array((spark,1), (rdd,1), (example,1),
(sample,1), (example,1))