Q. How did changing values on the SparkSession property parameters affect the throughput and latency of the data?
A. By checking processedRowsPerSecond
Q. What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?
A.
- spark.streaming.kafka.maxRatePerPartition
- spark.sql.shuffle.partitions
- spark.default.parallelism