I need to sort an RDD. The sort needs to be on multiple fields of my record and I hence need a custom Comparator.
I see that the sortBy
as it accepts only a single key. I chanced upon http://codingjunkie.net/spark-secondary-sort/ and thus used repartitionAndSortWithinPartitions
to achieve the same.
Why doesn't sortBy
accept a custom Comparator and sort? Why do I have to repartition just inorder to user a custom Comparator?