Since we rung in the new year, we've been discussing various myths that I often see development teams run into when trying to optimize their Spark jobs. So far, we have covered:
Why increasing the executor memory may not give you the performance boost you expect.
Why increasing the number of executors also may not give you the boost you expect.
Why increasing driver memory will rarely have an impact on your system.
Why increasing overhead memory is almost never the right solution.
This week, we're going to talk about executor cores. First, as we've done with the previous posts, we'll understand how setting executor cores affects how our jobs run. We'll then discuss the issues I've seen with doing this, as well as the possible benefits in doing this.
This is a topic where I tend to differ with the overall Spark community, so if you disagree, feel free to comment on this post to start a conversation. As always, the better everyone understands how things work under the hood, the better we can come to agreement on these sorts of situations.
How Does Spark Use Multiple Executor Cores?
So the first thing to understand with executor cores is what exactly does having multiple executor cores buy you? To answer this, lets go all the way back to a diagram we discussed in the first post in this series.
As we discussed back then, every job is made up of one or more actions, which are further split into stages. These stages, in order to parallelize the job, is then split into tasks, which are spread across the cluster. Each task handles a subset of the data, and can be done in parallel to each other.
So how are these tasks actually run? Well if we assume the simpler single executor core example, it'll look like below. In this case, one or more tasks are run on each executor sequentially. As an executor finishes a task, it pulls the next one to do off the driver, and starts work on it. This allows as many executors as possible to be running for the entirety of the stage (and therefore the job), since slower executors will just perform fewer tasks than faster executors.
Note that we are skimming over some complications in the diagram above. Namely, the executors can be on the same nodes or different nodes from each other. Additionally, each executor is a YARN container. The driver may also be a YARN container, if the job is run in YARN cluster mode. Finally, the pending tasks on the driver would be stored in the driver memory section, but for clarity it has been called out separately.
So far so good. Now what happens when we request two executor cores instead of one? From the YARN point of view, we are just asking for more resources, so each executor now has two cores. Because YARN separates cores from memory, the memory amount is kept constant (assuming that no configuration changes were made other than increasing the number of executor cores).
Instead, what Spark does is it uses the extra core to spawn an extra thread. This extra thread can then do a second task concurrently, theoretically doubling our throughput. The result looks like the diagram below.
This seems like a win, right? We're using more cores to double our throughput, while keeping memory usage steady. Given that most clusters have higher usage percentages of memory than cores, this seems like an obvious win. Sadly, it isn't as simple as that.
What's the Problem?
Looking at the previous posts in this series, you'll come to the realization that the most common problem teams run into is setting executor memory correctly to not waste resources, while keeping their jobs running successfully and efficiently.
Let's say that we have optimized the executor memory setting so we have enough that it'll run successfully nearly every time, without wasting resources. Now let's take that job, and have the same memory amount be used for two tasks instead of one. It's pretty obvious you're likely to have issues doing that. That's because you've got the memory amount to the lowest it can be while still being safe, and now you're splitting that between two concurrent tasks.
This is essentially what we have when we increase the executor cores. Increasing executor cores alone doesn't change the memory amount, so you'll now have two cores for the same amount of memory. So once you increase executor cores, you'll likely need to increase executor memory as well. The naive approach would be to double the executor memory as well, so now you, on average, have the same amount of executor memory per core as before.
One note I should make here: I note this as the naive solution because it's not 100% true. Some memory is shared between the tasks, such as libraries. Assuming you'll need double the memory and then cautiously decreasing the amount is your best bet to ensure you don't have issues pop up later once you get to production.
And with that you've got a configuration which now works, except with two executor cores instead of one. But what has that really bought us now? We are using double the memory, so we aren't saving memory. At this point, we might as well have doubled the number of executors, and we'd be using the same resource count.
Increasing number of executors (instead of cores) would even make scheduling easier, since we wouldn't require the two cores to be on the same node. This means that using more than one executor core could even lead us to be stuck in the pending state longer on busy clusters.
Based on this, my advice has always been to use one executor core configurations unless there is a legitimate need to have more. But, this is against the common practice, so it's important to understand the benefits that multiple executor cores have that increasing the number of executors alone don't.
Hidden Benefits
The biggest benefit I've seen mentioned that isn't obvious from above is when you shuffle. If you shuffle between two tasks on the same executor, then the data doesn't even need to move. The data is still in the container in memory (or on disk, based on caching), so no network traffic is needed for that. This decreases your traffic utilization, and can make the network transfers that do need to occur faster, since the network isn't as busy.
Based on this, if you have a shuffle heavy load (joining many tables together, for instance), then using multiple executor cores may give you performance benefits. It is best to test this to get empirical results before going this way, however. The typical recommendations I've seen for executor core count fluctuates between 3 - 5 executor cores, so I would try that as a starting point. Keep in mind that you will likely need to increase executor memory by the same factor, in order to prevent Out of Memory exceptions. If you don't do this and it is still successful, you either have failures in your future, or you have been wasting YARN resources.
Conclusion
Based on the above, my complete recommendation is to default to a single executor core, increasing that value if you find the majority of your time is spent joining many different tables together. When doing this, make sure to empirically check your change, and make sure you are seeing a benefit worthy of the inherent risks of increasing your executor core count.
As I stated at the beginning, this is a contentious topic, and I could very well be wrong with this recommendation.
That said, based on my experience in recommending this to multiple clients, I have yet to have any issues. In contrast, I have had multiple instances of issues being solved by moving to a single executor core. I actually plan to discuss one such issue as a separate post sometime in the next month or two.
If you have a side of this topic you feel I didn't address, please let us know in the comments! I'd love nothing more than to be proven wrong by an eagle-eyed reader!