The unary increment operator: Prefixed vs. Postfixed
👋 Introduction
The unary increment operator (++) has two forms: prefixed and postfixed. The difference is that the former returns the already incremented value and the latter returns the original value and then increments it. Their logic is different, but what about their performance?
⏱️ Performance
When it comes to performance one should prefer the prefixed one, right? You'd think yes, because based on their implementations, the prefixed one should use less registers and less instructions:
Implementation of the prefixed one:
x += 1;
return x;
Implementation of the postfixed one:
var temp = x;
x +=1;
return x;
Let's check some actual scenarios!
⚙️ JIT Compilation
public class IncrementOperator
{
private int _dummy1;
private int _dummy2;
private int _dummy3;
private int _dummy4;
public int pre(int i) {
_dummy1 = ++i;
return i;
}
public int post(int i) {
_dummy1 = i++;
return i;
}
}
Now, if you check the postfixed one and start to think about how it will be compiled, you might come to the conclusion that it won't use a temp variable! Why? Because it gets inlined, obviously. And if it is inlined then there's no need for a temp variable, because you can assign the original value directly to the target variable, and then increment it, and just like that we're getting the same performance regardless of the operator being prefixed or postfixed.
The two methods above compile to almost the same machine code:
- Set the incremented return value.
- Move the incremented (or the original) value to _dummy1.
- Return.
public int pre(int i)
lea eax, [rdx+1]
mov [rcx+0xc], eax
ret
public int post(int i)
lea eax, [rdx+1]
mov [rcx+0xc], edx
ret
❗ When Time wins over Optimization
This was of course a well-designed example that serves our purpose. But! In some rare cases the JIT - due to the time pressure - cannot provide the most optimized machine code possible.
Let's check the jitted output of the below method.
public void post_unoptimized(int i)
{
_dummy1 = i++;
_dummy2 = i++;
_dummy3 = i++;
_dummy4 = i++;
}
You'll be surprised to see that it's quite redundant:
lea eax, [rdx+1]
mov [rcx+8], edx
mov edx, eax
lea eax, [rdx+1]
mov [rcx+0xc], edx
mov edx, eax
lea eax, [rdx+1]
mov [rcx+0x10], edx
mov [rcx+0x14], eax
ret
I mean... Why are we bouncing between edx
and eax
? We should use a single register for i
, and increment it, and before every increment copy it to the appropriate memory address.
Actually... if you make the c# code more explicit, you can achieve the most optimized code:
public void post_optimized(int i)
{
_dummy1 = i;
++i;
_dummy2 = i;
++i;
_dummy3 = i;
++i;
_dummy4 = i;
}
And see yourself the perfect machine code:
mov [rcx+8], edx
inc edx
mov [rcx+0xc], edx
inc edx
mov [rcx+0x10], edx
inc edx
mov [rcx+0x14], edx
ret
📋 Summary
In this post we saw that there should be no difference in the performance of the prefixed and postfixed increment operators. Unfortunately, the time pressure wins when jitting the postfixed one, resulting in a less optimized code that would otherwise be possible. We can make the jit emit the most optimized code by being a little more explicit about what we'd like to achieve.