Await vs. return Task

#net7

#speed

#memory

2023-10-15

👋 Introduction

When you use a single task in your method you have two options:

await the task in the method, probably because you need its result, befure returning from the method
return the task and leave it to the caller to await it

✏️ Benchmarking Plan

What do we actually want to measure? We want to measure the cost of awaiting a task instead of returning it. The best option for this is to use an already existing and completed task, so that we don't spoil our benchmark with the cost of creating the task and the cost of waiting for it to be completed. Therefore, I will use the Task.CompletedTask variable: it's static, and it's already completed. I would probably only need to benchmark the scenario with the await, because the other one (returning the task) is almost an empty method (it basically just returns a pointer), but let's not skip this case so that we can be sure that we're not missing anything.

⏱️ Benchmark

Download the Sample Code

Code

using BenchmarkDotNet.Attributes;

namespace Benchmarks.Console.Misc;

[MemoryDiagnoser]
public class AsyncVsReturnTask
{
    [Benchmark]
    public async Task AwaitCompletedTask()
    {
        await Task.CompletedTask;
    }

    [Benchmark]
    public Task ReturnCompletedTask()
    {
        return Task.CompletedTask;
    }
}

Results

BenchmarkDotNet v0.13.9+228a464e8be6c580ad9408e98f18813f6407fb5a, Windows 11 (10.0.22621.2428/22H2/2022Update/SunValley2)
Intel Core i5-10210U CPU 1.60GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.100-preview.2.23157.25
  [Host]     : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2

Method	Mean	Allocated
AwaitCompletedTask	12.5317 ns	-
ReturnCompletedTask	0.3036 ns	-

We can see that when returning the task it's like an empty method, checking its JIT output also confirms this (two mov instructions), so our methodology seems to be correct, because by awaiting it instead of returning it, we will get the cost of await. On this 10th gen i5 it's 12ns. Now, it's not much, but still an overhead. Now it might be tempting to summarize this in "relatives" like: "it's 40x slower than returning the task", but it would be quite misleading, because:

In this special case we would essentially be comparing it to an empty method that is theoretically to nothing, and comparing (relatively) to nothing just does not make sense (nothing is 0 and for comparing you should divide by 0, which is an invalid operation).
Also it's quite rare that a task is already completed (and in that case, when it's known, one should use a ValueTask).

So, to summarize it, you save 12ns by returning a task instead of awaiting on it. Or is it?

Code v2

I figured that awaiting on an already completed task might behave differently thus producing different results. So let's run the same benchmark on a task that is not completed at the time of awaiting.

In the next code snippet I simply create a task from an iteration that increments an instance member (to prevent the loop being optimized away). I will benchmark various iteration counts so that we can see scaling in action. Also, during the fine tuning of the benchmark I realized that I cannot really you use huge iteration numbers like a thousand and more, because the natural volatility of such a task is too high to measure differences between the returning and the awaiting. Actually, even synchronously executing the underlying loop more than a thousand times shows quite a noticable volatility in performance (like hundreds of nanoseconds, which is quite a disadvantage when measuring differences of 0-100ns).

[MemoryDiagnoser]
public class AsyncVsReturnTask
{
    [Benchmark]
    [Arguments(0)]
    [Arguments(5)]
    [Arguments(10)]
    [Arguments(100)]
    public async Task<int> Await_Iterations(int iterations)
    {
        return await CreateDummyTask(iterations);
    }

    [Benchmark]
    [Arguments(0)]
    [Arguments(5)]
    [Arguments(10)]
    [Arguments(100)]
    public Task<int> Return_Iterations(int iterations)
    {
        return CreateDummyTask(iterations);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private Task<int> CreateDummyTask(int iterations)
    {
        return Task.Run(() =>
        {
            int dummy = 0;

            for (var i = 0; i < iterations; ++i) ++dummy;

            return dummy;
        });
    }
}

Method	iterations	Mean	Allocated
Await_Iterations	0	910.8 ns	280 B
Return_Iterations	0	773.4 ns	160 B
Await_Iterations	5	884.3 ns	280 B
Return_Iterations	5	757.5 ns	160 B
Await_Iterations	10	896.9 ns	280 B
Return_Iterations	10	772.6 ns	160 B
Await_Iterations	100	928.5 ns	280 B
Return_Iterations	100	830.8 ns	160 B

And now it starts getting interesting, because we are not only seeing greater differences in performance, but we are actually seeing allocations! Bear in mind, that now we can see allocations for the simple returning one because we are creating a new Task every time.

The "compare in absolutes, not in relatives" principle applies to this scenario too. Every task's execution duration is different, but the cost of await is constant, it does not "scale" with the task, therefore it does not makes sense, actually it's impossible to talk about it relatively. In individual cases, yes, you may be able to talk about it relatively, but when you are talking about the "cost of await" in general, that can be an absolute value only.

📋 Summary

When comparing the awaiting and returning of a task, we should expect some performance impact when awaiting. If the task is already completed one can get away with only a few nanoseconds of performance penalty without any memory allocation. If the task is still running, one should expect 80-150ns of performance penalty with 120B more allocated memory. Do not use these measurements as a basis for relative comparison, because comparing to a Task's execution time (which is different every time) just does not make sense. Calculate with these as absolute values.