|
|
|
## Introduction
|
|
|
|
The objective of this Wiki page is to give some useful information about **FastMutex** and **std::atomic<>** absolute performance on the STM32 platform with the hope of helping with choosing the best candidate every time the developer bothers about process speed in multithreading code.
|
|
|
|
|
|
|
|
## How tests were made
|
|
|
|
Every piece of data displayed on this page comes from several executions of the entrypoint named `mutex-benchmark` running on a `STM32F429ZI` board. The source file allows a lot of customization about the testing environment (i.e. number of operations, number of threads, etc.), so if you're wondering how a different board would perform, or/and would like to try changing the setup for the tests, check it out and feel free to play with the tunings.
|
|
|
|
|
|
|
|
Standard deviations are not shown for the tests where the results were so consistent that they were practically zero.
|
|
|
|
|
|
|
|
## Test 1: FastMutex lock() and unlock() performance
|
|
|
|
The purpose of the first test is to measure how much overhead a varying number of threads synchronized with a mutex introduce compared to executing the exact same task sequentially on a single thread.
|
|
|
|
In this test, the sample task consists of writing 24000 times 4 bytes on the interface SPI1. The following table shows the numerical results of the test. Note that the calls to lock() and unlock() are split equally between threads. Also, the number of SPI operations are the same for each iteration, so that the excess execution time is only due to FastMutex overhead.
|
|
|
|
|
|
|
|
| **Threads** | **Number of locks/unlocks** | **Execution time [ms]** | **Overhead** |
|
|
|
|
|-------------|-----------------------------|-------------------------|--------------|
|
|
|
|
| 1 | - | 2341 | - |
|
|
|
|
| 2 | 8000 | 2390 | 2.05 % |
|
|
|
|
| 2 | 24000 | 2494 | 6.54 % |
|
|
|
|
| 4 | 8000 | 2393 | 2.22 % |
|
|
|
|
| 4 | 24000 | 2497 | 6.66 % |
|
|
|
|
| 8 | 8000 | 2400 | 2.52 % |
|
|
|
|
| 8 | 24000 | 2496 | 6.62 % |
|
|
|
|
|
|
|
|
## Test 2: comparison of using std::atomic or FastMutex and an int
|
|
|
|
This test consists of measuring the execution time of 50000 increments of an integer protected by a mutex vs a std::atomic<int>.
|
|
|
|
|
|
|
|
| **Threads** | **FastMutex or std::atomic?** | **Execution time [ms]** |
|
|
|
|
|-------------|-------------------------------|-------------------------|
|
|
|
|
| 1 | ordinary integer | 132 |
|
|
|
|
| 2 | FastMutex | 1652 ± 21 |
|
|
|
|
| 2 | std::atomic | ~132 |
|
|
|
|
| 4 | FastMutex | 3667 ± 4.6 |
|
|
|
|
| 4 | std::atomic | ~132 |
|
|
|
|
| 8 | FastMutex | 3647 ± 7.6 |
|
|
|
|
| 8 | std::atomic | ~132 |
|
|
|
|
Results show that std::atomic performs practically the same as a normal integer.
|
|
|
|
|
|
|
|
## Test 3: comparison of using std::atomic vs FastMutex as synchronizer
|
|
|
|
This test benchmarks two different ways to keep threads from executing a critical section in parallel. The FastMutex test trivially locks the mutex at the start of the section and then unlocks it at the end, while the std::atomic version uses the value of an atomic_int as a Dijkstra semaphore of multiciplity 1 (values 0/1).
|
|
|
|
The number of locks and unlocks is constant (50000) for all the tests:
|
|
|
|
|
|
|
|
| **Threads** | **FastMutex or std::atomic?** | **Execution time [ms]** |
|
|
|
|
|-------------|-------------------------------|-------------------------|
|
|
|
|
| 2 | FastMutex | 221.8 ± 11.6 |
|
|
|
|
| 2 | std::atomic | 237 |
|
|
|
|
| 4 | FastMutex | 376.9 ± 3.18 |
|
|
|
|
| 4 | std::atomic | 235 |
|
|
|
|
| 8 | FastMutex | 375.4 ± 1.17 |
|
|
|
|
| 8 | std::atomic | 233 |
|
|
|
|
|
|
|
|
This test clearly shows a significantly lower overhead when using the std::atomic variant: please note, however, that more tests (possibly involving longer critical sections of code) are needed to take it as a proven fact. |