Quantcast
Channel: KVR Audio
Viewing all articles
Browse latest Browse all 3222

DSP and Plugin Development • Re: Atomic ring/dual buffer implementation?

$
0
0
The thing is.. "volatile" is wrong thing for this.
I found the doc page: https://learn.microsoft.com/en-us/cpp/b ... w=msvc-170
So, acquire-read from volatile is the default on Intel (I guess mov from memory is always an acquire), but not necessarily elsewhere.
It's because on some architectures it might be cached in an inconsistent state and have changed between the read and the branch.
On any architecture you can get an interrupt after loading from pPendingOrRegaining and have producer code run a few times, so that when InterlockedExchangePointer loads from pPendingOrRegaining again, it's not the same value.

To avoid thinking about that, let's assume for a moment that the producer always allocates a new buffer with malloc() or new, fills it with data, and writes the address into pPendingOrRegaining.

Code:

#include <atomic>#include <span>std::atomic<std::span<float>*> g_p_data = nullptr;void consume_data(std::span<float> data);void consumer() {    std::span<float>* p_data = g_p_data.load();    if (p_data) {        consume_data(*p_data);    }}void produce_data(std::span<float> data);void producer() {    size_t data_size = 256;    float* data_buf = new float[data_size];    std::span<float>* p_data = new std::span<float>(data_buf, data_size);    produce_data(*p_data);    g_p_data.store(p_data);}
The way I see it, for this to work correctly on ARM with ldr/str/dmb, we need the load() to compile to ldr, dmb (so a barrier after load) and store() to compile to dmb, str. What we are trying to achieve is that if the consumer saw the effect of store(), it also sees the effect of the preceding produce_data().

Now, without explicitly asking for acquire/release, gcc for ARMv7 surrounds ldr/str with dmb: https://godbolt.org/z/Mavhnhbcn , but I don't think it's needed for correctness in this case.

Another argument could be that when writing this stuff by hand without atomics, we'd also like to prevent the compiler from moving unrelated expensive computation into our "critical sections" (not literally in this case, but close enough), and using some kind of barrier intrinsic does that.

In any case, now that we can use atomics, there should be less room to screw things up. If in the original code we replace `DataSlot* volatile` with `std::atomic<DataSlot*>`, normal read with load(), InterlockedExchangePointer() with exchange(), and remove all MemoryBarrier(), I think it becomes a very nice portable implementation.

Statistics: Posted by Alien Brother — Sat Aug 17, 2024 5:12 pm



Viewing all articles
Browse latest Browse all 3222

Trending Articles