Wednesday, August 4, 2010

Cache Coherency

The presence and usage of ARM and DSP side caches poses a cache coherency problem when we want to access a shared area by both processors. Let's consider the following scenario:

1. A CMEM buffer is allocated for shared usage by the GPP side and its physical pointer is passed to the DSP.
2. The DSP wants to read and then write some data into this buffer. Let's say that there are free entry slots in the DSP L2 cache - so the data actually gets written to the DSP cache, instead of making it to the DDR shared region. The DSP then signals the GPP that it's done with the buffer for the time being.
3. The GPP attempts to read the buffer, but what it reads is just the garbage values present in the buffer after initialization since the DSP-written data is in the DSP cache. This buffer also gets cached in the ARM-side now, so when the GPP tries to write some new data into it, it stays into the ARM cache and doesn't make it to the main memory either.
4. If the DSP tries to read the cache now, it won't get what the ARM has written into it most recently since it'll be reading from its own cache, and vice-versa for the ARM side.

We can see that the "same" buffer actually exists in three different locations (main memory, DSP cache, ARM cache), all of which can contain totally different data - in this case it is said that they are not coherent, and that we have a cache coherency problem.

In most cached systems, there are cache coherency protocols which prevent these situations from occuring. The TMS320C64x+ DSP Cache User's Guide states:

In the following cases, it is your responsibility to maintain cache coherence:
• DMA or other external entity writes data or code to external memory that is then read by the CPU
• CPU writes data to external memory that is then read by DMA or another external entity

thus we have to manually maintain cache coherence for mutual access to CMEM regions by the DSP and the GPP. Studying the scenario above, we can observe that there are two underlying problems:

1. If the memory block to be read already exists in the local cache, there's a risk that the local cache is outdated: we need to discard the local cache entries and fill them up with information from the main memory. This process is called cache invalidation.
2. When the memory block is to be written into, there's a risk that the info remains in the local cache and doesn't make it to the main memory: we have to make sure that the new info gets written to the main memory as well. This process is called cache writeback.

Therefore, from a RPC perspective, for a call that involves transferring buffers, the steps we have to take are as follows:

1. Before passing the marshalled info via DSP/Link, the DSP must do a cache writeback
2. Before passing the params to the GPP side stub, the GPP must do a cache invalidate
3. After the GPP side stub is finished, the GPP must do a cache writeback
4. The DSP side stub must do a cache invalidate before terminating

This is assuming that both processor caches will be active - in case the DSP cache is disabled, the steps 1 and 4 will not be necessary, and likewise with steps 2 and 3 for a disabled GPP cache.

No comments:

Post a Comment