Friday, June 18, 2010

A brief overview of RPC over multiple cores

RPC (which stands for Remote Procedure Call) is defined as follows by Wikipedia:

Remote procedure call (RPC) is an Inter-process communication technology that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction (...)

Within our context of heterogenous OMAP3 multi-cores, it doesn't exactly refer to the same thing as it does in general, but still, there are enough familiarities. For example, the DSP and GPP use the same memory (ie, the DDR RAM on the Beagle) as main memory, but this doesn't make their address spaces the same - to start with, the DSP works with physical addresses whereas the GPP works with logical ones. And the binaries are obviously not compatible since we have two different processors and architectures (hence the "heterogenous" in "heterogenous multicore processing"). If we want the GPP and DSP to co-operate towards the same goal (especially in situations like video decoding which we'd really appreciate the DSPs help), it would be useful to get them to talk to each other without much effort. The DSP should be able to easily obtain the data it needs to work with (which usually resides in the GPP file system or program memory), process it and hand it back to the GPP.

While all of this is quite feasible as of the moment, doing something like this is far from trivial (set up OE, build DSP/Link, examine the examples, learn the DSP/Link API, set up the GPP side structures for bringing up the DSP, try to get the DSP things working without being able to see what you're doing unless you have CCS or are using C6RunApp ;) and so on...). And DSP side code is C code. TI's CGT6000 is a C compiler. You could essentially write the same things and compile/run on either side. Why all this hassle? Why not be able to pass parameters and call functions from one side to the other? Why not live in harmony and peace? Then RPC is the answer! (maybe except that last one).

Scrolling down the Wikipedia article, we see the steps involved in making RPC work; and it's rather trivial to see these from a DSP to GPP RPC point of view. Let's say we have this function int gpp_side_function(int param1, float param2, char *param3) somewhere in the GPP side, which we want to call from the DSP.

  1. The client (DSP in our case) calls the Client stub, which has the same definition. The call is a local procedure call, with parameters pushed on to the stack in the normal way. 
  2. The client stub packs the parameters into a message (done according to some predefined structure, e.g 2 bytes function name length, n bytes function name, then 2 bytes parameter count, then each of the parameters with their lengths, etc.) and making a system call to send the message. In our case, sending the message is putting a message on the DSP->GPP MSGQ. Packing the parameters is called marshaling
  3. The server (the GPP side in our case) receives the message, unpacks (unmarshals) it and locates the corresponding local function call stub.
  4. Finally, the server stub calls the desired procedure. The reply traces the same in other direction.
Similarly, one could to GPP->DSP RPC calls - a component of the C6Run project called C6RunLib is meant for this purpose.

There are certain implementation details that need special attention in our case:

Passing Pointers as RPC Parameters
Since the addressing mechanism of the GPP and DSP are different, we need to make sure that any pointer parameters / buffers which are passed are accessible from both sides. When doing GPP->DSP calls, one can use the CMEM module to obtain a shared region of memory and translate the addresses forth and back to ensure mutual accessibility, but there is no CMEM interface on the DSP side, thus one must resort to workarounds. If the size of the buffer parameters is always available, one can pass the contents of the buffers themselves back and forth via DSP/Link. One approach could be using the DSP/Link POOL module which allocates shared buffers and provides address translation, though this will not be suitable for large amounts of memory usage since there is a limited number of constant-sized POOL buffers. Another could be using a special protocol to access the CMEM interface on the GPP side and doing all the allocation from there, but both in this case and the POOL buffer case we have to ensure the passed pointer parameters are allocated with our special method and not from the DSP stack region, otherwise we'll be in trouble.

Openness to Expansion
Having a RPC framework that can only call 5 (or 50, or 100) predefined functions on the other side is not very useful, one wants to be able to define and call one's own functions. And to be able to do this, the framework must not have too many hardcoded details such as when to do the address translations mentioned above. To address this, I'm using a function signature system in my implementation to be able to identify parameter types and where special attention is needed. For example, for our above function 

int gpp_side_function(int param1, float param2, char *param3)

the corresponding signature would be

iif@

where the first symbol identifies the return type as an integer, and the other three identify parameter types. The final @ indicates that some manner of address translation (or passing the buffer back/forth, if that's the preferred approach) is needed: the marshaler may take care of the translation before passing the message, or the server stub may do the translation itself, depending on how the protocol is defined.

Also, I've designed the marshaler itself to be as user-friendly as possible - let's say you want to provide the DSP side function stub for the above function. All you have to do is pass the function name, function signature (which can be extracted via a simple lookup table) and all the parameters to the marshaller, prompt the data transfer, and return the result with the desired data type. So the stub for the above function would look like this:

int gpp_side_function(int param1, float param2, char *param3)
{
    rpc_marshal("gpp_side_function", "iif@", param1, param2, param3);
    rpc_makecall();
    return RPC_GETRESULT(int);
}

The process is quite suitable for automation, and I'm planning to have a script that auto-generates the stubs given the definitions.

Finding and invoking the corresponding GPP stub
It's easier said than done - so you have a string giving you the GPP-side function name and a bunch of parameters. How do you locate the corresponding function, and how do you actually make the function call? So far, I've been thinking of these approaches:
  1. Use a static jump table. An intuitive albeit not-so-elegant solution. Associate each function name with a number (perhaps via hashing?), and use a function pointer table to jump to the stub. The stub has to take care of the parameter demarshaling and passing the parameters. A static approach, so one has to add the table entry as well as the GPP-side stubs. 
  2. Use an assembly function caller. Once we are able to decode the function address (perhaps statically as in 1, perhaps via dynamic loading with libdl functions) we can have an assembly routine which pushes the parameters onto the stack and call the function living at the located address. This is what C6RunLib uses, and will be the eventually preferred method.

No comments:

Post a Comment