I'm implementing a wrapper of a C library for Python. The obvious choice is to write some C linked to libpython, and the not so obvious but simpler one is to use ctypes. ctypes is really simple to use: it has ways to declare structures and function types, with several classes that represents the simple types: c_int, c_char, c_char_p, c_void_p, etc.

Now, this library has a function, write(), that handles a buffer. In this case, for buffer I mean a fixed size space in memory with data, paired with an integer telling us how much of the space is really data. So basically its declaration is like this:

void (*write) (const char *buf, size_t size);

The void pointer is because this is the type of a struct member that has to point to such a function. Looking at that declaration, one would think that, assuming a c_size_t is already declared with the correct type, the corresponding declaration in ctypes is:

write_t = CFUNCTYPE(c_void_p, c_char_p, c_size_t)

This is what the ctypes documentation calls a callback function.

The problem arises with the c_char_p there. With this class, ctypes assumes that the parameter is a string, and not a buffer. Both strings and buffers in C are fixed size space in memory. The difference between them is that strings are \x00 ended, so its size it's determined by the first occurence of a \x00 in the memory space, while a buffer has to be accompanied by an integer, as I mentioned before. A \x00 cannot occur in a string (the trailing one is not always considered as part of the string per se), while it can occur several times in a buffer. In fact, a buffer can be entirely full of \x00's.

So what ctypes does here is to convert our buffer into a string. Any occurence of a \x00 in the original data will make c_char_p end the string and forget about the rest of the data, ignoring the real size of the buffer. Even more, if the original data has no \x00 in it, ctypes might cause a segmentation fault trying to find one beyond the process' memory space. This not only corrupts data, but might even crash the app!

The solution is simple, luckly enough. You just neet to treat your buffer as a void * instead of a char *. So the declaration ends up being:

write_t = CFUNCTYPE(c_void_p, c_void_p, c_size_t)

The later, in our callback, we can convert that buffer into a str()[1] to manipulate it as such:

def write(buf, size):
    data = string_at(buf, size)

The size parameter is again important; if not, string_at() will again think in terms of string and not of buffer. I think this has to be improved a little. Maybe next PyCamp I'll file a bug and develope a patch, either for the code or the documentation; maybe both.

[1] This is Python 2.5

python c

Posted Wed 27 Jan 2010 11:55:55 PM CET Tags: c