Code Faff: C reactor rewrite

Driven by my earlier post, I just rewrote my portable C event reactor with a new API, including proactor-like calls, i.e., a proactor is emulated. The new API replaces clumsy calls like this:

react_fdcond cond = { .fd = myfd, .mode = react_MIN };
react_prime(ev, react_CFD, &cond);

…with this:

react_prime_fdin(ev, myfd);

I can't remember what value the old style had, except that it looked very fcntl/ioctl-like. This style is much better, as it's type-safe, and doesn't rely on arbitrary constants (react_CFD) that have to be always available even if the condition type is not. Internally, I also got rid of all the switches on condition type, and replaced them with a couple of function pointers, defuse and act, and this has led to a cleaner model of how event handles are manipulated in general. The core has a platform-dependent ‘core registration’ which is a record of all things that are being monitored, including:

a set of event handles primed on idle events,
a binary heap of handles primed on timed events,
on Windows, an optional handle for Windows messages/events,
on Windows, an optional handle for APCs,
on Windows, an array of HANDLEs to be passed to WaitForMultipleObjects and friends,
on Unix-like systems with support for poll and ppoll, and array of struct pollfds to be passed to those functions,
on other Unix-like systems with support for select and pselect, an array of three fd_sets,
on Unix-like systems, an optional handle for signals.

Options 5, 6 and 7 are mutually exclusive, and each comes with a parallel structure that maps the event to the event handle watching for it, as well as a record of which events are currently in use but not necessarily being watched for. When an event handle is primed, it makes some contribution to the core registration, and sets its defuse and act pointers to indicate what to do in two circumstances. When react_yield is called, it checks the core registration, and decides what data it has to pass to the system call (WaitForMultipleObjects, poll, etc.), and it sometimes even decides on what system call too. When the call returns, the reactor finds out which events have occurred, and invokes the corresponding act function on the appropriate handle. The same handle's act function may be called several times if it has been primed on several events at once. The function should record in the user's structures what has happened, ensure the handle has been queued for processing (which will later result in the user's function being invoked), and remove the specific event from the core registration so it won't be watched for again. The act does not remove the event from the in-use set, as we want to prevent other handles from priming on an event that has occurred but is not yet passed to the user. After all acts have been called, the non-empty queue with the highest priority is processed. Just before each user function is called, defuse is invoked, which removes any of the handle's events remaining in the core registration, and removes all events that the handle was primed with from the in-use set. Handles in lower-priority queues do not get processed until at least another invocation of react_yield.

This approach allows for something that was not clearly possible under the old approach: a handle could be primed on multiple events, and receive and accumulate them on separate calls to react_yield, so as to present them to the user as a single call when eventually processed. This was necessary to allow ALSA's snd_seq_poll_descriptors and related functions to interoperate easily with the reactor.

We can program robustly within the reactor with defuse and act. defuse is reset just before being invoked, so it is safe to defuse a handle twice, and its job is clear: remove remaining events from the core registration, and remove original events from the in-use set. We can be more sloppy with act; it's only called if the handle still has events in the core, which defuse removes, so we can keep it around until the handle is primed again.

Unix-related specifications could do with a few standardized additions to fd_set to make the use of select more optimal:

// Get and remove from *s an FD in range [0, nfds), or return FD_SETSIZE.
int FD_EXTRACT(fd_set *s, int nfds);

// Add all FDs in *f to *s.  Optionally write already present ones to *alr.
void FD_ADD(fd_set *alr, const fd_set *f, fd_set *s);

// Remove all FDs in *f from *s.  Optionally write missing ones to *mis.
void FD_SUBTRACT(fd_set *mis, const fd_set *f, fd_set *s);

// Count collisions in *a and *b, and optionally write them to *res.
int FD_INTERSECT(fd_set *res, const fd_set *a, const fd_set *b);

(Actually, only FD_EXTRACT and FD_INTERSECT would really be valuable for the reactor, but the others should be trivially implementable with inside knowledge of fd_sets.)

select-based implementations can manage upto FD_SETSIZE descriptors. WaitForMultipleObjects-based implementations can handle upto MAXIMUM_WAIT_OBJECTS-1 HANDLEs, but at least that's per-thread. poll should be able to handle a lot more, although I think its array limit is the maximum number of descriptors in the process, yet that doesn't account for the possibility of the same FD appearing multiple times in the array. Still, I don't think it's a limit that can easily be reached.

Anyway, the library is there now. It needs people to use it, try it out, find bugs.

Update 2017-12-09: Found a nasty bug. Some events in the system array could be 'forgotten' if they happen to be last in the array. The check that causes the array to shrink was being applied too generally, i.e., even when the element being removed wasn't at the end. Silly, but fixed.

Code Faff

2016-05-07

C reactor rewrite

No comments:

Post a Comment

Search this blog

Look at this life-ruining scumbag

Blog archive