2016-12-24

Resolving directed acyclic graph with ordered edges in SQL

Suppose you have a table holding the adjacency set of a directed acyclic graph (DAG). It has two columns, parent and child, each row indicating an edge from one node to another. With a few temporary tables, it's easy enough to resolve any set of nodes into a fuller set of those nodes and all their descendants. You can use it to represent group membership, and find rules that match a given node directly or indirectly. You can also create a deterministic function to do the resolution for you, and use it in triggers to prevent cycles.

Now add another column, seqno, that orders the direct children of a node. Now you want to resolve a node to a sequence with depth-first traversal. SQL doesn't do recursion, so how do you go about it? Here's some example data:

parent child seqno
1 2 0
1 3 1
1 4 2
2 4 0
4 5 0
4 6 1

So 1 imports 2, 3 and 4, but 2 also imports 4, which also imports 5 and 6. Although 1 directly imports 4 after 3, 4 should appear before 3 in the result. The expected sequence is 1, 2, 4, 5, 6, 3, 4, 5, 6, from which we remove the later duplicates to give 1, 2, 4, 5, 6, 3.

Here's a technique that scans the graph in two passes. In the first, we descend level by level, looking for the highest seqno of any node's outgoing edges (the maximum fanout) at that level. In our example, that gives us 3, 2, 2, 0. We then work back, starting with a counter of zero, multiplying the number in the table by the counter and adding one, which becomes the new counter: 0×0+1=1; 2×1+1=3; 2×3+1=7. Note that we don't use the first fanout 3. We reverse the result 1, 3, 7, and associate these factors with each level of the graph, so level 1 maps to 7, 2 to 3, and 3 to 1.

Now we start the second pass, this time keeping a computed sequence number. The root element 1 is assigned 0 as its sequence number. Then we find its children are 2, 3 and 4 in sequence 0, 1 and 2. We assign their sequence numbers as the root's assigned number 0, plus 1, plus the original sequence number times the factor at the current level: so 2 is placed at 0+1+0×7=1, 3 is placed at 0+1+1×7=8, and 4 is placed at 0+1+2×7=15.

When we move to the next level, using factor 3, 2 at 1 imports 4 with seqno 0, so 4 is placed at 1+1+0×3=2. We now have 4 at both 2 and 15, so we can lose the later one at 15. That also puts 4 (at 2) before 3 (at 8).

Essentially, what we're doing is leaving a space at each level for a full tree (but with potentially different fanout at each level). We then use the highest factor at the top level to spread the first generation out, leaving enough gap between two siblings for the descendants of the 'older' sibling.

Here's the body of a stored function that takes one node id, and resolves it to a sequence that represents depth-first traversal, with later duplicates removed. Let's hope I've got it right!:

BEGIN
DECLARE idlist TEXT DEFAULT '';
DECLARE scl INT UNSIGNED;
DECLARE lyr INT UNSIGNED;
DECLARE prv INT UNSIGNED;

-- Create a table to hold fan-outs for each level of the graph.
DROP TEMPORARY TABLE IF EXISTS fanouts;
CREATE TEMPORARY TABLE fanouts
   (layer INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
    max_fanout INT UNSIGNED,
    factor INT UNSIGNED DEFAULT NULL);

-- Create a table to hold candidates, and populate it with the root
-- node.
DROP TEMPORARY TABLE IF EXISTS cands;
CREATE TEMPORARY TABLE cands (id INT UNSIGNED);
INSERT INTO cands (id) VALUES (root);

-- Create a table to hold parents of the current candidates.
DROP TEMPORARY TABLE IF EXISTS parents;
CREATE TEMPORARY TABLE parents (id INT UNSIGNED);

-- Work out the fanouts for each level.
WHILE EXISTS (SELECT id FROM cands) DO
   -- Get the maximum fan-out at this level.
   INSERT INTO fanouts
      SELECT NULL AS layer,
             MAX(seqno) + 1 AS max_fanout,
             NULL AS factor
         FROM edges
         WHERE parent IN (SELECT id FROM cands)
         GROUP BY parent
         ORDER BY max_fanout DESC
   LIMIT 1;

   -- Work out what's at the next level.
   INSERT INTO parents
      SELECT edges.child AS id
         FROM cands
   INNER JOIN edges
     ON edges.parent = cands.id;

   -- Make the next level the current level, and reset the next level.
   DELETE FROM cands;
   INSERT INTO cands
      SELECT DISTINCT id FROM parents;
   DELETE FROM parents;
END WHILE;

-- Work out the scale factor for each level.
SET scl = 0;
SET prv = 0;
SELECT MAX(layer) FROM fanouts INTO lyr;
WHILE lyr > 0 DO
   UPDATE fanouts
      SET factor = scl * prv + 1
      WHERE layer = lyr;
   SELECT max_fanout, factor
      FROM fanouts
      WHERE layer = lyr
      INTO prv, scl;
   SET lyr = lyr - 1;
END WHILE;



-- SECOND PHASE


-- This table holds the final result.  seqno defines the order.
DROP TEMPORARY TABLE IF EXISTS done;
CREATE TEMPORARY TABLE done
   (id INT UNSIGNED, seqno INT UNSIGNED);

-- This table lists nodes that we have to add to the result.
DROP TEMPORARY TABLE IF EXISTS cands;
CREATE TEMPORARY TABLE cands
   (id INT UNSIGNED, seqno INT UNSIGNED);

DROP TEMPORARY TABLE IF EXISTS parents;
CREATE TEMPORARY TABLE parents
   (id INT UNSIGNED, seqno INT UNSIGNED);

INSERT INTO cands (id, seqno) VALUES (root, 0);

SET lyr = 1;
WHILE EXISTS (SELECT id FROM cands) DO
   SELECT factor FROM fanouts WHERE layer = lyr INTO scl;
   SET lyr = lyr + 1;

   INSERT INTO parents
      SELECT child AS id,
             cands.seqno + 1 + scl * edges.seqno AS seqno FROM cands
   JOIN edges
     ON edges.parent = cands.id
   ORDER BY cands.seqno ASC, edges.seqno ASC;

   -- Acknowledge that we have the parents of the current candidates.
   INSERT INTO done
      SELECT id, seqno FROM cands;
   DELETE FROM cands;

   -- Keep the earliest entry from done and parents.
   DELETE FROM parents
      WHERE id IN (SELECT id FROM done WHERE done.seqno < parents.seqno);
   DELETE FROM done
      WHERE id IN (SELECT id FROM parents WHERE done.seqno > parents.seqno);

   -- Make the parents the new candidates.
   INSERT INTO cands
      SELECT id, seqno FROM parents;
   DELETE FROM parents;
END WHILE;

SELECT GROUP_CONCAT(id ORDER BY seqno ASC SEPARATOR ',')
    FROM done
    INTO idlist;
RETURN idlist;
END;

Happy now?

2016-11-20

Extracting an email attachment to a pipe in procmail

At work, some of my emails come through some Microsoft mail server, which I don't use. However, through its web interface, I can set up redirection to the address I do actually use. But there's a problem: the redirected email is identical to the original, except that the Message-Id header field is overwritten with one the server chooses locally. What the hell could this be useful for?!! It completely messes up threading, and achieves nothing in its place! Worse, the new id ends in .local!! I thought these things were supposed to be globally unique…?

My own emails run through a procmail script that I control, so I already get it to fix things like restoring the subject line when mangled by spam filters. This does the job for that:

:0 fhw
* ^X-Spam-Prev-Subject:
| formail -R X-Spam-Prev-Subject Subject -U Subject

Can I do something similar for Message-Id? No, not with what gets redirected. However, there is another option: forward email as attachment. I checked, and the message id within the attachment is preserved!

I hunted around for a tool to unpack the attachment and print it to standard output (so it could be used as a procmail filter), but tools like munpack and ripmime only unpack to a directory. With nothing ready-made, I decided to pick a language with some hopefully mature RFC822 library:

#!/usr/bin/env python

"""Unpack a MIME message onto STDOUT."""

# Distilled from https://docs.python.org/2/library/email-examples.html

import sys
import email

def main():

    msg = email.message_from_file(sys.stdin)

    if not(msg.is_multipart()):
        sys.stderr.write('Not multipart\n')
        sys.exit(1)
    attachment = msg.get_payload(1)
    print attachment.get_payload(0).as_string(False)
    sys.exit(0)

if __name__ == '__main__':
    main()

The procmail configuration just has to be certain that the message has come through the mindlessly mangling server. I used three lines that should prevent mishaps:

:0 fw
* ^From:.*<my@email\.address\.example\.com>
* ^X-MS-Has-Attach: yes
* ^X-MS-Exchange-Inbox-Rules-Loop: my@email\.address\.example\.com
| /path/to/the/python/script

The encapsulating message should always come from me, even if the original is from someone else.

It seems to do the job! Just tried it with an image attachment (within the attached message), and it was fine. I wonder how well it will cope with very big emails. Will the Python library automatically save to disc when the size is above a threshold?

2016-07-11

Headless clipping and making scalable of SVGs exported from LibreOffice

Finally, I've found a satisfactory way of generating decent SVGs from ODG files. By ‘decent’, I mean that:

  • They scale. The width and height attributes must be removed, or contain only percentages. Leaving these in with any other units (or no units) means you've got a VG, not an SVG!
  • They are properly cropped. Export from ODG using unoconv places the graphic on a canvas as big as the page it happened to be created on.
  • They have no background, unless that was explicitly defined in the source.

By ‘satisfactory’, I mean that you can automate the process in a headless environment.

Previously, I used unoconv to export to PDF, pdfcrop to crop to the area it actually uses, then pdf2svg to make an SVG from it, and finally xsltproc to strip its width, height and background. However, this stopped working for me recently, as the libraries that pdf2svg uses changed, so the background element could not so easily be found with xsltproc.

So here's the new way:

  • Export with unoconv directly to SVG. It will still have the wrong viewbox, and an absolute size, but there will be no background, so no need to do anything special to remove it.
  • Use xsltproc stripSize.xsl to remove the absolute size. This step also makes sure that the next step generates dimensions in the correct units for a viewbox.
  • Use inkscape --query-all to get the desired viewbox. Only the first line of the output is required.
  • Compute the new width and height as percentages. At least one will be 100%.
  • Use xsltproc replaceViewBox.xsl to replace the current viewbox with the desired one, and add the width and height back in as percentages.

Here's a series of commands to try:

unoconv -f svg -o stage1.svg input.odg
xsltproc --novalid stripSize.xsl stage1.svg > stage2.svg
viewbox="$(inkscape -z --query-all stage2.svg | head -1 | \
  awk -e 'BEGIN { FS="," } { print $2 " " $3 " " $4 " " $5 }')"
vbp=($viewbox)
if (( $( echo "${vbp[3]} > ${vbp[2]}" | bc -lq) )) ; then
  height="100%"
  width=$(echo "100 * ${vbp[2]} / ${vbp[3]}" | bc -lq)
  width="$(printf '%.4g%%\n' "$width")"
else
  width="100%"
  height=$(echo "100 * ${vbp[3]} / ${vbp[2]}" | bc -lq)
  height="$(printf '%.4g%%\n' "$height")"
fi
xsltproc --novalid -param viewbox "'$viewbox'" \
  -param width "'$width'" \
  -param height "'$height'" \
  replaceViewBox.xsl stage2.svg > output.svg

This use of inkscape does not start a GUI, so you can use it in headless environments. It would be nice if this could all fit into a single pipe, but I don't think there's a way, as stage2.svg is used twice. Is there a way to use process substitution to substitute the same file twice?

The two XSLT files are here:

<!-- stripSize.xsl -->
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:svg="http://www.w3.org/2000/svg">

  <xsl:output omit-xml-declaration="no"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/svg:svg/@width" />

  <xsl:template match="/svg:svg/@height" />
</xsl:stylesheet>
<!-- replaceViewBox.xsl -->
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:svg="http://www.w3.org/2000/svg">

  <xsl:output omit-xml-declaration="no"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/svg:svg/@viewBox">
    <xsl:attribute name="viewBox">
      <xsl:value-of select="$viewbox" />
    </xsl:attribute>
    <xsl:attribute name="width">
      <xsl:value-of select="$width" />
    </xsl:attribute>
    <xsl:attribute name="height">
      <xsl:value-of select="$height" />
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>

This feature has been added to Mokvino Web. However, conversion from PDF to SVG is now disabled to ensure that the new conversion path is chosen.


Update 2019-02-15: I added the width and height attributes back in, but only with computed percentages. This prevents some software balking at size-less SVGs, without turning scalability back off.

2016-05-16

Jardeps: Makefiles for Java the right way

When Java came out, folks tried to build Java projects using makefiles the same way they did for C. Foo.class depends on Foo.java, so compile the latter to get the former:
%.class: %.java
	$(JAVAC) $<

The problem is, it's just not as ‘simple’ as in C, where all the things you depend on are in header files whose contents you consciously make distinct from source files. Instead, Java method declarations are not separate from their definitions, which is easier to maintain, but undermines the principles that makefiles depend on.

This then means that, when compiling one source file, which needs declarations from another source file, you end up compiling both in one go. (javac works better this way anyway, so maybe it's best not to fight it.)

I tried to tackle this by extracting a kind of ‘header’ from a Java file, and using this to build a dependency rule. The first question was whether to try and do it before compilation of the file, or after. Since Java only needs the source files to compile, headers can be generated after the fact, provided we can force compilation before we test any dependencies against the headers.

Another question is what can be used as a header. A class file is not suitable, because it contains implementation details which should be hidden. Also, dependencies based on class files can produce complex and cyclic rules, which no flavour of make is likely to relish. Verbose output from javap, or something based on it, is more suitable. Ordering of declarations should also have no influence on it, so some line-based format (perhaps each declaration on an unbroken line) is best, after it has been sorted to produce a canonical form.

The next problem is about aggregation of compiler commands. As far as I know, GNU Make doesn't have any feature to aggregate multiple similar commands into one. It's not so important for C compilers, but javac can have a significant start-up time, and one of the goals of using a makefile is to reduce build times by minimizing the amount of work to be done. I tried picking out source files that had changed, but this misses an occasional case in Java, and can result in inconsistent annotation processing, as some classes get compiled implicitly some of the time, and explicitly on other occasions. Per-file dependencies were also not easy to compute and keep up-to-date accurately, as (IIRC) some would change even though their source files didn't. From this, I concluded that it is better to consider per-source-tree dependencies rather than per-file. This also helps to deal with cyclic dependencies; provided they don't span more than one tree, they don't cause a problem.

I'm going to use the term ‘module’ to refer to a set of related classes in a source tree, but independent of whether they are expressed as source or as class files (so I'll be talking about per-module dependencies, in fact). I tend to use ‘(source) tree’ and ‘module’ interchangeably though.

The next problem stems from generating headers after compilation. If you generate them every time, everything that depends on them will get recompiled, even if the content didn't change. This is relatively easy to solve: generate the header elsewhere, compare it with the current header, and replace the current header with it if their contents differ.

Now we have a new problem. The header isn't a regular target that you depend on any more. In fact, it's a “potentially non-updating by-product” of meeting another target (compilation). You must complete that target before comparing the by-product with one of its dependencies. I achieved this initially with:

foo.compiled: | bar.compiled
foo.compiled: bar.api

…where *.compiled is the target that compiles a module, and *.api is the header (the by-product) generated from it. Here, module foo depends on the header of module bar. The first rule ensures that bar has been compiled, but it doesn't demand that foo gets compiled if bar is newer. It also ensures that, as a by-product, bar.api will exist; it might not be updated every time, but it will exist. Even though there is no rule targeting bar.api, the second rule is now safe.

The problem here is that this approach is not safe for a parallel build. (I don't particularly require a parallel build, and javac already seems to exploit parallelism to some extent. I just want projects to be able to include my makefile library without having to globally disable any parallelism they're exploiting.) Because there is no way to express the relationship between bar.compiled and bar.api, a parallel build tries to meet both rules at the same time, not realising that the tests of the second must not begin before the first is complete. One solution might be a .WAIT pseudo-prerequisite:

foo.compiled: | bar.compiled
foo.compiled: .WAIT bar.api

…but that's not widely supported. It also only rather indirectly implies the relationship between bar.compiler and bar.api. I thought I might be able to hack a new rule type into GNU Make to do this explicitly:

foo.compiled: | bar.compiled
foo.compiled: bar.api
bar.api: & bar.compiled

…but I'm just not sufficiently familiar with the program structure to do this, or to know that it is straight-forwardly possible. An alternative strategy was proposed on the GNU Make users' mailing list:

foo.compiled: bar.api
bar.api: | bar.compiled
%.api: %.compiled
	cmp -s $*.api-tmp $*.api || cp $*.api-tmp $*.api

(I wonder, can I fold the order-only rule into the pattern rule?)

This is completely safe for parallel builds, but has one small caveat. If bar is compiled, a new bar.api-tmp is produced, which will be newer than bar.api. That means the pattern rule will be invoked, and the conditional copy will be applied. If the contents have changed, bar.api will be updated, preventing the rule from being invoked next time. Otherwise, bar.api-tmp remains perpetually newer than its counterpart, and the rule's recipe will keep being followed, until bar's header really does change. GNU Make will assume that some work was actually done, so it won't report ‘nothing to be done for…’. If we try to hide the command with @, we hide the activity from the user, but not from Make, so we don't get back the reassuring message of inactivity.

My solution here is a patch for GNU Make, one rather simpler than trying to add a new rule type. Instead, support a special .IDLE target to list targets whose commands should not be considered activity that suppresses the inactivity message:

foo.compiled: bar.api
bar.api: | bar.compiled
.IDLE: bar.api
%.api: %.compiled
	cmp -s $*.api-tmp $*.api || cp $*.api-tmp $*.api

This isn't a radical change. It just extends to user-defined targets a property that generated included makefiles already have.

The result is Jardeps, a makefile that you can include in your own projects to manage the building of Java components, possibly alongside other parts. You still have to express inter-module dependencies, but it's a matter of writing deps_foo += bar, and Jardeps does the rest. It also has a few other bells and whistles:

  • It generates ServiceLoader meta-data from in-code annotations.
  • It manages resource bundles.
  • It merges per-module manifests into per-jar manifests.
  • It generates OSGi imports and exports.
  • It includes a versioned-jar installation tool.
  • I just added support for CORBA IDL code generation. (I probably broke everything else with that.)

Anyway, if you write Java programs, please give it a look and try it out. Even if you have only one source tree, but other non-Java components to build, Jardeps can help to prevent the Java part being recompiled unconditionally each time. I've been using it for a couple of years. Get back to me with any problems.

2016-05-07

C reactor rewrite

Driven by my earlier post, I just rewrote my portable C event reactor with a new API, including proactor-like calls, i.e., a proactor is emulated.  The new API replaces clumsy calls like this:

react_fdcond cond = { .fd = myfd, .mode = react_MIN };
react_prime(ev, react_CFD, &cond);

…with this:

react_prime_fdin(ev, myfd);

I can't remember what value the old style had, except that it looked very fcntl/ioctl-like.  This style is much better, as it's type-safe, and doesn't rely on arbitrary constants (react_CFD) that have to be always available even if the condition type is not.  Internally, I also got rid of all the switches on condition type, and replaced them with a couple of function pointers, defuse and act, and this has led to a cleaner model of how event handles are manipulated in general.  The core has a platform-dependent ‘core registration’ which is a record of all things that are being monitored, including:

  1. a set of event handles primed on idle events,

  2. a binary heap of handles primed on timed events,

  3. on Windows, an optional handle for Windows messages/events,

  4. on Windows, an optional handle for APCs,

  5. on Windows, an array of HANDLEs to be passed to WaitForMultipleObjects and friends,

  6. on Unix-like systems with support for poll and ppoll, and array of struct pollfds to be passed to those functions,

  7. on other Unix-like systems with support for select and pselect, an array of three fd_sets,

  8. on Unix-like systems, an optional handle for signals.

Options 5, 6 and 7 are mutually exclusive, and each comes with a parallel structure that maps the event to the event handle watching for it, as well as a record of which events are currently in use but not necessarily being watched for.  When an event handle is primed, it makes some contribution to the core registration, and sets its defuse and act pointers to indicate what to do in two circumstances.  When react_yield is called, it checks the core registration, and decides what data it has to pass to the system call (WaitForMultipleObjects, poll, etc.), and it sometimes even decides on what system call too.  When the call returns, the reactor finds out which events have occurred, and invokes the corresponding act function on the appropriate handle.  The same handle's act function may be called several times if it has been primed on several events at once.  The function should record in the user's structures what has happened, ensure the handle has been queued for processing (which will later result in the user's function being invoked), and remove the specific event from the core registration so it won't be watched for again.  The act does not remove the event from the in-use set, as we want to prevent other handles from priming on an event that has occurred but is not yet passed to the user.  After all acts have been called, the non-empty queue with the highest priority is processed.  Just before each user function is called, defuse is invoked, which removes any of the handle's events remaining in the core registration, and removes all events that the handle was primed with from the in-use set.  Handles in lower-priority queues do not get processed until at least another invocation of react_yield.

This approach allows for something that was not clearly possible under the old approach: a handle could be primed on multiple events, and receive and accumulate them on separate calls to react_yield, so as to present them to the user as a single call when eventually processed.  This was necessary to allow ALSA's snd_seq_poll_descriptors and related functions to interoperate easily with the reactor.

We can program robustly within the reactor with defuse and actdefuse is reset just before being invoked, so it is safe to defuse a handle twice, and its job is clear: remove remaining events from the core registration, and remove original events from the in-use set.  We can be more sloppy with act; it's only called if the handle still has events in the core, which defuse removes, so we can keep it around until the handle is primed again.

Unix-related specifications could do with a few standardized additions to fd_set to make the use of select more optimal:

// Get and remove from *s an FD in range [0, nfds), or return FD_SETSIZE.
int FD_EXTRACT(fd_set *s, int nfds);

// Add all FDs in *f to *s.  Optionally write already present ones to *alr.
void FD_ADD(fd_set *alr, const fd_set *f, fd_set *s);

// Remove all FDs in *f from *s.  Optionally write missing ones to *mis.
void FD_SUBTRACT(fd_set *mis, const fd_set *f, fd_set *s);

// Count collisions in *a and *b, and optionally write them to *res.
int FD_INTERSECT(fd_set *res, const fd_set *a, const fd_set *b);

(Actually, only FD_EXTRACT and FD_INTERSECT would really be valuable for the reactor, but the others should be trivially implementable with inside knowledge of fd_sets.)

select-based implementations can manage upto FD_SETSIZE descriptors.  WaitForMultipleObjects-based implementations can handle upto MAXIMUM_WAIT_OBJECTS-1 HANDLEs, but at least that's per-thread.  poll should be able to handle a lot more, although I think its array limit is the maximum number of descriptors in the process, yet that doesn't account for the possibility of the same FD appearing multiple times in the array.  Still, I don't think it's a limit that can easily be reached.

Anyway, the library is there now.  It needs people to use it, try it out, find bugs.


Update 2017-12-09: Found a nasty bug. Some events in the system array could be 'forgotten' if they happen to be last in the array. The check that causes the array to shrink was being applied too generally, i.e., even when the element being removed wasn't at the end. Silly, but fixed.

2016-04-01

How to define a low-level C event reactor/proactor API?

Recently looking through the source of lispd (a daemon to support the LISP protocol for mobility), I saw it has some sort of bespoke reactor in it. It would be nice if there was a relatively simple reactor API that could be easily implemented on multiple platforms, and be documented only once, so that (for example) the authors of lispd could just use it directly, and other developers could follow the code without having to understand how a bespoke reactor works internally.

I have a portable reactor implementation, but I doubt anyone uses it but me. If you were designing a portable reactor API, e.g., as a POSIX interface, what should it look like? If you were using one, what would you expect to find?

Also, a proactor does a similar job, but has a different API. Sometimes, this API has better performance, because it permits the kernel to partly implement it. Can the APIs of a reactor and proactor be combined (into a “reproactor”)?

  • You can always emulate a proactor over a reactor. You don't get the kernel optimizations, but you just rebuild on a different system, and there they are.
  • Having the reactor interface around as well means that you can still access special APIs that are not represented in the proactor API.
  • Using event-type-specific functions (see below) blurs the distinction between reactor and proactor anyway.
  • So, I'm leaning towards supporting proactor-like calls in the API, even if they are not kernel-optimized.

Should events be automatically reprimed?

  • The main benefit is that it allows the reproactor to avoid rebuilding its internal structures on each iteration. It might make the application simpler too.
  • On the other hand, the event handler will almost always know how to reprime if it has been written properly.
  • Auto-repriming makes sense for monitoring (say) sockets, but not for timed events (where the handler would be repeatedly called until you explicitly cancelled the event). I don't like requiring the user to intuit this, as there may be cases where the expected behaviour is not intuitive. Also, if the interface is re-implemented, I foresee this subtle behaviour being missed if it's not clearly formalized, producing variable behaviour with the same interface, and chaos ensuing.
  • I'm leaning towards sticking with manual repriming. Auto-repriming also has other impacts on the issues below.

How should a reproactor support priorities?

  • It could support multiple (major-)priority queues, and on each iteration, flush out only one queue (the highest non-empty). Low-priority events could then be forced to wait for several iterations before being handled. You could ensure that an idle event handler would be invoked only when there really was nothing else to do.
  • There could be sub-priorities (or minor priorities) within a queue. This would merely guarantee ordering of execution of user functions within a queue. Is that worthwhile?
  • Both forms could be supported simultaneously, and the application could exploit one, both or neither, as required.
  • The number of priorities and sub-priorities don't have to be established beforehand. Just keep track of the highest priority requested, and create new queues as necessary. There can be implementation-defined limits.
  • Is 0 the highest priority or the lowest? It's perhaps more intuitive to make it the lowest. Yet, if you're going to allow the number of priorities to be dynamic, you probably want 0 to be the highest, and dynamically grow out from there, so you always know what the highest is, and put essential events on it. If you still want < and > to have the expected meanings, then you could grow down from 0, -1, -2, etc., and only do that internally. Probably not high-value.
  • For a real proactor (not one emulated over a reactor), must the priorities be part of the underlying OS API?
  • If multi-level priorities are supported, should they be set together in one call, or set separately?

How should a reproactor interact with multi-threaded programs?

  • Does it ever make sense to have more than one per thread? If not, we can make it a thread-local. In C11, call_once can be used to establish a tss_t key, and a function to identify the thread's reproactor will use that to create the reproactor in the first instance. A destructor will clean everything up when the thread terminates.
  • Should one thread be able to express interest in an event to be handled by a different thread? Perhaps, but it introduces synchronization complexity, undermining the simplicity of thread-locality.
  • Should pools of threads be able to watch for the same event? Doesn't make a lot of sense, unless you can guarantee that only one of the watchers will be informed of readiness. If two watchers are informed, one of them will be successful, while the other gets blocked when it actually tries to process the event. In only informing one thread to avoid this, you'd have to remember which thread you informed, so that the other threads don't test for it again.
    • Is it important to support this? Or should the user just do it themselves, by-passing the reactor to (say) listen for connections, and just get the threads to use their own reactors on their own sockets thereafter?

How should a reproactor interact with some platform-specific features, like Windows events?

  • The Windows events are especially high-priority, so should they be handled distinctly, with a special yield function? That would be really ugly.
  • Or do we ignore the priority, and always make them the highest?
  • Or perhaps, if the event is low-priority, and is blocked by higher-priority activity, we don't test for further events of the same type until it has been dealt with. This makes the most sense. We use a zero timeout, and use WaitForMultipleObjectsEx instead of MsgWaitForMultipleObjectsEx on Windows. A timeout of zero should be used anyway, because there will be unprocessed event handlers still pending.
    • The problem with the Windows events is that we don't find out ourselves what events have come in. Could we efficiently peek to probe for different types? If not, we would only be able to permit one handler to watch for events, and not watch for any more until the handler was called.

How do we handle signals?

  • Handling signals is do-able so long as a signal will interrupt all threads that have not blocked it, and we can have a signal handler per thread.
  • Handling signals at all should only be done if the user can also avoid using the reproactor for some signals, in case it is important that the user's signal handler actually take some action directly. That should be fine if we expose the sigset_t passed to pselect/ppoll.
  • Looks like per-thread signal handling is tricky, to say the least. There is some StackOverflow advice. I suppose, with one static array of volatile sig_atomic_ts, and atomic operations to increment them from the signal handler, and decrement from repro_yield(), it doesn't matter how many threads access the array. Just have to make sure that any interrupted ppoll/pselect returns with EINTR after the signal handler has completed.

Can a reproactor exploit splice() or similar functions to copy from one descriptor directly to another?

  • Yes. You should first wait for the destination to become writable. The proactive action to take place is then to prime an event to perform the splice. Interesting. That's a use case for an event to be primed alternately on two distinct event types, i.e., when auto-repriming is undesirable.

Do we allow internal, user-defined events?

  • What for? Isn't that the application's problem?
  • We could have a null event which never fires, except by an explicit trigger call.
  • It might help when providing external extensions to the API.
  • Current position: probably not useful at the moment, but having a null event type is fairly trivial. Could cause problems with repro_yield() detecting when there's no more work.

So far, I've listed event types as normal C enums. All types are listed, including ones not supported on a particular platform.

  • This means you have to include all possible types in the API, and report a run-time error if one attempts to prime on an unsupported type. That way, if an existing one becomes supported on its platform, its number won't change. You can't simply add at the end, because it might already be supported on another platform. This is cumbersome.
  • An alternative is to use addresses of static constant objects. They don't have to have any contents, just distinct addresses. Unsupported event types won't even compile. However, although it's not a big deal, it's not possible to switch on addresses, even constant ones.
  • Another alternative is to do away with condition types, and have specifically typed functions. They will fail in the same way as static constant object addresses, and they are more type-safe. In fact, I'm leaning towards this.

Should we allow a single event handle to wait on multiple events, triggering as soon as any one of them occurs?

  • That means we have to either stop watching for the others until the event handler has been called, or the handle exists in the queue while still primed for the remaining events. It will simultaneously be primed and triggered! Not impossible, but it breaks a previous invariant.
  • It might help to deal with the Windows events too…?
  • I'm leaning towards supporting this.

How do we fit in ALSA's attempts to interact with a reproactor?

  • ALSA has three functions to:
    • get the number of structures to pass to poll(),
    • write those structures into an array,
    • pass them back after poll() has modified them.
  • If we try to support this directly, it means we are allowing a library to wait on multiple events. (See above.)

What does the API look like now?

// Thread-locally create and destroy an event handle. 
repro_ev_t repro_open(void);
repro_close(repro_ev_t);

// Set and get priorities.
void repro_setprio(repro_ev_t, repro_prio_t);
void repro_setsubprio(repro_ev_t, repro_subprio_t);
repro_prio_t repro_getprio(repro_ev_t);
repro_subprio_t repro_getsubprio(repro_ev_t);
void repro_setprios(repro_ev_t, repro_prio_t, repro_subprio_t);
void repro_getprios(repro_ev_t, repro_prio_t *, repro_subprio_t *);

// Specify the event handler. 
typedef void repro_func_t(void *ctxt);
void repro_direct(repro_ev_t, repro_func_t *, void *ctxt);

// Artificially trigger/cancel an event.
void repro_defuse(repro_ev_t);
void repro_trigger(repro_ev_t); // Defuse and queue?
void repro_cancel(repro_ev_t); // Defuse and dequeue?

typedef const struct repro_cond *repro_cond_t;

// Example condition type as address of static object
extern const struct repro_cond repro_OSOCK;
#define repro_CSOCK (&repro_OSOCK)


// Generic priming function

int repro_prime(repro_ev_t, repro_cond_t, const void *conddata);

// Type-specific priming function
int repro_prime_sock(repro_ev_t, int sock, repro_iomode_t);

// More proactive type-specific priming functions
int repro_prime_idle(repro_ev_t);
int repro_prime_recv(repro_ev_t, int sock,
void *buf, size_t len, int flags,
ssize_t *rc, int *ec);
int repro_prime_recvfrom(repro_ev_t, int sock,
void *buf, size_t len, int flags, 
struct sockaddr *addr,
socklen_t *addrlen, 
ssize_t *rc, int *ec);
int repro_prime_read(repro_ev_t, int fd, void *buf, size_t len,
ssize_t *rc, int *ec);
int repro_prime_sock(repro_ev_t, int sock, repro_iomode_t, int *status);
int repro_prime_splice(repro_ev_t, int from, int to, ssize_t *rc, int *ec);
int repro_prime_poll(repro_ev_t, struct pollfd *, size_t);
int repro_prime_select(repro_ev_t, fd_set *, fd_set *, fd_set *);
int repro_prime_signal(repro_ev_t, int sig);

#ifdef __WIN32__
int repro_prime_winhandle(repro_ev_t, HANDLE, int *status);
#endif

#if _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE
// Access the signal mask, or return NULL if not available.

sigset_t *repro_sigset(void);
#endif

// Yield control to the reproactor.
int repro_yield(void);

Priming functions should return 0 on success. On error, they could set errno to:

  • EADDRINUSE/EBUSY - something is already waiting for that event.
    • Both could be used when auto-repriming, with one code indicating that an event has already triggered, and will re-prime in a conflicting way when handled. Not good when handling multiple events with one handle, as both error codes are then valid.
  • EINVAL - condition not recognized or has invalid parameters.

repro_yield() would set errno to:

  • EWOULDBLOCK/EGAIN - nothing to wait for.
  • EINTR - interrupted by signal not handled internally.
  • ? - some essential events not being awaited.

So, can you fill in the blanks? Or point out flaws?

Updated formatting 2021-05-18