2016-05-16

Jardeps: Makefiles for Java the right way

When Java came out, folks tried to build Java projects using makefiles the same way they did for C. Foo.class depends on Foo.java, so compile the latter to get the former:
%.class: %.java
	$(JAVAC) $<

The problem is, it's just not as ‘simple’ as in C, where all the things you depend on are in header files whose contents you consciously make distinct from source files. Instead, Java method declarations are not separate from their definitions, which is easier to maintain, but undermines the principles that makefiles depend on.

This then means that, when compiling one source file, which needs declarations from another source file, you end up compiling both in one go. (javac works better this way anyway, so maybe it's best not to fight it.)

I tried to tackle this by extracting a kind of ‘header’ from a Java file, and using this to build a dependency rule. The first question was whether to try and do it before compilation of the file, or after. Since Java only needs the source files to compile, headers can be generated after the fact, provided we can force compilation before we test any dependencies against the headers.

Another question is what can be used as a header. A class file is not suitable, because it contains implementation details which should be hidden. Also, dependencies based on class files can produce complex and cyclic rules, which no flavour of make is likely to relish. Verbose output from javap, or something based on it, is more suitable. Ordering of declarations should also have no influence on it, so some line-based format (perhaps each declaration on an unbroken line) is best, after it has been sorted to produce a canonical form.

The next problem is about aggregation of compiler commands. As far as I know, GNU Make doesn't have any feature to aggregate multiple similar commands into one. It's not so important for C compilers, but javac can have a significant start-up time, and one of the goals of using a makefile is to reduce build times by minimizing the amount of work to be done. I tried picking out source files that had changed, but this misses an occasional case in Java, and can result in inconsistent annotation processing, as some classes get compiled implicitly some of the time, and explicitly on other occasions. Per-file dependencies were also not easy to compute and keep up-to-date accurately, as (IIRC) some would change even though their source files didn't. From this, I concluded that it is better to consider per-source-tree dependencies rather than per-file. This also helps to deal with cyclic dependencies; provided they don't span more than one tree, they don't cause a problem.

I'm going to use the term ‘module’ to refer to a set of related classes in a source tree, but independent of whether they are expressed as source or as class files (so I'll be talking about per-module dependencies, in fact). I tend to use ‘(source) tree’ and ‘module’ interchangeably though.

The next problem stems from generating headers after compilation. If you generate them every time, everything that depends on them will get recompiled, even if the content didn't change. This is relatively easy to solve: generate the header elsewhere, compare it with the current header, and replace the current header with it if their contents differ.

Now we have a new problem. The header isn't a regular target that you depend on any more. In fact, it's a “potentially non-updating by-product” of meeting another target (compilation). You must complete that target before comparing the by-product with one of its dependencies. I achieved this initially with:

foo.compiled: | bar.compiled
foo.compiled: bar.api

…where *.compiled is the target that compiles a module, and *.api is the header (the by-product) generated from it. Here, module foo depends on the header of module bar. The first rule ensures that bar has been compiled, but it doesn't demand that foo gets compiled if bar is newer. It also ensures that, as a by-product, bar.api will exist; it might not be updated every time, but it will exist. Even though there is no rule targeting bar.api, the second rule is now safe.

The problem here is that this approach is not safe for a parallel build. (I don't particularly require a parallel build, and javac already seems to exploit parallelism to some extent. I just want projects to be able to include my makefile library without having to globally disable any parallelism they're exploiting.) Because there is no way to express the relationship between bar.compiled and bar.api, a parallel build tries to meet both rules at the same time, not realising that the tests of the second must not begin before the first is complete. One solution might be a .WAIT pseudo-prerequisite:

foo.compiled: | bar.compiled
foo.compiled: .WAIT bar.api

…but that's not widely supported. It also only rather indirectly implies the relationship between bar.compiler and bar.api. I thought I might be able to hack a new rule type into GNU Make to do this explicitly:

foo.compiled: | bar.compiled
foo.compiled: bar.api
bar.api: & bar.compiled

…but I'm just not sufficiently familiar with the program structure to do this, or to know that it is straight-forwardly possible. An alternative strategy was proposed on the GNU Make users' mailing list:

foo.compiled: bar.api
bar.api: | bar.compiled
%.api: %.compiled
	cmp -s $*.api-tmp $*.api || cp $*.api-tmp $*.api

(I wonder, can I fold the order-only rule into the pattern rule?)

This is completely safe for parallel builds, but has one small caveat. If bar is compiled, a new bar.api-tmp is produced, which will be newer than bar.api. That means the pattern rule will be invoked, and the conditional copy will be applied. If the contents have changed, bar.api will be updated, preventing the rule from being invoked next time. Otherwise, bar.api-tmp remains perpetually newer than its counterpart, and the rule's recipe will keep being followed, until bar's header really does change. GNU Make will assume that some work was actually done, so it won't report ‘nothing to be done for…’. If we try to hide the command with @, we hide the activity from the user, but not from Make, so we don't get back the reassuring message of inactivity.

My solution here is a patch for GNU Make, one rather simpler than trying to add a new rule type. Instead, support a special .IDLE target to list targets whose commands should not be considered activity that suppresses the inactivity message:

foo.compiled: bar.api
bar.api: | bar.compiled
.IDLE: bar.api
%.api: %.compiled
	cmp -s $*.api-tmp $*.api || cp $*.api-tmp $*.api

This isn't a radical change. It just extends to user-defined targets a property that generated included makefiles already have.

The result is Jardeps, a makefile that you can include in your own projects to manage the building of Java components, possibly alongside other parts. You still have to express inter-module dependencies, but it's a matter of writing deps_foo += bar, and Jardeps does the rest. It also has a few other bells and whistles:

  • It generates ServiceLoader meta-data from in-code annotations.
  • It manages resource bundles.
  • It merges per-module manifests into per-jar manifests.
  • It generates OSGi imports and exports.
  • It includes a versioned-jar installation tool.
  • I just added support for CORBA IDL code generation. (I probably broke everything else with that.)

Anyway, if you write Java programs, please give it a look and try it out. Even if you have only one source tree, but other non-Java components to build, Jardeps can help to prevent the Java part being recompiled unconditionally each time. I've been using it for a couple of years. Get back to me with any problems.

2016-05-07

C reactor rewrite

Driven by my earlier post, I just rewrote my portable C event reactor with a new API, including proactor-like calls, i.e., a proactor is emulated.  The new API replaces clumsy calls like this:

react_fdcond cond = { .fd = myfd, .mode = react_MIN };
react_prime(ev, react_CFD, &cond);

…with this:

react_prime_fdin(ev, myfd);

I can't remember what value the old style had, except that it looked very fcntl/ioctl-like.  This style is much better, as it's type-safe, and doesn't rely on arbitrary constants (react_CFD) that have to be always available even if the condition type is not.  Internally, I also got rid of all the switches on condition type, and replaced them with a couple of function pointers, defuse and act, and this has led to a cleaner model of how event handles are manipulated in general.  The core has a platform-dependent ‘core registration’ which is a record of all things that are being monitored, including:

  1. a set of event handles primed on idle events,

  2. a binary heap of handles primed on timed events,

  3. on Windows, an optional handle for Windows messages/events,

  4. on Windows, an optional handle for APCs,

  5. on Windows, an array of HANDLEs to be passed to WaitForMultipleObjects and friends,

  6. on Unix-like systems with support for poll and ppoll, and array of struct pollfds to be passed to those functions,

  7. on other Unix-like systems with support for select and pselect, an array of three fd_sets,

  8. on Unix-like systems, an optional handle for signals.

Options 5, 6 and 7 are mutually exclusive, and each comes with a parallel structure that maps the event to the event handle watching for it, as well as a record of which events are currently in use but not necessarily being watched for.  When an event handle is primed, it makes some contribution to the core registration, and sets its defuse and act pointers to indicate what to do in two circumstances.  When react_yield is called, it checks the core registration, and decides what data it has to pass to the system call (WaitForMultipleObjects, poll, etc.), and it sometimes even decides on what system call too.  When the call returns, the reactor finds out which events have occurred, and invokes the corresponding act function on the appropriate handle.  The same handle's act function may be called several times if it has been primed on several events at once.  The function should record in the user's structures what has happened, ensure the handle has been queued for processing (which will later result in the user's function being invoked), and remove the specific event from the core registration so it won't be watched for again.  The act does not remove the event from the in-use set, as we want to prevent other handles from priming on an event that has occurred but is not yet passed to the user.  After all acts have been called, the non-empty queue with the highest priority is processed.  Just before each user function is called, defuse is invoked, which removes any of the handle's events remaining in the core registration, and removes all events that the handle was primed with from the in-use set.  Handles in lower-priority queues do not get processed until at least another invocation of react_yield.

This approach allows for something that was not clearly possible under the old approach: a handle could be primed on multiple events, and receive and accumulate them on separate calls to react_yield, so as to present them to the user as a single call when eventually processed.  This was necessary to allow ALSA's snd_seq_poll_descriptors and related functions to interoperate easily with the reactor.

We can program robustly within the reactor with defuse and actdefuse is reset just before being invoked, so it is safe to defuse a handle twice, and its job is clear: remove remaining events from the core registration, and remove original events from the in-use set.  We can be more sloppy with act; it's only called if the handle still has events in the core, which defuse removes, so we can keep it around until the handle is primed again.

Unix-related specifications could do with a few standardized additions to fd_set to make the use of select more optimal:

// Get and remove from *s an FD in range [0, nfds), or return FD_SETSIZE.
int FD_EXTRACT(fd_set *s, int nfds);

// Add all FDs in *f to *s.  Optionally write already present ones to *alr.
void FD_ADD(fd_set *alr, const fd_set *f, fd_set *s);

// Remove all FDs in *f from *s.  Optionally write missing ones to *mis.
void FD_SUBTRACT(fd_set *mis, const fd_set *f, fd_set *s);

// Count collisions in *a and *b, and optionally write them to *res.
int FD_INTERSECT(fd_set *res, const fd_set *a, const fd_set *b);

(Actually, only FD_EXTRACT and FD_INTERSECT would really be valuable for the reactor, but the others should be trivially implementable with inside knowledge of fd_sets.)

select-based implementations can manage upto FD_SETSIZE descriptors.  WaitForMultipleObjects-based implementations can handle upto MAXIMUM_WAIT_OBJECTS-1 HANDLEs, but at least that's per-thread.  poll should be able to handle a lot more, although I think its array limit is the maximum number of descriptors in the process, yet that doesn't account for the possibility of the same FD appearing multiple times in the array.  Still, I don't think it's a limit that can easily be reached.

Anyway, the library is there now.  It needs people to use it, try it out, find bugs.


Update 2017-12-09: Found a nasty bug. Some events in the system array could be 'forgotten' if they happen to be last in the array. The check that causes the array to shrink was being applied too generally, i.e., even when the element being removed wasn't at the end. Silly, but fixed.