2016-05-16

Jardeps: Makefiles for Java the right way

When Java came out, folks tried to build Java projects using makefiles the same way they did for C. Foo.class depends on Foo.java, so compile the latter to get the former:
%.class: %.java
	$(JAVAC) $<

The problem is, it's just not as ‘simple’ as in C, where all the things you depend on are in header files whose contents you consciously make distinct from source files. Instead, Java method declarations are not separate from their definitions, which is easier to maintain, but undermines the principles that makefiles depend on.

This then means that, when compiling one source file, which needs declarations from another source file, you end up compiling both in one go. (javac works better this way anyway, so maybe it's best not to fight it.)

I tried to tackle this by extracting a kind of ‘header’ from a Java file, and using this to build a dependency rule. The first question was whether to try and do it before compilation of the file, or after. Since Java only needs the source files to compile, headers can be generated after the fact, provided we can force compilation before we test any dependencies against the headers.

Another question is what can be used as a header. A class file is not suitable, because it contains implementation details which should be hidden. Also, dependencies based on class files can produce complex and cyclic rules, which no flavour of make is likely to relish. Verbose output from javap, or something based on it, is more suitable. Ordering of declarations should also have no influence on it, so some line-based format (perhaps each declaration on an unbroken line) is best, after it has been sorted to produce a canonical form.

The next problem is about aggregation of compiler commands. As far as I know, GNU Make doesn't have any feature to aggregate multiple similar commands into one. It's not so important for C compilers, but javac can have a significant start-up time, and one of the goals of using a makefile is to reduce build times by minimizing the amount of work to be done. I tried picking out source files that had changed, but this misses an occasional case in Java, and can result in inconsistent annotation processing, as some classes get compiled implicitly some of the time, and explicitly on other occasions. Per-file dependencies were also not easy to compute and keep up-to-date accurately, as (IIRC) some would change even though their source files didn't. From this, I concluded that it is better to consider per-source-tree dependencies rather than per-file. This also helps to deal with cyclic dependencies; provided they don't span more than one tree, they don't cause a problem.

I'm going to use the term ‘module’ to refer to a set of related classes in a source tree, but independent of whether they are expressed as source or as class files (so I'll be talking about per-module dependencies, in fact). I tend to use ‘(source) tree’ and ‘module’ interchangeably though.

The next problem stems from generating headers after compilation. If you generate them every time, everything that depends on them will get recompiled, even if the content didn't change. This is relatively easy to solve: generate the header elsewhere, compare it with the current header, and replace the current header with it if their contents differ.

Now we have a new problem. The header isn't a regular target that you depend on any more. In fact, it's a “potentially non-updating by-product” of meeting another target (compilation). You must complete that target before comparing the by-product with one of its dependencies. I achieved this initially with:

foo.compiled: | bar.compiled
foo.compiled: bar.api

…where *.compiled is the target that compiles a module, and *.api is the header (the by-product) generated from it. Here, module foo depends on the header of module bar. The first rule ensures that bar has been compiled, but it doesn't demand that foo gets compiled if bar is newer. It also ensures that, as a by-product, bar.api will exist; it might not be updated every time, but it will exist. Even though there is no rule targeting bar.api, the second rule is now safe.

The problem here is that this approach is not safe for a parallel build. (I don't particularly require a parallel build, and javac already seems to exploit parallelism to some extent. I just want projects to be able to include my makefile library without having to globally disable any parallelism they're exploiting.) Because there is no way to express the relationship between bar.compiled and bar.api, a parallel build tries to meet both rules at the same time, not realising that the tests of the second must not begin before the first is complete. One solution might be a .WAIT pseudo-prerequisite:

foo.compiled: | bar.compiled
foo.compiled: .WAIT bar.api

…but that's not widely supported. It also only rather indirectly implies the relationship between bar.compiler and bar.api. I thought I might be able to hack a new rule type into GNU Make to do this explicitly:

foo.compiled: | bar.compiled
foo.compiled: bar.api
bar.api: & bar.compiled

…but I'm just not sufficiently familiar with the program structure to do this, or to know that it is straight-forwardly possible. An alternative strategy was proposed on the GNU Make users' mailing list:

foo.compiled: bar.api
bar.api: | bar.compiled
%.api: %.compiled
	cmp -s $*.api-tmp $*.api || cp $*.api-tmp $*.api

(I wonder, can I fold the order-only rule into the pattern rule?)

This is completely safe for parallel builds, but has one small caveat. If bar is compiled, a new bar.api-tmp is produced, which will be newer than bar.api. That means the pattern rule will be invoked, and the conditional copy will be applied. If the contents have changed, bar.api will be updated, preventing the rule from being invoked next time. Otherwise, bar.api-tmp remains perpetually newer than its counterpart, and the rule's recipe will keep being followed, until bar's header really does change. GNU Make will assume that some work was actually done, so it won't report ‘nothing to be done for…’. If we try to hide the command with @, we hide the activity from the user, but not from Make, so we don't get back the reassuring message of inactivity.

My solution here is a patch for GNU Make, one rather simpler than trying to add a new rule type. Instead, support a special .IDLE target to list targets whose commands should not be considered activity that suppresses the inactivity message:

foo.compiled: bar.api
bar.api: | bar.compiled
.IDLE: bar.api
%.api: %.compiled
	cmp -s $*.api-tmp $*.api || cp $*.api-tmp $*.api

This isn't a radical change. It just extends to user-defined targets a property that generated included makefiles already have.

The result is Jardeps, a makefile that you can include in your own projects to manage the building of Java components, possibly alongside other parts. You still have to express inter-module dependencies, but it's a matter of writing deps_foo += bar, and Jardeps does the rest. It also has a few other bells and whistles:

  • It generates ServiceLoader meta-data from in-code annotations.
  • It manages resource bundles.
  • It merges per-module manifests into per-jar manifests.
  • It generates OSGi imports and exports.
  • It includes a versioned-jar installation tool.
  • I just added support for CORBA IDL code generation. (I probably broke everything else with that.)

Anyway, if you write Java programs, please give it a look and try it out. Even if you have only one source tree, but other non-Java components to build, Jardeps can help to prevent the Java part being recompiled unconditionally each time. I've been using it for a couple of years. Get back to me with any problems.

No comments:

Post a Comment