2018-08-18

JDK9 doclet API frustration

The new Javadoc doclet API promises a better view of Javadoc comments than before, one consistent and integrated with other source-related tools. I recently decided that my old doclet (“ssdoc”) based on the old API was becoming unmaintainable, and that I should start writing afresh against the new API (“Polydoclot” at the same location).

One way that the new API helps is that HTML tags and entity/character references in Javadoc comments are distinctly parsed along with Javadoc tags, important if your doclet is generating something other than HTML. If you were writing XHTML, you'd have to recognize empty HTML tags, and infer implicit closing of (say) <p> by a <div>, so that you could meet the strict requirement of XHTML that all elements are properly closed. For LaTeX output, references like &amp; would first have to be decoded into & before being re-escaped as \&.

So, that's a big improvement. However, I've found a few faults (at least, as I deem them) in the new API/implementation:

  1. It does not resolve understood HTML entity/character references, even though the new API obviates retaining them in their original form. (The old API did not have this option.)
  2. It does not resolve context-sensitive signatures in {@link}, {@linkplain}, {@value} and @see tags any more. (It used to!)
  3. It does not recursively parse the content of unknown in-line tags. (It used to!)
  4. Unknown in-line tags have their own class UnknownInlineTagTree, instead of simply being of the supertype InlineTagTree. Similarly, unknown block tags have their own class UnknownBlockTagTree, instead of simply being of the supertype BlockTagTree. This causes problems when tags defined in future JDKs are supplied to doclets compiled against older APIs.
  5. By now, there ought to be a formal way of determining how to link to elements within a Javadoc installation. (It keeps changing, and pinning it down would be too restrictive for alternative doclets.)

Here are those points in detail.

Lack of HTML reference resolution

The new API parses Javadoc source looking for Javadoc tags, HTML tags, and HTML entity/character references, and has distinct classes to represent each of these three groups. Since the HTML tags are distinctly represented from plain text by StartElementTree and EndElementTree, HTML references no longer need to remain escaped, and could just appear as the resolved character in a TextTree. The only times that can't happen are when the referenced entity is not recognized, or when it maps to a character not expressible in a Java string. Otherwise, why not just resolve them away? Whether you're generating HTML or something else, the escaping is only required within the source. The resolution has to be done whatever the output, and it is the same whatever the output.

Lack of signature resolution

The old API modelled {@link}, {@linkplain}, {@value} and @see tags with the SeeTag class. Javadoc would parse (say) {@link Service#close()}, work out that Service referred to (say) org.example.Service based on imports, on nested class declarations of the file containing the {@link}, or on the enclosing package, pick the zero-argument method called close from it, and then provide references to the modelled method through SeeTag.referencedMember().

In the new API, {@link}/{@linkplain}, {@value} and @see tags are modelled with LinkTree, ValueTree and SeeTree respectively. The first two each provide a ReferenceTree directly, and SeeTree provides one as the first element of its content, as it's meant to cope with other kinds of references. In turn, ReferenceTree provides just a flat string taken unchanged from the tag. This requires the doclet author to write some 300 lines of code to meet this contract:

/**
 * Resolve a signature in a given element context.
 *
 * @param context the element whose documentation
 * provided the signature
 *
 * @param signature the flat, unresolved signature,
 * as provided by ReferenceTree.getSignature()
 *
 * @return the corresponding element, or null if
 * not found
 */
Element resolveSignature(Element context, String signature);

I imagine this design decision is based on not wanting the Javadoc tool to do things that are doclet-specific. But how else should {@link} be interpreted? The output might be different between (say) HTML and LaTeX, but it still fundamentally refers to the same program element, independently of how the doclet will choose to use it!

Lack of recursive parsing of in-line tag content

In the old API, if an unknown in-line tag was encountered, it would be modelled as a plain (unspecialized) Tag, but its content would be parsed as a sequence of inner tags, accessible through inlineTags(). In the new API, the content is just a flat string! Yet UnknownInlineTagTree.getContent() returns a list of documentation tree nodes, implying that the content should have been recursively parsed. Instead, it returns a list of exactly one TextTree. This requires an explicit parsing routine that reparses arbitrary text according to Javadoc rules, and the only way I could find to do that was to spoof a FileObject with the content wrapped in <body>.

Again, this looks like a design decision to avoid the Javadoc tool from doing something doclet-specific, but Javadoc already has to impose some basic structure on the content, i.e., braces of nested tags have to match up, so it can't be left as free-format for the doclet. And, if Javadoc is going so far as to parse the braces, it might as well finish the job, especially since &#123; and &#125; will be needed to escape any braces to be passed literally to the doclet, which means &amp; will also be needed. Then, the documenter shouldn't have to remember which characters need to be escaped based on context (especially if the doclet doesn't recognize a tag), and the doclet author shouldn't have to re-escape & or re-piece together the parsed components just so that the rest of it can be re-interpreted as Javadoc+HTML again.

An alternative might be for the doclet to be able to declare which tags it recognizes, which ones should have their content parsed, etc. A method declareTags(TagTypes tts) on Doclet could be invoked at a sufficiently early stage to collect that information. It would be an opportunity to specify argument syntax in general too, as you might want to define {@link}-like tags that take an element reference as an argument, for example. However, that forces that documenter to be over-conscious of whether an extension tag will be recognized.

Special classes for unknown tags

So, there's a class BlockTagTree, the base type for all block tags. It also has a subtype UnknownBlockTagTree, which adds a method to get the parsed content of the block tag. What if a previously unknown block tag @foo starts being recognized by a new Javadoc implementation and API? You'd have a new FooTagTree class extending BlockTagTree, but now the object representing the tag can't go to the same places as it did when it was unknown. Sure, the visitor type probably has a new method on it to accept the new type, but if the doclet was compiled against the old API, it cannot override that, and it won't go through visitUnknownBlockTag(), because it's the wrong type. Fortunately, the doclet can specify the most recent version of Java (and Javadoc, implicitly?) it recognizes, allowing Javadoc to deliberately fail to recognize the new tag. Does it do that for block tags? Not sure yet.

It doesn't do that for in-line tags! JDK10 introduces a {@summary} in-line tag to be used to explicitly delimit the “first sentence” of an element's description, when application of the default rules (“Look for the first dot and whitespace.”) leads to the wrong result. It also defines a SummaryTree class to represent this. However, even though my doclet's highest language version is declared as 9, the {@summary} tag doesn't come through as UnknownInlineTagTree, so it is ignored, and the most important content of the documentation goes missing. My doclet is compiled against 9, so SummaryTree is not available, so the doclet has no option to provide a special visitor for that case. If I compile against 10, it won't be runnable against 9, because SummaryTree will be unavailable at runtime.

If UnknownInlineTagTree were to be abolished, with InlineTagTree subsuming its functions, a JDK10 default implementation of visitSummary(...) (which no JDK9 doclet can override) could call visitUnknownInlineTag(...) (which would now take an InlineTagTree instead of UnknownInlineTagTree), and some sensible default action could be taken, leading to some future-proofing for doclet implementations.

(This ties in with the generic, recursive parsing of in-line tags. The method getContent() is on UnknownInlineTagTree, but moving it to InlineTagTree kind-of implies that you unconditionally parse all tags' content, whether the tag type is known or not.)

Not distinguishing between block and in-line tags

Now that JDK10 recognizes the in-line {@summary}, it tramples on my own @summary, even though it's a block tag. These are syntactically distinguishable!

Lack of mechanism to derive URI for element documentation

The original Javadoc mapped methods to simple fragment identifiers, so foo(String,int) became #foo(java.lang.String, int). Later versions of Javadoc changed the scheme to avoid brackets and spaces, possibly to make it more compatible with (say) the more limited XML fragment-identifier syntax. It also used to erase parameter types, but later versions do not, and varargs are no longer flattened into arrays. (And wtf? Brackets are back in 10!) This makes linking to an installation generated by a different doclet awkward.

By now, there ought to be a formal way of determining how to link into a Javadoc installation without having to be the doclet that created it. For both the old and new versions of my doclet, I came up with the following. The doclet should generate (say) doc-properties.xml alongside package-list or element-list. This would be the XML representation of a Properties object, a property of which describes how to mechanically generate links to the documentation of specific elements, relative to the documentation's base address. Another doclet, told to -link to such an installation, would look up doc-properties.xml (in the same way it must already look up package-list/element-list), extract a well-known property, and use its value in a MacroFormatter. This would automatically tell it how to link within the site, while independently using its own scheme, which it can express to other doclets through the same mechanism. The format string would be arcane, e.g.:

{?PACKAGE:{${PACKAGE}:\\.:/}{?CLASS:/{${CLASS}:\\.:\\$}{?FIELD:-field-{FIELD}:{?EXEC:-{?CONSTR:constr:method-{EXEC}}{@PARAMETER:I:/{?PARAMETER.{I}.DIMS:{PARAMETER.{I}.DIMS}:0}{${PARAMETER.{I}}:\\.:\\$}}}}:/package-summary}:{${MODULE}:\\.:\\$}-module}

…but it's only meant to be machine-readable.

I chose XML as it obviates charset issues. Simply serve as application/xml. A Properties object leaves room for expansion, and you could probably deprecate package-list/element-list altogether by incorporating their information into the same doc-properties.xml, although retaining the simpler format could still be useful for interfacing with other languages.

Summary

Please, authors of javadoc:

  • Specify contractually that the documentation author shall write literal text, HTML element tags, HTML references, and Javadoc in-line tags (recursively containing such structured content) in the bodies and block-tag content of Javadoc comments, regardless of the documentation output format. Javadoc shall supply literal text, HTML element tags, unrecognized HTML references, and Javadoc in-line tags to the doclet, regardless of the documentation output format.
  • Resolve HTML references into their corresponding unescaped text, if possible, and merge with adjacent literal text.
  • Specify that unrecognized in-line tags should be interpreted as if only their content existed.
  • If you're going to make the effort of recognizing @see, {@link} and {@value} tags, bother to resolve the signatures within them to Elements too.
  • Either uniformly parse all tag's content recursively, or introduce a means for the doclet to declare tags whose content should be recursively parsed. Failing that, at least expose the routine to do the parsing directly, rather than forcing the doclet author to draw such a routine out of the API's own rectum.
  • Introduce a means for a doclet to declare tags whose arguments should be resolved as element references.
  • Move the methods of UnknownBlockTagTree and UnknownInlineTagTree to BlockTagTree and InlineTagTree respectively, and deprecate Unknown*TagTree.
  • Devise and specify a technique for expressing how to link with documentation elements, something that can be statically served with the documentation just like package-list already is.

Fixing SDDM scale on 4K screens

I'm running Kubuntu 18.04 on a 4K screen*, and everything is tiny. I can fix the desktop when I'm logged in by scaling the display in the “Display and Monitor” settings. This doesn't affect the display manager's screen before you log in, though. As a note to myself if I have to do this again, I modified /usr/share/sddm/scripts/Xsetup, adding this to the end:

xrandr --output eDP-1-1 --fbmm 346x194

That file is obviously for SDDM only. Other display managers might have a similar script in a different location.

The string eDP-1-1 and the screen's physical size are given by xrandr:

$ xrandr --query | grep ' connected'
eDP-1-1 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 346mm x 194mm

I suspect that the reported dimensions might only be accurate after you've applied scaling in the desktop.

*(Why did I get a 4K screen? Twenty years ago, I might actually have been able to see the difference…)