RTF hyperlink conversion with ScroogeXHTML (XPath based post processing)

ScroogeXHTML for the Java platform supports hyperlink conversion to HTML in two ways. Many RTF documents use special RTF keywords which include the hyperlink target as “invisible” text, so that the HTTP address was already available in the document. Other RTF documents however only use underlined and blue text, but no hidden HTTP link addresses. In this case, the conversion requires a different solution.

Earlier versions of ScroogeXHTML used a hard-coded solution for text-to-hyperlink conversion, which did not support advanced tweaks and manipulations of the result HTML.

Now, the next release of ScroogeXHTML provides an XPath based post processor class, as a starting point for customized blue/underlined hyperlink conversion.

One line of code is required to add the post processor class:

       .add(new ConvertUnderlinedToHyperlinks());

The post processor will locate all text elements which are underlined and blue, and turn it into a hyperlink.

With some custom code, the post processor may be adjusted to your special needs, for example it may use a dictionary (map) to assign specific URLs to the blue/underlined text.

Note: this is a breaking change, the next version will no longer have the property ConvertHyperlinksForBlueUnderlinedText.

Source code excerpt:

  public void postProcess(PostProcessEventObject e) {
    try {
      XPathFactory xpathFactory = XPathFactory.newInstance();

      String exp = String.format("//span[contains(@style, 'color:%s;') and contains(@style ,'text-decoration:underline;') ]", color);
      XPathExpression xpathExp = xpathFactory.newXPath().compile(exp);

      NodeList hyperlinkNodes = (NodeList) xpathExp.evaluate(e.getDocument(), XPathConstants.NODESET);

      // Iterate over all found nodes
      for (int i = 0; i < hyperlinkNodes.getLength(); i++) {
        Element linkNode = (Element) hyperlinkNodes.item(i);

        // remove the hyperlink style
        String style = linkNode.getAttribute("style");
        style = style.replace("color:" + color + ";", "");
        style = style.replace("text-decoration:underline;", "");
        if (style.isEmpty()) {
        } else {
          linkNode.setAttribute("style", style);

        // create anchor with href attribute
        Element anchor = e.getDocument().createElement("a");
        String linkText = linkNode.getTextContent();
        anchor.setAttribute("href", linkText);

        // insert the a element
        Node parent = linkNode.getParentNode();

        if (linkNode.getAttributes().getLength() == 0) {
        } else {
          parent.insertBefore(anchor, linkNode);
    } catch (XPathExpressionException ex) {
      LOGGER.error(ex.getMessage(), ex);

Wingdings bullets workaround (XPath based post-processing)

Some RTF writers (such as WPTools) use the Wingdings font for bullet signs in unnumbered lists. Wingdings is not a web-safe font, so additional tweaking is required to transform a document generated by ScroogeXHTML to web-safe HTML5.

The library already includes post-processing classes, so we can build a workaround based on existing code and clean up the intermediate DOM.

A basic implementation is shown below. It iterates all nodes which carry a font-family:Wingdings style and does two things:

  • replace the ‘l’ character with a Unicode bullet sign
  • replace the ‘Wingdings’ font name with ‘serif’

Note: older versions of WPTools 7 emitted the font name as “WingDings” (with capital D) and used a “Ÿ” character instead of “l” for the bullet. The code example below has been simplified for newer WPTools versions for better readability.

public void postProcess(PostProcessEventObject e) {
  try {
    XPathFactory xpathFactory = XPathFactory.newInstance();
    // XPath to find Wingdings text nodes.
    XPathExpression xpathExp = xpathFactory.newXPath().compile(
       "//span[contains(@style, 'font-family:Wingdings')]");
    NodeList nodes = (NodeList) xpathExp
      .evaluate(e.getDocument(), XPathConstants.NODESET);

    for (int i = 0; i < nodes.getLength(); i++) {
      Element node = (Element) nodes.item(i);

      // replace the bullet
      String textContent = node.getTextContent();
      if ("l".equals(textContent)) {

      // replace the font name
      String style = node.getAttribute("style");
      style = style.replace("Wingdings", "serif");
      node.setAttribute("style", style);
  } catch (XPathExpressionException ex) {
      LOGGER.error(ex.getMessage(), ex);

ScroogeXHTML for the Java™ platform 6.3.0 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.3.0 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform. The new version introduces 2 enhancements.

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them.