ScroogeXHTML for the Java™ platform 6.3.1 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.3.1 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform.

The new version introduces four enhancements and two fixes. Release notes are available on the ScroogeXHTML web site and in the HTML API documentation.

Starting with release 6.3.1, progress listeners are deprecated. They will be removed in a future version to improve conversion speed. See all deprecated methods.

Online demo

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them. The demo page also links to a demo of a preview (or a development snapshot) of the next version when it becomes available.

scrooge_portrait_logo_2016

RTF hyperlink conversion with ScroogeXHTML (XPath based post processing)

ScroogeXHTML for the Java platform supports hyperlink conversion to HTML in two ways. Many RTF documents use special RTF keywords which include the hyperlink target as “invisible” text, so that the HTTP address was already available in the document. Other RTF documents however only use underlined and blue text, but no hidden HTTP link addresses. In this case, the conversion requires a different solution.

Earlier versions of ScroogeXHTML used a hard-coded solution for text-to-hyperlink conversion, which did not support advanced tweaks and manipulations of the result HTML.

Now, the next release of ScroogeXHTML provides an XPath based post processor class, as a starting point for customized blue/underlined hyperlink conversion.

One line of code is required to add the post processor class:

scrooge.getPostProcessListeners()
       .add(new ConvertUnderlinedToHyperlinks());

The post processor will locate all text elements which are underlined and blue, and turn it into a hyperlink.

With some custom code, the post processor may be adjusted to your special needs, for example it may use a dictionary (map) to assign specific URLs to the blue/underlined text.

Note: this is a breaking change, the next version will no longer have the property ConvertHyperlinksForBlueUnderlinedText.

Source code excerpt:

  @Override
  public void postProcess(PostProcessEventObject e) {
    try {
      XPathFactory xpathFactory = XPathFactory.newInstance();

      String exp = String.format("//span[contains(@style, 'color:%s;') and contains(@style ,'text-decoration:underline;') ]", color);
      XPathExpression xpathExp = xpathFactory.newXPath().compile(exp);

      NodeList hyperlinkNodes = (NodeList) xpathExp.evaluate(e.getDocument(), XPathConstants.NODESET);

      // Iterate over all found nodes
      for (int i = 0; i < hyperlinkNodes.getLength(); i++) {
        Element linkNode = (Element) hyperlinkNodes.item(i);

        // remove the hyperlink style
        String style = linkNode.getAttribute("style");
        style = style.replace("color:" + color + ";", "");
        style = style.replace("text-decoration:underline;", "");
        if (style.isEmpty()) {
          linkNode.removeAttribute("style");
        } else {
          linkNode.setAttribute("style", style);
        }

        // create anchor with href attribute
        Element anchor = e.getDocument().createElement("a");
        String linkText = linkNode.getTextContent();
        anchor.setAttribute("href", linkText);

        // insert the a element
        Node parent = linkNode.getParentNode();

        if (linkNode.getAttributes().getLength() == 0) {
          anchor.setTextContent(linkText);
          parent.removeChild(linkNode);
          parent.appendChild(anchor);
        } else {
          parent.insertBefore(anchor, linkNode);
          anchor.appendChild(linkNode);
        }
      }
    } catch (XPathExpressionException ex) {
      LOGGER.error(ex.getMessage(), ex);
    }
  }

Wingdings bullets workaround (XPath based post-processing)

Some RTF writers (such as WPTools) use the Wingdings font for bullet signs in unnumbered lists. Wingdings is not a web-safe font, so additional tweaking is required to transform a document generated by ScroogeXHTML to web-safe HTML5.

The library already includes post-processing classes, so we can build a workaround based on existing code and clean up the intermediate DOM.

A basic implementation is shown below. It iterates all nodes which carry a font-family:Wingdings style and does two things:

  • replace the ‘l’ character with a Unicode bullet sign
  • replace the ‘Wingdings’ font name with ‘serif’

Note: older versions of WPTools 7 emitted the font name as “WingDings” (with capital D) and used a “Ÿ” character instead of “l” for the bullet. The code example below has been simplified for newer WPTools versions for better readability.

public void postProcess(PostProcessEventObject e) {
  try {
    XPathFactory xpathFactory = XPathFactory.newInstance();
    // XPath to find Wingdings text nodes.
    XPathExpression xpathExp = xpathFactory.newXPath().compile(
       "//span[contains(@style, 'font-family:Wingdings')]");
    NodeList nodes = (NodeList) xpathExp
      .evaluate(e.getDocument(), XPathConstants.NODESET);

    for (int i = 0; i < nodes.getLength(); i++) {
      Element node = (Element) nodes.item(i);

      // replace the bullet
      String textContent = node.getTextContent();
      if ("l".equals(textContent)) {
        node.setTextContent("\u25CF");
      }

      // replace the font name
      String style = node.getAttribute("style");
      style = style.replace("Wingdings", "serif");
      node.setAttribute("style", style);
    }
  } catch (XPathExpressionException ex) {
      LOGGER.error(ex.getMessage(), ex);
  }
}

Tiny RTF Viewer 2.7 using ScroogeXHTML RTF to HTML5 converter

Habarisoft released Tiny RTF Viewer 2.7 for Android™. This small viewer app converts RTF documents (which can be stored locally or accessed by choosing a hyperlink in a web browser) to HTML5, and displays them in the internal web browser.

Google play

For the internal conversion from Rich Text Format to HTML5, it uses ScroogeXHTML for the Java™ platform version 6.3. More information and an on-line demo of the converter library are available at https://www.scroogexhtml.com/

scrooge_portrait_logo_2016

Android is a trademark of Google Inc. ♦ Google Play is a trademark of Google Inc.

 

ScroogeXHTML for the Java™ platform 6.3.0 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.3.0 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform. The new version introduces 2 enhancements.

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them.

scrooge_portrait_logo_2016

 

ScroogeXHTML for the Java™ platform 6.2.0 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.2.0 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform. The new version resolves 3 bugs and introduces 3 enhancements and new features, including support for table row height and text format changes within hyperlinks.

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them.

scrooge_portrait_logo_2016

 

“Tiny RTF Viewer” 2.5 using ScroogeXHTML RTF to HTML5 converter

Habarisoft released Tiny RTF Viewer 2.5 for Android. This small viewer app converts RTF documents (which can be stored locally or accessed by choosing a hyperlink in a web browser) to HTML5, and displays them in the internal web browser.
Google play

For the internal conversion from Rich Text Format (RTF) to HTML5, the app uses the ScroogeXHTML library from Habarisoft. More information and an on-line demo of the converter library are available at https://www.scroogexhtml.com/

scrooge_portrait_logo_2016

Android is a trademark of Google Inc. ♦ Google Play is a trademark of Google Inc.