ScroogeXHTML for the Java™ platform 7.0 Milestone 2 released

Habarisoft released the second milestone for version 7.0 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform.

Download

The installer for 7.0.0.MS2 is now available at https://www.habarisoft.com/scroogexhtml_j/download/release (for registered users only).

As it is work in progress, source code not included in the installer.

API

The Javadoc API is included in the installer.

It is also available at https://www.habarisoft.com/scroogexhtml_j/7.0.0-MS2/docs/api/index.html

Getting Started PDF

The “Getting Started” PDF is included in the installer.

It is also available at https://www.habarisoft.com/scroogexhtml_j/7.0.0-MS2/docs/ScroogeXHTMLGettingStarted.pdf

New in 7.0.0-MS2

Experimental list conversion support

  • a new experimental implementation for numbered and unnumbered list is now available and may be enabled with the ConversionKeys.USE_LIST_TABLE switch (see PDF for details)
  • multi-level bullet list conversion support is now available when list table support is enabled
  • numbered lists with roman numbers are now supported when list table support is enabled
  • Note: not all RTF writers generate correct and consistent list level code
  • the library includes an example post processor which replaces Wingdings bullets with web-safe Unicode bullet characters

Table conversion support

  • cell background color support
  • faster algorithm for cell merging
  • table border (whole table border) detection improved
  • other improvements are still in development

Minor changes

  • paragraph border conversion switch
  • JavaBean manifest entry
  • installer updated to IzPack 5.1.2
  • New property ConvertAlignment
  • Javadoc has been cleaned up to be compatible with the new JDK 8 Doclet

Breaking changes

Release 7.0 will contain breaking changes. Please consult the Getting Started PDF for details.

 

scrooge_portrait_logo_2016

ScroogeXHTML for the Java™ platform 7.0 Milestone 1 released

Habarisoft released the first milestone for version 7.0 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform.

Download

The installer for 7.0.0.MS1 is now available at https://www.habarisoft.com/scroogexhtml_j/download/release (for registered users only).

As it is work in progress, source code not included in the installer.

API

The Javadoc API is included in the installer.

It is also available at https://www.habarisoft.com/scroogexhtml_j/7.0.0-MS1/docs/api/index.html

Getting Started PDF

The “Getting Started” PDF is included in the installer.

It is also available at https://www.habarisoft.com/scroogexhtml_j/7.0.0-MS1/docs/ScroogeXHTMLGettingStarted.pdf

New in 7.0.0-MS1

List conversion support

  • a new experimental implementation for numbered and unnumbered list is now available and may be enabled with the ConversionKeys.USE_LIST_TABLE switch (see PDF for details)
  • multi-level bullet list conversion support is now available when list table support is enabled
  • numbered lists with roman numbers are now supported when list table support is enabled
  • Note: not all RTF writers generate correct and consistent list level code
  • the library includes an example post processor which replaces Wingdings bullets with web-safe Unicode bullet characters

Table conversion support

  • cell background color support
  • faster algorithm for cell merging
  • other improvements are still in development

Breaking changes

Release 7.0 will contain breaking changes. Please consult the Getting Started PDF for details.

 

scrooge_portrait_logo_2016

ScroogeXHTML for the Java™ platform 6.5 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.5 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform. Habarisoft encourages all users of earlier versions of to upgrade to this latest release.

Changes

  • new: support for paragraph background color and paragraph border box
  • fixed: conversion of RTF without a trailing \par token
  • improved: conversion of blanks to a sequence of non-breaking spaces

Release notes are available on the ScroogeXHTML web site and in the HTML API documentation.

ScroogeXHTML 6.5 example

Online demo

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them. The demo page also links to a demo of a preview (or a development snapshot) of the next version.

scrooge_portrait_logo_2016

ScroogeXHTML for the Java™ platform 6.4 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.4 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform. The new version is focused on HTML5 conformance  and enhancements in the HTML head element. Habarisoft encourages all users of earlier versions of to upgrade to this latest release.

Changes

  • new: conversion of space before and after paragraphs to CSS (margin-top and margin-buttom)
  • new: Viewport property allows to include a meta viewport element in the HTML head
  • the order of elements in the head element changed to
    • charset
    • viewport (new)
    • title
    • description
    • keywords
    • author
    • date (deprecated)
    • generator
    • style sheets

Release notes are available on the ScroogeXHTML web site and in the HTML API documentation.

Starting with release 6.4, the compiled jar of the library is sealed for additional security.

Starting with release 6.4, MetaDate and MetaDateAuto properties are deprecated. They will be removed in a future version. See all deprecated methods.

Online demo

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them. The demo page also links to a demo of a preview (or a development snapshot) of the next version when it becomes available.

scrooge_portrait_logo_2016

ScroogeXHTML for the Java™ platform 6.3.1 – fast RTF to HTML5 and XHTML conversion

Habarisoft released version 6.3.1 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform.

The new version introduces four enhancements and two fixes. Release notes are available on the ScroogeXHTML web site and in the HTML API documentation.

Starting with release 6.3.1, progress listeners are deprecated. They will be removed in a future version to improve conversion speed. See all deprecated methods.

Online demo

You can evaluate the new release with the online converter demo, which displays the configuration property values of the converter, and allows to modify many of them. The demo page also links to a demo of a preview (or a development snapshot) of the next version when it becomes available.

scrooge_portrait_logo_2016

RTF hyperlink conversion with ScroogeXHTML (XPath based post processing)

ScroogeXHTML for the Java platform supports hyperlink conversion to HTML in two ways. Many RTF documents use special RTF keywords which include the hyperlink target as “invisible” text, so that the HTTP address was already available in the document. Other RTF documents however only use underlined and blue text, but no hidden HTTP link addresses. In this case, the conversion requires a different solution.

Earlier versions of ScroogeXHTML used a hard-coded solution for text-to-hyperlink conversion, which did not support advanced tweaks and manipulations of the result HTML.

Now, the next release of ScroogeXHTML provides an XPath based post processor class, as a starting point for customized blue/underlined hyperlink conversion.

One line of code is required to add the post processor class:

scrooge.getPostProcessListeners()
       .add(new ConvertUnderlinedToHyperlinks());

The post processor will locate all text elements which are underlined and blue, and turn it into a hyperlink.

With some custom code, the post processor may be adjusted to your special needs, for example it may use a dictionary (map) to assign specific URLs to the blue/underlined text.

Note: this is a breaking change, the next version will no longer have the property ConvertHyperlinksForBlueUnderlinedText.

Source code excerpt:

  @Override
  public void postProcess(PostProcessEventObject e) {
    try {
      XPathFactory xpathFactory = XPathFactory.newInstance();

      String exp = String.format("//span[contains(@style, 'color:%s;') and contains(@style ,'text-decoration:underline;') ]", color);
      XPathExpression xpathExp = xpathFactory.newXPath().compile(exp);

      NodeList hyperlinkNodes = (NodeList) xpathExp.evaluate(e.getDocument(), XPathConstants.NODESET);

      // Iterate over all found nodes
      for (int i = 0; i < hyperlinkNodes.getLength(); i++) {
        Element linkNode = (Element) hyperlinkNodes.item(i);

        // remove the hyperlink style
        String style = linkNode.getAttribute("style");
        style = style.replace("color:" + color + ";", "");
        style = style.replace("text-decoration:underline;", "");
        if (style.isEmpty()) {
          linkNode.removeAttribute("style");
        } else {
          linkNode.setAttribute("style", style);
        }

        // create anchor with href attribute
        Element anchor = e.getDocument().createElement("a");
        String linkText = linkNode.getTextContent();
        anchor.setAttribute("href", linkText);

        // insert the a element
        Node parent = linkNode.getParentNode();

        if (linkNode.getAttributes().getLength() == 0) {
          anchor.setTextContent(linkText);
          parent.removeChild(linkNode);
          parent.appendChild(anchor);
        } else {
          parent.insertBefore(anchor, linkNode);
          anchor.appendChild(linkNode);
        }
      }
    } catch (XPathExpressionException ex) {
      LOGGER.error(ex.getMessage(), ex);
    }
  }

Wingdings bullets workaround (XPath based post-processing)

Some RTF writers (such as WPTools) use the Wingdings font for bullet signs in unnumbered lists. Wingdings is not a web-safe font, so additional tweaking is required to transform a document generated by ScroogeXHTML to web-safe HTML5.

The library already includes post-processing classes, so we can build a workaround based on existing code and clean up the intermediate DOM.

A basic implementation is shown below. It iterates all nodes which carry a font-family:Wingdings style and does two things:

  • replace the ‘l’ character with a Unicode bullet sign
  • replace the ‘Wingdings’ font name with ‘serif’

Note: older versions of WPTools 7 emitted the font name as “WingDings” (with capital D) and used a “Ÿ” character instead of “l” for the bullet. The code example below has been simplified for newer WPTools versions for better readability.

public void postProcess(PostProcessEventObject e) {
  try {
    XPathFactory xpathFactory = XPathFactory.newInstance();
    // XPath to find Wingdings text nodes.
    XPathExpression xpathExp = xpathFactory.newXPath().compile(
       "//span[contains(@style, 'font-family:Wingdings')]");
    NodeList nodes = (NodeList) xpathExp
      .evaluate(e.getDocument(), XPathConstants.NODESET);

    for (int i = 0; i < nodes.getLength(); i++) {
      Element node = (Element) nodes.item(i);

      // replace the bullet
      String textContent = node.getTextContent();
      if ("l".equals(textContent)) {
        node.setTextContent("\u25CF");
      }

      // replace the font name
      String style = node.getAttribute("style");
      style = style.replace("Wingdings", "serif");
      node.setAttribute("style", style);
    }
  } catch (XPathExpressionException ex) {
      LOGGER.error(ex.getMessage(), ex);
  }
}