ScroogeXHTML for the Java™ platform 6.0 – fast RTF to HTML5 conversion

Habarisoft released version 6.0 of its RTF to HTML5 and XHTML converter library, ScroogeXHTML for the Java™ platform.

The new major version resolves 5 bugs and introduces more than 30 enhancements and new features.

A short introduction to major changes is available on the ScroogeXHTML home page, in the Getting Started (PDF documentation), in the API documentation, and in this blog post.

You can evaluate the final release with the online converter demo, which now displays the configuration property values of the converter, and allows to modify many of them.

Bildschirmfoto am 2016-07-01 um 09.37.29

ScroogeXHTML for the Java™ platform 6.0: new features

The upcoming 6.0 release of ScroogeXHTML for the Java™ platform introduces useful new features. Here is a short overview:

Embedding images with HTML Data URI scheme

The traditional MemoryPictureAdapter class in ScroogeXHTML for the Java platform generates image link elements which point to a resource location <img src=”…”>. This keeps the document small, but requires making the image resources accessible for the web browser at the given location.

In some cases however, it is useful to embed the whole image in-line in the web page as if they were external resources.

The new MemoryPictureAdapterBase64 class returns Data URIs for small JPEG and PNG images. By default, the size threshold is set to 32 kB.

Usage example:

scrooge = new ScroogeXHTML();
PictureAdapter adapter = new MemoryPictureAdapterBase64();

The new class inherits from the old MemoryPictureAdapter, and will return the inherited result for images which exceed the size limit.

Data URIs are fully supported by most major browsers, and partially supported in Internet Explorer and Microsoft Edge.

Event listeners for DOM post processing

The converter internally uses a XML DOM tree to create the HTML document structure. Before converting the DOM to the result HTML5 String, the converter calls a sequence of post processing handlers, which apply optimizations and custom modifications on the DOM tree. Post processing handlers must implement the PostProcessListener interface.

The converter stores the event handlers in its PostProcessListeners property which is a list of PostProcessListener implementations. By default, the converter library creates and assigns post process handlers to perform these tasks

  • strip empty (whitespace-only) text nodes
  • strip empty span nodes
  • strip attribute-less span nodes
  • replace empty paragraph (<p>) nodes with <br> nodes

These default PostProcessListener implementations are located in the com.habarisoft.scroogexhtml.tidy package and use XPath to perform the DOM modification (see Stack Overflow example code).

Application code may create and add more post process listeners as needed.

On-line demo

The new demo page allows to upload and convert RTF files.


ScroogeXHTML for the Java™ platform 6.0 – RTF to HTML5 converter preview

Habarisoft released the first public preview of version 6.0 of its RTF to HTML5 converter library, ScroogeXHTML for the Java™ platform.

Major changes

  • only HTML5 and XHTML will be supported  – for other markup language versions, ScroogeXHTML 5.X is still available
  • new DOM based post processing event listeners
  • new support for embedded JPEG and PNG images with HTML Data URI scheme
  • improved support for RTF tables
  • improved support for monospace fonts
  • improved conversion of footnote
  • improved support for listtable (paragraph numbering)

Continue reading