ScroogeXHTML for the Java™ platform is a library which can convert a subset of the RTF standard to HTML5 and XHTML.

Release Notes

2018-08-10: Version 7.3.0

  • Added support for multiple external style sheets (property StyleSheetLinks), the StyleSheetLink property is now deprecated
  • Changed finishColortableEntry() to improve conversion speed
  • Changed removeHtmlTags() to improve conversion speed
  • Updated izpack installer to version 5.1.3
  • Removed unused methods
  • Fixed Findbugs/Spotbugs warnings

2018-03-24: Version 7.2.0

  • Added support for vertical alignment in table cells
  • Standalone XHTML documents begin with a XML declaration if the charset is not UTF-8
  • Table conversion uses the class="table table-bordered" attribute (instead of border="1") to indicate that the table is bordered. This fixes the W3C HTML validator warning "The border attribute on the table element is presentational markup". Applications which still require the border="1" attribute may enable it with setOutputProperty(ConversionKeys.USE_TABLE_BORDER_ATTRIBUTE, "yes");
  • Removed the enclosing <!-- ... --> around the CSS code within the <style> element for standalone documents
  • Removed the attribute type="text/css" for the <style> element for standalone HTML5 documents. This fixes the W3C HTML validator warning: "The type attribute for the style element is not needed and should be omitted".
  • Changed BODY {... to lowercase body {... in auto-generated CSS code
  • The <style> element includes comments before auto-generated and custom styles
  • Fixed Findbugs warnings for non-transient non-serializable instance fields in MemoryPictureAdapter and ListHeaderInfo class
  • Fixed Findbugs warnings for reliance on default encoding in com.scroogexhtml.ScroogeXHTML.convert
  • Fixed Findbugs warnings for casting and passing to ceil in com.scroogexhtml.converter.AbstractWriter.getFontSizeStyle
  • Fixed Findbugs warnings for casting and passing to ceil in and getWGoalPx

2018-02-12: Version 7.1.0

  • Added support for five character encodings, including MacRoman
  • Added support for non-breaking hyphen (RTF token \_)
  • Improved conversion of 'Symbol' font
  • As a side effect of the enhanced 'Symbol' font conversion, bullet list conversion now (correctly) emits &bullet; instead of &middot;
Bug Fixes
  • Emit the HTML bullet \u2022 for RTF token "\bullet\" instead of middot

2017-10-28: Version 7.0.0

Bug Fixes
  • Always hide all "hidden" text (even it ConvertFontStyle is false)
  • Added option to disable paragraph border conversion - to disable, use setOutputProperty(ConversionKeys.CONVERT_PARAGRAPH_BORDERS, "no");
  • Improved algorithm for ConvertEmptyParagraphs
  • Improved Unicode support for Japanese text
  • Improved initialization speed of DOM tree transformation
  • Improved support for detection of outer table border
  • Experimental support for a multilevel numbering writer - enabled with setOutputProperty(ConversionKeys.SUPPORT_MULTILEVEL, "yes");
  • Experimental support for uppercase and lowercase roman number - requires setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
  • Experimental support for \*\pn paragraph numbering - enabled with setOutputProperty(ConversionKeys.SUPPORT_STAR_PN, "yes");
  • ConvertFootnotes value default changed to false
  • Experimental UseListTable property is deprecated now - use setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
  • UseListTable property default changed to false
  • Removed ProgressListener properties
  • Removed detection of hyperlinks based on blue/underlined text format
  • Removed MetaDateAuto property
  • Removed default creation of post process listeners
  • Added ScroogeXHTMLMain.addDefaultListeners() method for backward compatibility

2017-10-20: Version 6.7.0

  • Improved initialization speed of DOM tree transformation
  • Improved support for Japanese text
  • Added ReplaceWingdingsBullets constructor parameters
  • Updated izpack installer to version 5.1.2

2017-09-02: Version 6.6.0

Bug Fixes
  • Fixed JavaDoc comments (JDK 8 JavaDoc warnings)
  • Added support for table cell background color
  • Added property ConvertAlignment (default true)
  • Faster cell merging algorithm
  • Faster RGB to HTML color conversion
  • Logging of invalid font numbers

2017-08-02: Version 6.5.0

Bug Fixes
  • Fixed empty conversion for RTF which does not end with \par token
  • Fixed JavaDoc comments
  • Added support for paragraph background color
  • Added support for paragraph border box
  • Convert multiple blanks (convert blank+blank to &nbsp;&nbsp;)

2017-07-12: Version 6.4.0

Bug Fixes
  • Fixed stylesheet link element in HTML head
  • Added support for space before and after paragraph
  • Added Viewport property for HTML head
  • Use recommended order of HTML head elements (charset / viewport / title / description / keywords / author)
  • The MetaDate and MetaDateAuto properties are deprecated
  • The ConvertHyperlinksForBlueUnderlinedText property is deprecated
  • The compiled library JAR is sealed for additional security

2017-06-21: Version 6.3.1

Bug Fixes
  • Fixed missing initialization of unicodeSkip state before conversion
  • Fixed release notes in HTML API docs
  • Minor improvements for WPTools and TRichView bullet lists
  • Refactoring of table support (introduced TableWriter interface)
  • Progress listeners are now deprecated, they will be removed in a future version to increase conversion performance
  • Updated izpack installer to version 5.1.1

2017-02-10: Version 6.3.0

  • Added option flag ConversionKeys.CONVERT_HEADERS_AND_FOOTERS
  • Added lang attribute generation for html element (if default language is non-empty)

2016-12-30: Version 6.2.1

Bug Fixes
  • Fixed validation error (anchor div must not be inside p element)

2016-12-30: Version 6.2.0

Bug Fixes
  • Fixed replacement of invalid characters in id attributes
  • Fixed conversion of \line token
  • Fixed generation of missing table columns
  • Added support for merged cells (based on clmrg and clmgf tokens)
  • Added conversion of all bookmarks to div elements with anchor
  • Added property OutputProperties for experimental settings

2016-12-04: Version 6.1.0

Bug Fixes
  • Fixed IndexOutOfBoundsException in conversion of merged table cells
  • Fixed validation error ID must not contain whitespace
  • Added support for table row height
  • Added support for text format changes within hyperlinks
  • Added support and property to disable footnotes conversion (experimental feature)
  • Updated izpack installer to version 5.0.10

2016-08-05: Version 6.0.1

  • Added support for conversion of merged table cells
  • Improved support for Android platform (Base64Utils class)
  • Updated izpack installer to version 5.0.9

2016-07-01: Version 6.0

Bug Fixes
  • Fixed conversion of multi-paragraph text in table cells
  • Fixed character property reset bug
  • Fixed anchor support in LinkURIBuilder class
  • Fixed usage of Serializable interface and serialization capability of converter instances
  • Fixed table conversion when ConvertTables property is false
  • Added support for XHTML 1.0 Transitional document type to DOM based converter
  • Added support for conversion of table left margin
  • Added support for conversion of table width
  • Added support for conversion of table column width
  • Added support for conversion of table border detection
  • Added post process event handlers
  • Added support for Data URI image embedding
  • Added support for automatic footnote numbering
  • Added converter property IndentAmount to define output document indentation
  • Added converter property ConvertHyperlinksForBlueUnderlinedText
  • Added converter method convert(String, Charset)
  • Added assertion that AddOuterHtml is true for conversion to file
  • Added conversion of RTF byte value 9 as a \tab control word
  • Improved document builder to use DOM also for the html head section
  • Improved document clean up (remove attribute less span elements etc)
  • Improved conversion of hyperlinks
  • Improved processing of listtable section
  • Improved replacement of sequence of space characters with 'non breaking space'
  • Changed default value of converter property UseListTable to true

2016-04-30: Version 5.5

  • The converter does no longer generate span tags without attributes <span>text<span>
  • Fixed incorrect values of the ISO 8601 time stamp in the meta date header
  • Improved handling of empty font names, the converter now uses the specified default font instead of the 'first' font in the font replace list
  • Improved implementation of RTF keyword processing, uses no access indirection
  • Code refactorings
  • Usage of Java 7 'try with resources'
  • Moved Symbol font converter to new class
  • Moved XML DOM based result builder to a new class
  • Changed RTFProperties to enum type
  • Improve thread safety of UnicodeConverter

2016-03-12: Version 5.4

  • Fixed image conversion in XML DOM mode when ConvertEmptyParagraphs is true

Minor changes and fixes:

  • Moved and renamed ParagraphConstants to enum ParagraphProperties.Alignment
  • Moved numbering style constants to interface NumberingStyle
  • Updated documentation for Android support (PDF), convertFields default
  • Updated izpack installer to version 5.0.7

2015-12-11: Version 5.3

  • Added experimental support for Word 97 / 2007 list definitions in XML DOM Mode. To activate it use setUseListTable(true).
  • Removed JAXP dependency entry from POM. This change makes it possible to use the library on the Android platform, including table support
  • Since JAXP is no longer required, "JAXP mode" now is called "XML DOM mode", and JAXPWriter is renamed to XMLDOMWriter.
  • Fixed missing blank error for ScroogeXHTMLBase.setConvertLanguage(boolean) in XML DOM mode
  • Fixed ScroogeXHTMLBase.setConvertEmptyParagraphs(boolean) to keep attributes when replacing <p ...></p> with <br>in XML DOM mode
  • Removed line break after style attribute in XML DOM mode - it is not necessary, slows down conversion, and does not work when ScroogeXHTMLBase.setConvertLanguage(boolean) is set to true
  • Deprecated methods related to the classic DOM Writer

Minor changes and fixes:

  • Moved classic DOMWriter to package com.scroogexhtml.dom
  • Moved new XMLDOMWriter to package com.scroogexhtml.xmldom
  • Removed dependencies on package com.scroogexhtml.dom from package com.scroogexhtml.converter
  • Removed dependencies on package com.scroogexhtml.dom from LinkURIBuilder
  • Do not setConvertEmptyParagraphs(false) when <br> is not supported, only log a warning
  • Added more unit tests
  • Improved logging of font name substitution
  • Updated izpack installer to version 5.0.6

2015-11-22: Version 5.2

2015-11-01: Version 5.1

  • Added support for RTF tokens emdash, endash, emspace, enspace, zwj, zwnj
  • Added ScroogeXHTMLBase.setTabString(java.lang.String) to set the tab string
  • Added ScroogeXHTML.convert(ByteArrayInputStream rtf) conversion method
  • Refactored code to use more Java enum types (KeyType, RtfFnc, DestGrp)
  • Removed deprecated property convertUsingPrettyIndents
  • Fixed missing PUSHBACK_BUFFER_SIZE
  • Added some JUnit tests to binary distribution
  • Updated izpack installer to version 5.0.5

2015-09-12: Version 5.0

  • Changed minimum required Java version to 7
  • Updated source to make use of Java 7 language elements (try with resources, multicatch exception handlers etc.)
  • Added experimental table support which uses a JAXP based document model. To enable table conversion, set setUseJAXP and ScroogeXHTMLBase.setConvertTables(boolean) to true.
  • Adding hyperlink listeners is now optional. If ScroogeXHTMLBase.setConvertHyperlinks(boolean) is set to true and no hyperlink listeners are defined, the converter will use the LinkURIBuilder to generate the link element.
  • Changed the default document type from XHTML to HTML5
  • Changed the root package from de.betabeans to com.habarisoft
  • Changed document types to Enum types
  • Deprecated method setConvertUsingPrettyIndents
  • Added logging of warnings for configuration problems
  • Removed conversion methods which have been deprecated in version 4.4
  • Changed default installation folder to the user profile folder as base
  • Updated izpack installer to version 5.0.3

4.X releases

2015-05-02: Version 4.6

  • Fixed image sizing error in the WMF picture helper class WMFPictureHelper
  • Fixed support for special characters
  • Improved support for field conversion

2014-10-04: Version 4.5

  • Added support for bookmarks. Bookmarks will be converted to id attributes of additional <span> elements. Bookmark processing is disabled by default and can be enabled by setting ScroogeXHTMLBase.setConvertBookmarks(boolean) to true.
  • Added support for field expressions to capture hyperlink targets. Field expression processing is disabled by default and can be enabled by setting ScroogeXHTMLBase.setConvertFields(boolean) to true.
  • Added experimental helper class LinkURIBuilder which converts field expressions to hyperlink target URLs
  • Updated installer to version 5.0.0-rc3

2014-08-31: Version 4.4

  • Fixed missing stream#close() calls
  • Fixed potential null pointer exception for documents which contain invalid font table references
  • Fixed uppercase character support in RTF token keywords
  • Experimental support for WMF to PNG conversion using Apache Batik 1.7, the required conversion helper sources are available on request
  • Deprecated all conversion methods which take String arguments for the RTF input file name and encoding, and provided new methods which take File and Charset type parameters
  • MemoryPictureAdapter "base" property defaults to empty string "" instead of "/img?". Note: this is a potentially breaking change
  • Updated installer to check for installed JDK
  • Updated installer to require JRE 1.6 or newer
  • Updated installer to ask for administrator permissions

2014-07-05: Version 4.3

  • Removed legacy debug mode code to reduce library code size
  • Dependency to sl4j updated to 1.7.7
  • Installer updated to IzPack 5.0.0 rc2
  • Removed unnecessary setFont and getFont methods from Fonttable class
  • Removed needless valueOf for in valueOf(getCurFontNr())
  • Java 5: use import static java.util.Collections.unmodifiableMap
  • Added private constructors for utility classes

2014-02-19: Version 4.2

  • The conversion method convert(String rtf) no longer uses HTML encoding (&#nnnnn;) for non-ASCII characters
  • New method convert(String rtfileName, String htmlFileName, String charsetName)
  • Method convert(String rtfileName, String htmlFileName) uses UTF-8 charset
  • New Method convert(String rtf, File outFile, String charsetName)
  • Method convert(String rtf, File outFile) uses UTF-8 charset
  • Fixed unchecked calls in classes FontTable and RTFKeywords
  • Fixed double blanks before lang attribute (961)
  • Code clean-up based on NetBeans IDE hints
  • Use javadoc, remove doxygen
  • Dependency to sl4j updated to 1.7.5
  • Installer updated to IzPack 5.0.0 rc1
  • HTML5 added to unit tests

2013-06-12: Version 4.1

  • Code clean-up based on NetBeans IDE hints
  • Use SL4J logging framework
  • Installer updated to IzPack 4.3.5

2011-05-03: Version 4.0

  • moved to Java 5: uses Generics, StringBuilder,
  • fix for end of subscript and superscript on start of superscript and subscript
  • fix for UnicodeConverter.symbolToUnicode
  • fix for EmbeddedPicture.getWResult() funtion
  • basic support for Flex compatible font formatting (provided by new HTML3Fx translator class)

Older releases

2011-03-08: Version 3.6

  • support for embedded images
  • fixes for bean serialization (tested with online demo on Google App Engine)
  • online documentation built with doxygen
  • switched to Maven 2 build system

2009-09-01: Version 3.5

  • Removed javax dependencies.
  • Compiles on the Android platform.

2009-02-03: Version 3.4

  • Fixed AfterTextConversionListener, BeforeTextConversionListener, HyperLinkListener and ProgrssListener to make them usable in Java IDE property editors.
  • Source code improvements based on FindBugs and PMD reports.

2009-01-11: Version 3.3

  • Improved support for documents with mixed character sets
  • Installer updated to IzPack 4.2

2008-08-02: Version 3.2

  • Added support for hidden text
  • Use no escape XHTML &apos; entity for single quote
  • Improved ANSI character handling

2008-06-29: Version 3.1

  • Improved support for documents with mixed character sets

2008-04-05: Version 3.0

  • New installer

2007-07-10: Version 2.9

  • Improved support for parameter values in the range -2^63..2^63 -1.

2006-03-11: Version 2.8

Conversion speed has been improved, this version is about 60-80% faster. Bugs fixed in 2.8

  • Fixed a bug in ScroogeXHTMLBase.getDefaultFontStyleDefinition()

Changes in 2.8

  • Added simple PlainText conversion
  • Added property convertIndent (note: defaults to true, should not break existing code)
  • Added support for "pnlvlcont" RTF token
  • Added support for "pict" RTF token to route picture data to DG_IGNORE
  • Added support for font definitions (in \fonttbl) which are not embedded in braces
  • Added private method isAlpha to ScroogeXHTMLMain class (see JavaDoc for more information)
  • Added check for EOF ("empty stack") in main conversion loop
  • Added class "Formatter" which controls line breaks and indentation
  • Added class "TranslatorFactory" which creates the Translator object
  • Changed method CharacterProperties.DeepCopy
  • Changed ScroogeXHTMLMain.getLeadingHTMLTags() adds no empty line if the getStyleSheetInclude() property is empty
  • Changed design of the logging helper classes, to allow easy migration to Log4J. For more details, see docs for package "logging"
  • Changed source to be more compliant with Sun's Java coding guidelines
  • Changed classes to final classes whenever possible
  • Changed from Vector to List and ArrayList
  • Changed from Enumeration to Iterator
  • Renamed method "textElementToXHTML" to "process"
  • Renamed class "XHTMLMobileProfile10Translator" to "XHTMLMobileProfile10"
  • Renamed class "ScroogeXHTMLReader" to "RTFReader"
  • Renamed class "ScroogeXHTMLUnicode" to "UnicodeConverter"
  • Renamed class "ScroogeXHTMLWriter" to "DOMWriter"
  • Renamed class "ScroogeXHTMLDocParagraph" to "Paragraph"
  • Renamed class "ScroogeXHTMLDocText" to "FormattedText"
  • Renamed class "ScroogeXHTMLDocument" to "Document"
  • Moved converter classes to new package de.betabeans.scroogexhtml.converter

2004-03-04: Version 2.7

Major changes:

  • Added support for right-to-left languages
  • Added support for XHTML Mobile Profile 1.0 document type
  • Added support for point, em, ex or percent font sizes (property FontSizeScale)
  • Added support for language attributes (lang="..")
  • Added support for Double Byte Character Set strings in font names

Minor changes:

  • Added JUnit tests which perform XML validation
  • Added and improved JavaDoc documentation
  • Fixed a problem which caused a nullpointer exception if there was no default font with font number zero
  • Fixed a problem in the XHTML Transitional translator which generated an invalid parameter for paragraphs with 'justify' alignment
  • Removed all deprecated properties and methods
  • Removed unused class LogAdapter

2003-08-01: Version 2.6

Major changes:

  • Added Double Byte Character Set support for Japanese, Simplified Chinese, Traditional Chinese and Korean

Minor changes:

  • Added JUnit tests
  • Added method setTagStyle(String tagName, String style) which allows to define an additional CSS style parameter for <p>, <br /> and <li> tags 
  • Added method setTagClass(String tagName, String className) which allows to define a CSS class for <p>, <br /> and <li> tags. Example: setTagClass("p", "pink_border") will change the conversion of <p> tags to <p class="pink_border">
  • Added support for \up0 and \dn0 tokens (switch super/subscript off)
  • Added optimizations in ScroogeXHTMLDocument.buildHtml()
  • Added a fix for a bug which added font color "#000000" to hyperlink text
  • Added a bugfix for documents which only contain empty paragraphs
  • Removed ShowMessages property
  • Removed unused constant INDENTCRLF -
  • Removed method 'public void add(DocumentNode node)' from DocumentNode interface and all implementing classes
  • Removed isEmpty method

2003-06-19: Version 2.5

Major changes:

  • Added support for XHTML Basic 1.0, HTML 4.01 Strict and HTML 4.01 Transitional
  • Added performance improvements for the debugger and logger classes
  • Added BeforeTextConversionEvent which receives the text before it is encoded and entites are replaced, and changed the Unicode and XHTML encoding procedures to allow 'deferred' conversion
  • Added property 'useAposTag' which switches conversion of apostroph between &apos; and &#39;. Default: true
  • Added property 'convertEmptyParagraphs' which optionally replaces <p></p> with <br />. Default: false
  • Changed source code to be compliant with the Java coding style guide 
  • New properties 'convertEmptyParagraphs', 'logger', 'loggingEnabled'
  • Symbol font support (greek alphabet)

Minor changes:

  • Added performance improvements for the debugger and logger classes
  • Added optimization for internal ScroogeXHTMLDocParagraph.isEmpty() method
  • Added LogInterface with DefaultLogger and EmptyLogger implementations
  • Added LogAdapter interface and implementation class ScroogeXHTMLLogger to support custom logger implementations
  • Added property 'loggingSupported' which decides if a DefaultLogger or the EmptyLogger will be used
  • Added Encoder interface and implementation class ScroogeXHTMLEncoder
  • Changed debugger implementation, debug mode is now about 100% faster
  • Changed source code to be compliant with the Java coding style guide
  • Changed Encoder interface and implementation class ScroogeXHTMLEncoder to Translator interface, XHTMLTranslator, XHTML10StrictTranslator, XHTML10TransitionalTranslator
  • Changed debugger output, uses a style sheet
  • Changed package structure: classes and interfaces for bean methods are now in package de.betabeans.scroogexhtml.methods

2003-04-06: Version 2.4

Major changes:

  • Added definitions for special entites to ScroogeXHTMLWriter class. Note: there are two possible declarations for the constant XHTML_SPECIAL_ENTITTY_APOS (the apostrophe mark, U+0027 ISOnum). By default, the XHTML standard compliant code "&amp;apos;" will be used by ScroogeXHTML. To support older browsers however it is also possible to use the second declaration, using the code "&amp;#39;"
  • Added support for simple numbered lists
  • Added support for documents which use different character sets. Based on the character set which is assigned to a font, characters will be translated to Unicode.
  • Changed default document content type to UTF-8
  • Added conversion for single quote character to "&amp;apos;"
  • Added support for left, right and first line paragraph indent
  • Added support for highlight color
  • Added property 'IncludeDefaultFontStyle': it sets the font attributes of the "BODY" tag, so all text in the document body will have the defined default font style
  • Renamed the 'textElement' event to 'afterTextConversion'

Minor changes:

  • Changed encoding for unicode characters from hexadecimal (&x...;) to decimal (&#...;)
  • Changed conversion for unknown character sets: do not use Cp1252 as default
  • Added support for font background color in RTF documents
  • Added support for the 'bullet' keyword as a workaround for Writer bullet lists
  • Changed method "replaceHyperlink" to use target="_new" only if XHTMLTransitional = true
  • Fixed a bug which disabled conversion of 'justified' paragraphs
  • Fixed a bug in Unicode support
  • Changed translation for &zwnj; / &zwj; special characters to Unicode
  • Added new property 'Version'
  • Removed unused ScroogeXHTMLBeanInfo class
  • Added support for "\line" token (required line break)
  • Added property "IncludeXMLDeclaration" which inserts the line <?xml version="1.0">. Note: default = true
  • Added filter for invalid form feed characters
  • Added automatic deletion of empty paragraphs at the end of the generated document
  • Changed some logging messages to have a lower level
  • Added method 'getFormattedMessage()' to LogEvent class

2002-12-13: Version 2.3

  • Added new conversion method public String convert(String rtf)

2002-05-11: Version 2.2

  • Changed all event handlers to conform JavaBeans standard. All events can be accessed in the NetBeans IDE now.

2002-05-05: Version 2.1

  • Added GUI demo application
  • Added support for token (non-breaking space)
  • Added property "replaceFonts"
  • Added property "logLevel"
  • Added EventListener "log", using the Log4J logging levels (DEBUG, INFO, WARN, ERROR, FATAL)
  • Changed output on System.err / System.out to log method calls
  • Changed showMessages default to false
  • Changed showMessages implementation to write on System.out (allows to work without log event listener)
  • Removed deprecated profiler class
  • Fixed a bug in the ScroogeXHTMLReader.finishFontName method (use single quotes)
  • Fixed a bug in the ScroogeXHTMLWriter.getWriterstate method (use deepCopy method)
  • Fixed a bug in the ScroogeXHTMLReader class (initialization of curFontNr)

2002-03-01: Version 2.0

  • Initial Version of ScroogeXHTML, replacing the former "Scrooge" JavaBean.
All Packages Core Packages Other Packages 
Package Description
Provides the main bean class, ScroogeXHTML.
Provides the main converter classes.
Provides classes for bean event methods.
Provides classes for RTF field expressions.
Provides classes for RTF list definition processing.
Provides embedded picture support classes.
Provides RTF standard datatypes and constants.
Provides post processing (PostProcessListener implementations).
Provides a 'document object model' based on javax.xml.
Provides RTF paragraph numbering support classes.
Provides RTF table support classes.