ScroogeXHTML for the Java™ platform is a library which can convert a subset of the RTF standard to HTML5 and XHTML.

Release Notes

2018-08-10: Version 7.3.0

  • Added support for multiple external style sheets (property StyleSheetLinks), the StyleSheetLink property is now deprecated
  • Changed finishColortableEntry() to improve conversion speed
  • Changed removeHtmlTags() to improve conversion speed
  • Updated izpack installer to version 5.1.3
  • Removed unused methods
  • Fixed Findbugs/Spotbugs warnings

2018-03-24: Version 7.2.0

  • Added support for vertical alignment in table cells
  • Standalone XHTML documents begin with a XML declaration if the charset is not UTF-8
  • Table conversion uses the class="table table-bordered" attribute (instead of border="1") to indicate that the table is bordered. This fixes the W3C HTML validator warning "The border attribute on the table element is presentational markup". Applications which still require the border="1" attribute may enable it with setOutputProperty(ConversionKeys.USE_TABLE_BORDER_ATTRIBUTE, "yes");
  • Removed the enclosing <!-- ... --> around the CSS code within the <style> element for standalone documents
  • Removed the attribute type="text/css" for the <style> element for standalone HTML5 documents. This fixes the W3C HTML validator warning: "The type attribute for the style element is not needed and should be omitted".
  • Changed BODY {... to lowercase body {... in auto-generated CSS code
  • The <style> element includes comments before auto-generated and custom styles
  • Fixed Findbugs warnings for non-transient non-serializable instance fields in MemoryPictureAdapter and ListHeaderInfo class
  • Fixed Findbugs warnings for reliance on default encoding in com.scroogexhtml.ScroogeXHTML.convert
  • Fixed Findbugs warnings for casting and passing to ceil in com.scroogexhtml.converter.AbstractWriter.getFontSizeStyle
  • Fixed Findbugs warnings for casting and passing to ceil in and getWGoalPx

2018-02-12: Version 7.1.0

  • Added support for five character encodings, including MacRoman
  • Added support for non-breaking hyphen (RTF token \_)
  • Improved conversion of 'Symbol' font
  • As a side effect of the enhanced 'Symbol' font conversion, bullet list conversion now (correctly) emits &bullet; instead of &middot;
Bug Fixes
  • Emit the HTML bullet \u2022 for RTF token "\bullet\" instead of middot

2017-10-28: Version 7.0.0

Bug Fixes
  • Always hide all "hidden" text (even it ConvertFontStyle is false)
  • Added option to disable paragraph border conversion - to disable, use setOutputProperty(ConversionKeys.CONVERT_PARAGRAPH_BORDERS, "no");
  • Improved algorithm for ConvertEmptyParagraphs
  • Improved Unicode support for Japanese text
  • Improved initialization speed of DOM tree transformation
  • Improved support for detection of outer table border
  • Experimental support for a multilevel numbering writer - enabled with setOutputProperty(ConversionKeys.SUPPORT_MULTILEVEL, "yes");
  • Experimental support for uppercase and lowercase roman number - requires setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
  • Experimental support for \*\pn paragraph numbering - enabled with setOutputProperty(ConversionKeys.SUPPORT_STAR_PN, "yes");
  • ConvertFootnotes value default changed to false
  • Experimental UseListTable property is deprecated now - use setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
  • UseListTable property default changed to false
  • Removed ProgressListener properties
  • Removed detection of hyperlinks based on blue/underlined text format
  • Removed MetaDateAuto property
  • Removed default creation of post process listeners
  • Added ScroogeXHTMLMain.addDefaultListeners() method for backward compatibility

2017-10-20: Version 6.7.0

  • Improved initialization speed of DOM tree transformation
  • Improved support for Japanese text
  • Added ReplaceWingdingsBullets constructor parameters
  • Updated izpack installer to version 5.1.2

2017-09-02: Version 6.6.0

Bug Fixes
  • Fixed JavaDoc comments (JDK 8 JavaDoc warnings)
  • Added support for table cell background color
  • Added property ConvertAlignment (default true)
  • Faster cell merging algorithm
  • Faster RGB to HTML color conversion
  • Logging of invalid font numbers

2017-08-02: Version 6.5.0

Bug Fixes
  • Fixed empty conversion for RTF which does not end with \par token
  • Fixed JavaDoc comments
  • Added support for paragraph background color
  • Added support for paragraph border box
  • Convert multiple blanks (convert blank+blank to &nbsp;&nbsp;)

2017-07-12: Version 6.4.0

Bug Fixes
  • Fixed stylesheet link element in HTML head
  • Added support for space before and after paragraph
  • Added Viewport property for HTML head
  • Use recommended order of HTML head elements (charset / viewport / title / description / keywords / author)
  • The MetaDate and MetaDateAuto properties are deprecated
  • The ConvertHyperlinksForBlueUnderlinedText property is deprecated
  • The compiled library JAR is sealed for additional security

2017-06-21: Version 6.3.1

Bug Fixes
  • Fixed missing initialization of unicodeSkip state before conversion
  • Fixed release notes in HTML API docs
  • Minor improvements for WPTools and TRichView bullet lists
  • Refactoring of table support (introduced TableWriter interface)
  • Progress listeners are now deprecated, they will be removed in a future version to increase conversion performance
  • Updated izpack installer to version 5.1.1

2017-02-10: Version 6.3.0

  • Added option flag ConversionKeys.CONVERT_HEADERS_AND_FOOTERS
  • Added lang attribute generation for html element (if default language is non-empty)

2016-12-30: Version 6.2.1

Bug Fixes
  • Fixed validation error (anchor div must not be inside p element)

2016-12-30: Version 6.2.0

Bug Fixes
  • Fixed replacement of invalid characters in id attributes
  • Fixed conversion of \line token
  • Fixed generation of missing table columns
  • Added support for merged cells (based on clmrg and clmgf tokens)
  • Added conversion of all bookmarks to div elements with anchor
  • Added property OutputProperties for experimental settings

2016-12-04: Version 6.1.0

Bug Fixes
  • Fixed IndexOutOfBoundsException in conversion of merged table cells
  • Fixed validation error ID must not contain whitespace
  • Added support for table row height
  • Added support for text format changes within hyperlinks
  • Added support and property to disable footnotes conversion (experimental feature)
  • Updated izpack installer to version 5.0.10

2016-08-05: Version 6.0.1

  • Added support for conversion of merged table cells
  • Improved support for Android platform (Base64Utils class)
  • Updated izpack installer to version 5.0.9

2016-07-01: Version 6.0

Bug Fixes
  • Fixed conversion of multi-paragraph text in table cells
  • Fixed character property reset bug
  • Fixed anchor support in LinkURIBuilder class
  • Fixed usage of Serializable interface and serialization capability of converter instances
  • Fixed table conversion when ConvertTables property is false
  • Added support for XHTML 1.0 Transitional document type to DOM based converter
  • Added support for conversion of table left margin
  • Added support for conversion of table width
  • Added support for conversion of table column width
  • Added support for conversion of table border detection
  • Added post process event handlers
  • Added support for Data URI image embedding
  • Added support for automatic footnote numbering
  • Added converter property IndentAmount to define output document indentation
  • Added converter property ConvertHyperlinksForBlueUnderlinedText
  • Added converter method convert(String, Charset)
  • Added assertion that AddOuterHtml is true for conversion to file
  • Added conversion of RTF byte value 9 as a \tab control word
  • Improved document builder to use DOM also for the html head section
  • Improved document clean up (remove attribute less span elements etc)
  • Improved conversion of hyperlinks
  • Improved processing of listtable section
  • Improved replacement of sequence of space characters with 'non breaking space'
  • Changed default value of converter property UseListTable to true

2016-04-30: Version 5.5

  • The converter does no longer generate span tags without attributes <span>text<span>
  • Fixed incorrect values of the ISO 8601 time stamp in the meta date header
  • Improved handling of empty font names, the converter now uses the specified default font instead of the 'first' font in the font replace list
  • Improved implementation of RTF keyword processing, uses no access indirection
  • Code refactorings
  • Usage of Java 7 'try with resources'
  • Moved Symbol font converter to new class
  • Moved XML DOM based result builder to a new class
  • Changed RTFProperties to enum type
  • Improve thread safety of UnicodeConverter

2016-03-12: Version 5.4

  • Fixed image conversion in XML DOM mode when ConvertEmptyParagraphs is true

Minor changes and fixes:

  • Moved and renamed ParagraphConstants to enum ParagraphProperties.Alignment
  • Moved numbering style constants to interface NumberingStyle
  • Updated documentation for Android support (PDF), convertFields default
  • Updated izpack installer to version 5.0.7

2015-12-11: Version 5.3

  • Added experimental support for Word 97 / 2007 list definitions in XML DOM Mode. To activate it use setUseListTable(true).
  • Removed JAXP dependency entry from POM. This change makes it possible to use the library on the Android platform, including table support
  • Since JAXP is no longer required, "JAXP mode" now is called "XML DOM mode", and JAXPWriter is renamed to XMLDOMWriter.
  • Fixed missing blank error for ScroogeXHTMLBase.setConvertLanguage(boolean) in XML DOM mode
  • Fixed ScroogeXHTMLBase.setConvertEmptyParagraphs(boolean) to keep attributes when replacing <p ...></p> with <br>in XML DOM mode
  • Removed line break after style attribute in XML DOM mode - it is not necessary, slows down conversion, and does not work when ScroogeXHTMLBase.setConvertLanguage(boolean) is set to true
  • Deprecated methods related to the classic DOM Writer

Minor changes and fixes:

  • Moved classic DOMWriter to package com.scroogexhtml.dom
  • Moved new XMLDOMWriter to package com.scroogexhtml.xmldom
  • Removed dependencies on package com.scroogexhtml.dom from package com.scroogexhtml.converter
  • Removed dependencies on package com.scroogexhtml.dom from LinkURIBuilder
  • Do not setConvertEmptyParagraphs(false) when <br> is not supported, only log a warning
  • Added more unit tests
  • Improved logging of font name substitution
  • Updated izpack installer to version 5.0.6

2015-11-22: Version 5.2

2015-11-01: Version 5.1

  • Added support for RTF tokens emdash, endash, emspace, enspace, zwj, zwnj
  • Added ScroogeXHTMLBase.setTabString(java.lang.String) to set the tab string
  • Added ScroogeXHTML.convert(ByteArrayInputStream rtf) conversion method
  • Refactored code to use more Java enum types (KeyType, RtfFnc, DestGrp)
  • Removed deprecated property convertUsingPrettyIndents
  • Fixed missing PUSHBACK_BUFFER_SIZE
  • Added some JUnit tests to binary distribution
  • Updated izpack installer to version 5.0.5

2015-09-12: Version 5.0

  • Changed minimum required Java version to 7
  • Updated source to make use of Java 7 language elements (try with resources, multicatch exception handlers etc.)
  • Added experimental table support which uses a JAXP based document model. To enable table conversion, set setUseJAXP and ScroogeXHTMLBase.setConvertTables(boolean) to true.
  • Adding hyperlink listeners is now optional. If ScroogeXHTMLBase.setConvertHyperlinks(boolean) is set to true and no hyperlink listeners are defined, the converter will use the LinkURIBuilder to generate the link element.
  • Changed the default document type from XHTML to HTML5
  • Changed the root package from de.betabeans to com.habarisoft
  • Changed document types to Enum types
  • Deprecated method setConvertUsingPrettyIndents
  • Added logging of warnings for configuration problems
  • Removed conversion methods which have been deprecated in version 4.4
  • Changed default installation folder to the user profile folder as base
  • Updated izpack installer to version 5.0.3

4.X releases

2015-05-02: Version 4.6

  • Fixed image sizing error in the WMF picture helper class WMFPictureHelper
  • Fixed support for special characters
  • Improved support for field conversion

2014-10-04: Version 4.5

  • Added support for bookmarks. Bookmarks will be converted to id attributes of additional <span> elements. Bookmark processing is disabled by default and can be enabled by setting ScroogeXHTMLBase.setConvertBookmarks(boolean) to true.
  • Added support for field expressions to capture hyperlink targets. Field expression processing is disabled by default and can be enabled by setting ScroogeXHTMLBase.setConvertFields(boolean) to true.
  • Added experimental helper class LinkURIBuilder which converts field expressions to hyperlink target URLs
  • Updated installer to version 5.0.0-rc3

2014-08-31: Version 4.4

  • Fixed missing stream#close() calls
  • Fixed potential null pointer exception for documents which contain invalid font table references
  • Fixed uppercase character support in RTF token keywords
  • Experimental support for WMF to PNG conversion using Apache Batik 1.7, the required conversion helper sources are available on request
  • Deprecated all conversion methods which take String arguments for the RTF input file name and encoding, and provided new methods which take File and Charset type parameters
  • MemoryPictureAdapter "base" property defaults to empty string "" instead of "/img?". Note: this is a potentially breaking change
  • Updated installer to check for installed JDK
  • Updated installer to require JRE 1.6 or newer
  • Updated installer to ask for administrator permissions

2014-07-05: Version 4.3

  • Removed legacy debug mode code to reduce library code size
  • Dependency to sl4j updated to 1.7.7
  • Installer updated to IzPack 5.0.0 rc2
  • Removed unnecessary setFont and getFont methods from Fonttable class
  • Removed needless valueOf for in valueOf(getCurFontNr())
  • Java 5: use import static java.util.Collections.unmodifiableMap
  • Added private constructors for utility classes

2014-02-19: Version 4.2

  • The conversion method convert(String rtf) no longer uses HTML encoding (&#nnnnn;) for non-ASCII characters
  • New method convert(String rtfileName, String htmlFileName, String charsetName)
  • Method convert(String rtfileName, String htmlFileName) uses UTF-8 charset
  • New Method convert(String rtf, File outFile, String charsetName)
  • Method convert(String rtf, File outFile) uses UTF-8 charset
  • Fixed unchecked calls in classes FontTable and RTFKeywords
  • Fixed double blanks before lang attribute (961)
  • Code clean-up based on NetBeans IDE hints
  • Use javadoc, remove doxygen
  • Dependency to sl4j updated to 1.7.5
  • Installer updated to IzPack 5.0.0 rc1
  • HTML5 added to unit tests

2013-06-12: Version 4.1

  • Code clean-up based on NetBeans IDE hints
  • Use SL4J logging framework
  • Installer updated to IzPack 4.3.5

2011-05-03: Version 4.0

  • moved to Java 5: uses Generics, StringBuilder,
  • fix for end of subscript and superscript on start of superscript and subscript
  • fix for UnicodeConverter.symbolToUnicode
  • fix for EmbeddedPicture.getWResult() funtion
  • basic support for Flex compatible font formatting (provided by new HTML3Fx translator class)

Older releases

2011-03-08: Version 3.6

  • support for embedded images
  • fixes for bean serialization (tested with online demo on Google App Engine)
  • online documentation built with doxygen
  • switched to Maven 2 build system

2009-09-01: Version 3.5

  • Removed javax dependencies.
  • Compiles on the Android platform.

2009-02-03: Version 3.4

  • Fixed AfterTextConversionListener, BeforeTextConversionListener, HyperLinkListener and ProgrssListener to make them usable in Java IDE property editors.
  • Source code improvements based on FindBugs and PMD reports.

2009-01-11: Version 3.3

  • Improved support for documents with mixed character sets
  • Installer updated to IzPack 4.2

2008-08-02: Version 3.2

  • Added support for hidden text
  • Use no escape XHTML &apos; entity for single quote
  • Improved ANSI character handling

2008-06-29: Version 3.1

  • Improved support for documents with mixed character sets

2008-04-05: Version 3.0

  • New installer

2007-07-10: Version 2.9

  • Improved support for parameter values in the range -2^63..2^63 -1.

2006-03-11: Version 2.8

Conversion speed has been improved, this version is about 60-80% faster. Bugs fixed in 2.8

  • Fixed a bug in ScroogeXHTMLBase.getDefaultFontStyleDefinition()

Changes in 2.8

  • Added simple PlainText conversion
  • Added property convertIndent (note: defaults to true, should not break existing code)
  • Added support for "pnlvlcont" RTF token
  • Added support for "pict" RTF token to route picture data to DG_IGNORE
  • Added support for font definitions (in \fonttbl) which are not embedded in braces
  • Added private method isAlpha to ScroogeXHTMLMain class (see JavaDoc for more information)
  • Added check for EOF ("empty stack") in main conversion loop
  • Added class "Formatter" which controls line breaks and indentation
  • Added class "TranslatorFactory" which creates the Translator object
  • Changed method CharacterProperties.DeepCopy
  • Changed ScroogeXHTMLMain.getLeadingHTMLTags() adds no empty line if the getStyleSheetInclude() property is empty
  • Changed design of the logging helper classes, to allow easy migration to Log4J. For more details, see docs for package "logging"
  • Changed source to be more compliant with Sun's Java coding guidelines
  • Changed classes to final classes whenever possible
  • Changed from Vector to List and ArrayList
  • Changed from Enumeration to Iterator
  • Renamed method "textElementToXHTML" to "process"
  • Renamed class "XHTMLMobileProfile10Translator" to "XHTMLMobileProfile10"
  • Renamed class "ScroogeXHTMLReader" to "RTFReader"
  • Renamed class "ScroogeXHTMLUnicode" to "UnicodeConverter"
  • Renamed class "ScroogeXHTMLWriter" to "DOMWriter"
  • Renamed class "ScroogeXHTMLDocParagraph" to "Paragraph"
  • Renamed class "ScroogeXHTMLDocText" to "FormattedText"
  • Renamed class "ScroogeXHTMLDocument" to "Document"
  • Moved converter classes to new package de.betabeans.scroogexhtml.converter

2004-03-04: Version 2.7

Major changes:

  • Added support for right-to-left languages
  • Added support for XHTML Mobile Profile 1.0 document type
  • Added support for point, em, ex or percent font sizes (property FontSizeScale)
  • Added support for language attributes (lang="..")
  • Added support for Double Byte Character Set strings in font names

Minor changes:

  • Added JUnit tests which perform XML validation
  • Added and improved JavaDoc documentation
  • Fixed a problem which caused a nullpointer exception if there was no default font with font number zero
  • Fixed a problem in the XHTML Transitional translator which generated an invalid parameter for paragraphs with 'justify' alignment
  • Removed all deprecated properties and methods
  • Removed unused class LogAdapter

2003-08-01: Version 2.6

Major changes:

  • Added Double Byte Character Set support for Japanese, Simplified Chinese, Traditional Chinese and Korean

Minor changes:

  • Added JUnit tests
  • Added method setTagStyle(String tagName, String style) which allows to define an additional CSS style parameter for <p>, <br /> and <li> tags 
  • Added method setTagClass(String tagName, String className) which allows to define a CSS class for <p>, <br /> and <li> tags. Example: setTagClass("p", "pink_border") will change the conversion of <p> tags to <p class="pink_border">
  • Added support for \up0 and \dn0 tokens (switch super/subscript off)
  • Added optimizations in ScroogeXHTMLDocument.buildHtml()
  • Added a fix for a bug which added font color "#000000" to hyperlink text
  • Added a bugfix for documents which only contain empty paragraphs
  • Removed ShowMessages property
  • Removed unused constant INDENTCRLF -
  • Removed method 'public void add(DocumentNode node)' from DocumentNode interface and all implementing classes
  • Removed isEmpty method

2003-06-19: Version 2.5

Major changes:

  • Added support for XHTML Basic 1.0, HTML 4.01 Strict and HTML 4.01 Transitional
  • Added performance improvements for the debugger and logger classes
  • Added BeforeTextConversionEvent which receives the text before it is encoded and entites are replaced, and changed the Unicode and XHTML encoding procedures to allow 'deferred' conversion
  • Added property 'useAposTag' which switches conversion of apostroph between &apos; and &#39;. Default: true
  • Added property 'convertEmptyParagraphs' which optionally replaces <p></p> with <br />. Default: false
  • Changed source code to be compliant with the Java coding style guide 
  • New properties 'convertEmptyParagraphs', 'logger', 'loggingEnabled'
  • Symbol font support (greek alphabet)

Minor changes:

  • Added performance improvements for the debugger and logger classes
  • Added optimization for internal ScroogeXHTMLDocParagraph.isEmpty() method
  • Added LogInterface with DefaultLogger and EmptyLogger implementations
  • Added LogAdapter interface and implementation class ScroogeXHTMLLogger to support custom logger implementations
  • Added property 'loggingSupported' which decides if a DefaultLogger or the EmptyLogger will be used
  • Added Encoder interface and implementation class ScroogeXHTMLEncoder
  • Changed debugger implementation, debug mode is now about 100% faster
  • Changed source code to be compliant with the Java coding style guide
  • Changed Encoder interface and implementation class ScroogeXHTMLEncoder to Translator interface, XHTMLTranslator, XHTML10StrictTranslator, XHTML10TransitionalTranslator
  • Changed debugger output, uses a style sheet
  • Changed package structure: classes and interfaces for bean methods are now in package de.betabeans.scroogexhtml.methods

2003-04-06: Version 2.4

Major changes:

  • Added definitions for special entites to ScroogeXHTMLWriter class. Note: there are two possible declarations for the constant XHTML_SPECIAL_ENTITTY_APOS (the apostrophe mark, U+0027 ISOnum). By default, the XHTML standard compliant code "&amp;apos;" will be used by ScroogeXHTML. To support older browsers however it is also possible to use the second declaration, using the code "&amp;#39;"
  • Added support for simple numbered lists
  • Added support for documents which use different character sets. Based on the character set which is assigned to a font, characters will be translated to Unicode.
  • Changed default document content type to UTF-8
  • Added conversion for single quote character to "&amp;apos;"
  • Added support for left, right and first line paragraph indent
  • Added support for highlight color
  • Added property 'IncludeDefaultFontStyle': it sets the font attributes of the "BODY" tag, so all text in the document body will have the defined default font style
  • Renamed the 'textElement' event to 'afterTextConversion'

Minor changes:

  • Changed encoding for unicode characters from hexadecimal (&x...;) to decimal (&#...;)
  • Changed conversion for unknown character sets: do not use Cp1252 as default
  • Added support for font background color in RTF documents
  • Added support for the 'bullet' keyword as a workaround for Writer bullet lists
  • Changed method "replaceHyperlink" to use target="_new" only if XHTMLTransitional = true
  • Fixed a bug which disabled conversion of 'justified' paragraphs
  • Fixed a bug in Unicode support
  • Changed translation for &zwnj; / &zwj; special characters to Unicode
  • Added new property 'Version'
  • Removed unused ScroogeXHTMLBeanInfo class
  • Added support for "\line" token (required line break)
  • Added property "IncludeXMLDeclaration" which inserts the line <?xml version="1.0">. Note: default = true
  • Added filter for invalid form feed characters
  • Added automatic deletion of empty paragraphs at the end of the generated document
  • Changed some logging messages to have a lower level
  • Added method 'getFormattedMessage()' to LogEvent class

2002-12-13: Version 2.3

  • Added new conversion method public String convert(String rtf)

2002-05-11: Version 2.2

  • Changed all event handlers to conform JavaBeans standard. All events can be accessed in the NetBeans IDE now.

2002-05-05: Version 2.1

  • Added GUI demo application
  • Added support for token (non-breaking space)
  • Added property "replaceFonts"
  • Added property "logLevel"
  • Added EventListener "log", using the Log4J logging levels (DEBUG, INFO, WARN, ERROR, FATAL)
  • Changed output on System.err / System.out to log method calls
  • Changed showMessages default to false
  • Changed showMessages implementation to write on System.out (allows to work without log event listener)
  • Removed deprecated profiler class
  • Fixed a bug in the ScroogeXHTMLReader.finishFontName method (use single quotes)
  • Fixed a bug in the ScroogeXHTMLWriter.getWriterstate method (use deepCopy method)
  • Fixed a bug in the ScroogeXHTMLReader class (initialization of curFontNr)

2002-03-01: Version 2.0

  • Initial Version of ScroogeXHTML, replacing the former "Scrooge" JavaBean.
