ScroogeXHTML RTF converter 10.0.0 API



ScroogeXHTML is a library which can convert a subset of the RTF standard to HTML5 and XHTML.

Release Notes

2022-10-22: Version 10.0.0

  • Move all properties related to conversion of characters, paragraphs and HTML head section to nested properties
  • Introduce an enum based configuration for experimental conversion options
  • Introduce the ConvertParagraphBorders property
  • Target Java SE 11 and newer
  • Use <meta charset='...'> instead of <meta http-equiv='Content-Type' content='text/html; charset=...'>
  • Bump slf4j from 1.7.36 to 2.0.3
  • Remove indirect write access from interface ConversionConfiguration
  • Remove deprecated statistics support
  • Remove deprecated configuration methods
  • Remove deprecated interface FontStatisticsCollecting
  • Remove deprecated class MemoryPictureAdapterBase64
  • Remove method nameAndVersion

2022-09-06: Version 9.6.0

  • Improved conversion of hyperlinks in nested fields

2022-06-16: Version 9.5.0

Bug Fixes
  • Bug in current JDK 1.8 u332 Expected: '<br/>' Found: '<br />'
  • Deprecate getCharPropConvConfig properties in favor of new nested properties
  • Deprecate getParaPropConvConfig properties in favor of new nested properties
  • Deprecate HtmlHead properties in favor of new nested properties
  • New method HtmlHeadConfiguration#addStyleSheetLink(String)
  • The method HtmlHeadConfiguration#getStyleSheetLinks() returns an unmodifiable collection
  • SLF4J dependency updated to 1.7.36

2022-02-12: Version 9.4.0

Bug Fixes
  • The paragraph border width/color of the paragraph before a table is copied to paragraph border after the table
  • EmbeddedPicture is not serializable (required for Servlet)
  • XML parsing is vulnerable to XXE
  • Javadoc error message with JDK 17
  • Break out of the main conversion Loop using a label instead of throwing an exception
  • Deprecated the complimentary class MemoryPictureAdapterBase64. The new class MemoryPictureAdapterDataURI may be used instead.
  • Added translation of the special Unicode character uf0b7 (\u61623, Unicode Private Use Area) which is used in the RTF specification to the bullet character
  • Changed public interface NumberingLevel to a class

2021-05-29: Version 9.3.0

Bug Fixes
  • Fallback to the original picture dimensions if the RTF does not specify the desired picture dimensions
  • Added property ImgAltAttribute to class, which contains the value of the IMG tag attribute \"alt\". Its default value is \"picture\" for backwards compatibility.
  • Support for WMF conversion using the complimentary class MemoryPictureAdapterDataURI is disabled by default, it must be enabled by including PF_WMF at runtime.
  • Added null checks to setter methods of class
  • Moved class com.scroogexhtml.fonts.FontDef to the Example artifact, as it is used only there.
  • Removed assertions in class com.scroogexhtml.xmldom.table.TableSupport.
  • Fixed QA hints and warnings

2021-04-22: Version 9.2.0

Bug Fixes
  • Reset character attributes on listtext token (workaround for WPTools bug)
  • Added boolean property ConvertParagraphMargins (default true)
  • Support picture data extraction for binary data
  • Support dibitmap token (device independent bitmap) for image data extraction
  • WMF mime type is image/x-wmf, EMF is image/x-emf
  • GetPostProcessListeners returns a UnmodifiableCollection
  • Deprecated FontStatisticsCollecting interface
  • Use xmlunit in TableSupportTests
  • Fixed QA hints and warnings

2021-03-06: Version 9.1.0

Bug Fixes
  • The lang attribute on the html tag must not be created if ConvertLanguage is false
  • The <!-- ... --> comments in CSS cause errors, replaced by /* ... */
  • Code in createNumberingWriter() is executed too often
  • Ignore tbldef tokens at the end of the row
  • RTF with bin keyword causes conversion error
  • Added prefix EXPERIMENTAL_ for experimental conversion options
  • Use {} for log message parameters instead of String.format
  • Lowered the fonttable entry log level to TRACE
  • Fixed QA hints and warnings

2020-06-26: Version 9.0.0

  • Minimum supported Java version is JRE 8
  • Improved support for header and footer sections
  • Added Path based conversion methods
  • Moved MemoryPictureAdapter and MemoryPictureAdapterBase64 to new ScroogeXHTML-Pictures artifact
  • Moved optional DefaultFontStatistics class to ScroogeXHTML-Addons artifact
  • Moved optional post-processing classes to ScroogeXHTML-Addons artifact
  • Removed deprecated code and file based conversion methods

2020-03-14: Version 8.7.0


2020-01-19: Version 8.6.0

  • Do not throw an exception on unknown borderstyle in BorderStyleBuilder#borderSideToString(Border)
  • Fixed PMD warnings
  • Fixed CheckStyle hints
  • Updated SLF4J dependency to 1.7.30

2019-08-20: Version 8.5.0

  • Check for empty text in addStyledTexts() (this makes the post-processing class StripWhitespaceSpanNodes obsolete)
  • Check for span attributes in prepareHyperlinkElement() (this makes the post-processing class StripAttributeLessSpanNodes obsolete)
  • Remove references to obsolete post-processing classes in addDefaultListeners()
  • Deprecate all obsolete post processor classes and the com.scroogexhtml.tidy package
  • Deprecate addDefaultListeners() because all post-processing classes are no longer used

2019-07-10: Version 8.4.0

  • Post process listener ReplaceEmptyParagraphNodes is no longer included in the addDefaultListeners() method. Its functionality is already covered more efficiently in the core conversion routines.
  • Post process listener ReplaceMonospaceBlanks is no longer included in the addDefaultListeners() method. Its functionality is already covered more efficiently in the core conversion routines.
  • Avoid throwing IOException to exit the main conversion loop
  • Internal code improvements
Bug Fixes
  • Setting the ConvertHyperlinks property to false has no effect
  • Removed ReplaceMonospaceBlanks post process listener as it had side effects (e.g. missing hyperlinks), and is obsolete

2019-06-08: Version 8.3.0

  • Pictures which are tagged with the \nonshppict token are not included in the conversion
  • Unit test improvements and additions

2019-05-07: Version 8.2.1

Bug Fixes
  • Example post-processing class MergeBorderDivNodes does not merge div nodes because it searches for ‘border-style’ instead of 'border'

2019-03-15: Version 8.2.0

  • Adjusted image support code for WMF to PNG image conversion example
  • Added WMF to PNG conversion example code using Apache Batik
  • Changed CSS to use no leading blank after the colon
Bug Fixes
  • Fixed CheckStyle warnings
  • Fixed support for Java 11 in Base64Utils
  • Fixed support for Java 11 in integration tests
  • Upgraded all tests from JUnit 3 to JUnit 4

2019-03-01: Version 8.1.0

  • Added support for paragraph border color conversion
  • Added support for paragraph border width conversion
  • Added support for cell border width conversion
  • Avoid hard exit on missing cell table definitions

2019-01-12: Version 8.0.0

  • Moved from package com.habarisoft.scroogexhtml to com.scroogexhtml
  • Tested with Oracle JDK 8 and Oracle OpenJDK 11 on Windows and Linux
  • New FontReplacing interface and FontReplacer property
  • New FontStatisticsCollecting interface and FontStatistics property
  • Improved table cell border conversion
  • Improved paragraph border conversion
  • MergeBorderDivNodes, a post-processor example class for paragraph border cleanup
  • ConvertIndent default value is false now (as documented)
  • Methods String convert(String rtf) and String convert(ByteArrayInputStream rtf) throw the unchecked RuntimeException instead of IOException
  • Fixed Spotbugs warnings and JavaDoc errors
Bug Fixes
  • Fixed a color conversion bug

2018-08-10: Version 7.3.0

  • Added support for multiple external style sheets (property StyleSheetLinks), the StyleSheetLink property is now deprecated
  • Changed finishColortableEntry() to improve conversion speed
  • Changed removeHtmlTags() to improve conversion speed
  • Updated izpack installer to version 5.1.3
  • Removed unused methods
  • Fixed Findbugs/Spotbugs warnings

2018-03-24: Version 7.2.0

  • Added support for vertical alignment in table cells
  • Standalone XHTML documents begin with an XML declaration if the charset is not UTF-8
  • Table conversion uses the class="table table-bordered" attribute (instead of border="1") to indicate that the table is bordered. This fixes the W3C HTML validator warning "The border attribute on the table element is presentational markup". Applications which still require the border="1" attribute may enable it with setOutputProperty(ConversionKeys.USE_TABLE_BORDER_ATTRIBUTE, "yes");
  • Removed the enclosing <!-- ... --> around the CSS code within the <style> element for standalone documents
  • Removed the attribute type="text/css" for the <style> element for standalone HTML5 documents. This fixes the W3C HTML validator warning: "The type attribute for the style element is not needed and should be omitted".
  • Changed BODY {... to lowercase body {... in auto-generated CSS code
  • The <style> element includes comments before auto-generated and custom styles
  • Fixed Findbugs warnings for non-transient non-serializable instance fields in MemoryPictureAdapter and ListHeaderInfo class
  • Fixed Findbugs warnings for reliance on default encoding in com.scroogexhtml.ScroogeXHTML.convert
  • Fixed Findbugs warnings for casting and passing to ceil in com.scroogexhtml.converter.AbstractWriter.getFontSizeStyle
  • Fixed Findbugs warnings for casting and passing to ceil in and getWGoalPx

2018-02-12: Version 7.1.0

  • Added support for five character encodings, including MacRoman
  • Added support for non-breaking hyphen (RTF token \_)
  • Improved conversion of 'Symbol' font
  • As a side effect of the enhanced 'Symbol' font conversion, bullet list conversion now (correctly) emits &bullet; instead of &middot;
Bug Fixes
  • Emit the HTML bullet \u2022 for RTF token "\bullet\" instead of middot

2017-10-28: Version 7.0.0

Bug Fixes
  • Always hide all "hidden" text (even it ConvertFontStyle is false)
  • Added option to disable paragraph border conversion - to disable, use setOutputProperty(ConversionKeys.CONVERT_PARAGRAPH_BORDERS, "no");
  • Improved algorithm for ConvertEmptyParagraphs
  • Improved Unicode support for Japanese text
  • Improved initialization speed of DOM tree transformation
  • Improved support for detection of outer table border
  • Experimental support for a multilevel numbering writer - enabled with setOutputProperty(ConversionKeys.SUPPORT_MULTILEVEL, "yes");
  • Experimental support for uppercase and lowercase roman number - requires setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
  • Experimental support for \*\pn paragraph numbering - enabled with setOutputProperty(ConversionKeys.SUPPORT_STAR_PN, "yes");
  • ConvertFootnotes value default changed to false
  • Experimental UseListTable property is deprecated now - use setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
  • UseListTable property default changed to false
  • Removed ProgressListener properties
  • Removed detection of hyperlinks based on blue/underlined text format
  • Removed MetaDateAuto property
  • Removed default creation of post process listeners
  • Added ScroogeXHTMLMain.addDefaultListeners() method for backward compatibility
Provides the main converter class, ScroogeXHTML.
Provides the main converter classes.
Provides classes for CSS.
Provides classes for event methods.
Provides classes for RTF field expressions.
Provides classes for font substitution, font statistics.
Provides classes for RTF list definition processing.
Provides embedded picture support classes.
Provides RTF standard datatypes and constants.
Provides table conversion support classes.
Provides a 'document object model' based on javax.xml.
Provides RTF paragraph numbering support classes.
Provides section / header / footer support classes.
Provides RTF table support classes.