Overview
Introduction
ScroogeXHTML is a library which can convert a subset of the RTF standard to HTML5 and XHTML.Release Notes
2022-06-16: Version 9.5.0
Bug Fixes
- Bug in current JDK 1.8 u332 Expected: '<br/>' Found: '<br />'
Enhancements
- Deprecate getCharPropConvConfig properties in favor of new nested properties
- Deprecate getParaPropConvConfig properties in favor of new nested properties
- Deprecate HtmlHead properties in favor of new nested properties
- New method HtmlHeadConfiguration#addStyleSheetLink(String)
- The method HtmlHeadConfiguration#getStyleSheetLinks() returns an unmodifiable collection
- SLF4J dependency updated to 1.7.36
2022-02-12: Version 9.4.0
Bug Fixes
- The paragraph border width/color of the paragraph before a table is copied to paragraph border after the table
- EmbeddedPicture is not serializable (required for Servlet)
- XML parsing is vulnerable to XXE
- Javadoc error message with JDK 17
Enhancements
- Break out of the main conversion Loop using a label instead of throwing an exception
- Deprecated the complimentary class MemoryPictureAdapterBase64. The new class MemoryPictureAdapterDataURI may be used instead.
- Added translation of the special Unicode character uf0b7 (\u61623, Unicode Private Use Area) which is used in the RTF specification to the bullet character
- Changed public interface NumberingLevel to a class
2021-05-29: Version 9.3.0
Bug Fixes
- Fallback to the original picture dimensions if the RTF does not specify the desired picture dimensions
Enhancements
- Added property ImgAltAttribute to class com.scroogexhtml.pictures.EmbeddedPicture, which contains the value of the IMG tag attribute \"alt\". Its default value is \"picture\" for backwards compatibility.
- Support for WMF conversion using the complimentary class MemoryPictureAdapterDataURI is disabled by default, it must be enabled by including PF_WMF at runtime.
- Added null checks to setter methods of class com.scroogexhtml.pictures.EmbeddedPicture
- Moved class com.scroogexhtml.fonts.FontDef to the Example artifact, as it is used only there.
- Removed assertions in class com.scroogexhtml.xmldom.table.TableSupport.
- Fixed QA hints and warnings
2021-04-22: Version 9.2.0
Bug Fixes
- Reset character attributes on listtext token (workaround for WPTools bug)
Enhancements
- Added boolean property ConvertParagraphMargins (default true)
- Support picture data extraction for binary data
- Support dibitmap token (device independent bitmap) for image data extraction
- WMF mime type is image/x-wmf, EMF is image/x-emf
- GetPostProcessListeners returns a UnmodifiableCollection
- Deprecated FontStatisticsCollecting interface
- Use xmlunit in TableSupportTests
- Fixed QA hints and warnings
2021-03-06: Version 9.1.0
Bug Fixes
- The lang attribute on the html tag must not be created if ConvertLanguage is false
- The <!-- ... --> comments in CSS cause errors, replaced by /* ... */
- Code in createNumberingWriter() is executed too often
- Ignore tbldef tokens at the end of the row
- RTF with bin keyword causes conversion error
Enhancements
- Added prefix EXPERIMENTAL_ for experimental conversion options
- Use {} for log message parameters instead of String.format
- Lowered the fonttable entry log level to TRACE
- Fixed QA hints and warnings
2020-06-26: Version 9.0.0
Enhancements
- Minimum supported Java version is JRE 8
- Improved support for header and footer sections
- Added Path based conversion methods
- Moved MemoryPictureAdapter and MemoryPictureAdapterBase64 to new ScroogeXHTML-Pictures artifact
- Moved optional DefaultFontStatistics class to ScroogeXHTML-Addons artifact
- Moved optional post-processing classes to ScroogeXHTML-Addons artifact
- Removed deprecated code and file based conversion methods
2020-03-14: Version 8.7.0
Enhancements
- Added conversion methods ScroogeXHTML.convert(Path) and ScroogeXHTML.convert(Path, Charset)
- Added conversion methods ScroogeXHTML.convert(Path, Path) and ScroogeXHTML.convert(Path, Path, Charset)
- Extracted ConversionConfiguration interface from ScroogeXHTMLBase
- Refactoring of Colortable and FontTable class
- Improved code test coverage
- DefaultFontStatistics and EmbeddedPicture classes implement Serializable
- Use {} instead of string concatenation for logging
- Fixed PMD warnings
2020-01-19: Version 8.6.0
Enhancements
- Do not throw an exception on unknown borderstyle in BorderStyleBuilder#borderSideToString(Border)
- Fixed PMD warnings
- Fixed CheckStyle hints
- Updated SLF4J dependency to 1.7.30
2019-08-20: Version 8.5.0
Enhancements
- Check for empty text in addStyledTexts() (this makes the post-processing class StripWhitespaceSpanNodes obsolete)
- Check for span attributes in prepareHyperlinkElement() (this makes the post-processing class StripAttributeLessSpanNodes obsolete)
- Remove references to obsolete post-processing classes in addDefaultListeners()
- Deprecate all obsolete post processor classes and the com.scroogexhtml.tidy package
- Deprecate addDefaultListeners() because all post-processing classes are no longer used
2019-07-10: Version 8.4.0
Enhancements
- Post process listener ReplaceEmptyParagraphNodes is no longer included in the addDefaultListeners() method. Its functionality is already covered more efficiently in the core conversion routines.
- Post process listener ReplaceMonospaceBlanks is no longer included in the addDefaultListeners() method. Its functionality is already covered more efficiently in the core conversion routines.
- Avoid throwing IOException to exit the main conversion loop
- Internal code improvements
Bug Fixes
- Setting the ConvertHyperlinks property to false has no effect
- Removed ReplaceMonospaceBlanks post process listener as it had side effects (e.g. missing hyperlinks), and is obsolete
2019-06-08: Version 8.3.0
Enhancements
- Pictures which are tagged with the \nonshppict token are not included in the conversion
- Unit test improvements and additions
2019-05-07: Version 8.2.1
Bug Fixes
- Example post-processing class MergeBorderDivNodes does not merge div nodes because it searches for ‘border-style’ instead of 'border'
2019-03-15: Version 8.2.0
Enhancements
- Adjusted image support code for WMF to PNG image conversion example
- Added WMF to PNG conversion example code using Apache Batik
- Changed CSS to use no leading blank after the colon
Bug Fixes
- Fixed CheckStyle warnings
- Fixed support for Java 11 in Base64Utils
- Fixed support for Java 11 in integration tests
- Upgraded all tests from JUnit 3 to JUnit 4
2019-03-01: Version 8.1.0
Enhancements
- Added support for paragraph border color conversion
- Added support for paragraph border width conversion
- Added support for cell border width conversion
- Avoid hard exit on missing cell table definitions
2019-01-12: Version 8.0.0
Enhancements
- Moved from package com.habarisoft.scroogexhtml to com.scroogexhtml
- Tested with Oracle JDK 8 and Oracle OpenJDK 11 on Windows and Linux
- New
FontReplacing
interface andFontReplacer
property - New
FontStatisticsCollecting
interface andFontStatistics
property - Improved table cell border conversion
- Improved paragraph border conversion
- MergeBorderDivNodes, a post-processor example class for paragraph border cleanup
ConvertIndent
default value is false now (as documented)- Methods String
convert(String rtf)
andString convert(ByteArrayInputStream rtf)
throw the unchecked RuntimeException instead of IOException - Fixed Spotbugs warnings and JavaDoc errors
Bug Fixes
- Fixed a color conversion bug
2018-08-10: Version 7.3.0
Enhancements
- Added support for multiple external style sheets (property StyleSheetLinks), the StyleSheetLink property is now deprecated
- Changed finishColortableEntry() to improve conversion speed
- Changed removeHtmlTags() to improve conversion speed
- Updated izpack installer to version 5.1.3
- Removed unused methods
- Fixed Findbugs/Spotbugs warnings
2018-03-24: Version 7.2.0
Enhancements
- Added support for vertical alignment in table cells
- Standalone XHTML documents begin with an XML declaration if the charset is not UTF-8
- Table conversion uses the
class="table table-bordered"
attribute (instead ofborder="1"
) to indicate that the table is bordered. This fixes the W3C HTML validator warning "The border attribute on the table element is presentational markup". Applications which still require theborder="1"
attribute may enable it with setOutputProperty(ConversionKeys.USE_TABLE_BORDER_ATTRIBUTE, "yes"); - Removed the enclosing
<!-- ... -->
around the CSS code within the<style>
element for standalone documents - Removed the attribute
type="text/css"
for the<style>
element for standalone HTML5 documents. This fixes the W3C HTML validator warning: "The type attribute for the style element is not needed and should be omitted". - Changed
BODY {...
to lowercasebody {...
in auto-generated CSS code - The
<style>
element includes comments before auto-generated and custom styles - Fixed Findbugs warnings for non-transient non-serializable instance fields in MemoryPictureAdapter and ListHeaderInfo class
- Fixed Findbugs warnings for reliance on default encoding in com.scroogexhtml.ScroogeXHTML.convert
- Fixed Findbugs warnings for casting and passing to ceil in com.scroogexhtml.converter.AbstractWriter.getFontSizeStyle
- Fixed Findbugs warnings for casting and passing to ceil in com.scroogexhtml.pictures.EmbeddedPicture.getHGoalPx and getWGoalPx
2018-02-12: Version 7.1.0
Enhancements
- Added support for five character encodings, including MacRoman
- Added support for non-breaking hyphen (RTF token \_)
- Improved conversion of 'Symbol' font
- As a side effect of the enhanced 'Symbol' font conversion, bullet list conversion now (correctly) emits • instead of ·
Bug Fixes
- Emit the HTML bullet \u2022 for RTF token "\bullet\" instead of middot
2017-10-28: Version 7.0.0
Bug Fixes
- Always hide all "hidden" text (even it ConvertFontStyle is false)
Enhancements
- Added option to disable paragraph border conversion - to disable, use setOutputProperty(ConversionKeys.CONVERT_PARAGRAPH_BORDERS, "no");
- Improved algorithm for ConvertEmptyParagraphs
- Improved Unicode support for Japanese text
- Improved initialization speed of DOM tree transformation
- Improved support for detection of outer table border
- Experimental support for a multilevel numbering writer - enabled with setOutputProperty(ConversionKeys.SUPPORT_MULTILEVEL, "yes");
- Experimental support for uppercase and lowercase roman number - requires setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
- Experimental support for \*\pn paragraph numbering - enabled with setOutputProperty(ConversionKeys.SUPPORT_STAR_PN, "yes");
Other
- ConvertFootnotes value default changed to false
- Experimental UseListTable property is deprecated now - use setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE, "yes");
- UseListTable property default changed to false
- Removed ProgressListener properties
- Removed detection of hyperlinks based on blue/underlined text format
- Removed MetaDateAuto property
- Removed default creation of post process listeners
- Added ScroogeXHTMLMain.addDefaultListeners() method for backward compatibility
Package | Description |
---|---|
com.scroogexhtml |
Provides the main converter class, ScroogeXHTML.
|
com.scroogexhtml.converter |
Provides the main converter classes.
|
com.scroogexhtml.css |
Provides classes for CSS.
|
com.scroogexhtml.events |
Provides classes for event methods.
|
com.scroogexhtml.fields |
Provides classes for RTF field expressions.
|
com.scroogexhtml.fonts |
Provides classes for font substitution, font statistics.
|
com.scroogexhtml.lists |
Provides classes for RTF list definition processing.
|
com.scroogexhtml.pictures |
Provides embedded picture support classes.
|
com.scroogexhtml.rtf |
Provides RTF standard datatypes and constants.
|
com.scroogexhtml.statistics |
Provides default implementations of statistics classes.
|
com.scroogexhtml.table |
Provides table conversion support classes.
|
com.scroogexhtml.xmldom |
Provides a 'document object model' based on javax.xml.
|
com.scroogexhtml.xmldom.numbering |
Provides RTF paragraph numbering support classes.
|
com.scroogexhtml.xmldom.sections |
Provides section / header / footer support classes.
|
com.scroogexhtml.xmldom.table |
Provides RTF table support classes.
|