Release Notes

11.0.EA25 July 2024
  System requirements
New Increase minimum required Java Runtime Environment version from Java SE 8 to Java SE 11.
  Character property conversion
New The property FontSizeUnit moved. It can now be set using getCharPropConvConfig().setFontSizeUnit().
New The signature of FontReplacing#replace changed. It now longer takes a ScroogeXHTMLBase instance as first parameter. The default implementation has been updated accordingly.
  Paragraph property conversion
New The default value for property ConvertEmptyParagraphs changed. It is now enabled by default.
  List conversion
New Improve support for RTF listtable and listoverridetable processing. It is now enabled by default.
New Remove property SupportMultiLevel, as multi-level lists are now handled when SupportListTable is enabled.
New Removed property ConvertFields, as it is currently unused. Field result text will always be emitted in the conversion.
  Font size units
New Rename class FontSizeUnit to LengthUnit.
New Change internal font size handling from points to half-points for higher precision.
New Remove support for 'ex' font size unit, and improve support for 'em' and 'px' unit.
New The default font size unit is now 'em'
New The allowed font size units are now 'em', 'point' and 'px'
  Complimentary picture adapter
New The default maximum size for data URI in the MemoryPictureAdapterDataURI class has been increased from 32 to 256 kilobytes.
New The PictureFormat PICT is no longer enabled by default in MemoryPictureAdapterDataURI
  Other changes
New The value of the DefaultLanguage property will be used for the lang attribute on the HTML5 root element (even if it is empty).
New Enforce that HTML5 uses UTF-8 by throwing an exception when a different encoding is set.
New ResultBuilderException will no longer be thrown. The class has been removed.
  Known issues with experimental features
Unresolved Complimentary MemoryPictureAdapterDataURI fails to convert from PICT to PNG.
Unresolved Nested table conversion fails for RTF document created by LibreOffice 7.x
10.7.0released 20 May 2024
New Add property InvalidRtfHeaderAction, which is the action to be taken when an invalid RTF header is encountered
New Refactor keyword and number parsing in class ScroogeXHTMLMain
New Refactor table conversion conditions in ScroogeXHTML
New Refactor class ListHeaderInfo
New Bump slf4j from 2.0.12 to 2.0.13
10.6.0released 22 March 2024
Fixed Fix right-to-left support
Fixed Bump slf4j from 2.0.9 to 2.0.12
New Improve experimental support for nested tables
New Improve numberFormat instance creation in CssBuilder (conversion speed increase)
New SetConvertTables in ScroogeXHTMLBase no longer sets the experimental ConvertTableBorder property to true
New Refactor 'Writer' interface, add specialized interfaces
New Replace StringBuilder with char array in ScroogeXHTMLMain
New Update MultilevelNumberingSupport to use HtmlElement enum
New Rename experimental property ConvertTableBorders to ConvertCellStyles
New Improve formatting of expected test results for improved readability
New Add JUnit test cases
10.5.0released 04 February 2024
New Reach 98% code line coverage for the core library tests
New Add pixel unit support to font size in CssBuilder
New Optimize border style handling in ScroogeXHTML
New Replace logging with assertions in TableCellMerger class
New Replace error handling with assertion in Table and TableRowHolder class
New Refactor code in MultilevelNumberingSupport and add tests
New Refactor ListLevel and NumberType classes
New Refactor setProperty function in
New Refactor RTFReader and add unit tests
New Convert the Elements class to an enum and rename it to HtmlElement
New Convert the Attributes class to an enum and rename it to Attribute
New Update the handling of NoSuchElementException in to throw a ConversionException instead.
New Simplify toBrowserHexValue method
New Add UnicodeConverter tests and expose methods for charset handling
New Add tests for null octet and form feed octet in the RTF code
New Rename test classes for experimental features
New Remove unused DG_FIELD case in
New Comment out trace logging in searchToken method
10.4.0released 23 December 2023
New Add experimental support for nested tables
New Add five languages to installer
New Add test testConvertParagraphMarginsSpaceBefore and testConvertParagraphMarginsSpaceAfter
New Add test testHtmlHeadMetaHeaders()
New Add test for a minimal table
New Add itap to paragraph properties
New Add example code for document lang attribute
New Add example code for meta date
New Add logging and toString() method to Table class
New Use ATTR_STYLE instead hard coded "style"
New Use a Deque for Table Rows property
New Use a Deque for StyledTexts
New Use a stack structure for tables
New Refactor away intable field
Fixed Fix "nonesttables" to set DG_IGNORE for nested tables
Fixed Bump slf4j from 2.0.7 to 2.0.9
Fixed Bump com.twelvemonkeys.imageio:imageio-pict from 3.9.4 to 3.10.0
10.3.0released 18 June 2023
New Cleanup and refactor Table class
New Remove unnecessary try ... catch in class ScroogeXHTMLMain
New Refactor constant PX in class CssBuilder into FontSizeUnit enum
New Simplify anchor conversion in XMLDOMWriter#addAnchor: always create a div node
New Remove unnecessary unit px from DEFAULT_CSS in class DefaultConfiguration
New Remove unused method Node getColGroup() in interface TableWriter
New Add itap property to TableWriter
New Add experimental support for font size based on the relative unit 'vw'.
New Add tests for 'empty' input string and for minimal RTF string
New Refactor away flag dummyRowNeeded
New Refactor flag inTableRow
New Refactor flag inTableRowDefinition
New Refactor flag RowPropertiesAlreadySeenAtBeginOfRow (was RowPropertiesComplete)
New Refactor method finishColGroup()
New Refactor method finishTableNode()
New Add logging and defensive code in class TableRowHolder
New Rename isInRow to isInTableRowDefinition
New Rename tableRowStarts to tableRowDefinitionStarts
New Remove unused field inNestedTableProperties
New Extract TableSupportBase class
New Treat \nestcell and \nestrow tokens as \cell and \row
New Remove unused method getCurrentPageWidth
Fixed Suppress warning UnnecessaryUnicodeEscape
Fixed Add workaround for missing colgroup
Fixed Add workaround for missing cellXs
Fixed Fixed PMD warnings
10.2.0released 06 May 2023
New New property TablePropConfiguration#MaxTableWidthPercent
New Target Java SE 8 and newer
New Bump slf4j from 2.0.6 to 2.0.7
10.1.0released 04 March 2023
New Migrate from ConversionFlags to nested properties for experimental features
New Add experimental property TablePropConvConfig#ConvertWidthToPercent
New Add experimental support for macpict
New Add classes for HTML attributes and elements
New Delete unused class MemoryPictureAdapter
New Bump slf4j from 2.0.3 to 2.0.6
Fixed Fix hyperlink conversion for extra spaces before HYPERLINK
Fixed Fix PMD warnings
Fixed Fix PMD processing error in class UnicodeConverterTest
10.0.0released 22 October 2022
New Move all properties related to conversion of characters, paragraphs and HTML head section to nested properties
New Introduce an enum based configuration for experimental conversion options
New Introduce the ConvertParagraphBorders property
New Target Java SE 11 and newer
New Use <meta charset='...'> instead of <meta http-equiv='Content-Type' content='text/html; charset=...'>
New Bump slf4j from 1.7.36 to 2.0.3
Fixed Remove indirect write access from interface ConversionConfiguration
Fixed Remove deprecated statistics support
Fixed Remove deprecated configuration methods
Fixed Remove deprecated interface FontStatisticsCollecting
Fixed Remove deprecated class MemoryPictureAdapterBase64
Fixed Remove method nameAndVersion
9.6.0released 06 September 2022
New Improved conversion of hyperlinks in nested fields
9.5.0released 25 June 2022
Fixed unit test failure on newer JDK versions, which now write BR elements as <br /> instead of <br/>
New Properties related to character attribute conversion, paragraph attribute conversion and HTML head section options have been deprecated in favour of the new HtmlHeadConfiguration, CharPropsConfig and ParaPropConfig properties, which will be the only API to set these properties in the next major version
New New method HtmlHeadConfiguration#addStyleSheetLink(String)
New The method HtmlHeadConfiguration#getStyleSheetLinks() returns an unmodifiable collection
New SLF4J dependency updated to 1.7.36
New Tested with Eclipse Temurin JDK 8, 11, 17 and 18
New Tested with more JDK versions (Amazon Coretto, Eclipse Temurin, IBM Semeru)
New Internal code maintenance
9.4.0released 12 February 2022
Fixed The paragraph border width/color of the paragraph before a table is copied to paragraph border after the table
Fixed EmbeddedPicture is not serializable (required for Servlet)
Fixed XML parsing is vulnerable to XXE
Fixed Javadoc error message with JDK 17
New Break out of the main conversion Loop using a label instead of throwing an exception
New Deprecated the complimentary class MemoryPictureAdapterBase64. The new class MemoryPictureAdapterDataURI may be used instead.
New Added translation of the special Unicode character uf0b7 (\u61623, Unicode Private Use Area) which is used in the RTF specification to the bullet character
New Changed public interface NumberingLevel to a class
9.3.0released 29 May 2021
Fixed Fallback to the original picture dimensions if the RTF does not specify the desired picture dimensions
New Added complimentary code for WMF picture conversion using an external converter (not included) to the complimentary class MemoryPictureAdapterDataURI.
New Added example code for plain text extraction from a XHTML document (using XSLT).
New Added property ImgAltAttribute to class, which contains the value of the IMG tag attribute 'alt'. Its default value is 'picture' for backwards compatibility.
New Moved class com.scroogexhtml.fonts.FontDef to the Example artifact, as it is used only there.
New Deprecated the complimentary class MemoryPictureAdapterBase64. The new class MemoryPictureAdapterDataURI may be used instead.
New Updated the documentation for an experimental feature which is enabled implicitly when the ConvertTables property is set to true.
New Updated tutorials and documentation
New Added JUnit tests
New Fixed QA hints and warnings
New Minor code improvements (see JavaDoc)
9.2.0released 22 April 2021
Fixed Reset character attributes on listtext token (workaround for WPTools bug)
New Added boolean property ConvertParagraphMargins (default true)
New Support picture data extraction for binary data
New Support dibitmap token (device independent bitmap) for image data extraction
New WMF mime type is image/x-wmf, EMF is image/x-emf
New GetPostProcessListeners returns a UnmodifiableCollection
New Deprecated FontStatisticsCollecting interface
New Use xmlunit in TableSupportTests
New Fixed QA hints and warnings
Fixed The lang attribute on html tag must not be created if ConvertLanguage is false
Fixed The <!-- ... --> comments in CSS cause errors, replaced by /* ... */
Fixed The code in createNumberingWriter() is executed too often
Fixed Ignore tbldef token at the end of the row
Fixed RTF with bin keyword causes conversion errors
New Added prefix EXPERIMENTAL_ for experimental conversion options
New Use {} for log message parameters instead of String.format
New Lowered the fonttable entry log level to TRACE
New Fixed QA hints and warnings
9.0.0released 26 June 2020
New Improved support for header and footer sections
New Added Path based conversion methods
New Moved MemoryPictureAdapter and MemoryPictureAdapterBase64 to new ScroogeXHTML-Pictures artifact
New Moved optional DefaultFontStatistics class to ScroogeXHTML-Addons artifact
New Moved optional post processing classes to ScroogeXHTML-Addons artifact
New Removed deprecated code and file based conversion methods
New Added new conversion methods String convert(Path) and String convert(Path, Charset)
New Added new conversion methods void convert(Path, Path) and void convert(Path, Path, Charset)
New Many internal refactorings and fixed PMD warnings
Fixed Do not throw an exception on unknown borderstyle in BorderStyleBuilder#borderSideToString(Border)
Fixed Fixed PMD warnings
Fixed Fixed CheckStyle hints
New Updated SLF4J dependency to 1.7.30
Fixed Binary Jar contains a malfunctioning pom.xml (it refers to a parent project)
Fixed Example source required unit tests for WMF images which is not included
New Check for empty text in addStyledTexts() (this makes the post processing class StripWhitespaceSpanNodes obsolete)
New Check for span attributes in prepareHyperlinkElement() (this makes the post processing class StripAttributeLessSpanNodes obsolete)
New Remove references to obsolete post processing classes in addDefaultListeners()
New Deprecate all obsolete post processor classes and the com.scroogexhtml.tidy package
New Deprecate addDefaultListeners() because all post processing classes are no longer used
Fixed Setting the ConvertHyperlinks property to false has no effect
Fixed Removed ReplaceMonospaceBlanks post process listener as it had side effects (e.g. missing hyperlinks), and is obsolete
New Post process listener ReplaceEmptyParagraphNodes is no longer included in the addDefaultListeners() method. Its functionality is already covered more efficiently in the core conversion routines.
New Post process listener ReplaceMonospaceBlanks is no longer included in the addDefaultListeners() method. Its functionality is already covered more efficiently in the core conversion routines.
New Avoid throwing IOException to exit the main conversion loop
New Internal code improvements
Fixed Pictures which are tagged with the \nonshppict token are not included in the conversion
Fixed Removed needless test for binary data flag FN_BIN
New Include code coverage tests
New Include tests for ReplaceEmptyParagraphNodes, ListHeaderInfo, DefaultFontStatistics
Fixed Example post processing class MergeBorderDivNodes does not merge div nodes because it searches for 'border-style'
New Adjusted image support code for WMF to PNG image conversion example
New Added WMF to PNG conversion example code using Apache Batik
Fixed Fixed Checkstyle warnings
Fixed Fixed support for Java 11 in Base64Utils
Fixed Fixed support for Java 11 in integration tests
Fixed Upgraded all tests from JUnit 3 to JUnit 4
New Paragraph border color conversion
New Paragraph border width conversion
New Table border width conversion
Fixed [2491] Avoid hard exit on missing cell definitions
8.0.0released 12 January 2019
New Moved package to com.scroogexhtml
New Tested with Oracle JDK 8 and Oracle OpenJDK 11 on Windows and Linux, requires Java 8 or newer
New New FontStatistics property / FontStatisticsCollecting interface
New New FontReplacer property / FontReplacing interface
New Improved table cell border conversion
New Improved paragraph border conversion
New Removed deprecated properties and methods getISO8601DateTime, getStyleSheetLink, setCompatibleDefaults, useListTable, metaDate
Fixed [2435] ConvertIndent default value is false
Fixed [2389] Fixed a color conversion bug
New Added support for multiple external style sheets (property StyleSheetLinks), the StyleSheetLink property is now deprecated
New Changed finishColortableEntry() to improve conversion speed
New Changed removeHtmlTags() to improve conversion speed
New Updated izpack installer to version 5.1.3
New Removed unused methods
Fixed Fixed Findbugs/Spotbugs warnings
New Added support for vertical alignment in table cells
New Standalone XHTML documents begin with a XML declaration if the charset is not UTF-8
New Table conversion uses the class="table table-bordered" attribute (instead of border="1") to indicate that the table is bordered. This fixes the W3C HTML validator warning "The border attribute on the table element is presentational markup". Applications which still require the border="1" attribute may enable it with setOutputProperty(ConversionKeys.USE_TABLE_BORDER_ATTRIBUTE, "yes");
New Removed the enclosing <!-- ... --> around the CSS code within the <style> element for standalone documents
New Removed the attribute type="text/css" for the <style> element for standalone HTML5 documents. This fixes the W3C HTML validator warning: "The type attribute for the style element is not needed and should be omitted".
New Changed BODY {... to lowercase body {... in auto-generated CSS code
New The <style> element includes comments before auto-generated and custom styles
Fixed Fixed Findbugs warnings for non-transient non-serializable instance fields in MemoryPictureAdapter and ListHeaderInfo class
Fixed Fixed Findbugs warnings for reliance on default encoding in com.habarisoft.scroogexhtml.ScroogeXHTML.convert
Fixed Fixed Findbugs warnings for casting and passing to ceil in com.habarisoft.scroogexhtml.converter.AbstractWriter.getFontSizeStyle
Fixed Fixed Findbugs warnings for casting and passing to ceil in and getWGoalPx
Fixed Fixed Findbugs warnings with medium severity
New Added support for five character encodings, including MacRoman
New Added support for non-breaking hyphen (RTF token \_)
New Improved conversion of 'Symbol' font
New As a side effect of enhanced 'Symbol' font conversion, bullet list conversion now (correctly) emits &bullet; instead of &middot;
Fixed Emit the HTML bullet character \u2022 or &bull; for RTF token '\bullet' instead of &middot
7.0.0released 28 October 2017
New Added option to disable paragraph border conversion
New Improved algorithm for ConvertEmptyParagraphs
New Improved Unicode support for Japanese text
New Improved initialization speed of DOM tree transformation
New Improved detection of for outer table border
New Experimental support for a multilevel numbering writer
New Experimental support for uppercase and lowercase roman numbers
New Experimental support for \*\pn paragraph numbering
New ConvertFootnotes default value changed to false
New Experimental UseListTable property is deprecated, use setOutputProperty(ConversionKeys.SUPPORT_LIST_TABLE}, 'yes');
New UseListTable property default changed to false
New Removed ProgressListener properties
New Removed detection of hyperlinks based on blue/underlined text format
New Removed MetaDateAuto property
New Removed default creation of post process listeners
New Added ScroogeXHTMLMain.addDefaultListeners() method for backward compatibility
Fixed Always hide all hidden text (even if ConvertFontStyle is false)