Apache Tika 1.26
The most notable changes in Tika 1.26 over the previous release are:
- Fix thread safety bug in OpenOffice parser (TIKA-3334).
- The "writeLimit" header now pertains to the combined characters written per container document (and embedded documents) in the /rmeta endpoint in tika-server (TIKA-3325); it no longer functions only per container or embedded document.
- Extract more embedded files in PDFs by recursively processing the embedded file tree (TIKA-3332).
- Allow for case insensitive headers for configuration of the PDFParser and the TesseractOCRParser in tika-server via Subhajit Das (TIKA-3320).
- Improve detection and parsing of XPS files (TIKA-3316).
- General dependency upgrades (TIKA-3244).
- Great optimization in ForkParser (TIKA-3237).
- Fix parsing of emails attached to other emails in PST files (TIKA-3004).
- MP3 parser should output the xmpDM:duration metadata as seconds notmilliseconds, consistent with the other Audio and Video parsers (TIKA-3318).
The following people have contributed to Tika 1.26 by submitting or commenting on the issues resolved in this release:
- Andrew Pavlin
- Bertrand Caron
- Julien Massiera
- Nick Burch
- Nick Harmer
- Peter Kronenberg
- Ross Johnson
- Subhajit Das
- Tilman Hausherr
- Tim Allison
See https://s.apache.org/yjp3v for more details on these contributions.