Apache Tika 2.7.0
The most notable changes in Tika 2.7.0 over the previous release are:
- Add SVG detection for svg files that lack the xml header (TIKA-3308).
- Migrate to a live fork of Universal Charset Detector (TIKA-3213).
- Improve handling of text-based attachments inside .eml files (TIKA-3959).
- Add tika-parser-nlp-package to release artifacts (TIKA-3958).
- Remove need for params/ element in classes that extend ConfigBase (TIKA-3946).
- Add X-TIKA:embedded_id_path to ensure unique embedded file paths (TIKA-3942).
- Fix bug that prevented digests when the fallback/EmptyParserwas called (TIKA-3939).
- Remove log4j 1.2.x (and slf4j-log4j12 which now redirects to slf4j-reload4j) fromall modules (TIKA-3935).
- Upgrade mime4j to 0.8.9 (TIKA-3950).
- Refactor date parsing for emails (TIKA-3957)
- Upgrade to Bouncy Castle 1.71 and jdk18on jars (TIKA-3933).
- Add a JDBCPipesReporter (TIKA-3931).
- Add multivalued field strategy option in jdbc-emitter (TIKA-3930).Default is now 'concatenate' with ', ' as the delimiter.
The following people have contributed to Tika 2.7.0 by submitting or commenting on the issues resolved in this release:
- Anant Dahiya
- Anas Hammani
- Gregory Lepore
- Joseph Goh
- Julien Massiera
- Konstantin Gribov
- Tilman Hausherr
- Tim Allison
- Valery Yatsynovich
- Yury Kats
See https://s.apache.org/ys2y0 for more details on these contributions.