Apache Tika

Apache Tika 1.24

The most notable changes in Tika 1.24 over the previous release are:

Upgrade Drew Noakes' metadata-extractor (TIKA-2952).
Enable optional extraction of structural tags in PDFs (alpha-grade) (TIKA-3026).
Tika app's --extract mode now outputs to STDOUT (TIKA-3035).
Add an optional Preflight parser for PDFs (TIKA-3055).
Improve detection of some zip-based formats (TIKA-3057).
Upgrade metadata-extractor to 2.13.0 (TIKA-2952).
Upgrade POI to 4.1.2 (TIKA-3047).
Extract XMP from PSD files (TIKA-3050).
Added XMLProfiler as an optional parser to profile XFA and XMPin PDFs (TIKA-3045).
Extract inline images that rely on the DCT filter from PDFs (TIKA-3041).
Upgrade PDFBox to 2.0.19 (TIKA-3033).
Fix bug in ASM parser configuration (TIKA-2992).
Upgrade java-libpst to 0.9.3 (TIKA-2546).

The following people have contributed to Tika 1.24 by submitting or commenting on the issues resolved in this release:

Aman Mishra * Arvind Jain * Carina Antunes * Clark Perkins * David Eric Pugh * David Pilato * Don * Jan Vlug * Jorge Spinsanti * Luís Filipe Nassif * Markus Mandalka * Michael Moritz * MRIT64 * Nick Burch * Richard Jones * Soren Daugaard * Steve * Syed Osama Anwer * Tilman Hausherr * Tim Allison * Zoltan Farago

See https://s.apache.org/xa01p for more details on these contributions.