Package org.apache.tika.extractor
@Version("1.0.0")
package org.apache.tika.extractor
Extraction of component documents.
-
ClassDescriptionFor now, this is an in-memory EmbeddedDocumentBytesHandler that stores all the bytes in memory.Tika container extractor interface.Loads EmbeddedStreamTranslators via service loading.Interface for different document selection strategies for purposes like embedded document extraction by a
ContainerExtractor
instance.This factory creates EmbeddedDocumentExtractors that require anEmbeddedDocumentBytesHandler
in theParseContext
should extend this.Utility class to handle common issues with embedded documents.Tika container extractor callback interface.Interface for different filtering of embedded streams.Simple pointer class to allow parsers to pass on the parent contenthandler through to the embedded document's parseAn implementation ofContainerExtractor
powered by the regularParser
API.Helper class for parsers of package archives or other compound document formats that support embedded or attached component documents.Recursive Unpacker and text and metadata extractor.