Package org.apache.tika.eval.app.tools
package org.apache.tika.eval.app.tools
-
ClassDescriptionUtility class that runs TopCommonTokenCounter against a directory of table files (named {lang}_table.gz or leipzip-like afr_...COPIED VERBATIM FROM LUCENE This class forces a composite reader (eg a
MultiReader
orDirectoryReader
) to emulate aLeafReader
.Utility class that reads in a UTF-8 input file with one document per row and outputs the 20000 tokens with the highest document frequencies.