[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzip compressed files and bypass unsupported files firstly then extract text. #5

Open
fishfree opened this issue Dec 13, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@fishfree
Copy link
raise exceptions.ExtensionNotSupported(ext)
textract.exceptions.ExtensionNotSupported: The filename extension .zip is not yet supported by
textract. Please suggest this filename extension here:

    https://github.com/deanmalmgren/textract/issues

Available extensions include: .csv, .doc, .docx, .eml, .epub, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .mp3, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .rtf, .tab, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx

@fishfree fishfree changed the title Unzip compressed files firstly then extract text. Unzip compressed files and bypass unsupported files firstly then extract text. Dec 13, 2023
@jaluoma
Copy link
Owner
jaluoma commented Dec 13, 2023

Good idea! Definitely worth doing at some point (I'll leave the issue open), but I'd also be happy to accept a PR.

@jaluoma jaluoma added the enhancement New feature or request label Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants