-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upstream changes and clean up where possible #42
Merged
Merged
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
8bc4b89
Comment out sections of tables that weren't used to save memory.
dan-blanchard 20cad49
Add 3.4 to list for Travis testing and remove 3.2
dan-blanchard 44752d7
Bunch of little clean up things
dan-blanchard 7251430
Merge branch 'master' into feature/upstream-changes-and-overhaul
dan-blanchard 0ecaf05
Add if __name__... to test.py and a break to speed things up in loop.
dan-blanchard 9b8b12c
Modernize testings
dan-blanchard 3e13cc7
Fix missing req_path in setup.py
dan-blanchard 59f30b7
Simplify Travis setup and just use pip. conda was overkill for our s…
dan-blanchard a5c7484
Make tests slightly more efficient.
dan-blanchard 07a5849
Merge branch 'master' into feature/upstream-changes-and-overhaul
dan-blanchard 267c5d8
Switch to new Travis docker VMs and add PyPy testing.
dan-blanchard c665459
Add C-equivalent implementation of filter_english_letters.
dan-blanchard 7cfa45c
Fix some pylint warnings in universaldetector.py
dan-blanchard 125575f
Made latin1 equivalent to windows-1252 when running unit tests.
dan-blanchard d9c42c7
A bunch of little clean up changes.
dan-blanchard 04398ff
Comment out pypy line in .travis.yml. It's 10x slower, which is ridi…
dan-blanchard 475ffa6
Re-enable PyPy on Travis, but disable coverage for it
dan-blanchard 2eae0d6
Fix syntax error in .travis.yml
dan-blanchard b45c331
Fix coverage logic reversal in .travis.yml
dan-blanchard b382f22
Fix TypeError on PyPy in utf8prober.py
dan-blanchard be09612
Switch to using enums instead of constants, and a bunch of cleanup st…
dan-blanchard 431bd39
Get rid of set literal to appease Python 2.6
dan-blanchard 3fb82c9
Some minor PEP8 name changes
dan-blanchard 6058456
Merge branch 'master' into feature/upstream-changes-and-overhaul
dan-blanchard bd9951f
Loads of PEP8 naming convention fixes.
dan-blanchard 4317be7
Fix some NOTES.rst formatting issues
dan-blanchard 01e82e3
Update MANIFEST.in to include test files and docs
dan-blanchard b8f8b24
Remove PyCharm stuff from .gitignore
dan-blanchard 50f701c
Remove flake8: noqa lines.
dan-blanchard e42c4d1
Add missing __version__ import to __init__.py
dan-blanchard 0913a91
Remove unnecessary import sys import from conf.py
dan-blanchard c7f01c1
Switch to using pip for installation in .travis.yml
dan-blanchard 1e0f1a5
Rename SMState to MachineState
dan-blanchard 4a8084d
Get rid of messy ternary operator in charsetprober.py
dan-blanchard 5449248
Fix __version typo in __init__.py
dan-blanchard 369875d
Add comment about why we're slicing in filter_with_english_letters
dan-blanchard 8e3fc03
Made more attributes public.
dan-blanchard da6c0a0
Temporarily disable Hungarian probers, and update missing encodings list
dan-blanchard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Modernize testings
- Switch to using nose - Calculate coverage with coveralls - Use conda for setting up Python since Travis is flakey about that
- Loading branch information
commit 9b8b12c61f24100509ea0d2d06776a3eb55d7c8a
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
[run] | ||
source = chardet | ||
omit = | ||
*/python?.?/* | ||
*/lib-python/?.?/*.py | ||
*/lib_pypy/_*.py | ||
*/site-packages/ordereddict.py | ||
*/site-packages/nose/* | ||
*/unittest2/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
nose |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,56 +1,47 @@ | ||
""" | ||
Run chardet on a bunch of documents and see that we get the correct encodings. | ||
|
||
:author: Dan Blanchard | ||
:author: Ian Cordasco | ||
""" | ||
|
||
from __future__ import with_statement | ||
|
||
import os | ||
import sys | ||
import unittest | ||
from os import listdir | ||
from os.path import dirname, isdir, join, realpath, relpath, splitext | ||
|
||
from nose.tools import eq_ | ||
|
||
import chardet | ||
|
||
|
||
class TestCase(unittest.TestCase): | ||
def __init__(self, file_name, encoding): | ||
unittest.TestCase.__init__(self) | ||
self.file_name = file_name | ||
encoding = encoding.lower() | ||
for postfix in ['-arabic', | ||
'-bulgarian', | ||
'-cyrillic', | ||
'-greek', | ||
'-hebrew', | ||
'-hungarian', | ||
'-turkish']: | ||
if encoding.endswith(postfix): | ||
encoding, _, _ = encoding.rpartition(postfix) | ||
break | ||
self.encoding = encoding | ||
|
||
def runTest(self): | ||
with open(self.file_name, 'rb') as f: | ||
result = chardet.detect(f.read()) | ||
self.assertEqual(result['encoding'].lower(), self.encoding, | ||
"Expected %s, but got %s in %s" % | ||
(self.encoding, result['encoding'], | ||
self.file_name)) | ||
|
||
|
||
def main(): | ||
suite = unittest.TestSuite() | ||
if len(sys.argv) > 1: | ||
base_path = sys.argv[1] | ||
else: | ||
base_path = os.path.join( | ||
os.path.dirname(os.path.abspath(__file__)), 'tests') | ||
for encoding in os.listdir(base_path): | ||
path = os.path.join(base_path, encoding) | ||
if not os.path.isdir(path): | ||
def check_file_encoding(file_name, encoding): | ||
""" Ensure that we detect the encoding for file_name correctly. """ | ||
encoding = encoding.lower() | ||
for postfix in ['-arabic', '-bulgarian', '-cyrillic', '-greek', '-hebrew', | ||
'-hungarian', '-turkish']: | ||
if encoding.endswith(postfix): | ||
encoding = encoding.rpartition(postfix)[0] | ||
break | ||
|
||
with open(file_name, 'rb') as f: | ||
result = chardet.detect(f.read()) | ||
eq_(result['encoding'].lower(), encoding, ("Expected %s, but got %s for " | ||
"%s" % (encoding, | ||
result['encoding'], | ||
file_name))) | ||
|
||
|
||
def test_encoding_detection(): | ||
base_path = relpath(join(dirname(realpath(__file__)), 'tests')) | ||
for encoding in listdir(base_path): | ||
path = join(base_path, encoding) | ||
if not isdir(path): | ||
continue | ||
for file_name in os.listdir(path): | ||
_, ext = os.path.splitext(file_name) | ||
for file_name in listdir(path): | ||
ext = splitext(file_name)[1].lower() | ||
if ext not in ['.html', '.txt', '.xml', '.srt']: | ||
continue | ||
suite.addTest(TestCase(os.path.join(path, file_name), encoding)) | ||
unittest.TextTestRunner().run(suite) | ||
|
||
|
||
if __name__ == '__main__': | ||
main() | ||
yield check_file_encoding, join(path, file_name), encoding |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
pip install .
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to prefer that over
python setup.py install
if we're using setuptools?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that not how almost every one of our users will be installing chardet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... good point. 👍