[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upstream changes and clean up where possible #42

Merged
merged 38 commits into from
Jan 9, 2015
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
8bc4b89
Comment out sections of tables that weren't used to save memory.
dan-blanchard Oct 11, 2014
20cad49
Add 3.4 to list for Travis testing and remove 3.2
dan-blanchard Oct 11, 2014
44752d7
Bunch of little clean up things
dan-blanchard Dec 1, 2014
7251430
Merge branch 'master' into feature/upstream-changes-and-overhaul
dan-blanchard Dec 2, 2014
0ecaf05
Add if __name__... to test.py and a break to speed things up in loop.
dan-blanchard Dec 2, 2014
9b8b12c
Modernize testings
dan-blanchard Dec 2, 2014
3e13cc7
Fix missing req_path in setup.py
dan-blanchard Dec 2, 2014
59f30b7
Simplify Travis setup and just use pip. conda was overkill for our s…
dan-blanchard Dec 2, 2014
a5c7484
Make tests slightly more efficient.
dan-blanchard Dec 2, 2014
07a5849
Merge branch 'master' into feature/upstream-changes-and-overhaul
dan-blanchard Dec 21, 2014
267c5d8
Switch to new Travis docker VMs and add PyPy testing.
dan-blanchard Dec 29, 2014
c665459
Add C-equivalent implementation of filter_english_letters.
dan-blanchard Dec 30, 2014
7cfa45c
Fix some pylint warnings in universaldetector.py
dan-blanchard Dec 30, 2014
125575f
Made latin1 equivalent to windows-1252 when running unit tests.
dan-blanchard Dec 30, 2014
d9c42c7
A bunch of little clean up changes.
dan-blanchard Dec 30, 2014
04398ff
Comment out pypy line in .travis.yml. It's 10x slower, which is ridi…
dan-blanchard Dec 30, 2014
475ffa6
Re-enable PyPy on Travis, but disable coverage for it
dan-blanchard Dec 30, 2014
2eae0d6
Fix syntax error in .travis.yml
dan-blanchard Dec 30, 2014
b45c331
Fix coverage logic reversal in .travis.yml
dan-blanchard Dec 30, 2014
b382f22
Fix TypeError on PyPy in utf8prober.py
dan-blanchard Dec 30, 2014
be09612
Switch to using enums instead of constants, and a bunch of cleanup st…
dan-blanchard Jan 2, 2015
431bd39
Get rid of set literal to appease Python 2.6
dan-blanchard Jan 2, 2015
3fb82c9
Some minor PEP8 name changes
dan-blanchard Jan 5, 2015
6058456
Merge branch 'master' into feature/upstream-changes-and-overhaul
dan-blanchard Jan 5, 2015
bd9951f
Loads of PEP8 naming convention fixes.
dan-blanchard Jan 5, 2015
4317be7
Fix some NOTES.rst formatting issues
dan-blanchard Jan 6, 2015
01e82e3
Update MANIFEST.in to include test files and docs
dan-blanchard Jan 6, 2015
b8f8b24
Remove PyCharm stuff from .gitignore
dan-blanchard Jan 6, 2015
50f701c
Remove flake8: noqa lines.
dan-blanchard Jan 6, 2015
e42c4d1
Add missing __version__ import to __init__.py
dan-blanchard Jan 6, 2015
0913a91
Remove unnecessary import sys import from conf.py
dan-blanchard Jan 6, 2015
c7f01c1
Switch to using pip for installation in .travis.yml
dan-blanchard Jan 6, 2015
1e0f1a5
Rename SMState to MachineState
dan-blanchard Jan 6, 2015
4a8084d
Get rid of messy ternary operator in charsetprober.py
dan-blanchard Jan 6, 2015
5449248
Fix __version typo in __init__.py
dan-blanchard Jan 6, 2015
369875d
Add comment about why we're slicing in filter_with_english_letters
dan-blanchard Jan 6, 2015
8e3fc03
Made more attributes public.
dan-blanchard Jan 6, 2015
da6c0a0
Temporarily disable Hungarian probers, and update missing encodings list
dan-blanchard Jan 7, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Modernize testings
- Switch to using nose
- Calculate coverage with coveralls
- Use conda for setting up Python since Travis is flakey about that
  • Loading branch information
dan-blanchard committed Dec 2, 2014
commit 9b8b12c61f24100509ea0d2d06776a3eb55d7c8a
9 changes: 9 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[run]
source = chardet
omit =
*/python?.?/*
*/lib-python/?.?/*.py
*/lib_pypy/_*.py
*/site-packages/ordereddict.py
*/site-packages/nose/*
*/unittest2/*
32 changes: 29 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,34 @@ python:
- 3.3
- 3.4

script: python test.py
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
- ./miniconda.sh -b
- export PATH=/home/travis/miniconda/bin:$PATH
- conda config --add channels https://conda.binstar.org/dan_blanchard
- conda update --yes conda

install:
# Setup desired Python in conda environments with python-coveralls
- conda install --yes pip python=$TRAVIS_PYTHON_VERSION python-coveralls
# Have to use pip for nose-cov because its entry points are not supported by conda yet
- pip install nose-cov
# Multiprocessing fix for Travis
- sudo rm -rf /dev/shm
- sudo ln -s /run/shm /dev/shm
# Actuall install chardet
- python setup.py install
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about pip install .?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to prefer that over python setup.py install if we're using setuptools?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that not how almost every one of our users will be installing chardet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... good point. 👍


# Run test
script:
- nosetests -v --with-cov --cov skll --cov-config .coveragerc --logging-level=DEBUG

# Calculate coverage
after_success:
- coveralls --config_file .coveragerc

notifications:
on_success: change
on_failure: always
email:
on_success: change
on_failure: always
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nose
17 changes: 12 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,25 @@ def readme():
return f.read()


def requirements():
with open(req_path) as f:
reqs = f.read().splitlines()
return reqs

setup(name='chardet',
version=__version__,
description='Universal encoding detector for Python 2 and 3',
long_description=readme(),
author='Mark Pilgrim',
author_email='mark@diveintomark.org',
maintainer='Ian Cordasco',
maintainer_email='graffatcolmingov@gmail.com',
maintainer='Daniel Blanchard',
maintainer_email='dblanchard@ets.org',
url='https://github.com/chardet/chardet',
license="LGPL",
keywords=['encoding', 'i18n', 'xml'],
classifiers=["Development Status :: 4 - Beta",
"Intended Audience :: Developers",
("License :: OSI Approved :: GNU Library or Lesser General" +
("License :: OSI Approved :: GNU Library or Lesser General"
" Public License (LGPL)"),
"Operating System :: OS Independent",
"Programming Language :: Python",
Expand All @@ -32,8 +37,10 @@ def readme():
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.2',
'Programming Language :: Python :: 3.3',
("Topic :: Software Development :: Libraries :: Python " +
("Topic :: Software Development :: Libraries :: Python "
"Modules"),
"Topic :: Text Processing :: Linguistic"],
packages=['chardet'],
entry_points={'console_scripts': ['chardetect = chardet.chardetect:main']})
install_requires=requirements(),
entry_points={'console_scripts':
['chardetect = chardet.chardetect:main']})
81 changes: 36 additions & 45 deletions test.py
Original file line number Diff line number Diff line change
@@ -1,56 +1,47 @@
"""
Run chardet on a bunch of documents and see that we get the correct encodings.

:author: Dan Blanchard
:author: Ian Cordasco
"""

from __future__ import with_statement

import os
import sys
import unittest
from os import listdir
from os.path import dirname, isdir, join, realpath, relpath, splitext

from nose.tools import eq_

import chardet


class TestCase(unittest.TestCase):
def __init__(self, file_name, encoding):
unittest.TestCase.__init__(self)
self.file_name = file_name
encoding = encoding.lower()
for postfix in ['-arabic',
'-bulgarian',
'-cyrillic',
'-greek',
'-hebrew',
'-hungarian',
'-turkish']:
if encoding.endswith(postfix):
encoding, _, _ = encoding.rpartition(postfix)
break
self.encoding = encoding

def runTest(self):
with open(self.file_name, 'rb') as f:
result = chardet.detect(f.read())
self.assertEqual(result['encoding'].lower(), self.encoding,
"Expected %s, but got %s in %s" %
(self.encoding, result['encoding'],
self.file_name))


def main():
suite = unittest.TestSuite()
if len(sys.argv) > 1:
base_path = sys.argv[1]
else:
base_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)), 'tests')
for encoding in os.listdir(base_path):
path = os.path.join(base_path, encoding)
if not os.path.isdir(path):
def check_file_encoding(file_name, encoding):
""" Ensure that we detect the encoding for file_name correctly. """
encoding = encoding.lower()
for postfix in ['-arabic', '-bulgarian', '-cyrillic', '-greek', '-hebrew',
'-hungarian', '-turkish']:
if encoding.endswith(postfix):
encoding = encoding.rpartition(postfix)[0]
break

with open(file_name, 'rb') as f:
result = chardet.detect(f.read())
eq_(result['encoding'].lower(), encoding, ("Expected %s, but got %s for "
"%s" % (encoding,
result['encoding'],
file_name)))


def test_encoding_detection():
base_path = relpath(join(dirname(realpath(__file__)), 'tests'))
for encoding in listdir(base_path):
path = join(base_path, encoding)
if not isdir(path):
continue
for file_name in os.listdir(path):
_, ext = os.path.splitext(file_name)
for file_name in listdir(path):
ext = splitext(file_name)[1].lower()
if ext not in ['.html', '.txt', '.xml', '.srt']:
continue
suite.addTest(TestCase(os.path.join(path, file_name), encoding))
unittest.TextTestRunner().run(suite)


if __name__ == '__main__':
main()
yield check_file_encoding, join(path, file_name), encoding