Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements
_suggest_batch
instead of single-document_suggest
in the SVC backend, allowing the use of more efficient vector operations.I tested this with the 20 Newsgroups data set example, as shown on the wiki page of the SVC backend. Evaluation on the test set is 20-30% faster with the results otherwise unchanged.
There is one special case that requires attention: what to do when the input to the SVC classifier is empty or near-empty. The old
_suggest
code handled this as a special case so it short-circuits the classifier and returns an empty result. There is also a unit test that tests this (using the special input text "j" which is an unknown token and thus equivalent to empty input). The SVC model has no problem returning some class (maybe the majority class?) in this case, but I reimplemented the special case check for empty input to retain the old behaviour that refuses to classify it and keeps the unit test happy.With 1 job
With 4 jobs
Fixes #667