-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: analyze serially, lint in parallel #339
Comments
Thank you for bringing this up. But, being able to shave off 200ms for a single ns sounds interesting. For reference, linting my work project (~70kloc) takes between four to five minutes. |
Testing my tentative change more extensively: I got a nice shave from 130s to 80s when linting one of my biggest projects. As agreed over Slack I'll just introduce a "linter-executor" to be optionally passed as config (not dynvar). My only hesitation would be around the thread safety of the linters themselves. Having a look they seem safe to me: they have no eval, require, etc. I could only spot a single eastwood/src/eastwood/linters/typetags.clj Line 145 in 81d9c55
...but it seems a very delimited Hope to PR soon! |
That has been my assumption too, that the linters are thread safe. |
I do not recall any of the linters being anything other than sequential, pure functions of their inputs, and I do not recall any of them using the results of others as inputs. Caveat: It has been a couple of years since I looked at them. |
I've used `pmap` over the last 12 months, continuously (in CI, and work/personal projects) observing no issues. So it seems a safe and convenient default to improve. `clojure.core/pmap` is not too bad of a choice (vs. other pmap-like functions) because its implementation caps the maximum amount of spawned threads to a number relative to CPU count. So the workloads, which are CPU-bound, will be treated as such, without possibly hogging the OS with an excess of threads.
I've used `pmap` over the last 12 months, continuously (in CI, and work/personal projects) observing no issues. So it seems a safe and convenient default to improve. `clojure.core/pmap` is not too bad of a choice (vs. other pmap-like functions) because its implementation caps the maximum amount of spawned threads to a number relative to CPU count. So the workloads, which are CPU-bound, will be treated as such, without possibly hogging the OS with an excess of threads.
Problem statement
Although https://github.com/jonase/eastwood/tree/81d9c551c345261dcb2676c78034ad9406c9d383#parallelism is a thing, I think performing AST analysis in parallel in a repl environment can be problematic: tools.analyzer performs a
clojure.core/require
as part of its evaluations, andrequire
is not thread-safe.I documented a similar previous experience here.
Proposal
So, if AST analysis cannot be safely done in parallel, the next big candidate to parallelization would be the execution of the linters themselves.
Here, we can see how the set of linters (by default: 20) is run serially:
eastwood/src/eastwood/lint.clj
Line 168 in 81d9c55
As per usual, changing
map
->pmap
is not magic (as it can easily 'overparallelize' workloads).However, running a custom
pmap
did succeed in improving the performance. Linting a single big ns (eastwood.lint
) I saw ~200 ms being shaved. Obviously, these gains could add up nicely when linting a whole project.Nuances
pmap
just like that(def ^:dynamic *linter-executor* map)
pmap
, a custompmap
etcWDYT?
The text was updated successfully, but these errors were encountered: