Idea: analyze serially, lint in parallel #339

vemv · 2020-03-02T03:46:55Z

Problem statement

Although https://github.com/jonase/eastwood/tree/81d9c551c345261dcb2676c78034ad9406c9d383#parallelism is a thing, I think performing AST analysis in parallel in a repl environment can be problematic: tools.analyzer performs a clojure.core/require as part of its evaluations, and require is not thread-safe.

I documented a similar previous experience here.

Proposal

So, if AST analysis cannot be safely done in parallel, the next big candidate to parallelization would be the execution of the linters themselves.

Here, we can see how the set of linters (by default: 20) is run serially:

eastwood/src/eastwood/lint.clj

Line 168 in 81d9c55

(map (partial lint-ns* ns-sym analyze-results opts)))

As per usual, changing map -> pmap is not magic (as it can easily 'overparallelize' workloads).

However, running a custom pmap did succeed in improving the performance. Linting a single big ns (eastwood.lint) I saw ~200 ms being shaved. Obviously, these gains could add up nicely when linting a whole project.

Nuances

I assume that the 20 linters are pure functions and don't perform further AST evaluations
- i.e. thread-safety is a prerequisite
Learning from recent experiences, I'd be careful about introducing pmap just like that
- idea: (def ^:dynamic *linter-executor* map)
  - i.e. change nothing, allow consumers to bring pmap, a custom pmap etc
  - allowing third-party consumers to test out this well, particularly on CI, etc.

WDYT?

The text was updated successfully, but these errors were encountered:

slipset · 2020-03-02T06:43:58Z

Thank you for bringing this up.
Running the linters in parallel was my first idea for speeding up Eastwood, but when I did the analysis it showed that the time spent analyzing was so much greater than the time spent linting, so I tried to focus on analysis.

But, being able to shave off 200ms for a single ns sounds interesting.

For reference, linting my work project (~70kloc) takes between four to five minutes.

vemv · 2020-03-04T13:48:32Z

Testing my tentative change more extensively: I got a nice shave from 130s to 80s when linting one of my biggest projects.

As agreed over Slack I'll just introduce a "linter-executor" to be optionally passed as config (not dynvar).

My only hesitation would be around the thread safety of the linters themselves. Having a look they seem safe to me: they have no eval, require, etc. I could only spot a single eval:

eastwood/src/eastwood/linters/typetags.clj

Line 145 in 81d9c55

(replace-variable-tag-part (eval tag))

...but it seems a very delimited eval. Sounds correct?

Hope to PR soon!

slipset · 2020-03-04T19:13:00Z

That has been my assumption too, that the linters are thread safe.
The only thing I would be worried about is output, but if I'm not mistaken, the code first collects data, then it outputs, so that should not be a problem.

jafingerhut · 2020-03-04T19:48:03Z

I do not recall any of the linters being anything other than sequential, pure functions of their inputs, and I do not recall any of them using the results of others as inputs. Caveat: It has been a couple of years since I looked at them.

Part of jonase#339

Part of #339

I've used `pmap` over the last 12 months, continuously (in CI, and work/personal projects) observing no issues. So it seems a safe and convenient default to improve. `clojure.core/pmap` is not too bad of a choice (vs. other pmap-like functions) because its implementation caps the maximum amount of spawned threads to a number relative to CPU count. So the workloads, which are CPU-bound, will be treated as such, without possibly hogging the OS with an excess of threads.

vemv added a commit to reducecombine/eastwood that referenced this issue Mar 4, 2020

Introduce set-linter-executor!

e5f9e59

Part of jonase#339

vemv added a commit to reducecombine/eastwood that referenced this issue Mar 4, 2020

Introduce set-linter-executor!

6d74b1e

Part of jonase#339

vemv added a commit to reducecombine/eastwood that referenced this issue Mar 4, 2020

Introduce set-linter-executor!

ace7aa6

Part of jonase#339

vemv mentioned this issue Mar 4, 2020

Introduce set-linter-executor! #340

Merged

2 tasks

slipset pushed a commit that referenced this issue Mar 15, 2020

Introduce set-linter-executor! (#340)

e771e89

Part of #339

slipset closed this as completed in 53318c8 Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: analyze serially, lint in parallel #339

Idea: analyze serially, lint in parallel #339

Idea: analyze serially, lint in parallel #339

Idea: analyze serially, lint in parallel #339

Comments

Problem statement

Proposal

Nuances