[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: analyze serially, lint in parallel #339

Closed
vemv opened this issue Mar 2, 2020 · 4 comments
Closed

Idea: analyze serially, lint in parallel #339

vemv opened this issue Mar 2, 2020 · 4 comments

Comments

@vemv
Copy link
Collaborator
vemv commented Mar 2, 2020

Problem statement

Although https://github.com/jonase/eastwood/tree/81d9c551c345261dcb2676c78034ad9406c9d383#parallelism is a thing, I think performing AST analysis in parallel in a repl environment can be problematic: tools.analyzer performs a clojure.core/require as part of its evaluations, and require is not thread-safe.

I documented a similar previous experience here.

Proposal

So, if AST analysis cannot be safely done in parallel, the next big candidate to parallelization would be the execution of the linters themselves.

Here, we can see how the set of linters (by default: 20) is run serially:

(map (partial lint-ns* ns-sym analyze-results opts)))

As per usual, changing map -> pmap is not magic (as it can easily 'overparallelize' workloads).

However, running a custom pmap did succeed in improving the performance. Linting a single big ns (eastwood.lint) I saw ~200 ms being shaved. Obviously, these gains could add up nicely when linting a whole project.

Nuances

  • I assume that the 20 linters are pure functions and don't perform further AST evaluations
    • i.e. thread-safety is a prerequisite
  • Learning from recent experiences, I'd be careful about introducing pmap just like that
    • idea: (def ^:dynamic *linter-executor* map)
      • i.e. change nothing, allow consumers to bring pmap, a custom pmap etc
      • allowing third-party consumers to test out this well, particularly on CI, etc.

WDYT?

@slipset
Copy link
Collaborator
slipset commented Mar 2, 2020

Thank you for bringing this up.
Running the linters in parallel was my first idea for speeding up Eastwood, but when I did the analysis it showed that the time spent analyzing was so much greater than the time spent linting, so I tried to focus on analysis.

But, being able to shave off 200ms for a single ns sounds interesting.

For reference, linting my work project (~70kloc) takes between four to five minutes.

@vemv
Copy link
Collaborator Author
vemv commented Mar 4, 2020

Testing my tentative change more extensively: I got a nice shave from 130s to 80s when linting one of my biggest projects.

As agreed over Slack I'll just introduce a "linter-executor" to be optionally passed as config (not dynvar).

My only hesitation would be around the thread safety of the linters themselves. Having a look they seem safe to me: they have no eval, require, etc. I could only spot a single eval:

(replace-variable-tag-part (eval tag))

...but it seems a very delimited eval. Sounds correct?

Hope to PR soon!

@slipset
Copy link
Collaborator
slipset commented Mar 4, 2020

That has been my assumption too, that the linters are thread safe.
The only thing I would be worried about is output, but if I'm not mistaken, the code first collects data, then it outputs, so that should not be a problem.

@jafingerhut
Copy link
Collaborator

I do not recall any of the linters being anything other than sequential, pure functions of their inputs, and I do not recall any of them using the results of others as inputs. Caveat: It has been a couple of years since I looked at them.

vemv added a commit to reducecombine/eastwood that referenced this issue Mar 4, 2020
vemv added a commit to reducecombine/eastwood that referenced this issue Mar 4, 2020
vemv added a commit to reducecombine/eastwood that referenced this issue Mar 4, 2020
@vemv vemv mentioned this issue Mar 4, 2020
2 tasks
slipset pushed a commit that referenced this issue Mar 15, 2020
vemv added a commit to reducecombine/eastwood that referenced this issue Apr 1, 2021
 I've used `pmap` over the last 12 months, continuously (in CI, and work/personal projects) observing no issues.

So it seems a safe and convenient default to improve.

`clojure.core/pmap` is not too bad of a choice (vs. other pmap-like functions) because its implementation caps the maximum amount of spawned threads to a number relative to CPU count.

So the workloads, which are CPU-bound, will be treated as such, without possibly hogging the OS with an excess of threads.
vemv added a commit to reducecombine/eastwood that referenced this issue Apr 1, 2021
 I've used `pmap` over the last 12 months, continuously (in CI, and work/personal projects) observing no issues.

So it seems a safe and convenient default to improve.

`clojure.core/pmap` is not too bad of a choice (vs. other pmap-like functions) because its implementation caps the maximum amount of spawned threads to a number relative to CPU count.

So the workloads, which are CPU-bound, will be treated as such, without possibly hogging the OS with an excess of threads.
@slipset slipset closed this as completed in 53318c8 Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants