[go: up one dir, main page]

Skip to content

WhatsApp/erlfmt

Repository files navigation

erlfmt

erlfmt is an opinionated Erlang code formatter. By automating the process of formatting the code, it frees your team up to focus on what the code does, instead of what it looks like.

erlfmt is feature complete and released as version 1.0 This means only backwards compatible changes and bug fixes can be adopted without very serious consideration. We do not want to put users in the position where they need to reformat code without a very good reason.

Before

Remember reading code before erlfmt and having arguments with co workers :(

what_is(Erlang) ->
case Erlang of movie->[hello(mike,joe,robert),credits]; language->formatting_arguments end
.

After

Now, with the new erlfmt, code is readable and you get along with your co workers :D

what_is(Erlang) ->
    case Erlang of
        movie -> [hello(mike, joe, robert), credits];
        language -> no_more_formatting_arguments
    end.

Disclaimer: erlfmt is just a code formatter, not a solution to all life's problems.

Table of Contents

Comparison with other Erlang formatters

erlfmt rebar3_format steamroller erl_tidy
File Types .erl, .hrl, .app, .app.src, .config, .script, .escript .erl, .hrl .erl, .hrl, .app, .app.src, .config, .script .erl
Macros No crashes formatting OTP Skips entire files sometimes Skips entire files sometimes Crashes sometimes
Comments Preserves and moves to line before Preserves but Floating Crashes sometimes and Reorders Crashes sometimes and Floating
Configurable vs Opinionated Opinionated Configurable Opinionated Configurable
Preserving Representation Yes Some Some No
Line Break Hints Yes No No No
Opt In/Out per file, per top level expression per file No No
Speed OTP lib in 7s N/A N/A N/A

See the comparison with other erlang formatters document for more details.

Usage

Rebar3

The easiest way to use erlfmt is as a rebar plugin, by adding to your rebar.config:

{project_plugins, [erlfmt]}.

This will provide a new rebar3 fmt task. All erlfmt command-line options can be configured with defaults in your rebar.config, for example:

{erlfmt, [write]}.

Now you can format all the files in your project by running:

$ rebar3 fmt

And you can add the following command in your CI to ensure your Erlang is formatted:

$ rebar3 fmt --check

For more usage instructions, see RebarUsage

Escript

Alternatively, you can build a standalone and portable escript and use erlfmt without rebar (it still requires Erlang to be installed on the target system).

$ rebar3 as release escriptize
$ _build/release/bin/erlfmt -h

You can then run it from the command line:

$ erlfmt -w './otp/lib/*/{src,include}/*.{erl,hrl}'

Requirements

erlfmt requires Erlang/OTP 21+ and works on all platforms.

Integrations

Add your integration here, by making a pull request.

Line length

erlfmt enforces a consistent style by parsing your code and re-printing it, while enforcing a selected maximum line-length.

For example, this line that exceeds the length limit:

scenario(dial_phone_number(),  ring(), hello(mike),hello(joe), hello(robert),   system_working(), seems_to_be())

will be re-printed automatically in a vertical style:

scenario(
    dial_phone_number(),
    ring(),
    hello(mike),
    hello(joe),
    hello(robert),
    system_working(),
    seems_to_be()
)

But this snippet:

hello(mike, joe, robert)

will be kept as-is, since it fits in a single line.

Note: The enforcing of line-length is best effort and will sometimes overrun the selected line length, because the algorithm is greedy.

Design principles

The formatter was designed with these main principles in mind:

First, the formatter never changes the semantics or structure of the code. This means the input AST and the output AST are equivalent. The formatter does not try to arbitrarily "improve" the code. For the most part it limits its behaviour to shifting whitespace around - it won't rewrite literals, add parentheses, reorder exports, etc.

The second principle is to provide as little configuration as possible. This removes contention points and makes it easier to achieve the goal of consistent code. Instead of providing configuration, the formatter respects a limited set of choices made in the original code to preserve intent and make it easier to achieve beautiful code based on contextual hints.

Furthermore, the formatter avoids special cases as much as possible. For example, there is no hard-coded behaviour specific to some function - all functions are laid out the same. There are some clear layout rules and general structures that are re-used as much as possible between different constructs. For example, the general layout of lists, functions, maps, records, and similar, all follow the same "container" rules.

Finally, the formatter should be idempotent. Formatting the code once should produce the same output as formatting it multiple times.

Manual interventions

In some cases, the formatter rules might lead to code that looks decent, but not perfect. Therefore some manual intervention to help the formatter out might be needed. For example, given the following code:

split_tokens([{TokenType, Meta, TokenValue} | Rest], TokenAcc, CommentAcc) ->
    split_tokens(Rest, [{TokenType, token_anno(erl_anno:to_term(Meta), #{}), TokenValue} | TokenAcc], CommentAcc).

Because the line-length is exceeded, the formatter will produce the following:

split_tokens([{TokenType, Meta, TokenValue} | Rest], TokenAcc, CommentAcc) ->
    split_tokens(
        Rest,
        [{TokenType, token_anno(erl_anno:to_term(Meta), #{}), TokenValue} | TokenAcc],
        CommentAcc
    ).

It might be more desirable, though, to extract a variable and allow the call to still be rendered in a single line, for example:

split_tokens([{TokenType, Meta, TokenValue} | Rest], TokenAcc, CommentAcc) ->
    Token = {TokenType, token_anno(erl_anno:to_term(Meta), #{}), TokenValue},
    split_tokens(Rest, [Token | TokenAcc], CommentAcc).

A similar situation could happen with long patterns in function heads, for example let's look at this function:

my_function(
    #user{name: Name, age: Age, ...},
    Arg2,
    Arg3
) ->
    ...

Even though the code is perfectly valid, you might prefer not to split the arguments across multiple lines and move the pattern extraction into the function body instead:

my_function(User, Arg2, Arg3) ->
    #user{name: Name, age: Age, ...} = User,
    ...

Such transformations cannot be automated since the formatter is not allowed to change the AST of your program. After running the formatter, especially if running it for the first time on a sizeable codebase, it's recommended to inspect the code manually to correct similar sub-par layouts.

Respecting original format

The formatter keeps the original decisions in two key places

  • when choosing between a "collapsed", "semi-expanded", and an "expanded" layout for containers
  • when choosing between single-line and multi-line clauses.

In containers

For containers like lists, tuples, maps, records, function calls, etc, there are three possible layouts - "collapsed" where the entire collection is printed in a single line; "semi-expanded" where the enclosing brackets/breaces/parentheses are printed on a line of their own, but all elements are printed in a single line; and "expanded" where each element is printed on a separate line. The formatter respects this choice, if possible. If there is a newline between the opening bracket/brace/parenthesis and the first element, the collection will be always printed "semi-expanded", for example:

[
    Foo, Bar
]

will be preserved, even though it could fit on a single line.

Similarly, if there's a break between any elements, the container will be printed in the "expanded" format:

[
    Foo,
    Bar
]

This is controlled by the newlines in the original version. For example, merely deleting the newlines from the above sequence:

[    Foo, Bar]

and re-running the formatter, will produce:

[Foo, Bar]

Similarly, adding the single initial newline back:

[
Foo, Bar]

and re-running the formatter, will produce the "semi-expanded" format again.

While adding a newline in the middle:

[Foo,
Bar]

and re-running the formatter, will produce the "expanded" format again.

In clauses

A similar approach is followed, when laying our clause sequences in functions, case expressions, receives, etc. The main choice there is simple - should the clause body be printed directly after -> or on a new line indented. The formatter imposes one constraint - either all clauses are printed on a single line, or in all clauses the body is printed on a new line. This is controlled by the layout of the first clause, again allowing to change the layout of the entire sequence with just one character, for example:

case is_beautiful(Code) of
    true ->
        ring_the_bell();
    false ->
        dig_a_hole()
end

Even though, the expressions could all fit on a single line, because there is a newline in the first clause after ->, this layout is preserved. If we'd like to "collapse" it, we can do that by removing the first newline:

case is_beautiful(Code) of
    true ->        ring_the_bell();
    false ->
        dig_a_hole()
end

and re-running the formatter will produce:

case is_beautiful(Code) of
    true -> ring_the_bell();
    false -> dig_a_hole()
end

To go back to the original layout, we can insert the newline back again:

case is_beautiful(Code) of
    true ->
ring_the_bell();
    false -> dig_a_hole()
end

which after re-formatting will result in the original layout again.

Ignoring Formatting

We found that mostly it is possible to format erlang code in an at least somewhat acceptable way, but exceptions do occur. We have introduced the erlfmt:ignore comment, which when placed before a top-level expression, will indicate to erlfmt to skip over that expression, leave it as is and move on to the next expression. For documentation purposes, a reason for not formatting can be given..

%% erlfmt:ignore I like it more this way
-define(DELTA_MATRIX, [
    [0,   0,   0,   0,   0,   0],
    [0, -16,   0,   0,   0,   0],
    [0,   0,  15,   0,   0,   0],
    [0,   0,   0,   6,   0,   0],
    [0, -16,   0,   0, -14,   0],
    [0,   0,  15,   0,   0,   0]
]).

You can also encose multiple top-level forms in a erlfmt:ignore-begin, erlfmt:ignore-end section.

%% erlfmt:ignore-begin
-define(DELTA_MATRIX1, [
    [0,   0,   0,   0,   0,   0]
]).
-define(DELTA_MATRIX2, [
    [0,   0,   0,   0,   0,   0]
]).
%% erlfmt:ignore-end

-define(THIS_IS_FORMATTED, ok).

Only top-level expression are supported. Nested expressions are not supported, for example expressions inside functions.

You can also add a comment to %%% % @noformat at the top of your file to opt that file out of formatting. It is also possible to use %%% % @format comments at the top of your files with the --require-pragma flag to only format opted in files.

Join the erlfmt community

To learn more about erlfmt internals, please explore the doc/ directory

See the CONTRIBUTING file for how to help out.

Test

$ rebar3 ct
$ rebar3 dialyzer
# or
$ make check

Local use

To format erlfmt itself:

$ make fmt

Release Process

The release process requires a few steps, updating the CHANGELOG.md, releasing to hex and more.

Decision Documents

Formatting Decisions documents are intended to explain our reasoning for making certain formatting decisions.

License

erlfmt is Apache 2.0 licensed, as found in the LICENSE file.