Propogate end_stream from downstream http/2 requests #7196

rubu · 2024-09-19T20:12:36Z

Description

I encountered an issue where mitmdump was failing with a request that was working in the browser. It was a GET request with no body but the server was throwing an error about GET request having a body.

After looking at packet captures the difference between mitmdump and browser was that browser was sending END STREAM on the HEADERS frame whereas mitmdump was sending an empty DATA frame with END STREAM.

I patched mitmdump to propogate END STREAM if it is present on the downstream HEADERS frame.

Tests pass and my test case works, could anyone elaborate if this can cause other issues? If so I can adjust this to get to a state where this can be merged - since this is breaking for me ill roll with a local patch now but it would be great to use an official release.

Checklist

I have updated tests where applicable.
I have added an entry to the CHANGELOG.

mhils · 2024-09-20T01:26:17Z

Thanks! Generally strong +1 on this, great find. :)

This needs a test in test/mitmproxy/proxy/layers/http/test_http2.py and ideally the same logic applied to responses. You can probably add a new test based on the following two without too much hassle:
https://github.com/mitmproxy/mitmproxy/blob/v10.4.2/test/mitmproxy/proxy/layers/http/test_http2.py#L95-L144
https://github.com/mitmproxy/mitmproxy/blob/v10.4.2/test/mitmproxy/proxy/layers/http/test_http2.py#L223-L243
Let me know if something is unclear, happy to answer questions.

I'm also wondering if we can just change

mitmproxy/mitmproxy/proxy/layers/http/__init__.py

Line 281 in 8964ded

if self.flow.request.stream:

to

if self.flow.request.stream and not event.end_stream:

as a simpler patch, but that is easy to test out once we have a test. Thanks!

rubu · 2024-09-20T01:53:27Z

Thank you for the comment - the suggested change does make sense and could be more elegant, ill try to make a test and then verify if that change works.

What did you mean by "and ideally the same logic applied to responses" - this was regarding tests or something else?

mhils · 2024-09-20T01:55:45Z

What did you mean by "and ideally the same logic applied to responses" - this was regarding tests or something else?

The PR currently removes empty data frames for requests, but it doesn't do so for responses. We basically have the same again here for responses:

mitmproxy/mitmproxy/proxy/layers/http/__init__.py

Lines 412 to 415 in db61b4b

    
           elif self.flow.response.stream: 
        
               yield from self.start_response_stream() 
        
           else: 
        
               self.server_state = self.state_consume_response_body

This should be fixable in exactly the same way, so I suggest we do both in one go. :)

rubu · 2024-09-23T14:08:10Z

@mhils sadly I gave it a shot but I'm totally unfamiliar with the playbook apis and Chat GPT was also not up to the task so Ill need some more guidance.

are there some docs on the playbook apis to understand all the hooks?
as far as I can see i need to construct header with the END_FRAME flag, send those and then ensure that the flag is present when the frames are received on the upstream server. I don't understand how to extract the upstream frames from the server. So I have this which does not work:

assert (
        playbook
        >> DataReceived(
            tctx.client,
            cff.build_headers_frame(example_request_headers, flags=["END_STREAM"]).serialize(),
        )
        << http.HttpRequestHeadersHook(flow)
        >> reply(side_effect=enable_request_streaming)
        << OpenConnection(server)
        << DataReceived(tctx.server, frames_from_mitmproxy := tctx.server.data) # there is no data in server, this is what Chat GPT gave me
        << reply()
    )

So how can i peek at (or retrieve?) the data sent to the upstream server? What should go in place of DataReceived?

mhils · 2024-09-25T04:48:28Z

Hi @rubu!

are there some docs on the playbook apis to understand all the hooks?

Not really, but this here is relevant.

        << OpenConnection(server)
        << DataReceived(tctx.server, frames_from_mitmproxy := tctx.server.data) # there is no data in server, this is what Chat GPT gave me
        << reply()

My recommendation would be to skip ChatGPT for this. << DataReceived doesn't make sense because DataReceived is an event received from the outside. So >> DataReceived(...) to feed it into your layers, never << DataReceived. Outgoing data is sent as a command (<< SendData(...)). Same with << reply(), replies to hooks can only come from the outside.

For OpenConnection you need a reply(None) (where None is the error), or you do tctx.server = open_h2_server_conn before the playbook and avoid the OpenConnection altogether (see tests I linked). I'll make a note to document that a reply is expected in the OpenConnection docstring, that's currently missing.

Try making a copy of test_simple and turn

        << http.HttpRequestHeadersHook(flow)
        >> reply()

into

        << http.HttpRequestHeadersHook(flow)
        >> reply(side_effect=enable_streaming)

That should get you onto the right track. If not let me know and I'm happy to lend a hand. :)

*) add test to check if end_stream is propogated on request / response *) simplify the patch

rubu · 2024-09-29T20:14:01Z

Hi @mhils ,

ok i figured how to make the test to cover both directions (request and resposne), but the response fix was a bit more complex since applying the same logic broke a websocket test - the bytes on the wire were not deframed properly - which seemed logical since your proposed strategy of avoiding the call to start_response_stream() vs mine that propagates the end stream does not really work for websockets since i assume the hooks need to be called as early as possible (i assume for web socket frame handlers to be set up?). So i moved the is_websocket logical condition to a member method so that it can be reused in the original place and now in the new logic.

Do this PR not seem good or is there something that should be adjusted - I am no a python person so I'm not sure if the bool | None thing is proper style or no.

rubu · 2024-09-29T20:17:19Z

Ok, now i actually see the code coverage thing failing due to the extra if:

def is_websocket(self) -> bool | None:
        if self.flow.response is None:
            return None
        return (
            self.flow.response.status_code == 101
            and self.flow.response.headers.get("upgrade", "").lower() == "websocket"
            and self.flow.request.headers.get("Sec-WebSocket-Version", "").encode()
            == wsproto.handshake.WEBSOCKET_VERSION
            and self.context.options.websocket
        )

What is the proper approach here? Move the self.flow.reponse inside the whole check and just leave the method as bool ? I wanted the bool | None return to signify is the result is known.

mhils · 2024-09-29T23:25:16Z

@rubu: Thanks! This looks great so far, thank you for doing the majority of the grunt work with tests etc. I can't give you an immediate answer without digging deeper, let me play around with this in the coming days. :)

mhils

Thanks again! I ended up adapting the WebSocket test, now the actual code diff is super simple. 🎉

rubu and others added 2 commits September 19, 2024 15:27

Propogate end_stream from downstream http/2 requests

5b6be44

[autofix.ci] apply automated fixes

db61b4b

rubu and others added 3 commits September 29, 2024 22:54

add test, simply patch

7011662

*) add test to check if end_stream is propogated on request / response *) simplify the patch

[autofix.ci] apply automated fixes

89bf753

fix linting in is_websocket

d09965e

mhils and others added 2 commits September 30, 2024 11:41

adjust websocket test

413af78

[autofix.ci] apply automated fixes

eaa2d0d

mhils approved these changes Sep 30, 2024

View reviewed changes

mhils enabled auto-merge (squash) September 30, 2024 18:45

mhils merged commit 72a0448 into mitmproxy:main Sep 30, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propogate end_stream from downstream http/2 requests #7196

Propogate end_stream from downstream http/2 requests #7196

Propogate end_stream from downstream http/2 requests #7196

Propogate end_stream from downstream http/2 requests #7196

Conversation

Description

Checklist

Choose a reason for hiding this comment