[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propogate end_stream from downstream http/2 requests #7196

Merged
merged 7 commits into from
Sep 30, 2024

Conversation

rubu
Copy link
Contributor
@rubu rubu commented Sep 19, 2024

Description

I encountered an issue where mitmdump was failing with a request that was working in the browser. It was a GET request with no body but the server was throwing an error about GET request having a body.

After looking at packet captures the difference between mitmdump and browser was that browser was sending END STREAM on the HEADERS frame whereas mitmdump was sending an empty DATA frame with END STREAM.

I patched mitmdump to propogate END STREAM if it is present on the downstream HEADERS frame.

Tests pass and my test case works, could anyone elaborate if this can cause other issues? If so I can adjust this to get to a state where this can be merged - since this is breaking for me ill roll with a local patch now but it would be great to use an official release.

Checklist

  • I have updated tests where applicable.
  • I have added an entry to the CHANGELOG.

@mhils
Copy link
Member
mhils commented Sep 20, 2024

Thanks! Generally strong +1 on this, great find. :)

This needs a test in test/mitmproxy/proxy/layers/http/test_http2.py and ideally the same logic applied to responses. You can probably add a new test based on the following two without too much hassle:
https://github.com/mitmproxy/mitmproxy/blob/v10.4.2/test/mitmproxy/proxy/layers/http/test_http2.py#L95-L144
https://github.com/mitmproxy/mitmproxy/blob/v10.4.2/test/mitmproxy/proxy/layers/http/test_http2.py#L223-L243
Let me know if something is unclear, happy to answer questions.

I'm also wondering if we can just change

if self.flow.request.stream:

to

if self.flow.request.stream and not event.end_stream:

as a simpler patch, but that is easy to test out once we have a test. Thanks!

@rubu
Copy link
Contributor Author
rubu commented Sep 20, 2024

Thank you for the comment - the suggested change does make sense and could be more elegant, ill try to make a test and then verify if that change works.

What did you mean by "and ideally the same logic applied to responses" - this was regarding tests or something else?

@mhils
Copy link
Member
mhils commented Sep 20, 2024

What did you mean by "and ideally the same logic applied to responses" - this was regarding tests or something else?

The PR currently removes empty data frames for requests, but it doesn't do so for responses. We basically have the same again here for responses:

elif self.flow.response.stream:
yield from self.start_response_stream()
else:
self.server_state = self.state_consume_response_body

This should be fixable in exactly the same way, so I suggest we do both in one go. :)

@rubu
Copy link
Contributor Author
rubu commented Sep 23, 2024

@mhils sadly I gave it a shot but I'm totally unfamiliar with the playbook apis and Chat GPT was also not up to the task so Ill need some more guidance.

  1. are there some docs on the playbook apis to understand all the hooks?
  2. as far as I can see i need to construct header with the END_FRAME flag, send those and then ensure that the flag is present when the frames are received on the upstream server. I don't understand how to extract the upstream frames from the server. So I have this which does not work:
assert (
        playbook
        >> DataReceived(
            tctx.client,
            cff.build_headers_frame(example_request_headers, flags=["END_STREAM"]).serialize(),
        )
        << http.HttpRequestHeadersHook(flow)
        >> reply(side_effect=enable_request_streaming)
        << OpenConnection(server)
        << DataReceived(tctx.server, frames_from_mitmproxy := tctx.server.data) # there is no data in server, this is what Chat GPT gave me
        << reply()
    )

So how can i peek at (or retrieve?) the data sent to the upstream server? What should go in place of DataReceived?

@mhils
Copy link
Member
mhils commented Sep 25, 2024

Hi @rubu!

are there some docs on the playbook apis to understand all the hooks?

Not really, but this here is relevant.

        << OpenConnection(server)
        << DataReceived(tctx.server, frames_from_mitmproxy := tctx.server.data) # there is no data in server, this is what Chat GPT gave me
        << reply()

My recommendation would be to skip ChatGPT for this. << DataReceived doesn't make sense because DataReceived is an event received from the outside. So >> DataReceived(...) to feed it into your layers, never << DataReceived. Outgoing data is sent as a command (<< SendData(...)). Same with << reply(), replies to hooks can only come from the outside.

For OpenConnection you need a reply(None) (where None is the error), or you do tctx.server = open_h2_server_conn before the playbook and avoid the OpenConnection altogether (see tests I linked). I'll make a note to document that a reply is expected in the OpenConnection docstring, that's currently missing.

Try making a copy of test_simple and turn

        << http.HttpRequestHeadersHook(flow)
        >> reply()

into

        << http.HttpRequestHeadersHook(flow)
        >> reply(side_effect=enable_streaming)

That should get you onto the right track. If not let me know and I'm happy to lend a hand. :)

rubu and others added 3 commits September 29, 2024 22:54
*) add test to check if end_stream is propogated on request / response
*) simplify the patch
@rubu
Copy link
Contributor Author
rubu commented Sep 29, 2024

Hi @mhils ,

ok i figured how to make the test to cover both directions (request and resposne), but the response fix was a bit more complex since applying the same logic broke a websocket test - the bytes on the wire were not deframed properly - which seemed logical since your proposed strategy of avoiding the call to start_response_stream() vs mine that propagates the end stream does not really work for websockets since i assume the hooks need to be called as early as possible (i assume for web socket frame handlers to be set up?). So i moved the is_websocket logical condition to a member method so that it can be reused in the original place and now in the new logic.

Do this PR not seem good or is there something that should be adjusted - I am no a python person so I'm not sure if the bool | None thing is proper style or no.

@rubu
Copy link
Contributor Author
rubu commented Sep 29, 2024

Ok, now i actually see the code coverage thing failing due to the extra if:

def is_websocket(self) -> bool | None:
        if self.flow.response is None:
            return None
        return (
            self.flow.response.status_code == 101
            and self.flow.response.headers.get("upgrade", "").lower() == "websocket"
            and self.flow.request.headers.get("Sec-WebSocket-Version", "").encode()
            == wsproto.handshake.WEBSOCKET_VERSION
            and self.context.options.websocket
        )

What is the proper approach here? Move the self.flow.reponse inside the whole check and just leave the method as bool ? I wanted the bool | None return to signify is the result is known.

@mhils
Copy link
Member
mhils commented Sep 29, 2024

@rubu: Thanks! This looks great so far, thank you for doing the majority of the grunt work with tests etc. I can't give you an immediate answer without digging deeper, let me play around with this in the coming days. :)

Copy link
Member
@mhils mhils left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! I ended up adapting the WebSocket test, now the actual code diff is super simple. 🎉

@mhils mhils enabled auto-merge (squash) September 30, 2024 18:45
@mhils mhils merged commit 72a0448 into mitmproxy:main Sep 30, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants