[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL handshake timeout leaves persistent connections hanging #8409

Closed
jlesueur opened this issue Apr 20, 2022 · 11 comments
Closed

SSL handshake timeout leaves persistent connections hanging #8409

jlesueur opened this issue Apr 20, 2022 · 11 comments

Comments

@jlesueur
Copy link

Description

We have a redis server, we're using the phpredis extension to connect to it. The phpredis extension is built on top of php's stream sockets. The bug I'm reporting can be duplicated without the phpredis extension.

Sometimes, a redis server will timeout during the ssl negotiation. This will generate a warning from php_openssl_enable_crypto(). If there is an error handler that stops execution on warnings, then the underlying connection will be left in place, even without SSL negotiation succeeding. We can see this by looking at client list on the redis server. The connection will still be hanging around, for as long as the fpm process exists.

For testing purposes, we can disable tls on the redis server, which will force the "handshake timed out" results.

Is there a reason that the connection is maintained even if the ssl negotiation fails? Could/should php close the connection when ssl negotiation fails?

The following code:

<?php

set_error_handler(function($errno, $errstring, $errfile, $errline, $errcontext) {
	var_dump(get_resources());
	exit(1);
});
$socket = stream_socket_client("tls://redis:6379", $error_code, $error_message, 0.2, STREAM_CLIENT_CONNECT | STREAM_CLIENT_PERSISTENT, stream_context_create(['tls' => ['verify_peer_name' => false]]));
echo "here";

Resulted in this output:

array(2) { [2]=> resource(2) of type (stream-context) [3]=> resource(3) of type (persistent stream) }

But I expected this output instead:

array(2) { [2]=> resource(2) of type (stream-context) }

PHP Version

PHP 7.4.28

Operating System

Ubuntu 20.04

@cmb69
Copy link
Member
cmb69 commented Apr 21, 2022

This looks related to https://bugs.php.net/bug.php?id=79501.

@staabm
Copy link
Contributor
staabm commented Apr 21, 2022

OT: it has to be related.. it was posted on the very same day - 2 years ago ;-)

grafik

@bukka
Copy link
Member
bukka commented Apr 29, 2022

Just FYI I have got this on my list with relatively high priority. Although there's bunch of other things so it might take few months to get this resolved.

@jlesueur
Copy link
Author

Thanks for the update, let me know if there's something I can do to make duplication easier.

@bukka
Copy link
Member
bukka commented Apr 29, 2022

I assume it's this, right? phpredis/phpredis#1726 . After reading it, it seems that it's specific to TLS 1.3. Can you confirm that it's just related to TLS 1.3? Also does this happen with the latest Redis server version?

@jlesueur
Copy link
Author

I ran this to check if tls version matters:

<?php

set_error_handler(function($errno, $errstring, $errfile, $errline, $errcontext) {
        var_dump(get_resources());
        exit(1);
});
$socket = stream_socket_client("tlsv1.2://redis:6379", $error_code, $error_message, 0.2, STREAM_CLIENT_CONNECT | STREAM_CLIENT_PERSISTENT, stream_context_create(['tls' => ['verify_peer_name' => false]]));
echo "here";

Sorry, this is long. tl;dr: I am proposing that the internals stream code should close connections where the tls handshake has timed out before emitting a warning.

For some context, I don't think the issue I'm seeing is the same as phpredis/phpredis#1726
How we found this issue was that we're moving to redis as a session store. We're using persistent connections to limit the number of new connections that need to be created, however, we still sometimes see bursts of new fpm workers coming online, triggering 500 or more new tls-enabled connections to redis. Redis does not like this. It starts to bog down, and ssl connections start to take longer and longer, eventually hitting our timeout (which we set somewhat aggressively at .2 seconds). In this case, the timeout is not a bug with php or redis, but a capacity problem at the redis server. But there is some strange behavior, in that the connections where phpredis hit the timeout while negotiating the tls handshake, are not closed until the fpm process exits. This meant each new request created a new connection, failed tls, but left that connection open, resulting in 20k open connections on a redis host.

This was because the stream internals code emits a warning when the tls handshake times out, and then returns the connection to the phpredis extension, and the phpredis extension attempts to work with the connection, notices communication problems, and closes the connection, except if an error handler exists that stops execution in the case of warnings. In that case, the connection is not closed.

@bukka bukka self-assigned this Aug 4, 2022
@bukka
Copy link
Member
bukka commented Aug 7, 2022

I took a closer look on this one and the problem is basically that there is an area of code in _php_stream_xport_create (specifically everything after stream is created and freed:

stream = (factory)(protocol, n,
(char*)name, namelen, persistent_id, options, flags, timeout,
context STREAMS_REL_CC);
if (stream) {
php_stream_context_set(stream, context);
if ((flags & STREAM_XPORT_SERVER) == 0) {
/* client */
if (flags & (STREAM_XPORT_CONNECT|STREAM_XPORT_CONNECT_ASYNC)) {
if (-1 == php_stream_xport_connect(stream, name, namelen,
flags & STREAM_XPORT_CONNECT_ASYNC ? 1 : 0,
timeout, &error_text, error_code)) {
ERR_RETURN(error_string, error_text, "connect() failed: %s");
failed = 1;
}
}
} else {
/* server */
if (flags & STREAM_XPORT_BIND) {
if (0 != php_stream_xport_bind(stream, name, namelen, &error_text)) {
ERR_RETURN(error_string, error_text, "bind() failed: %s");
failed = 1;
} else if (flags & STREAM_XPORT_LISTEN) {
zval *zbacklog = NULL;
int backlog = 32;
if (PHP_STREAM_CONTEXT(stream) && (zbacklog = php_stream_context_get_option(PHP_STREAM_CONTEXT(stream), "socket", "backlog")) != NULL) {
backlog = zval_get_long(zbacklog);
}
if (0 != php_stream_xport_listen(stream, backlog, &error_text)) {
ERR_RETURN(error_string, error_text, "listen() failed: %s");
failed = 1;
}
}
}
}
}
if (failed) {
/* failure means that they don't get a stream to play with */
if (persistent_id) {
php_stream_pclose(stream);
} else {
php_stream_close(stream);
}
stream = NULL;
}
) where it's not safe to trigger error as it can call error handler which exits so the stream cannot be cleaned up which is especially issue for persistent streams like it's the case here.

I think it's not very practical handle it in every function that triggers error but instead store the errors and emit them later (after freeing the stream). The good thing that we have got already in core (used for inheritance) some functionality to record errors so I will take a look if we could modify that slightly so we can do just record and then reply it after the stream is cleaned up.

bukka added a commit to bukka/php-src that referenced this issue Aug 9, 2022
bukka added a commit to bukka/php-src that referenced this issue Aug 9, 2022
bukka added a commit to bukka/php-src that referenced this issue Aug 9, 2022
This is not actually related to SSL handshake but stream socket creation
which does not clean errors if the error handler is set. This fix
prevents emitting errors until the stream is freed.
bukka added a commit to bukka/php-src that referenced this issue Aug 9, 2022
This is not actually related to SSL handshake but stream socket creation
which does not clean errors if the error handler is set. This fix
prevents emitting errors until the stream is freed.
@bukka bukka closed this as completed in d052742 Aug 12, 2022
@bukka
Copy link
Member
bukka commented Aug 12, 2022

Just for the record, the fix should be part of 8.1.10 . It's unfortunately non trivial task to back port the changes to 8.0 in an ABI compatible way as it relies on additions in 8.1 so it won't be part of 8.0.

@jlesueur
Copy link
Author

Thank you for the fix here!

@twose
Copy link
Member
twose commented Aug 14, 2022

Hi @bukka
Why not use zend_try and zend_catch to solve this problem? This patch (d052742) makes Swoole/Swow can not work anymore, because Coroutine will yield to another one during socket operation, and EG(record_errors) assertion will always fail.
Before that, zend_begin_record_errors was only used during compile time, this patch kills extensions like Swoole/Swow...

twose added a commit to twose/php-src that referenced this issue Aug 14, 2022
This patch reverts part of d052742, d052742 makes Swoole/Swow can not work anymore, because Coroutine will yield to another one during socket operation, and EG(record_errors) assertion will always fail.
Use zend_try and zend_catch to fix it, and this patch also can be applied to PHP-8.0.
twose added a commit to twose/php-src that referenced this issue Aug 14, 2022
This patch reverts part of d052742, d052742 makes Swoole/Swow can not work anymore, because Coroutine will yield to another one during socket operation, and EG(record_errors) assertion will always fail.
Use zend_try and zend_catch to fix it, and this patch also can be applied to PHP-8.0.
twose added a commit to twose/php-src that referenced this issue Aug 14, 2022
This patch reverts part of d052742, d052742 makes Swoole/Swow can not work anymore, because Coroutine will yield to another one during socket operation, and EG(record_errors) assertion will always fail.
Use zend_try and zend_catch to fix it, and this patch also can be applied to PHP-8.0.
twose added a commit to twose/php-src that referenced this issue Aug 14, 2022
This patch reverts part of d052742, d052742 makes Swoole/Swow can not work anymore, because Coroutine will yield to another one during socket operation, and EG(record_errors) assertion will always fail, and zend_begin_record_errors was only used during compile time before.
Use zend_try and zend_catch to fix it, and this patch also can be applied to PHP-8.0.
@bukka
Copy link
Member
bukka commented Aug 14, 2022

Ah yeah forgot about generators. You right I should have used zend_try, zend_catch, zend_bailout here as that's what is used elsewhere. It's just the test is a bit harder but might actually do an FPM test for this which should work.

@bukka bukka reopened this Aug 14, 2022
@twose twose closed this as completed in 897ca85 Aug 14, 2022
twose added a commit that referenced this issue Aug 14, 2022
This fix is another solution to replace d052742, use zend_try and zend_catch to make sure persistent stream will be released when error occurred.

Closes GH-9332.
twose added a commit that referenced this issue Aug 14, 2022
* PHP-8.0:
  Re-fix GH-8409: SSL handshake timeout persistent connections hanging
twose added a commit that referenced this issue Aug 14, 2022
* PHP-8.1:
  Re-fix GH-8409: SSL handshake timeout persistent connections hanging
  Revert "Fix GH-8409: SSL handshake timeout persistent connections hanging"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@staabm @bukka @jlesueur @cmb69 @twose and others