ansible/hg-web: increase network timeout from 60s to 120s (bug 1291926); r?fubar draft
authorGregory Szorc <gps@mozilla.com>
Tue, 11 Apr 2017 11:36:25 -0700
changeset 10758 b05a6ae648a3f6d1e90e9b9d15f23afc2a4cf47f
parent 10757 06dc130e1a283b768db6414bb9a58874d0799b39
push id1618
push userbmo:gps@mozilla.com
push dateTue, 11 Apr 2017 18:36:29 +0000
reviewersfubar
bugs1291926
ansible/hg-web: increase network timeout from 60s to 120s (bug 1291926); r?fubar This should hopefully make many of the Mercurial client failures reported in this bug go away. We had ~8000 of these "failed to proxy response to client" errors in March. And the rate went up last week when we converted various server repos to generaldelta. So we should know relatively quickly if this change reduces the failure rate. Currently, the load balancer is not enforcing an idle timeout on connections. We should consider changing that. And once we do, we can increase Timeout to effectively infinity, since as the in-line comment explains, the thing it is measuring isn't terribly important so it doesn't add much value. MozReview-Commit-ID: AmsL7EZCnN6
ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -6,16 +6,35 @@ WSGIPythonHome /var/hg/venv_hgweb
 # Default is 100. Mercurial encodes some arguments in HTTP request headers.
 # Repos with large numbers of heads (namely Try) run into these limits.
 LimitRequestFields 1000
 
 <VirtualHost *:80>
     ServerName hg.mozilla.org
     DocumentRoot /repo_local/mozilla/webroot_wsgi
 
+    # Clients processing e.g. bundle data may consume data much slower than the
+    # server can emit. This can result in the network being idle for >60s.
+    # Compounding this problem is that the load balancer has its own buffer.
+    # So there may be network activity between the client and the load balancer
+    # but not between the load balancer and this server. That can lead to even
+    # longer periods of network idle. We increase the timeout from its default
+    # of 60s to mitigate this problem.
+    #
+    # Since the TCP connection between the load balancer and this server doesn't
+    # totally reflect what the TCP connection between the client and load
+    # balancer is doing, measuring idle on this server is not very useful.
+    # Instead, network idle should be measured (and enforced) on the load
+    # balancer.
+    #
+    # If this value is too small, logs will appear in the error log:
+    #
+    #   The timeout specified has expired: ... mod_wsgi ... Failed to proxy response to client
+    Timeout 120
+
     RewriteEngine on
     RewriteRule ^/(.*)index.cgi/?(.*) https://hg.mozilla.org/$1$2
 
     SetEnv HGENCODING UTF-8
     SetEnv LC_TYPE UTF-8
 
     WSGIDaemonProcess hg.mozilla.org processes={{ wsgi_processes }} threads=1 maximum-requests=20 deadlock-timeout=60 inactivity-timeout=300 user=hg group=hg display-name=hg.mozilla.org
     WSGIProcessGroup hg.mozilla.org