Bug 1291035 - Increase HTTP connection pool capacity; r?dustin draft
authorGregory Szorc <gps@mozilla.com>
Mon, 01 Aug 2016 14:49:55 -0700
changeset 395278 0521447e0f3ad19709288c3ed6d82c3143b1acde
parent 395277 b2a6aaf59783cecf8cde63fe51fbd65e14b026df
child 526962 5c84bbfc8b0dcf40a96d0d6a07ed005e23ee6a2b
push id24732
push userbmo:gps@mozilla.com
push dateMon, 01 Aug 2016 21:50:09 +0000
reviewersdustin
bugs1291035
milestone50.0a1
Bug 1291035 - Increase HTTP connection pool capacity; r?dustin I was looking at some decision task logs and noticed lines like: Connection pool is full, discarding connection: taskcluster I also noticed lines like: Starting new HTTP connection (153): taskcluster In this case, we had established 153 TCP connections to a server. Looking at the requests source code, a requests.session by default creates a connection pool with capacity of 10. There are actually 2 components to the capacity: idle connections and active connections. What appeared to be happening was we could obtain an idle connection, use it, and then it would be discarded when put back in the idle pool because the idle pool was at capacity. Furthermore, it also appears that connections were sitting around waiting for a TCP connection. This commit uses a custom "adapter" with an increased pool size that matches the concurrency level of the code issuing the HTTP requests. This should increase the number of concurrent TCP connections / requests, decreease the number of TCP connections being used overall, and make decision tasks complete faster. MozReview-Commit-ID: 6NDbz78TM2y
taskcluster/taskgraph/create.py
--- a/taskcluster/taskgraph/create.py
+++ b/taskcluster/taskgraph/create.py
@@ -25,16 +25,24 @@ CONCURRENCY = 50
 
 def create_tasks(taskgraph, label_to_taskid):
     # TODO: use the taskGroupId of the decision task
     task_group_id = slugid()
     taskid_to_label = {t: l for l, t in label_to_taskid.iteritems()}
 
     session = requests.Session()
 
+    # Default HTTPAdapter uses 10 connections. Mount custom adapter to increase
+    # that limit. Connections are established as needed, so using a large value
+    # should not negatively impact performance.
+    http_adapter = requests.adapters.HTTPAdapter(pool_connections=CONCURRENCY,
+                                                 pool_maxsize=CONCURRENCY)
+    session.mount('https://', http_adapter)
+    session.mount('http://', http_adapter)
+
     decision_task_id = os.environ.get('TASK_ID')
 
     with futures.ThreadPoolExecutor(CONCURRENCY) as e:
         fs = {}
 
         # We can't submit a task until its dependencies have been submitted.
         # So our strategy is to walk the graph and submit tasks once all
         # their dependencies have been submitted.