serverlog: perform garbage collection on every request (bug 1443984); r?sheehan, glob draft
authorGregory Szorc <gps@mozilla.com>
Mon, 12 Mar 2018 14:13:14 -0700
changeset 12171 c17c210a4f106b7ae35b5c8f5f30e0d0273c6014
parent 12170 7d8dc89dee25e95e3cdfce2b587316f292933e90
push id1908
push userbmo:gps@mozilla.com
push dateMon, 12 Mar 2018 21:13:22 +0000
reviewerssheehan, glob
bugs1443984
serverlog: perform garbage collection on every request (bug 1443984); r?sheehan, glob hgwebdir is currently leaking repository objects. This can lead to OOM on hgweb machines in production. The leaks are likely due to a cycle in repository objects. Those leaks likely won't get fixed until 4.6 at the earliest. Since memory leaks are effectively a fact of life at this juncture, let's mitigate their existence by forcing a garbage collection at the end of every request. A similar patch to do this is proposed upstream. Worst case, both us and core perform a collect. The 2nd collect should be very fast. We implement this in the serverlog extension because monkeypatching hgweb is hard and this extension already does it. MozReview-Commit-ID: 3HDarYrDF3J
hgext/serverlog/__init__.py
--- a/hgext/serverlog/__init__.py
+++ b/hgext/serverlog/__init__.py
@@ -162,16 +162,17 @@ The extension currently only uses syslog
 
 The extension assumes only 1 thread is running per process. If multiple threads
 are running, CPU time calculations will not be accurate. Other state may get
 mixed up.
 """
 
 from __future__ import absolute_import
 
+import gc
 import inspect
 import os
 import resource
 import syslog
 import time
 import uuid
 
 from mercurial import (
@@ -272,16 +273,22 @@ class hgwebwrapped(hgweb_mod.hgweb):
             for what in super(hgwebwrapped, self)._runwsgi(req, repo):
                 sl['writecount'] += len(what)
                 yield what
 
                 if sl['writecount'] - lastlogamount > datasizeinterval:
                     logsyslog(sl, 'WRITE_PROGRESS', '%d' % sl['writecount'])
                     lastlogamount = sl['writecount']
         finally:
+            # It is easy to introduce cycles in localrepository instances.
+            # Versions of Mercurial up to and including 4.5 leak repo instances
+            # in hgwebdir. We force a GC on every request to help mitigate
+            # these leaks.
+            gc.collect()
+
             endtime = time.time()
             endusage = resource.getrusage(resource.RUSAGE_SELF)
             endcpu = endusage.ru_utime + endusage.ru_stime
 
             deltatime = endtime - starttime
             deltacpu = endcpu - startcpu
 
             logsyslog(sl, 'END_REQUEST', '%d' % sl['writecount'],