robustcheckout: detect and recover from open locks (bug 1297153); r?glob draft
authorGregory Szorc <gps@mozilla.com>
Thu, 10 Aug 2017 16:46:54 -0700
changeset 11523 7c03f1e51bd81509eb1820740a85522fafac238e
parent 11522 16908138278717151569b639889f6bd4031c3bca
push id1757
push userbmo:gps@mozilla.com
push dateMon, 14 Aug 2017 23:57:42 +0000
reviewersglob
bugs1297153
robustcheckout: detect and recover from open locks (bug 1297153); r?glob In automation, `hg` processes can get killed by SIGKILL. SIGKILL doesn't give processes time to clean up and gracefully abort whatever operation they were performing. If a Mercurial operation is forcibly killed while holding a lock, it will orphan a lock file. On normal machines with local filesystems, Mercurial can tell that the process behind the lock has gone away, that the lock is stale, and that it can be removed automatically. However, in TaskCluster, operations exist in PID namespaces and may have random hostnames between tasks. This confuses Mercurial's lock disambiguation mechanism and prevents Mercurial from reclaiming the lock. This leads to repositories getting wedged. This commit adds detection for locks before we attempt any Mercurial operation. If a lock is present, we assume it was left over from a dead process and nuke the repository data because we can't easily guarantee the repository isn't corrupt. (In many scenarios a SIGKILL'd process will corrupt the repository.) It is much easier to just re-clone the repo or create a new working directory than to validate the state of the repo. And because we use streaming clones, it is probably faster too! MozReview-Commit-ID: 7PjWD8BQQDh
hgext/robustcheckout/__init__.py
hgext/robustcheckout/tests/test-locks.t
--- a/hgext/robustcheckout/__init__.py
+++ b/hgext/robustcheckout/__init__.py
@@ -188,18 +188,34 @@ def _docheckout(ui, url, dest, upstream,
 
     def callself():
         return _docheckout(ui, url, dest, upstream, revision, branch, purge,
                            sharebase, networkattemptlimit, networkattempts)
 
     ui.write('ensuring %s@%s is available at %s\n' % (url, revision or branch,
                                                       dest))
 
+    # We assume that we're the only process on the machine touching the
+    # repository paths that we were told to use. This means our recovery
+    # scenario when things aren't "right" is to just nuke things and start
+    # from scratch. This is easier to implement than verifying the state
+    # of the data and attempting recovery. And in some scenarios (such as
+    # potential repo corruption), it is probably faster, since verifying
+    # repos can take a while.
+
     destvfs = getvfs()(dest, audit=False, realpath=True)
 
+    def deletesharedstore(path=None):
+        storepath = path or destvfs.read('.hg/sharedpath').strip()
+        if storepath.endswith('.hg'):
+            storepath = os.path.dirname(storepath)
+
+        storevfs = getvfs()(storepath, audit=False)
+        storevfs.rmtree(forcibly=True)
+
     if destvfs.exists() and not destvfs.exists('.hg'):
         raise error.Abort('destination exists but no .hg directory')
 
     # Require checkouts to be tied to shared storage because efficiency.
     if destvfs.exists('.hg') and not destvfs.exists('.hg/sharedpath'):
         ui.warn('(destination is not shared; deleting)\n')
         destvfs.rmtree(forcibly=True)
 
@@ -212,26 +228,33 @@ def _docheckout(ui, url, dest, upstream,
         if not os.path.exists(storepath):
             ui.warn('(shared store does not exist; deleting destination)\n')
             destvfs.rmtree(forcibly=True)
         elif not re.search('[a-f0-9]{40}/\.hg$', storepath.replace('\\', '/')):
             ui.warn('(shared store does not belong to pooled storage; '
                     'deleting destination to improve efficiency)\n')
             destvfs.rmtree(forcibly=True)
 
+        storevfs = getvfs()(storepath, audit=False)
+        if storevfs.isfileorlink('store/lock'):
+            ui.warn('(shared store has an active lock; assuming it is left '
+                    'over from a previous process and that the store is '
+                    'corrupt; deleting store and destination just to be '
+                    'sure)\n')
+            destvfs.rmtree(forcibly=True)
+            deletesharedstore(storepath)
+
         # FUTURE when we require generaldelta, this is where we can check
         # for that.
 
-    def deletesharedstore():
-        storepath = destvfs.read('.hg/sharedpath').strip()
-        if storepath.endswith('.hg'):
-            storepath = os.path.dirname(storepath)
-
-        storevfs = getvfs()(storepath, audit=False)
-        storevfs.rmtree(forcibly=True)
+    if destvfs.isfileorlink('.hg/wlock'):
+        ui.warn('(dest has an active working directory lock; assuming it is '
+                'left over from a previous process and that the destination '
+                'is corrupt; deleting it just to be sure)\n')
+        destvfs.rmtree(forcibly=True)
 
     def handlerepoerror(e):
         if e.message == _('abandoned transaction found'):
             ui.warn('(abandoned transaction found; trying to recover)\n')
             repo = hg.repository(ui, dest)
             if not repo.recover():
                 ui.warn('(could not recover repo state; '
                         'deleting shared store)\n')
--- a/hgext/robustcheckout/tests/test-locks.t
+++ b/hgext/robustcheckout/tests/test-locks.t
@@ -31,22 +31,25 @@
 
 Simulate a held lock on the working directory for a no-op pull but working
 directory update.
 
   $ hg -R wdirlock acquirewlock
   $ readlink wdirlock/.hg/wlock
   dummyhost:* (glob)
 
-  $ hg --config ui.timeout=1 robustcheckout http://localhost:$HGPORT/repo0 wdirlock --revision aada1b3e573f
+  $ hg robustcheckout http://localhost:$HGPORT/repo0 wdirlock --revision aada1b3e573f
   ensuring http://localhost:$HGPORT/repo0@aada1b3e573f is available at wdirlock
   (existing repository shared store: $TESTTMP/share/b8b78f0253d822e33ba652fd3d80a5c0837cfdf3/.hg)
-  waiting for lock on working directory of wdirlock held by * (glob)
-  abort: working directory of wdirlock: timed out waiting for lock held by * (glob)
-  [255]
+  (dest has an active working directory lock; assuming it is left over from a previous process and that the destination is corrupt; deleting it just to be sure)
+  (sharing from existing pooled repository b8b78f0253d822e33ba652fd3d80a5c0837cfdf3)
+  searching for changes
+  no changes found
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  updated to aada1b3e573f7272bb2ef93b34acbf0f77c69d44
 
 Simulate a held lock on the working directory for a store pull and working
 directory update.
 
   $ hg -q robustcheckout http://localhost:$HGPORT/repo0 wdirlock-pull --revision 5d6cdc75a09b
   ensuring http://localhost:$HGPORT/repo0@5d6cdc75a09b is available at wdirlock-pull
   updated to 5d6cdc75a09bcccf76f9339a28e1d89360c59dce
 
@@ -54,38 +57,52 @@ directory update.
   $ touch newfile
   $ hg -q commit -A -m 'add newfile'
   $ cd ../..
 
   $ hg -R wdirlock-pull acquirewlock
   $ readlink wdirlock-pull/.hg/wlock
   dummyhost:* (glob)
 
-  $ hg --config ui.timeout=1 robustcheckout http://localhost:$HGPORT/repo0 wdirlock-pull --revision a7c4155bc8eb
+  $ hg robustcheckout http://localhost:$HGPORT/repo0 wdirlock-pull --revision a7c4155bc8eb
   ensuring http://localhost:$HGPORT/repo0@a7c4155bc8eb is available at wdirlock-pull
   (existing repository shared store: $TESTTMP/share/b8b78f0253d822e33ba652fd3d80a5c0837cfdf3/.hg)
-  (pulling to obtain a7c4155bc8eb)
-  waiting for lock on working directory of wdirlock-pull held by * (glob)
-  abort: working directory of wdirlock-pull: timed out waiting for lock held by * (glob)
-  [255]
+  (dest has an active working directory lock; assuming it is left over from a previous process and that the destination is corrupt; deleting it just to be sure)
+  (sharing from existing pooled repository b8b78f0253d822e33ba652fd3d80a5c0837cfdf3)
+  searching for changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 1 changesets with 1 changes to 1 files
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  updated to a7c4155bc8eb86ecec78c91f744f597e7c9a3ff3
 
 Simulate a held lock on the store for a no-op pull and working directory
 update. This should work because no store update is needed so no lock needs
 to be acquired.
 
   $ hg -q robustcheckout http://localhost:$HGPORT/repo1 storelock --revision 7d5b54cb09e1
   ensuring http://localhost:$HGPORT/repo1@7d5b54cb09e1 is available at storelock
   updated to 7d5b54cb09e1172a3684402520112cab3f3a1b70
   $ hg -R storelock acquirestorelock
   $ readlink share/65cd4e3b46a3f22a08ec4162871e67f57c322f6a/.hg/store/lock
   dummyhost:* (glob)
 
-  $ hg --config ui.timeout=1 robustcheckout http://localhost:$HGPORT/repo1 storelock --revision 65cd4e3b46a3
+  $ hg robustcheckout http://localhost:$HGPORT/repo1 storelock --revision 65cd4e3b46a3
   ensuring http://localhost:$HGPORT/repo1@65cd4e3b46a3 is available at storelock
   (existing repository shared store: $TESTTMP/share/65cd4e3b46a3f22a08ec4162871e67f57c322f6a/.hg)
+  (shared store has an active lock; assuming it is left over from a previous process and that the store is corrupt; deleting store and destination just to be sure)
+  (sharing from new pooled repository 65cd4e3b46a3f22a08ec4162871e67f57c322f6a)
+  requesting all changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 2 changes to 1 files
+  searching for changes
+  no changes found
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
   updated to 65cd4e3b46a3f22a08ec4162871e67f57c322f6a
 
 Clean up for next test
 
   $ rm -rf share/65cd4e3b46a3f22a08ec4162871e67f57c322f6a
 
 Simulate a held lock on the store for a pull plus working directory update.
@@ -97,15 +114,22 @@ Simulate a held lock on the store for a 
   $ readlink share/65cd4e3b46a3f22a08ec4162871e67f57c322f6a/.hg/store/lock
   dummyhost:* (glob)
 
   $ cd server/repo1
   $ touch newfile
   $ hg -q commit -A -m 'add newfile'
   $ cd ../..
 
-  $ hg --config ui.timeout=1 robustcheckout http://localhost:$HGPORT/repo1 storelock-pull --revision fca136d824da
+  $ hg robustcheckout http://localhost:$HGPORT/repo1 storelock-pull --revision fca136d824da
   ensuring http://localhost:$HGPORT/repo1@fca136d824da is available at storelock-pull
   (existing repository shared store: $TESTTMP/share/65cd4e3b46a3f22a08ec4162871e67f57c322f6a/.hg)
-  (pulling to obtain fca136d824da)
-  waiting for lock on repository storelock-pull held by * (glob)
-  abort: repository storelock-pull: timed out waiting for lock held by * (glob)
-  [255]
+  (shared store has an active lock; assuming it is left over from a previous process and that the store is corrupt; deleting store and destination just to be sure)
+  (sharing from new pooled repository 65cd4e3b46a3f22a08ec4162871e67f57c322f6a)
+  requesting all changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 3 changesets with 3 changes to 2 files
+  searching for changes
+  no changes found
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  updated to fca136d824dac41b19345549edfda68fe63213c4