Bug 1271035 - Disable Places during reftests, preventing 50 GB of I/O; r?dholbert draft
authorGregory Szorc <gps@mozilla.com>
Sat, 07 May 2016 02:53:14 -0700
changeset 364654 626374e5bd48d42ae76c9a43e042ade19d8ddcb6
parent 364611 70dfd4254cf977be2000b41f7a6b97eaada8e2a0
child 520348 69acba38a216dd435048e03c7d454e2009586b53
push id17523
push usergszorc@mozilla.com
push dateSat, 07 May 2016 09:53:27 +0000
reviewersdholbert
bugs1271035
milestone49.0a1
Bug 1271035 - Disable Places during reftests, preventing 50 GB of I/O; r?dholbert It was a cold Friday night in San Francisco. Earlier in the day, I informed Chris AtLee that I was going to start focusing on improving the efficiency of Firefox automation and asked him where the biggest capacity issues were. He said "we're hurting most on Windows tests." As I was casually drinking a barleywine (note to reader: barleywines are serious beers - there's nothing casual about them), I found myself tediously clicking through Treeherder looking at logs for Windows jobs, looking for patterns and other oddities. As I was clicking through, something stood out to me: the sheer number of reftest jobs. I recalled a random project I started a few years ago. Its aim was to analyze buildbot job metadata so we could better understand where time was spent in automation. I had mostly written off the side project as a failure and a near complete waste of my time. Not even a stray random thought of this project had entered my mind in the past year. But clicking through Treeherder after a few glasses of barleywine somehow reminded me of one of the few useful findings of that project: reftest jobs consumed a seemingly disproportiate amount of machine time, something like 35 or 40% IIRC of the time spent on all jobs. Now, this finding was several years ago and almost certainly no longer relevant. But, again, I had a few glasses of barleywine in me and was bothered by the amount of reftest jobs and their duration, so I thought "hey, why don't I run reftests and see why they take so long." So I built Firefox on Windows - the platform Chris AtLee said we're "hurting most on." I decided to start my very casual profiling session by recording a `mach reftest` run using Sysinternals Process Monitor. To my surprise, it yielded a very obvious and unexpected result: the Places SQLite database was incurring a lot of I/O. On my i7-6700K Windows 10 desktop with a high performance SSD, `mach reftest` yielded the following: File Time #Events #Reads #Writes RBytes WBytes Path 198s 980,872 243,270 669,231 7,971,471,360 20,667,084,080 places.sqlite-wal 165s 645,853 222,407 367,775 7,287,701,820 14.071,529,472 places.sqlite 2s 377,121 1 0 32,768 0 places.sqlite-shm The Places SQLite database accounts for 2,003,846 of the total of 3,547,527 system calls (56.49%) recorded by procmon during `mach reftest` execution. This represents a staggering 49,997,786,732 of the 50,307,660,589 (99.38%) bytes of I/O recorded! Yes, that's 50 GB. I reckon the reason the Places database accumulates so much I/O load during reftests is because the reftest suite essentially loads thousands of pages as quickly as possible. This effectively performs a stress test against the Places history recording service. As effective as reftests are at stress-testing Places, it adds no value to reftests because reftests are testing the layout features, not the performance of history recording. So these 2M system calls and 50 GB of I/O are overhead. This commit disables Places when executing reftests and prevents the overhead. After this commit, `mach reftest` has significantly reduced interaction with the Places SQLite database: File Time #Events #Reads #Writes RBytes WBytes Path 0.09s 502 138 302 4,521,984 8,961,528 places.sqlite-wal 0.07s 254 20 140 524,604 8,126,464 places.sqlite 0.01s 3,289 1 0 32,768 0 places.sqlite-shm Of the 948,033 system calls recorded with this change (26.7% of original), 691,322 were related to I/O. The Places SQLite database only consumed ~22MB of I/O, <0.01% of original. It's worth noting that ~half of the remaining I/O system calls are related to reftest.log, which now accounts for the largest percentage of write I/O (only ~53 MB, however). It's worth noting that reftest.log appears to be using unbuffered writes and is requiring an excessive amount of system calls for writing. But that's for another bug and commit. In terms of wall time, the drastic I/O reduction during `mach reftest` appears to have minimal impact on my machine: maybe 30s shaved from a ~900s execution, or ~3%. But my machine with its modern SSD doesn't struggle with I/O. In automation, it is a different story. I pushed this change to Try along with the base revision and triggered 4 runs of most reftest jobs. The runtime improvements in automation are impressive. Here are the fastest reported times for various jobs: Job Before After Delta Linux Opt R1 31m 34m +3m Linux Opt R2 43m 35m -8m Linux Opt Ru1 40m 34m -6m Linux Opt Ru2 43m 37m -6m Linux Opt R E10s 89m 72m -17m Linux Debug R1 52m 40m -12m Linux Debug R2 49m 42m -7m Linux Debug R3 60m 51m -9m Linux Debug R4 42m 37m -5m Linux Debug R1 E10s 84m 72m -12m Linux Debug R2 E10s 97m 85m -12m Linux64 Opt R1 35m 24m -11m Linux64 Opt R2 37m 26m -11m Linux64 Opt Ru1 32m 29m -3m Linux64 Opt Ru2 37m 26m -12m Linux64 Opt TC R1 12m 10m -2m Linux64 Opt TC R2 10m 7m -3m Linux64 Opt TC R3 11m 9m -2m Linux64 Opt TC R4 11m 9m -2m Linux64 Opt TC R5 13m 11m -2m Linux64 Opt TC R6 11m 9m -2m Linux64 Opt TC R7 9m 8m -1m Linux64 Opt TC R8 11m 10m -1m Linux64 Opt TC Ru1 30m 25m -5m Linux64 Opt TC Ru2 36m 27m -11m OS X 10.10 Opt 31m 27m -4m OS X 10.10 Opt E10s 26m 25m -1m OS X 10.10 Debug 68m 55m -13m Win7 Opt R 30m 28m -2m Win7 Opt Ru 28m 26m -2m Win7 Opt R E10S 29m 27m -2m Win7 Debug R 85m 76m -9m Win7 Debug R E10S 75m 65m -10m Win8 x64 Opt R 29m 26m -3m Win8 x64 Opt Ru 27m 25m -2m Win8 x64 Debug R 90m 71m -19m Android 4.3 API15 Opt R1 89m 71m -18m Android 4.3 API15 Opt R2 78m 64m -14m Android 4.3 API15 Opt R3 75m 64m -11m Android 4.3 API15 Opt R4 74m 68m -6m Android 4.3 API15 Opt R5 75m 69m -6m Android 4.3 API15 Opt R6 91m 86m -5m Android 4.3 API15 Opt R7 87m 66m -21m Android 4.3 API15 Opt R8 87m 82m -5m Android 4.3 API15 Opt R9 80m 66m -14m Android 4.3 API15 Opt R10 80m 67m -13m Android 4.3 API15 Opt R11 73m 66m -7m Android 4.3 API15 Opt R12 105m 91m -14m Android 4.3 API15 Opt R13 72m 59m -13m Android 4.3 API15 Opt R14 82m 61m -21m Android 4.3 API15 Opt R15 73m 62m -11m Android 4.3 API15 Opt R16 79m 78m -1m The savings total 6+ *hours* or ~15% when running all reftests. I'd say this isn't bad for a one-line code change! MozReview-Commit-ID: H1LkACgSpVn
layout/tools/reftest/reftest-preferences.js
--- a/layout/tools/reftest/reftest-preferences.js
+++ b/layout/tools/reftest/reftest-preferences.js
@@ -112,8 +112,13 @@ user_pref("startup.homepage_welcome_url"
 user_pref("startup.homepage_welcome_url.additional", "");
 user_pref("startup.homepage_override_url", "");
 user_pref("browser.usedOnWindows10.introURL", "");
 
 user_pref("media.gmp-manager.url.override", "http://localhost/dummy-gmp-manager.xml");
 
 // A fake bool pref for "@supports -moz-bool-pref" sanify test.
 user_pref("testing.supports.moz-bool-pref", true);
+
+// Reftests load a lot of URLs very quickly. This puts avoidable and
+// unnecessary I/O pressure on the Places DB (measured to be in the
+// gigabytes).
+user_pref("places.history.enabled", false);
\ No newline at end of file