Bug 1397503 - Vary cache name when using out-of-tree Docker images; r?dustin
We currently vary the cache name for run-task tasks whenever run-task
changes. This allows us to not worry about backwards or forwards
compatibility of caches in run-task tasks.
This strategy doesn't work for out-of-tree Docker images because
the content of run-task cannot be determined at Taskgraph time:
the content of run-task was determined when that Docker image was
built and there is no way to get that content efficiently during
Taskgraph.
So, for out-of-tree Docker images we now vary the cache name by
the Docker image value, which includes its name and a tag or
hash. This means that out-of-tree run-task tasks will get separate
caches for each distinct Docker image.
This isn't ideal. Ideally we would share caches if run-task doesn't
vary between Docker images. But without any way of proving that
at Taskgraph time, we take the safe road and force cache separation.
MozReview-Commit-ID: FMiQBqfvjqW
--- a/taskcluster/taskgraph/transforms/task.py
+++ b/taskcluster/taskgraph/transforms/task.py
@@ -5,16 +5,17 @@
These transformations take a task description and turn it into a TaskCluster
task definition (along with attributes, label, etc.). The input to these
transformations is generic to any kind of task, but abstracts away some of the
complexities of worker implementations, scopes, and treeherder annotations.
"""
from __future__ import absolute_import, print_function, unicode_literals
+import hashlib
import json
import os
import re
import time
from copy import deepcopy
from mozbuild.util import memoize
from taskgraph.util.attributes import TRUNK_PROJECTS
@@ -717,19 +718,36 @@ def build_docker_worker_payload(config,
# run-task knows how to validate caches.
#
# To help ensure new run-task features and bug fixes don't interfere
# with existing caches, we seed the hash of run-task into cache names.
# So, any time run-task changes, we should get a fresh set of caches.
# This means run-task can make changes to cache interaction at any time
# without regards for backwards or future compatibility.
-
+ #
+ # But this mechanism only works for in-tree Docker images that are built
+ # with the current run-task! For out-of-tree Docker images, we have no
+ # way of knowing their content of run-task. So, in addition to varying
+ # cache names by the contents of run-task, we also take the Docker image
+ # name into consideration. This means that different Docker images will
+ # never share the same cache. This is a bit unfortunate. But it is the
+ # safest thing to do. Fortunately, most images are defined in-tree.
+ #
+ # For out-of-tree Docker images, we don't strictly need to incorporate
+ # the run-task content into the cache name. However, doing so preserves
+ # the mechanism whereby changing run-task results in new caches
+ # everywhere.
if run_task:
suffix = '-%s' % _run_task_suffix()
+
+ if out_of_tree_image:
+ name_hash = hashlib.sha256(out_of_tree_image).hexdigest()
+ suffix += name_hash[0:12]
+
else:
suffix = ''
skip_untrusted = config.params['project'] == 'try' or level == 1
for cache in worker['caches']:
# Some caches aren't enabled in environments where we can't
# guarantee certain behavior. Filter those out.