Bug 1247168 - Add special Dockerfile syntax to add arbitrary files to context; r?dustin
A limitation of traditional `docker build` is that it only has access
to files in the same directory as the Dockerfile.
Typically, when you do `docker build`, Docker will create a tar archive
of all files in the same directory as the Dockerfile and upload that to
Docker and the image building process will have access to all files in
the archive.
Over a year ago, I realized you could write some code to create custom
context archives and talk to the Docker build API directly to use your
custom archive. I hacked some code into version-control-tools that
parsed Dockerfiles for special syntax denoting extra paths from the
source checkout to add to the context and proceed to add them to
context archives. This commit essentially copied that code for use
by taskgraph's built-in Docker image building.
Using the syntax "# %include <path>" you are able to include paths
or directories (relative from the top source directory root) in the
generated context archive. Files add this way are available under the
"topsrcdir/" path.
The "lint" image has been changed to use this syntax to add in
in-tree version of tooltool.py (instead of downloading from github.com).
This eliminates a dependency on a third party service and increases
security and determinism. Yay.
In order to write tests, I had to make archiving deterministic. That's
why we no longer use a single "tar.add()" for the Dockerfile directory.
Instead, we obtain the list of files up front, sort them, then add with
uid/gid set to 0, so uid/gid is consistent no matter what it is on the
filesystem performing context creation. More determinism, yay.
I would like to test this feature a bit more. However, the test
environment for custom Docker image building doesn't currently
facilitate custom source paths: it expects Docker files to be in
$topsrcdir/testing/docker. If we add more functionality to this, we
should definitely invest in writing better tests.
MozReview-Commit-ID: 4hPZesJuGQV
new file mode 100644
--- /dev/null
+++ b/taskcluster/docs/docker-images.rst
@@ -0,0 +1,46 @@
+.. taskcluster_dockerimages:
+
+=============
+Docker Images
+=============
+
+TaskCluster Docker images are defined in the source directory under
+``testing/docker``. Each directory therein contains the name of an
+image used as part of the task graph.
+
+Adding Extra Files to Images
+============================
+
+Typically, when Docker images are built, all files from the directory
+containing the Dockerfile are made available to be ``ADD``ed to the
+created image. It is not possible to add files from parent or sibling
+directories. This limitation is annoying and results in duplicate files
+and patterns. So we have worked around it.
+
+Docker images built as part of task graph execution can make **any**
+file under source control available to the ``docker build`` context.
+
+Extra files can be added to the context by specifying special syntax
+in ``Dockerfile``s::
+
+ # %include <path>
+
+e.g.
+
+ # %include mach
+ # %include testing/mozharness/
+
+The ``# %include`` syntax expects a relative path after it. This string
+defines relative paths from the root level of the source directory to
+be included in the image's build context.
+
+If the path ends with a ``/``, it is interpretted as a directory and
+all files under that directory are included.
+
+Files added using ``# %include`` syntax are available inside the build
+context under the ``topsrcdir/`` path.
+
+Here is an example Dockerfile snippet::
+
+ # %include mach
+ ADD topsrcdir/mach /home/worker/mach
--- a/taskcluster/docs/index.rst
+++ b/taskcluster/docs/index.rst
@@ -23,8 +23,9 @@ check out the :doc:`how-to section <how-
taskgraph
parameters
attributes
kinds
transforms
yaml-templates
how-tos
+ docker-images
--- a/taskcluster/taskgraph/task/docker_image.py
+++ b/taskcluster/taskgraph/task/docker_image.py
@@ -75,17 +75,17 @@ class DockerImageTask(base.Task):
image_artifact_path = \
"public/decision_task/image_contexts/{}/context.tar.gz".format(image_name)
if os.environ.get('TASK_ID'):
destination = os.path.join(
os.environ['HOME'],
"artifacts/decision_task/image_contexts/{}/context.tar.gz".format(image_name))
image_parameters['context_url'] = ARTIFACT_URL.format(
os.environ['TASK_ID'], image_artifact_path)
- cls.create_context_tar(context_path, destination, image_name)
+ cls.create_context_tar(GECKO, context_path, destination, image_name)
else:
# skip context generation since this isn't a decision task
# TODO: generate context tarballs using subdirectory clones in
# the image-building task so we don't have to worry about this.
image_parameters['context_url'] = 'file:///tmp/' + image_artifact_path
image_task = templates.load('image.yml', image_parameters)
@@ -127,24 +127,89 @@ class DockerImageTask(base.Task):
# HEAD success on the artifact is enough
return True, existing_task['taskId']
except urllib2.HTTPError:
pass
return False, None
@classmethod
- def create_context_tar(cls, context_dir, destination, image_name):
- 'Creates a tar file of a particular context directory.'
+ def create_context_tar(cls, topsrcdir, context_dir, destination, image_name):
+ """Creates a tar file of a particular context directory.
+
+ We also scan the source Dockerfile for special syntax that influences
+ context generation.
+
+ If a line in the Dockerfile has the form ``# %include <path>``,
+ the relative path specified on that line will be matched against
+ files in the source repository and added to the context under the
+ path ``topsrcdir/``. If an entry ends in a ``/``, we add all files
+ under that directory. Otherwise, we assume it is a literal match and
+ only add a single file.
+ """
+ topsrcdir = os.path.normpath(topsrcdir)
+
destination = os.path.abspath(destination)
if not os.path.exists(os.path.dirname(destination)):
os.makedirs(os.path.dirname(destination))
+ # Gather list of context files first so we can sort to make
+ # behavior deterministic.
+ context_files = {}
+ for root, dirs, files in os.walk(context_dir):
+ for f in files:
+ if f == '.dockerignore':
+ raise Exception('.dockerignore not currently supported')
+
+ fs_path = os.path.join(root, f)
+ archive_path = os.path.join(image_name, fs_path[len(context_dir) + 1:])
+ context_files[archive_path] = fs_path
+
+ # Parse Dockerfile for special syntax of extra files to include.
+ with open(os.path.join(context_dir, 'Dockerfile'), 'rb') as fh:
+ for line in fh:
+ line = line.rstrip()
+ if not line.startswith('# %include'):
+ continue
+
+ p = line[len('# %include '):].strip()
+ if os.path.isabs(p):
+ raise Exception('extra include path cannot be absolute: %s' % p)
+
+ fs_path = os.path.normpath(os.path.join(topsrcdir, p))
+ # Check for filesystem traversal exploits.
+ if not fs_path.startswith(topsrcdir):
+ raise Exception('extra include path outside topsrcdir: %s' % p)
+
+ if not os.path.exists(fs_path):
+ raise Exception('extra include path does not exist: %s' % p)
+
+ if p.endswith('/'):
+ for root, dirs, files in os.walk(fs_path):
+ for f in files:
+ source_path = os.path.join(root, f)
+ archive_path = os.path.join(image_name, 'topsrcdir', p.rstrip('/'), f)
+ context_files[archive_path] = source_path
+ else:
+ archive_path = os.path.join(image_name, 'topsrcdir', p)
+ context_files[archive_path] = fs_path
+
with tarfile.open(destination, 'w:gz') as tar:
- tar.add(context_dir, arcname=image_name)
+ # Sort for determinism.
+ for archive_path, fs_path in sorted(context_files.items()):
+ ti = tar.gettarinfo(fs_path, arcname=archive_path)
+
+ # Make files owned by root:root to improve determinism.
+ # Otherwise the filesystem uid/gid gets inherited by the
+ # archive and applied to the container.
+ ti.uid = 0
+ ti.gid = 0
+
+ with open(fs_path, 'rb') as fh:
+ tar.addfile(ti, fileobj=fh)
@classmethod
def from_json(cls, task_dict):
# Generating index_paths for optimization
routes = task_dict['task']['routes']
index_paths = []
for route in routes:
index_path_regex = re.compile(INDEX_REGEX)
--- a/taskcluster/taskgraph/test/test_task_docker_image.py
+++ b/taskcluster/taskgraph/test/test_task_docker_image.py
@@ -2,16 +2,17 @@
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
from __future__ import absolute_import, print_function, unicode_literals
import unittest
import tempfile
import os
+import tarfile
from ..task import docker_image
from mozunit import main
KIND_PATH = os.path.join(docker_image.GECKO, 'taskcluster', 'ci', 'docker-image')
@@ -29,14 +30,41 @@ class TestDockerImageKind(unittest.TestC
# this one's easy!
self.assertEqual(self.task.get_dependencies(None), [])
# TODO: optimize_task
def test_create_context_tar(self):
image_dir = os.path.join(docker_image.GECKO, 'testing', 'docker', 'image_builder')
tarball = tempfile.mkstemp()[1]
- self.task.create_context_tar(image_dir, tarball, 'image_builder')
+ self.task.create_context_tar(docker_image.GECKO, image_dir, tarball, 'image_builder')
self.failUnless(os.path.exists(tarball))
- os.unlink(tarball)
+
+ try:
+ with tarfile.open(tarball, 'r:gz') as tf:
+ self.assertEqual([ti.name for ti in tf], [
+ 'image_builder/Dockerfile',
+ 'image_builder/REGISTRY',
+ 'image_builder/VERSION',
+ 'image_builder/bin/build_image.sh',
+ ])
+ finally:
+ os.unlink(tarball)
+
+ def test_context_topsrcdir_files(self):
+ image_dir = os.path.join(docker_image.GECKO, 'testing', 'docker', 'lint')
+ tarball = tempfile.mkstemp()[1]
+
+ self.task.create_context_tar(docker_image.GECKO, image_dir, tarball, 'lint')
+ self.failUnless(os.path.exists(tarball))
+ try:
+ with tarfile.open(tarball, 'r:gz') as tf:
+ self.assertEqual([ti.name for ti in tf], [
+ 'lint/Dockerfile',
+ 'lint/bin/checkout-and-run',
+ 'lint/system-setup.sh',
+ 'lint/topsrcdir/testing/docker/decision/tooltool.py',
+ ])
+ finally:
+ os.unlink(tarball)
if __name__ == '__main__':
main()
--- a/testing/docker/lint/Dockerfile
+++ b/testing/docker/lint/Dockerfile
@@ -1,17 +1,17 @@
FROM ubuntu:16.04
MAINTAINER Andrew Halberstadt <ahalberstadt@mozilla.com>
RUN useradd -d /home/worker -s /bin/bash -m worker
WORKDIR /home/worker
-# Install tooltool directly from github.
+# %include testing/docker/decision/tooltool.py
RUN mkdir /build
-ADD https://raw.githubusercontent.com/mozilla/build-tooltool/master/tooltool.py /build/tooltool.py
+ADD topsrcdir/testing/docker/decision/tooltool.py /build/tooltool.py
RUN chmod +rx /build/tooltool.py
# Install lint packages
ADD system-setup.sh /tmp/system-setup.sh
RUN bash /tmp/system-setup.sh
ADD bin /home/worker/bin
RUN chown -R worker:worker /home/worker/bin && chmod 755 /home/worker/bin*