Bug 1402010 - Only evaluate Files() context managers in files reading mode draft
authorGregory Szorc <gps@mozilla.com>
Thu, 18 Jan 2018 23:26:24 -0800
changeset 722534 25225531fd4dffb05517f298efd9ea84b593136f
parent 722533 e4eb0f7bf9254593b63ba450179ff18f673c4b0b
child 746630 f3c8a6c214ca379f2d31b74462b8904061901440
push id96166
push userbmo:gps@mozilla.com
push dateFri, 19 Jan 2018 07:29:47 +0000
bugs1402010
milestone59.0a1
Bug 1402010 - Only evaluate Files() context managers in files reading mode THIS COMMIT CURRENTLY FAILS TESTS AND SHOULD NOT BE CHECKED IN. Reading Files() metadata is super useful. However, in order to read moz.build files, you need to evaluate Python code. And the context of that evaluation needs access to known primitives in the moz.build sandbox. So, if you have old moz.build reading code and you want to read an arbitrary moz.build file, you may run into issues where that moz.build file uses a sandbox symbol or primitive unknown to the old moz.build reading code. This will result in a run-time failure. Some may say "but just use the moz.build reading code in the repository!" This works - but only if you trust the moz.build reading code in the repository. In some contexts - say on headless servers - we want to extract relevant moz.build data for untrusted content. In this case, the best we can do is use trusted code for reading moz.build files, sandbox the execution, and hope for the best. The big problem with that is that any commit may introduce a new moz.build sandbox primitive and the trusted code may not be able to read a new moz.build file using that newer primitive! That leads us to this commit. This commit changes the way moz.build files are evaluated in file-traversal mode (which cares about reading the content of Files() context managers). When the moz.build file is loaded, instead of evaluating the code as-is, we parse it to an AST, strip out root nodes that aren't `with Files()`, and execute that. Essentially, we strip the moz.build file down to only the bits we care about. This reduces the surface area of sandbox changes that can cause Files() reading incompatibility down to changes to Files() itself. And if we wanted to go a step further, we could strip AST nodes under Files() that belong to variables that are unknown to the reader. This is left as a follow-up feature. The way this is implemented at the sandbox layer (which is generic) is by specifying a filter of context manager names that evaluation will be reduced to. From a higher level, we pass ['Files']. A lot of code that hacked up the evaluation context for purposes of Files() evaluation has been removed because none of that code runs any more. Open Issues: * Tests fail * If `with Files()` isn't a top-level AST node, we strip it. We probably want a lint check that ensures this pattern is never used. * `with Files(), Files()` may not work properly. It definitely needs tests. * If a `with Files()` references a symbol defined outside the context manager, evaluation fails due to the symbol being stripped. We could use static analysis and avoid this pattern (at the cost of convenience in moz.build files). * Should we allow `include()`? If we do, we need to ensure that file's content is filtered as well. * Consider setting the "only context managers" mode on the MozbuildSandbox instance so all calls to exec_file() contain the filter. Otherwise, some inner caller (like include()) may call exec_file() without the AST stripping and our feature will be defeated. I'm submitting this for consideration to see if there is interest in this approach. If not, then the only viable alternative I see for the futures of Files() metadata if we want trusted execution is to extract Files() to separate files that can be statically evaluated without executing Python. This *is* probably the best path forward. But it fragments data between moz.build and *something else*. MozReview-Commit-ID: 5dgwpHHZdye
python/mozbuild/mozbuild/frontend/reader.py
python/mozbuild/mozbuild/frontend/sandbox.py
python/mozbuild/mozbuild/test/frontend/test_sandbox.py
--- a/python/mozbuild/mozbuild/frontend/reader.py
+++ b/python/mozbuild/mozbuild/frontend/reader.py
@@ -180,17 +180,17 @@ class MozbuildSandbox(Sandbox):
     We expose a few useful functions and expose the set of variables defining
     Mozilla's build system.
 
     context is a Context instance.
 
     metadata is a dict of metadata that can be used during the sandbox
     evaluation.
     """
-    def __init__(self, context, metadata={}, finder=default_finder):
+    def __init__(self, context, metadata=None, finder=default_finder):
         assert isinstance(context, Context)
         metadata = metadata or {}
 
         Sandbox.__init__(self, context, finder=finder)
 
         self._log = logging.getLogger(__name__)
 
         self.metadata = dict(metadata)
@@ -229,30 +229,31 @@ class MozbuildSandbox(Sandbox):
             raise KeyError('Cannot set "%s" because it is a reserved keyword'
                            % key)
         if key in self.exports:
             self._context[key] = value
             self.exports.remove(key)
             return
         Sandbox.__setitem__(self, key, value)
 
-    def exec_file(self, path):
+    def exec_file(self, path, only_context_managers=None):
         """Override exec_file to normalize paths and restrict file loading.
 
         Paths will be rejected if they do not fall under topsrcdir or one of
         the external roots.
         """
 
         # realpath() is needed for true security. But, this isn't for security
         # protection, so it is omitted.
         if not is_read_allowed(path, self._context.config):
             raise SandboxLoadError(self._context.source_stack,
                 sys.exc_info()[2], illegal_path=path)
 
-        Sandbox.exec_file(self, path)
+        Sandbox.exec_file(self, path,
+                          only_context_managers=only_context_managers)
 
     def _add_java_jar(self, name):
         """Add a Java JAR build target."""
         if not name:
             raise Exception('Java JAR cannot be registered without a name')
 
         if '/' in name or '\\' in name or '.jar' in name:
             raise Exception('Java JAR names must not include slashes or'
@@ -1026,17 +1027,18 @@ class BuildReader(object):
                 source = fh.read()
 
             tree = ast.parse(source, full)
             Visitor().visit(tree)
 
             for name, key, value in assignments:
                 yield p, name, key, value
 
-    def read_mozbuild(self, path, config, descend=True, metadata=None):
+    def read_mozbuild(self, path, config, descend=True, metadata=None,
+                      only_context_managers=None):
         """Read and process a mozbuild file, descending into children.
 
         This starts with a single mozbuild file, executes it, and descends into
         other referenced files per our traversal logic.
 
         The traversal logic is to iterate over the *DIRS variables, treating
         each element as a relative directory path. For each encountered
         directory, we will open the moz.build file located in that
@@ -1046,23 +1048,30 @@ class BuildReader(object):
         directories and files per variable values.
 
         Arbitrary metadata in the form of a dict can be passed into this
         function. This feature is intended to facilitate the build reader
         injecting state and annotations into moz.build files that is
         independent of the sandbox's execution context.
 
         Traversal is performed depth first (for no particular reason).
+
+        If ``only_context_managers`` is set to an iterable, only context
+        managers calling a function of a name in that iterable will be
+        evaluated in the moz.build file. This works by stripping the AST of
+        top-level nodes that aren't context managers that call something having
+        a name in the specified iterable.
         """
         metadata = metadata or {}
 
         self._execution_stack.append(path)
         try:
             for s in self._read_mozbuild(path, config, descend=descend,
-                                         metadata=metadata):
+                                         metadata=metadata,
+                                         only_context_managers=only_context_managers):
                 yield s
 
         except BuildReaderError as bre:
             raise bre
 
         except SandboxCalledError as sce:
             raise BuildReaderError(list(self._execution_stack),
                 sys.exc_info()[2], sandbox_called_error=sce)
@@ -1078,17 +1087,18 @@ class BuildReader(object):
         except SandboxValidationError as ve:
             raise BuildReaderError(list(self._execution_stack),
                 sys.exc_info()[2], validation_error=ve)
 
         except Exception as e:
             raise BuildReaderError(list(self._execution_stack),
                 sys.exc_info()[2], other_error=e)
 
-    def _read_mozbuild(self, path, config, descend, metadata):
+    def _read_mozbuild(self, path, config, descend, metadata,
+                       only_context_managers):
         path = mozpath.normpath(path)
         log(self._log, logging.DEBUG, 'read_mozbuild', {'path': path},
             'Reading file: {path}')
 
         if path in self._read_files:
             log(self._log, logging.WARNING, 'read_already', {'path': path},
                 'File already read. Skipping: {path}')
             return
@@ -1115,17 +1125,18 @@ class BuildReader(object):
             config = ConfigEnvironment.from_config_status(
                 mozpath.join(topobjdir, reldir, 'config.status'))
             config.topobjdir = topobjdir
             config.external_source_dir = None
 
         context = Context(VARIABLES, config, self.finder)
         sandbox = MozbuildSandbox(context, metadata=metadata,
                                   finder=self.finder)
-        sandbox.exec_file(path)
+
+        sandbox.exec_file(path, only_context_managers=only_context_managers)
         self._execution_time += time.time() - time_start
         self._file_count += len(context.all_paths)
 
         # Yield main context before doing any processing. This gives immediate
         # consumers an opportunity to change state before our remaining
         # processing is performed.
         yield context
 
@@ -1202,17 +1213,18 @@ class BuildReader(object):
                 raise SandboxValidationError(
                     'Attempting to process file outside of allowed paths: %s' %
                         child_path, context)
 
             if not descend:
                 continue
 
             for res in self.read_mozbuild(child_path, context.config,
-                                          metadata=child_metadata):
+                                          metadata=child_metadata,
+                                          only_context_managers=only_context_managers):
                 yield res
 
         self._execution_stack.pop()
 
     def _find_relevant_mozbuilds(self, paths):
         """Given a set of filesystem paths, find all relevant moz.build files.
 
         We assume that a moz.build file in the directory ancestry of a given path
@@ -1287,37 +1299,24 @@ class BuildReader(object):
 
             for i, mbpath in enumerate(mbpaths[0:-1]):
                 source_dir = mozpath.dirname(mbpath)
                 target_dir = mozpath.dirname(mbpaths[i + 1])
 
                 d = mozpath.normpath(mozpath.join(topsrcdir, mbpath))
                 dirs[d].add(mozpath.relpath(target_dir, source_dir))
 
-        # Exporting doesn't work reliably in tree traversal mode. Override
-        # the function to no-op.
-        functions = dict(FUNCTIONS)
-        def export(sandbox):
-            return lambda varname: None
-        functions['export'] = tuple([export] + list(FUNCTIONS['export'][1:]))
-
-        metadata = {
-            'functions': functions,
-        }
-
         contexts = defaultdict(list)
         all_contexts = []
         for context in self.read_mozbuild(mozpath.join(topsrcdir, 'moz.build'),
-                                          self.config, metadata=metadata):
-            # Explicitly set directory traversal variables to override default
-            # traversal rules.
+                                          self.config,
+                                          only_context_managers=['Files']):
+            # Explicitly set directory traversal variables so we traverse
+            # to desired children.
             if not isinstance(context, SubContext):
-                for v in ('DIRS', 'GYP_DIRS'):
-                    context[v][:] = []
-
                 context['DIRS'] = sorted(dirs[context.main_path])
 
             contexts[context.main_path].append(context)
             all_contexts.append(context)
 
         result = {}
         for path, paths in path_mozbuilds.items():
             result[path] = reduce(lambda x, y: x + y, (contexts[p] for p in paths), [])
--- a/python/mozbuild/mozbuild/frontend/sandbox.py
+++ b/python/mozbuild/mozbuild/frontend/sandbox.py
@@ -14,16 +14,17 @@ the execution.
 Code in this module takes a different approach to exception handling compared
 to what you'd see elsewhere in Python. Arguments to built-in exceptions like
 KeyError are machine parseable. This machine-friendly data is used to present
 user-friendly error messages in the case of errors.
 """
 
 from __future__ import absolute_import, unicode_literals
 
+import ast
 import os
 import sys
 import weakref
 
 from mozbuild.util import (
     exec_,
     ReadOnlyDict,
 )
@@ -139,57 +140,113 @@ class Sandbox(dict):
         self._current_source = None
 
         self._finder = finder
 
     @property
     def _context(self):
         return self._active_contexts[-1]
 
-    def exec_file(self, path):
+    def exec_file(self, path, only_context_managers=None):
         """Execute code at a path in the sandbox.
 
         The path must be absolute.
         """
         assert os.path.isabs(path)
 
         try:
             source = self._finder.get(path).read()
         except Exception as e:
             raise SandboxLoadError(self._context.source_stack,
                 sys.exc_info()[2], read_error=path)
 
-        self.exec_source(source, path)
+        self.exec_source(source, path,
+                         only_context_managers=only_context_managers)
 
-    def exec_source(self, source, path=''):
+    def exec_source(self, source, path='', only_context_managers=None):
         """Execute Python code within a string.
 
         The passed string should contain Python code to be executed. The string
         will be compiled and executed.
 
         You should almost always go through exec_file() because exec_source()
         does not perform extra path normalization. This can cause relative
         paths to behave weirdly.
         """
         def execute():
-            # compile() inherits the __future__ from the module by default. We
-            # do want Unicode literals.
-            code = compile(source, path, 'exec')
+            # Due to closure weirdness, we can't reassign `source` here.
+            if only_context_managers:
+                src = self._filter_context_managers(source, path,
+                                                    only_context_managers)
+            else:
+                src = source
+
+            code = compile(src, path, 'exec')
             # We use ourself as the global namespace for the execution. There
             # is no need for a separate local namespace as moz.build execution
             # is flat, namespace-wise.
             old_source = self._current_source
             self._current_source = source
             try:
                 exec_(code, self)
             finally:
                 self._current_source = old_source
 
         self.exec_function(execute, path=path)
 
+    def _filter_context_managers(self, source, path, names):
+        """Filters out ast nodes that aren't context managers.
+
+        This is called to evaluate moz.build in a constrained mode that
+        only allows access to well-defined primitives. It is intended for
+        use cases where the executing code may have an old version of the
+        moz.build reading code that may not be aware of every primitive in the
+        moz.build file.
+
+        The caller passes a list of names of context manager functions
+        that are recognized. All AST elements not context managers
+        calling functions with the specified names are filtered out of the
+        AST.
+
+        For example, if ``names == ['Files']``, then only code that
+        looks like the following will be in the returned AST:
+
+            with Files(...):
+                ...
+        """
+        names = set(names)
+
+        def fltr(node):
+            # A ``with`` statement.
+            if not isinstance(node, ast.With):
+                return False
+
+            # That calls something.
+            if not isinstance(node.context_expr, ast.Call):
+                return False
+
+            # That has a name.
+            if not isinstance(node.context_expr.func, ast.Name):
+                return False
+
+            # That is in our allow list.
+            return node.context_expr.func.id in names
+
+        # unicode_literals needs to be passed as a flag to compile().
+        # ast.parse() is a wrapper to compile(), so call compile()
+        # directly.
+        code = compile(source, path, 'exec',
+                       unicode_literals.compiler_flag | ast.PyCF_ONLY_AST)
+
+        # TODO use a node transformer instead. This will ensure line
+        # numbers are adjusted, etc.
+        code.body = filter(fltr, code.body)
+
+        return code
+
     def exec_function(self, func, args=(), kwargs={}, path='',
                       becomes_current_path=True):
         """Execute function with the given arguments in the sandbox.
         """
         if path and becomes_current_path:
             self._context.push_source(path)
 
         old_sandbox = self._context._sandbox
--- a/python/mozbuild/mozbuild/test/frontend/test_sandbox.py
+++ b/python/mozbuild/mozbuild/test/frontend/test_sandbox.py
@@ -134,19 +134,20 @@ class TestedSandbox(MozbuildSandbox):
             mozpath.join(self._context.config.topsrcdir, path))
 
     def source_path(self, path):
         return SourcePath(self._context, path)
 
     def exec_file(self, path):
         super(TestedSandbox, self).exec_file(self.normalize_path(path))
 
-    def exec_source(self, source, path=''):
+    def exec_source(self, source, path='', only_context_managers=None):
         super(TestedSandbox, self).exec_source(source,
-            self.normalize_path(path) if path else '')
+            self.normalize_path(path) if path else '',
+            only_context_managers=only_context_managers)
 
 
 class TestMozbuildSandbox(unittest.TestCase):
     def sandbox(self, data_path=None, metadata={}):
         config = None
 
         if data_path is not None:
             config = MockConfig(mozpath.join(test_data_path, data_path))