Bug 1402010 - Only evaluate Files() context managers in files reading mode
THIS COMMIT CURRENTLY FAILS TESTS AND SHOULD NOT BE CHECKED IN.
Reading Files() metadata is super useful. However, in order to read
moz.build files, you need to evaluate Python code. And the context
of that evaluation needs access to known primitives in the moz.build
sandbox. So, if you have old moz.build reading code and you want to
read an arbitrary moz.build file, you may run into issues where that
moz.build file uses a sandbox symbol or primitive unknown to the old
moz.build reading code. This will result in a run-time failure.
Some may say "but just use the moz.build reading code in the
repository!" This works - but only if you trust the moz.build reading
code in the repository. In some contexts - say on headless servers -
we want to extract relevant moz.build data for untrusted content. In
this case, the best we can do is use trusted code for reading moz.build
files, sandbox the execution, and hope for the best. The big problem
with that is that any commit may introduce a new moz.build sandbox
primitive and the trusted code may not be able to read a new moz.build
file using that newer primitive!
That leads us to this commit.
This commit changes the way moz.build files are evaluated in
file-traversal mode (which cares about reading the content of Files()
context managers). When the moz.build file is loaded, instead of
evaluating the code as-is, we parse it to an AST, strip out root
nodes that aren't `with Files()`, and execute that. Essentially, we
strip the moz.build file down to only the bits we care about. This
reduces the surface area of sandbox changes that can cause Files()
reading incompatibility down to changes to Files() itself. And if we
wanted to go a step further, we could strip AST nodes under Files()
that belong to variables that are unknown to the reader. This is
left as a follow-up feature.
The way this is implemented at the sandbox layer (which is generic)
is by specifying a filter of context manager names that evaluation
will be reduced to. From a higher level, we pass ['Files'].
A lot of code that hacked up the evaluation context for purposes of
Files() evaluation has been removed because none of that code runs
any more.
Open Issues:
* Tests fail
* If `with Files()` isn't a top-level AST node, we strip it.
We probably want a lint check that ensures this pattern is never
used.
* `with Files(), Files()` may not work properly. It definitely needs
tests.
* If a `with Files()` references a symbol defined outside the context
manager, evaluation fails due to the symbol being stripped. We
could use static analysis and avoid this pattern (at the cost of
convenience in moz.build files).
* Should we allow `include()`? If we do, we need to ensure that file's
content is filtered as well.
* Consider setting the "only context managers" mode on the MozbuildSandbox
instance so all calls to exec_file() contain the filter. Otherwise,
some inner caller (like include()) may call exec_file() without the
AST stripping and our feature will be defeated.
I'm submitting this for consideration to see if there is interest
in this approach. If not, then the only viable alternative I see
for the futures of Files() metadata if we want trusted execution
is to extract Files() to separate files that can be statically
evaluated without executing Python. This *is* probably the best
path forward. But it fragments data between moz.build and
*something else*.
MozReview-Commit-ID: 5dgwpHHZdye
--- a/python/mozbuild/mozbuild/frontend/reader.py
+++ b/python/mozbuild/mozbuild/frontend/reader.py
@@ -180,17 +180,17 @@ class MozbuildSandbox(Sandbox):
We expose a few useful functions and expose the set of variables defining
Mozilla's build system.
context is a Context instance.
metadata is a dict of metadata that can be used during the sandbox
evaluation.
"""
- def __init__(self, context, metadata={}, finder=default_finder):
+ def __init__(self, context, metadata=None, finder=default_finder):
assert isinstance(context, Context)
metadata = metadata or {}
Sandbox.__init__(self, context, finder=finder)
self._log = logging.getLogger(__name__)
self.metadata = dict(metadata)
@@ -229,30 +229,31 @@ class MozbuildSandbox(Sandbox):
raise KeyError('Cannot set "%s" because it is a reserved keyword'
% key)
if key in self.exports:
self._context[key] = value
self.exports.remove(key)
return
Sandbox.__setitem__(self, key, value)
- def exec_file(self, path):
+ def exec_file(self, path, only_context_managers=None):
"""Override exec_file to normalize paths and restrict file loading.
Paths will be rejected if they do not fall under topsrcdir or one of
the external roots.
"""
# realpath() is needed for true security. But, this isn't for security
# protection, so it is omitted.
if not is_read_allowed(path, self._context.config):
raise SandboxLoadError(self._context.source_stack,
sys.exc_info()[2], illegal_path=path)
- Sandbox.exec_file(self, path)
+ Sandbox.exec_file(self, path,
+ only_context_managers=only_context_managers)
def _add_java_jar(self, name):
"""Add a Java JAR build target."""
if not name:
raise Exception('Java JAR cannot be registered without a name')
if '/' in name or '\\' in name or '.jar' in name:
raise Exception('Java JAR names must not include slashes or'
@@ -1026,17 +1027,18 @@ class BuildReader(object):
source = fh.read()
tree = ast.parse(source, full)
Visitor().visit(tree)
for name, key, value in assignments:
yield p, name, key, value
- def read_mozbuild(self, path, config, descend=True, metadata=None):
+ def read_mozbuild(self, path, config, descend=True, metadata=None,
+ only_context_managers=None):
"""Read and process a mozbuild file, descending into children.
This starts with a single mozbuild file, executes it, and descends into
other referenced files per our traversal logic.
The traversal logic is to iterate over the *DIRS variables, treating
each element as a relative directory path. For each encountered
directory, we will open the moz.build file located in that
@@ -1046,23 +1048,30 @@ class BuildReader(object):
directories and files per variable values.
Arbitrary metadata in the form of a dict can be passed into this
function. This feature is intended to facilitate the build reader
injecting state and annotations into moz.build files that is
independent of the sandbox's execution context.
Traversal is performed depth first (for no particular reason).
+
+ If ``only_context_managers`` is set to an iterable, only context
+ managers calling a function of a name in that iterable will be
+ evaluated in the moz.build file. This works by stripping the AST of
+ top-level nodes that aren't context managers that call something having
+ a name in the specified iterable.
"""
metadata = metadata or {}
self._execution_stack.append(path)
try:
for s in self._read_mozbuild(path, config, descend=descend,
- metadata=metadata):
+ metadata=metadata,
+ only_context_managers=only_context_managers):
yield s
except BuildReaderError as bre:
raise bre
except SandboxCalledError as sce:
raise BuildReaderError(list(self._execution_stack),
sys.exc_info()[2], sandbox_called_error=sce)
@@ -1078,17 +1087,18 @@ class BuildReader(object):
except SandboxValidationError as ve:
raise BuildReaderError(list(self._execution_stack),
sys.exc_info()[2], validation_error=ve)
except Exception as e:
raise BuildReaderError(list(self._execution_stack),
sys.exc_info()[2], other_error=e)
- def _read_mozbuild(self, path, config, descend, metadata):
+ def _read_mozbuild(self, path, config, descend, metadata,
+ only_context_managers):
path = mozpath.normpath(path)
log(self._log, logging.DEBUG, 'read_mozbuild', {'path': path},
'Reading file: {path}')
if path in self._read_files:
log(self._log, logging.WARNING, 'read_already', {'path': path},
'File already read. Skipping: {path}')
return
@@ -1115,17 +1125,18 @@ class BuildReader(object):
config = ConfigEnvironment.from_config_status(
mozpath.join(topobjdir, reldir, 'config.status'))
config.topobjdir = topobjdir
config.external_source_dir = None
context = Context(VARIABLES, config, self.finder)
sandbox = MozbuildSandbox(context, metadata=metadata,
finder=self.finder)
- sandbox.exec_file(path)
+
+ sandbox.exec_file(path, only_context_managers=only_context_managers)
self._execution_time += time.time() - time_start
self._file_count += len(context.all_paths)
# Yield main context before doing any processing. This gives immediate
# consumers an opportunity to change state before our remaining
# processing is performed.
yield context
@@ -1202,17 +1213,18 @@ class BuildReader(object):
raise SandboxValidationError(
'Attempting to process file outside of allowed paths: %s' %
child_path, context)
if not descend:
continue
for res in self.read_mozbuild(child_path, context.config,
- metadata=child_metadata):
+ metadata=child_metadata,
+ only_context_managers=only_context_managers):
yield res
self._execution_stack.pop()
def _find_relevant_mozbuilds(self, paths):
"""Given a set of filesystem paths, find all relevant moz.build files.
We assume that a moz.build file in the directory ancestry of a given path
@@ -1287,37 +1299,24 @@ class BuildReader(object):
for i, mbpath in enumerate(mbpaths[0:-1]):
source_dir = mozpath.dirname(mbpath)
target_dir = mozpath.dirname(mbpaths[i + 1])
d = mozpath.normpath(mozpath.join(topsrcdir, mbpath))
dirs[d].add(mozpath.relpath(target_dir, source_dir))
- # Exporting doesn't work reliably in tree traversal mode. Override
- # the function to no-op.
- functions = dict(FUNCTIONS)
- def export(sandbox):
- return lambda varname: None
- functions['export'] = tuple([export] + list(FUNCTIONS['export'][1:]))
-
- metadata = {
- 'functions': functions,
- }
-
contexts = defaultdict(list)
all_contexts = []
for context in self.read_mozbuild(mozpath.join(topsrcdir, 'moz.build'),
- self.config, metadata=metadata):
- # Explicitly set directory traversal variables to override default
- # traversal rules.
+ self.config,
+ only_context_managers=['Files']):
+ # Explicitly set directory traversal variables so we traverse
+ # to desired children.
if not isinstance(context, SubContext):
- for v in ('DIRS', 'GYP_DIRS'):
- context[v][:] = []
-
context['DIRS'] = sorted(dirs[context.main_path])
contexts[context.main_path].append(context)
all_contexts.append(context)
result = {}
for path, paths in path_mozbuilds.items():
result[path] = reduce(lambda x, y: x + y, (contexts[p] for p in paths), [])
--- a/python/mozbuild/mozbuild/frontend/sandbox.py
+++ b/python/mozbuild/mozbuild/frontend/sandbox.py
@@ -14,16 +14,17 @@ the execution.
Code in this module takes a different approach to exception handling compared
to what you'd see elsewhere in Python. Arguments to built-in exceptions like
KeyError are machine parseable. This machine-friendly data is used to present
user-friendly error messages in the case of errors.
"""
from __future__ import absolute_import, unicode_literals
+import ast
import os
import sys
import weakref
from mozbuild.util import (
exec_,
ReadOnlyDict,
)
@@ -139,57 +140,113 @@ class Sandbox(dict):
self._current_source = None
self._finder = finder
@property
def _context(self):
return self._active_contexts[-1]
- def exec_file(self, path):
+ def exec_file(self, path, only_context_managers=None):
"""Execute code at a path in the sandbox.
The path must be absolute.
"""
assert os.path.isabs(path)
try:
source = self._finder.get(path).read()
except Exception as e:
raise SandboxLoadError(self._context.source_stack,
sys.exc_info()[2], read_error=path)
- self.exec_source(source, path)
+ self.exec_source(source, path,
+ only_context_managers=only_context_managers)
- def exec_source(self, source, path=''):
+ def exec_source(self, source, path='', only_context_managers=None):
"""Execute Python code within a string.
The passed string should contain Python code to be executed. The string
will be compiled and executed.
You should almost always go through exec_file() because exec_source()
does not perform extra path normalization. This can cause relative
paths to behave weirdly.
"""
def execute():
- # compile() inherits the __future__ from the module by default. We
- # do want Unicode literals.
- code = compile(source, path, 'exec')
+ # Due to closure weirdness, we can't reassign `source` here.
+ if only_context_managers:
+ src = self._filter_context_managers(source, path,
+ only_context_managers)
+ else:
+ src = source
+
+ code = compile(src, path, 'exec')
# We use ourself as the global namespace for the execution. There
# is no need for a separate local namespace as moz.build execution
# is flat, namespace-wise.
old_source = self._current_source
self._current_source = source
try:
exec_(code, self)
finally:
self._current_source = old_source
self.exec_function(execute, path=path)
+ def _filter_context_managers(self, source, path, names):
+ """Filters out ast nodes that aren't context managers.
+
+ This is called to evaluate moz.build in a constrained mode that
+ only allows access to well-defined primitives. It is intended for
+ use cases where the executing code may have an old version of the
+ moz.build reading code that may not be aware of every primitive in the
+ moz.build file.
+
+ The caller passes a list of names of context manager functions
+ that are recognized. All AST elements not context managers
+ calling functions with the specified names are filtered out of the
+ AST.
+
+ For example, if ``names == ['Files']``, then only code that
+ looks like the following will be in the returned AST:
+
+ with Files(...):
+ ...
+ """
+ names = set(names)
+
+ def fltr(node):
+ # A ``with`` statement.
+ if not isinstance(node, ast.With):
+ return False
+
+ # That calls something.
+ if not isinstance(node.context_expr, ast.Call):
+ return False
+
+ # That has a name.
+ if not isinstance(node.context_expr.func, ast.Name):
+ return False
+
+ # That is in our allow list.
+ return node.context_expr.func.id in names
+
+ # unicode_literals needs to be passed as a flag to compile().
+ # ast.parse() is a wrapper to compile(), so call compile()
+ # directly.
+ code = compile(source, path, 'exec',
+ unicode_literals.compiler_flag | ast.PyCF_ONLY_AST)
+
+ # TODO use a node transformer instead. This will ensure line
+ # numbers are adjusted, etc.
+ code.body = filter(fltr, code.body)
+
+ return code
+
def exec_function(self, func, args=(), kwargs={}, path='',
becomes_current_path=True):
"""Execute function with the given arguments in the sandbox.
"""
if path and becomes_current_path:
self._context.push_source(path)
old_sandbox = self._context._sandbox
--- a/python/mozbuild/mozbuild/test/frontend/test_sandbox.py
+++ b/python/mozbuild/mozbuild/test/frontend/test_sandbox.py
@@ -134,19 +134,20 @@ class TestedSandbox(MozbuildSandbox):
mozpath.join(self._context.config.topsrcdir, path))
def source_path(self, path):
return SourcePath(self._context, path)
def exec_file(self, path):
super(TestedSandbox, self).exec_file(self.normalize_path(path))
- def exec_source(self, source, path=''):
+ def exec_source(self, source, path='', only_context_managers=None):
super(TestedSandbox, self).exec_source(source,
- self.normalize_path(path) if path else '')
+ self.normalize_path(path) if path else '',
+ only_context_managers=only_context_managers)
class TestMozbuildSandbox(unittest.TestCase):
def sandbox(self, data_path=None, metadata={}):
config = None
if data_path is not None:
config = MockConfig(mozpath.join(test_data_path, data_path))