Bug 1413178 - Update webrender to commit fae962bfd6e1997f4b921ee93c3c1cc5abca3256. r?jrmuizel draft
authorKartikaya Gupta <kgupta@mozilla.com>
Fri, 03 Nov 2017 10:28:15 -0400
changeset 692743 e47a625b94f231b98a1a5738c8300d0348eeeb3b
parent 692714 4ada8f0d5cc011af4bc0f4fadbb25ea3e1e62bfc
child 692744 3501819ac8c73c2604e71641e005b96c7ffaac96
child 692748 f7d7ba26ab2c2cbb608f49afe963804b1ba9449f
child 692750 70d92e3d445e302a1d5f88ba36119ee043993e8b
push id87591
push userkgupta@mozilla.com
push dateFri, 03 Nov 2017 14:33:18 +0000
reviewersjrmuizel
bugs1413178
milestone58.0a1
Bug 1413178 - Update webrender to commit fae962bfd6e1997f4b921ee93c3c1cc5abca3256. r?jrmuizel MozReview-Commit-ID: LStxYqdw50U
gfx/doc/README.webrender
gfx/webrender/doc/text-rendering.md
gfx/webrender/res/prim_shared.glsl
gfx/webrender/res/ps_border_corner.glsl
gfx/webrender/res/ps_text_run.glsl
gfx/webrender/src/clip_scroll_node.rs
gfx/webrender/src/clip_scroll_tree.rs
gfx/webrender/src/device.rs
gfx/webrender/src/frame.rs
gfx/webrender/src/frame_builder.rs
gfx/webrender/src/gamma_lut.rs
gfx/webrender/src/glyph_rasterizer.rs
gfx/webrender/src/platform/macos/font.rs
gfx/webrender/src/platform/unix/font.rs
gfx/webrender/src/platform/windows/font.rs
gfx/webrender/src/prim_store.rs
gfx/webrender/src/renderer.rs
gfx/webrender/src/resource_cache.rs
gfx/webrender/src/tiling.rs
gfx/webrender_api/src/api.rs
gfx/webrender_api/src/display_item.rs
gfx/webrender_api/src/display_list.rs
gfx/webrender_api/src/font.rs
--- a/gfx/doc/README.webrender
+++ b/gfx/doc/README.webrender
@@ -170,9 +170,9 @@ 2. Sometimes autoland tip has changed en
    has an env var you can set to do this). In theory you can get the same
    result by resolving the conflict manually but Cargo.lock files are usually not
    trivial to merge by hand. If it's just the third_party/rust dir that has conflicts
    you can delete it and run |mach vendor rust| again to repopulate it.
 
 -------------------------------------------------------------------------------
 
 The version of WebRender currently in the tree is:
-c0194de78ce26106a8497484dc8d159069e3a482
+fae962bfd6e1997f4b921ee93c3c1cc5abca3256
new file mode 100644
--- /dev/null
+++ b/gfx/webrender/doc/text-rendering.md
@@ -0,0 +1,684 @@
+# Text Rendering
+
+This document describes the details of how WebRender renders text, particularly the blending stage of text rendering.
+We will go into grayscale text blending, subpixel text blending, and "subpixel text with background color" blending.
+
+### Prerequisites
+
+The description below assumes you're familiar with regular rgba compositing, operator over,
+and the concept of premultiplied alpha.
+
+### Not covered in this document
+
+We are going to treat the origin of the text mask as a black box.
+We're also going to assume we can blend text in the device color space and will not go into the gamma correction and linear pre-blending that happens in some of the backends that produce the text masks.
+
+## Grayscale Text Blending
+
+Grayscale text blending is the simplest form of text blending. Our blending function has three inputs:
+
+ - The text color, as a premultiplied rgba color.
+ - The text mask, as a single-channel alpha texture.
+ - The existing contents of the framebuffer that we're rendering to, the "destination". This is also a premultiplied rgba buffer.
+
+Note: The word "grayscale" here does *not* mean that we can only draw gray text.
+It means that the mask only has a single alpha value per pixel, so we can visualize
+the mask in our minds as a grayscale image.
+
+### Deriving the math
+
+We want to mask our text color using the single-channel mask, and composite that to the destination.
+This compositing step uses operator "over", just like regular compositing of rgba images.
+
+I'll be using GLSL syntax to describe the blend equations, but please consider most of the code below pseudocode.
+
+We can express the blending described above as the following blend equation:
+
+```glsl
+vec4 textblend(vec4 text_color, vec4 mask, vec4 dest) {
+  return over(in(text_color, mask), dest);
+}
+```
+
+with `over` being the blend function for (premultiplied) operator "over":
+
+```glsl
+vec4 over(vec4 src, vec4 dest) {
+  return src + (1.0 - src.a) * dest;
+}
+```
+
+and `in` being the blend function for (premultiplied) operator "in", i.e. the masking operator:
+
+```glsl
+vec4 in(vec4 src, vec4 mask) {
+  return src * mask.a;
+}
+```
+
+So the complete blending function is:
+
+```glsl
+result.r = text_color.r * mask.a + (1.0 - text_color.a * mask.a) * dest.r;
+result.g = text_color.g * mask.a + (1.0 - text_color.a * mask.a) * dest.g;
+result.b = text_color.b * mask.a + (1.0 - text_color.a * mask.a) * dest.b;
+result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.a;
+```
+
+### Rendering this with OpenGL
+
+In general, a fragment shader does not have access to the destination.
+So the full blend equation needs to be expressed in a way that the shader only computes values that are independent of the destination,
+and the parts of the equation that use the destination values need to be applied by the OpenGL blend pipeline itself.
+The OpenGL blend pipeline can be tweaked using the functions `glBlendEquation` and `glBlendFunc`.
+
+In our example, the fragment shader can output just `text_color * mask.a`:
+
+```glsl
+  oFragColor = text_color * mask.a;
+```
+
+and the OpenGL blend pipeline can be configured like so:
+
+```rust
+    pub fn set_blend_mode_premultiplied_alpha(&self) {
+        self.gl.blend_func(gl::ONE, gl::ONE_MINUS_SRC_ALPHA);
+        self.gl.blend_equation(gl::FUNC_ADD);
+    }
+```
+
+This results in an overall blend equation of
+
+```
+result.r = 1 * oFragColor.r + (1 - oFragColor.a) * dest.r;
+           ^                ^  ^^^^^^^^^^^^^^^^^
+           |                |         |
+           +--gl::ONE       |         +-- gl::ONE_MINUS_SRC_ALPHA
+                            |
+                            +-- gl::FUNC_ADD
+
+         = 1 * (text_color.r * mask.a) + (1 - (text_color.a * mask.a)) * dest.r
+         = text_color.r * mask.a + (1 - text_color.a * mask.a) * dest.r
+```
+
+which is exactly what we wanted.
+
+### Differences to the actual WebRender code
+
+There are two minor differences between the shader code above and the actual code in the text run shader in WebRender:
+
+```glsl
+oFragColor = text_color * mask.a;    // (shown above)
+// vs.
+oFragColor = vColor * mask * alpha;  // (actual webrender code)
+```
+
+`vColor` is set to the text color. The differences are:
+
+ - WebRender multiplies with all components of `mask` instead of just with `mask.a`.
+   However, our font rasterization code fills the rgb values of `mask` with the value of `mask.a`,
+   so this is completely equivalent.
+ - WebRender applies another alpha to the text. This is coming from the clip.
+   You can think of this alpha to be a pre-adjustment of the text color for that pixel, or as an
+   additional mask that gets applied to the mask.
+
+## Subpixel Text Blending
+
+Now that we have the blend equation for single-channel text blending, we can look at subpixel text blending.
+
+The main difference between subpixel text blending and grayscale text blending is the fact that,
+for subpixel text, the text mask contains a separate alpha value for each color component.
+
+### Component alpha
+
+Regular painting uses four values per pixel: three color values, and one alpha value. The alpha value applies to all components of the pixel equally.
+
+Imagine for a second a world in which you have *three alpha values per pixel*, one for each color component.
+
+ - Old world: Each pixel has four values: `color.r`, `color.g`, `color.b`, and `color.a`.
+ - New world: Each pixel has *six* values: `color.r`, `color.a_r`, `color.g`, `color.a_g`, `color.b`, and `color.a_b`.
+
+In such a world we can define a component-alpha-aware opererator "over":
+
+```glsl
+vec6 over_comp(vec6 src, vec6 dest) {
+  vec6 result;
+  result.r = src.r + (1.0 - src.a_r) * dest.r;
+  result.g = src.g + (1.0 - src.a_g) * dest.g;
+  result.b = src.b + (1.0 - src.a_b) * dest.b;
+  result.a_r = src.a_r + (1.0 - src.a_r) * dest.a_r;
+  result.a_g = src.a_g + (1.0 - src.a_g) * dest.a_g;
+  result.a_b = src.a_b + (1.0 - src.a_b) * dest.a_b;
+  return result;
+}
+```
+
+and a component-alpha-aware operator "in":
+
+```glsl
+vec6 in_comp(vec6 src, vec6 mask) {
+  vec6 result;
+  result.r = src.r * mask.a_r;
+  result.g = src.g * mask.a_g;
+  result.b = src.b * mask.a_b;
+  result.a_r = src.a_r * mask.a_r;
+  result.a_g = src.a_g * mask.a_g;
+  result.a_b = src.a_b * mask.a_b;
+  return result;
+}
+```
+
+and even a component-alpha-aware version of `textblend`:
+
+```glsl
+vec6 textblend_comp(vec6 text_color, vec6 mask, vec6 dest) {
+  return over_comp(in_comp(text_color, mask), dest);
+}
+```
+
+This results in the following set of equations:
+
+```glsl
+result.r = text_color.r * mask.a_r + (1.0 - text_color.a_r * mask.a_r) * dest.r;
+result.g = text_color.g * mask.a_g + (1.0 - text_color.a_g * mask.a_g) * dest.g;
+result.b = text_color.b * mask.a_b + (1.0 - text_color.a_b * mask.a_b) * dest.b;
+result.a_r = text_color.a_r * mask.a_r + (1.0 - text_color.a_r * mask.a_r) * dest.a_r;
+result.a_g = text_color.a_g * mask.a_g + (1.0 - text_color.a_g * mask.a_g) * dest.a_g;
+result.a_b = text_color.a_b * mask.a_b + (1.0 - text_color.a_b * mask.a_b) * dest.a_b;
+```
+
+### Back to the real world
+
+If we want to transfer the component alpha blend equation into the real world, we need to make a few small changes:
+
+ - Our text color only needs one alpha value.
+   So we'll replace all instances of `text_color.a_r/g/b` with `text_color.a`.
+ - We're currently not making use of the mask's `r`, `g` and `b` values, only of the `a_r`, `a_g` and `a_b` values.
+   So in the real world, we can use the rgb channels of `mask` to store those component alphas and
+   replace `mask.a_r/g/b` with `mask.r/g/b`.
+
+These two changes give us:
+
+```glsl
+result.r = text_color.r * mask.r + (1.0 - text_color.a * mask.r) * dest.r;
+result.g = text_color.g * mask.g + (1.0 - text_color.a * mask.g) * dest.g;
+result.b = text_color.b * mask.b + (1.0 - text_color.a * mask.b) * dest.b;
+result.a_r = text_color.a * mask.r + (1.0 - text_color.a * mask.r) * dest.a_r;
+result.a_g = text_color.a * mask.g + (1.0 - text_color.a * mask.g) * dest.a_g;
+result.a_b = text_color.a * mask.b + (1.0 - text_color.a * mask.b) * dest.a_b;
+```
+
+There's a third change we need to make:
+
+ - We're rendering to a destination surface that only has one alpha channel instead of three.
+   So `dest.a_r/g/b` and `result.a_r/g/b` will need to become `dest.a` and `result.a`.
+
+This creates a problem: We're currently assigning different values to `result.a_r`, `result.a_g` and `result.a_b`.
+Which of them should we use to compute `result.a`?
+
+This question does not have an answer. One alpha value per pixel is simply not sufficient
+to express the same information as three alpha values.
+
+However, see what happens if the destination is already opaque:
+
+We have `dest.a_r == 1`, `dest.a_g == 1`, and `dest.a_b == 1`.
+
+```
+result.a_r = text_color.a * mask.r + (1 - text_color.a * mask.r) * dest.a_r
+           = text_color.a * mask.r + (1 - text_color.a * mask.r) * 1
+           = text_color.a * mask.r + 1 - text_color.a * mask.r
+           = 1
+same for result.a_g and result.a_b
+```
+
+In other words, for opaque destinations, it doesn't matter what which channel of the mask we use when computing `result.a`, the result will always be completely opaque anyways. In WebRender we just pick `mask.g` (or rather,
+have font rasterization set `mask.a` to the value of `mask.g`) because it's as good as any.
+
+The takeaway here is: **Subpixel text blending is only supported for opaque destinations.** Attempting to render subpixel
+text into partially transparent destinations will result in bad alpha values. Or rather, it will result in alpha values which
+are not anticipated by the r, g, and b values in the same pixel, so that subsequent blend operations, which will mix r and a values
+from the same pixel, will produce incorrect colors.
+
+Here's the final subpixel blend function:
+
+```glsl
+vec4 subpixeltextblend(vec4 text_color, vec4 mask, vec4 dest) {
+  vec4 result;
+  result.r = text_color.r * mask.r + (1.0 - text_color.a * mask.r) * dest.r;
+  result.g = text_color.g * mask.g + (1.0 - text_color.a * mask.g) * dest.g;
+  result.b = text_color.b * mask.b + (1.0 - text_color.a * mask.b) * dest.b;
+  result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.a;
+  return result;
+}
+```
+
+or for short:
+
+```glsl
+vec4 subpixeltextblend(vec4 text_color, vec4 mask, vec4 dest) {
+  return text_color * mask + (1.0 - text_color.a * mask) * dest;
+}
+```
+
+To recap, here's what we gained and lost by making the transition from the full-component-alpha world to the
+regular rgba world: All colors and textures now only need four values to be represented, we still use a
+component alpha mask, and the results are equivalent to the full-component-alpha result assuming that the
+destination is opaque. We lost the ability to draw to partially transparent destinations.
+
+### Making this work in OpenGL
+
+We have the complete subpixel blend function.
+Now we need to cut it into pieces and mix it with the OpenGL blend pipeline in such a way that
+the fragment shader does not need to know about the destination.
+
+Compare the equation for the red channel and the alpha channel between the two ways of text blending:
+
+```
+  single-channel alpha:
+    result.r = text_color.r * mask.a + (1.0 - text_color.a * mask.a) * dest.r
+    result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.r
+
+  component alpha:
+    result.r = text_color.r * mask.r + (1.0 - text_color.a * mask.r) * dest.r
+    result.a = text_color.a * mask.a + (1.0 - text_color.a * mask.a) * dest.r
+```
+
+Notably, in the single-channel alpha case, all three destination color channels are multiplied with the same thing:
+`(1.0 - text_color.a * mask.a)`. This factor also happens to be "one minus `oFragColor.a`".
+So we were able to take advantage of OpenGL's `ONE_MINUS_SRC_ALPHA` blend func.
+
+In the component alpha case, we're not so lucky: Each destination color channel
+is multiplied with a different factor. We can use `ONE_MINUS_SRC_COLOR` instead,
+and output `text_color.a * mask` from our fragment shader.
+But then there's still the problem that the first summand of the computation for `result.r` uses
+`text_color.r * mask.r` and the second summand uses `text_color.a * mask.r`.
+
+There's no way around it, we have to use two passes.
+(Actually, there is a way around it, but it requires the use of `glBlendColor`, which we want to avoid because
+we'd have to use different draw calls for different text colors, or it requires "dual source blending" which is
+not supported everywhere.)
+
+Here's how we can express the subpixel text blend function with two passes:
+
+ - The first pass outputs `text_color.a * mask` from the fragment shader and
+   uses `gl::ZERO, gl::ONE_MINUS_SRC_COLOR` as the glBlendFuncs. This achieves:
+
+```
+oFragColor = text_color.a * mask;
+
+result_after_pass0.r = 0 * oFragColor.r + (1 - oFragColor.r) * dest.r
+                     = (1 - text_color.a * mask.r) * dest.r
+
+result_after_pass0.g = 0 * oFragColor.g + (1 - oFragColor.g) * dest.r
+                     = (1 - text_color.a * mask.r) * dest.r
+
+...
+```
+
+ - The second pass outputs `text_color * mask` from the fragment shader and uses
+   `gl::ONE, gl::ONE` as the glBlendFuncs. This gets us:
+
+```
+oFragColor = text_color * mask;
+
+result_after_pass1.r
+ = 1 * oFragColor.r + 1 * result_after_pass0.r
+ = text_color.r * mask.r + result_after_pass0.r
+ = text_color.r * mask.r + (1 - text_color.a * mask.r) * dest.r
+```
+
+And analogous results for the other channels.
+
+This achieves what we set out to do, so we're done here.
+
+## Subpixel Text Rendering to Transparent Destinations with a Background Color Hint
+
+### Motivation
+
+As we've seen in the previous section, subpixel text drawing has the limitation that it only works on opaque destinations.
+
+In other words, if you use the `subpixeltextblend` function to draw something to a transparent surface,
+and then composite that surface onto on opaque background,
+the result will generally be different from drawing the text directly onto the opaque background.
+
+Let's express that inequality in code.
+
+```
+ - vec4 text_color
+ - vec4 mask
+ - vec4 transparency = vec4(0.0, 0.0, 0.0, 0.0)
+ - vec4 background with background.a == 1.0
+
+over(subpixeltextblend(text_color, mask, transparency), background).rgb
+ is, in general, not equal to
+subpixeltextblend(text_color, mask, background).rgb
+```
+
+However, one interesting observation is that if the background is black, the two *are* equal:
+
+```
+vec4 black = vec4(0.0, 0.0, 0.0, 1.0);
+
+over(subpixeltextblend(text_color, mask, transparency), black).r
+ = subpixeltextblend(text_color, mask, transparency).r +
+     (1 - subpixeltextblend(text_color, mask, transparency).a) * black.r
+ = subpixeltextblend(text_color, mask, transparency).r +
+     (1 - subpixeltextblend(text_color, mask, transparency).a) * 0
+ = subpixeltextblend(text_color, mask, transparency).r
+ = text_color.r * mask.r + (1 - text_color.a * mask.r) * transparency.r
+ = text_color.r * mask.r + (1 - text_color.a * mask.r) * 0
+ = text_color.r * mask.r + (1 - text_color.a * mask.r) * black.r
+ = subpixeltextblend(text_color, mask, black).r
+```
+
+So it works out for black backgrounds. The further your *actual* background color gets away from black,
+the more incorrect your result will be.
+
+If it works for black, is there a way to make it work for other colors?
+This is the motivating question for this third way of text blending:
+
+We want to be able to specify an *estimated background color*, and have a blending function
+`vec4 subpixeltextblend_withbgcolor(vec4 text_color, vec4 mask, vec4 bg_color, vec4 dest)`,
+in such a way that the error we get by using an intermediate surface is somehow in relation
+to the error we made when estimating the background color. In particular, if we estimated
+the background color perfectly, we want the intermediate surface to go unnoticed.
+
+Expressed as code:
+
+```
+over(subpixeltextblend_withbgcolor(text_color, mask, bg_color, transparency), bg_color)
+ should always be equal to
+subpixeltextblend(text_color, mask, bg_color)
+```
+
+This is one of three constraints we'd like `subpixeltextblend_withbgcolor` to satisfy.
+
+The next constraint is the following: If `dest` is already opaque, `subpixeltextblend_withbgcolor`
+should have the same results as `subpixeltextblend`, and the background color hint should be ignored.
+
+```
+ If dest.a == 1.0,
+subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest)
+ should always be equal to
+subpixeltextblend(text_color, mask, dest)
+```
+
+And there's a third condition we'd like it to fulfill:
+In places where the mask is zero, the destination should be unaffected.
+
+```
+subpixeltextblend_withbgcolor(text_color, transparency, bg_color, dest)
+ should always be equal to
+dest
+```
+
+### Use cases
+
+The primary use case for such a blend method is text on top of vibrant areas of a window on macOS.
+
+Vibrant backgrounds with behind-window blending are computed by the window server, and they are tinted
+in a color that's based on the chosen vibrancy type.
+
+The window's rgba buffer is transparent in the vibrant areas. Window contents, even text, are drawn onto
+that transparent rgba buffer. Then the window server composites the window onto an opaque backdrop.
+So the results on the screen are computed as follows:
+
+```glsl
+window_buffer_pixel = subpixeltextblend_withbgcolor(text_color, mask, bg_color, transparency);
+screen_pixel = over(window_buffer_pixel, window_backdrop);
+```
+
+### Prior art
+
+Apple has implemented such a method of text blending in CoreGraphics, specifically for rendering text onto vibrant backgrounds.
+It's hidden behind the private API `CGContextSetFontSmoothingBackgroundColor` and is called by AppKit internally before
+calling the `-[NSView drawRect:]` method of your `NSVisualEffectView`, with the appropriate font smoothing background color
+for the vibrancy type of that view.
+
+I'm not aware of any public documentation of this way of text blending.
+It seems to be considered an implementation detail by Apple, and is probably hidden by default because it can be a footgun:
+If the font smoothing background color you specify is very different from the actual background that our surface is placed
+on top of, the text will look glitchy.
+
+### Deriving the blending function from first principles
+
+Before we dive into the math, let's repeat our goal once more.
+
+We want to create a blending function of the form
+`vec4 subpixeltextblend_withbgcolor(vec4 text_color, vec4 mask, vec4 bg_color, vec4 dest)`
+(with `bg_color` being an opaque color)
+which satisfies the following three constraints:
+
+```
+Constraint I:
+  over(subpixeltextblend_withbgcolor(text_color, mask, bg_color, transparency), bg_color)
+   should always be equal to
+  subpixeltextblend(text_color, mask, bg_color)
+
+Constraint II:
+   If dest.a == 1.0,
+  subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest)
+   should always be equal to
+  subpixeltextblend(text_color, mask, dest)
+
+Constraint II:
+  subpixeltextblend_withbgcolor(text_color, transparency, bg_color, dest)
+   should always be equal to
+  dest
+```
+
+Constraint I and constraint II are about what happens depending on the destination's alpha.
+In particular: If the destination is completely transparent, we should blend into the
+estimated background color, and if it's completely opaque, we should blend into the destination color.
+In fact, we really want to blend into `over(dest, bg_color)`: we want `bg_color` to be used
+as a backdrop *behind* the current destination. So let's combine constraints I and II into a new
+constraint IV:
+
+```
+Constraint IV:
+  over(subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest), bg_color)
+   should always be equal to
+  subpixeltextblend(text_color, mask, over(dest, bg_color))
+```
+
+Let's look at just the left side of that equation and rejiggle it a bit:
+
+```
+over(subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest), bg_color).r
+ = subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).r +
+   (1 - subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).a) * bg_color.r
+
+<=>
+
+over(subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest), bg_color).r -
+(1 - subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).a) * bg_color.r
+ = subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).r
+```
+
+Now insert the right side of constraint IV:
+
+```
+subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).r
+ = over(subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest), bg_color).r -
+   (1 - subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).a) * bg_color.r
+ = subpixeltextblend(text_color, mask, over(dest, bg_color)).r -
+   (1 - subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).a) * bg_color.r
+```
+
+Our blend function is almost finished. We just need select an alpha for our result.
+Constraints I, II and IV don't really care about the alpha value. But constraint III requires that:
+
+```
+  subpixeltextblend_withbgcolor(text_color, transparency, bg_color, dest).a
+   should always be equal to
+  dest.a
+```
+
+so the computation of the alpha value somehow needs to take into account the mask.
+
+Let's say we have an unknown function `make_alpha(text_color.a, mask)` which returns
+a number between 0 and 1 and which is 0 if the mask is entirely zero, and let's defer
+the actual implementation of that function until later.
+
+Now we can define the alpha of our overall function using the `over` function:
+
+```
+subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).a
+ := make_alpha(text_color.a, mask) + (1 - make_alpha(text_color.a, mask)) * dest.a
+```
+
+We can plug this in to our previous result:
+
+```
+subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).r
+ = subpixeltextblend(text_color, mask, over(dest, bg_color)).r
+   - (1 - subpixeltextblend_withbgcolor(text_color, mask, bg_color, dest).a) * bg_color.r
+ = subpixeltextblend(text_color, mask, over(dest, bg_color)).r
+   - (1 - (make_alpha(text_color.a, mask) +
+           (1 - make_alpha(text_color.a, mask)) * dest.a)) * bg_color.r
+ = text_color.r * mask.r + (1 - text_color.a * mask.r) * over(dest, bg_color).r
+   - (1 - (make_alpha(text_color.a, mask)
+           + (1 - make_alpha(text_color.a, mask)) * dest.a)) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * (dest.r + (1 - dest.a) * bg_color.r)
+   - (1 - (make_alpha(text_color.a, mask)
+           + (1 - make_alpha(text_color.a, mask)) * dest.a)) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * (dest.r + (1 - dest.a) * bg_color.r)
+   - (1 - (make_alpha(text_color.a, mask)
+           + (1 - make_alpha(text_color.a, mask)) * dest.a)) * bg_color.r
+ = text_color.r * mask.r
+   + (dest.r + (1 - dest.a) * bg_color.r)
+   - (text_color.a * mask.r) * (dest.r + (1 - dest.a) * bg_color.r)
+   - (1 - make_alpha(text_color.a, mask)
+      - (1 - make_alpha(text_color.a, mask)) * dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + dest.r + (1 - dest.a) * bg_color.r
+   - text_color.a * mask.r * dest.r
+   - text_color.a * mask.r * (1 - dest.a) * bg_color.r
+   - (1 - make_alpha(text_color.a, mask)
+      - (1 - make_alpha(text_color.a, mask)) * dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + dest.r + (1 - dest.a) * bg_color.r
+   - text_color.a * mask.r * dest.r
+   - text_color.a * mask.r * (1 - dest.a) * bg_color.r
+   - ((1 - make_alpha(text_color.a, mask)) * 1
+      - (1 - make_alpha(text_color.a, mask)) * dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + dest.r + (1 - dest.a) * bg_color.r
+   - text_color.a * mask.r * dest.r
+   - text_color.a * mask.r * (1 - dest.a) * bg_color.r
+   - ((1 - make_alpha(text_color.a, mask)) * (1 - dest.a)) * bg_color.r
+ = text_color.r * mask.r
+   + dest.r - text_color.a * mask.r * dest.r
+   + (1 - dest.a) * bg_color.r
+   - text_color.a * mask.r * (1 - dest.a) * bg_color.r
+   - (1 - make_alpha(text_color.a, mask)) * (1 - dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * dest.r
+   + (1 - dest.a) * bg_color.r
+   - text_color.a * mask.r * (1 - dest.a) * bg_color.r
+   - (1 - make_alpha(text_color.a, mask)) * (1 - dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * dest.r
+   + (1 - text_color.a * mask.r) * (1 - dest.a) * bg_color.r
+   - (1 - make_alpha(text_color.a, mask)) * (1 - dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * dest.r
+   + ((1 - text_color.a * mask.r)
+      - (1 - make_alpha(text_color.a, mask))) * (1 - dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * dest.r
+   + (1 - text_color.a * mask.r
+      - 1 + make_alpha(text_color.a, mask)) * (1 - dest.a) * bg_color.r
+ = text_color.r * mask.r
+   + (1 - text_color.a * mask.r) * dest.r
+   + (make_alpha(text_color.a, mask) - text_color.a * mask.r) * (1 - dest.a) * bg_color.r
+```
+
+We now have a term of the form `A + B + C`, with `A` and `B` being guaranteed to
+be between zero and one.
+
+We also want `C` to be between zero and one.
+We can use this restriction to help us decide on an implementation of `make_alpha`.
+
+If we define `make_alpha` as
+
+```glsl
+float make_alpha(text_color_a, mask) {
+  float max_rgb = max(max(mask.r, mask.g), mask.b);
+  return text_color_a * max_rgb;
+}
+```
+
+, then `(make_alpha(text_color.a, mask) - text_color.a * mask.r)` becomes
+`(text_color.a * max(max(mask.r, mask.g), mask.b) - text_color.a * mask.r)`, which is
+`text_color.a * (max(max(mask.r, mask.g), mask.b) - mask.r)`, and the subtraction will
+always yield something that's greater or equal to zero for r, g, and b,
+because we will subtract each channel from the maximum of the channels.
+
+Putting this all together, we have:
+
+```glsl
+vec4 subpixeltextblend_withbgcolor(vec4 text_color, vec4 mask, vec4 bg_color, vec4 dest) {
+  float max_rgb = max(max(mask.r, mask.g), mask.b);
+  vec4 result;
+  result.r = text_color.r * mask.r + (1 - text_color.a * mask.r) * dest.r +
+             text_color.a * bg_color.r * (max_rgb - mask.r) * (1 - dest.a);
+  result.g = text_color.g * mask.g + (1 - text_color.a * mask.g) * dest.g +
+             text_color.a * bg_color.g * (max_rgb - mask.g) * (1 - dest.a);
+  result.b = text_color.b * mask.b + (1 - text_color.a * mask.b) * dest.b +
+             text_color.a * bg_color.b * (max_rgb - mask.b) * (1 - dest.a);
+  result.a = text_color.a * max_rgb + (1 - text_color.a * max_rgb) * dest.a;
+  return result;
+}
+```
+
+This is the final form of this blend function. It satisfies all of the four constraints.
+
+### Implementing it with OpenGL
+
+Our color channel equations consist of three pieces:
+
+ - `text_color.r * mask.r`, which simply gets added to the rest.
+ - `(1 - text_color.a * mask.r) * dest.r`, a factor which gets multiplied with the destination color.
+ - `text_color.a * bg_color.r * (max_rgb - mask.r) * (1 - dest.a)`, a factor which gets multiplied
+   with "one minus destination alpha".
+
+We will need three passes. Each pass modifies the color channels in the destination.
+This means that the part that uses `dest.r` needs to be applied first.
+Then we can apply the part that uses `1 - dest.a`.
+(This means that the first pass needs to leave `dest.a` untouched.)
+And the final pass can apply the `result.a` equation and modify `dest.a`.
+
+```
+pub fn set_blend_mode_subpixel_with_bg_color_pass0(&self) {
+    self.gl.blend_func_separate(gl::ZERO, gl::ONE_MINUS_SRC_COLOR, gl::ZERO, gl::ONE);
+}
+pub fn set_blend_mode_subpixel_with_bg_color_pass1(&self) {
+    self.gl.blend_func_separate(gl::ONE_MINUS_DST_ALPHA, gl::ONE, gl::ZERO, gl::ONE);
+}
+pub fn set_blend_mode_subpixel_with_bg_color_pass2(&self) {
+    self.gl.blend_func_separate(gl::ONE, gl::ONE, gl::ONE, gl::ONE_MINUS_SRC_ALPHA);
+}
+
+Pass0:
+    oFragColor = vec4(text.color.a) * mask;
+Pass1:
+    oFragColor = vec4(text.color.a) * text.bg_color * (vec4(mask.a) - mask);
+Pass2:
+    oFragColor = text.color * mask;
+
+result_after_pass0.r = 0 * (text_color.a * mask.r) + (1 - text_color.a * mask.r) * dest.r
+result_after_pass0.a = 0 * (text_color.a * mask.a) + 1 * dest.a
+
+result_after_pass1.r = (1 - result_after_pass0.a) * (text_color.a * (mask.max_rgb - mask.r) * bg_color.r) + 1 * result_after_pass0.r
+result_after_pass1.a = 0 * (text_color.a * (mask.max_rgb - mask.a) * bg_color.a) + 1 * result_after_pass0.a
+
+result_after_pass2.r = 1 * (text_color.r * mask.r) + 1 * result_after_pass1.r
+result_after_pass2.a = 1 * (text_color.a * mask.max_rgb) + (1 - text_color.a * mask.max_rgb) * result_after_pass1.a
+```
+
+Instead of computing `max_rgb` in the shader, we can just require the font rasterization code to fill
+`mask.a` with the `max_rgb` value.
+
--- a/gfx/webrender/res/prim_shared.glsl
+++ b/gfx/webrender/res/prim_shared.glsl
@@ -69,17 +69,17 @@ vec4[2] fetch_from_resource_cache_2(int 
     );
 }
 
 #ifdef WR_VERTEX_SHADER
 
 #define VECS_PER_LAYER              9
 #define VECS_PER_RENDER_TASK        3
 #define VECS_PER_PRIM_HEADER        2
-#define VECS_PER_TEXT_RUN           2
+#define VECS_PER_TEXT_RUN           3
 #define VECS_PER_GRADIENT           3
 #define VECS_PER_GRADIENT_STOP      2
 
 uniform HIGHP_SAMPLER_FLOAT sampler2D sLayers;
 uniform HIGHP_SAMPLER_FLOAT sampler2D sRenderTasks;
 
 // Instanced attributes
 in ivec4 aData0;
@@ -712,23 +712,24 @@ struct Rectangle {
 
 Rectangle fetch_rectangle(int address) {
     vec4 data = fetch_from_resource_cache_1(address);
     return Rectangle(data);
 }
 
 struct TextRun {
     vec4 color;
+    vec4 bg_color;
     vec2 offset;
     int subpx_dir;
 };
 
 TextRun fetch_text_run(int address) {
-    vec4 data[2] = fetch_from_resource_cache_2(address);
-    return TextRun(data[0], data[1].xy, int(data[1].z));
+    vec4 data[3] = fetch_from_resource_cache_3(address);
+    return TextRun(data[0], data[1], data[2].xy, int(data[2].z));
 }
 
 struct Image {
     vec4 stretch_size_and_tile_spacing;  // Size of the actual image and amount of space between
                                          //     tiled instances of this image.
     vec4 sub_rect;                          // If negative, ignored.
 };
 
--- a/gfx/webrender/res/ps_border_corner.glsl
+++ b/gfx/webrender/res/ps_border_corner.glsl
@@ -14,16 +14,18 @@ flat varying vec4 vColorEdgeLine;
 // Border radius
 flat varying vec2 vClipCenter;
 flat varying vec4 vRadii0;
 flat varying vec4 vRadii1;
 flat varying vec2 vClipSign;
 flat varying vec4 vEdgeDistance;
 flat varying float vSDFSelect;
 
+flat varying float vIsBorderRadiusLessThanBorderWidth;
+
 // Border style
 flat varying float vAlphaSelect;
 
 #ifdef WR_FEATURE_TRANSFORM
 varying vec3 vLocalPos;
 #else
 varying vec2 vLocalPos;
 #endif
@@ -175,16 +177,18 @@ void main(void) {
                       border.widths.xy,
                       adjusted_widths.xy);
             set_edge_line(border.widths.xy,
                           corners.tl_outer,
                           vec2(1.0, 1.0));
             edge_distances = vec4(p0 + adjusted_widths.xy,
                                   p0 + inv_adjusted_widths.xy);
             color_delta = vec2(1.0);
+            vIsBorderRadiusLessThanBorderWidth = any(lessThan(border.radii[0].xy,
+                                                              border.widths.xy)) ? 1.0 : 0.0;
             break;
         }
         case 1: {
             p0 = vec2(corners.tr_inner.x, corners.tr_outer.y);
             p1 = vec2(corners.tr_outer.x, corners.tr_inner.y);
             color0 = border.colors[1];
             color1 = border.colors[2];
             vClipCenter = corners.tr_outer + vec2(-border.radii[0].z, border.radii[0].w);
@@ -199,16 +203,18 @@ void main(void) {
             set_edge_line(border.widths.zy,
                           corners.tr_outer,
                           vec2(-1.0, 1.0));
             edge_distances = vec4(p1.x - adjusted_widths.z,
                                   p0.y + adjusted_widths.y,
                                   p1.x - border.widths.z + adjusted_widths.z,
                                   p0.y + inv_adjusted_widths.y);
             color_delta = vec2(1.0, -1.0);
+            vIsBorderRadiusLessThanBorderWidth = any(lessThan(border.radii[0].zw,
+                                                              border.widths.zy)) ? 1.0 : 0.0;
             break;
         }
         case 2: {
             p0 = corners.br_inner;
             p1 = corners.br_outer;
             color0 = border.colors[2];
             color1 = border.colors[3];
             vClipCenter = corners.br_outer - border.radii[1].xy;
@@ -223,16 +229,18 @@ void main(void) {
             set_edge_line(border.widths.zw,
                           corners.br_outer,
                           vec2(-1.0, -1.0));
             edge_distances = vec4(p1.x - adjusted_widths.z,
                                   p1.y - adjusted_widths.w,
                                   p1.x - border.widths.z + adjusted_widths.z,
                                   p1.y - border.widths.w + adjusted_widths.w);
             color_delta = vec2(-1.0);
+            vIsBorderRadiusLessThanBorderWidth = any(lessThan(border.radii[1].xy,
+                                                              border.widths.zw)) ? 1.0 : 0.0;
             break;
         }
         case 3: {
             p0 = vec2(corners.bl_outer.x, corners.bl_inner.y);
             p1 = vec2(corners.bl_inner.x, corners.bl_outer.y);
             color0 = border.colors[3];
             color1 = border.colors[0];
             vClipCenter = corners.bl_outer + vec2(border.radii[1].z, -border.radii[1].w);
@@ -247,16 +255,18 @@ void main(void) {
             set_edge_line(border.widths.xw,
                           corners.bl_outer,
                           vec2(1.0, -1.0));
             edge_distances = vec4(p0.x + adjusted_widths.x,
                                   p1.y - adjusted_widths.w,
                                   p0.x + inv_adjusted_widths.x,
                                   p1.y - border.widths.w + adjusted_widths.w);
             color_delta = vec2(-1.0, 1.0);
+            vIsBorderRadiusLessThanBorderWidth = any(lessThan(border.radii[1].zw,
+                                                              border.widths.xw)) ? 1.0 : 0.0;
             break;
         }
     }
 
     switch (style) {
         case BORDER_STYLE_DOUBLE: {
             vEdgeDistance = edge_distances;
             vAlphaSelect = 0.0;
@@ -327,17 +337,18 @@ void main(void) {
     float aa_range = compute_aa_range(local_pos);
 
     float distance_for_color;
     float color_mix_factor;
 
     // Only apply the clip AA if inside the clip region. This is
     // necessary for correctness when the border width is greater
     // than the border radius.
-    if (all(lessThan(local_pos * vClipSign, vClipCenter * vClipSign))) {
+    if (vIsBorderRadiusLessThanBorderWidth == 0.0 ||
+        all(lessThan(local_pos * vClipSign, vClipCenter * vClipSign))) {
         vec2 p = local_pos - vClipCenter;
 
         // The coordinate system is snapped to pixel boundaries. To sample the distance,
         // however, we are interested in the center of the pixels which introduces an
         // error of half a pixel towards the exterior of the curve (See issue #1750).
         // This error is corrected by offsetting the distance by half a device pixel.
         // This not entirely correct: it leaves an error that varries between
         // 0 and (sqrt(2) - 1)/2 = 0.2 pixels but it is hardly noticeable and is better
--- a/gfx/webrender/res/ps_text_run.glsl
+++ b/gfx/webrender/res/ps_text_run.glsl
@@ -12,17 +12,20 @@ flat varying vec4 vUvBorder;
 varying vec3 vLocalPos;
 #endif
 
 #ifdef WR_VERTEX_SHADER
 
 #define MODE_ALPHA          0
 #define MODE_SUBPX_PASS0    1
 #define MODE_SUBPX_PASS1    2
-#define MODE_COLOR_BITMAP   3
+#define MODE_SUBPX_BG_PASS0 3
+#define MODE_SUBPX_BG_PASS1 4
+#define MODE_SUBPX_BG_PASS2 5
+#define MODE_COLOR_BITMAP   6
 
 void main(void) {
     Primitive prim = load_primitive();
     TextRun text = fetch_text_run(prim.specific_prim_address);
 
     int glyph_index = prim.user_data0;
     int resource_address = prim.user_data1;
 
@@ -54,42 +57,55 @@ void main(void) {
                                  prim.layer,
                                  prim.task,
                                  local_rect);
     vec2 f = (vi.local_pos - local_rect.p0) / local_rect.size;
 #endif
 
     write_clip(vi.screen_pos, prim.clip_area);
 
+#ifdef WR_FEATURE_SUBPX_BG_PASS1
+    vColor = vec4(text.color.a) * text.bg_color;
+#else
     switch (uMode) {
         case MODE_ALPHA:
         case MODE_SUBPX_PASS1:
+        case MODE_SUBPX_BG_PASS2:
             vColor = text.color;
             break;
         case MODE_SUBPX_PASS0:
+        case MODE_SUBPX_BG_PASS0:
         case MODE_COLOR_BITMAP:
             vColor = vec4(text.color.a);
             break;
+        case MODE_SUBPX_BG_PASS1:
+            // This should never be reached.
+            break;
     }
+#endif
 
     vec2 texture_size = vec2(textureSize(sColor0, 0));
     vec2 st0 = res.uv_rect.xy / texture_size;
     vec2 st1 = res.uv_rect.zw / texture_size;
 
     vUv = vec3(mix(st0, st1, f), res.layer);
     vUvBorder = (res.uv_rect + vec4(0.5, 0.5, -0.5, -0.5)) / texture_size.xyxy;
 }
 #endif
 
 #ifdef WR_FRAGMENT_SHADER
 void main(void) {
     vec3 tc = vec3(clamp(vUv.xy, vUvBorder.xy, vUvBorder.zw), vUv.z);
-    vec4 color = texture(sColor0, tc);
+    vec4 mask = texture(sColor0, tc);
 
     float alpha = 1.0;
 #ifdef WR_FEATURE_TRANSFORM
     init_transform_fs(vLocalPos, alpha);
 #endif
     alpha *= do_clip();
 
-    oFragColor = color * vColor * alpha;
+#ifdef WR_FEATURE_SUBPX_BG_PASS1
+    mask.rgb = vec3(mask.a) - mask.rgb;
+#endif
+
+    oFragColor = vColor * mask * alpha;
 }
 #endif
--- a/gfx/webrender/src/clip_scroll_node.rs
+++ b/gfx/webrender/src/clip_scroll_node.rs
@@ -225,24 +225,26 @@ impl ClipScrollNode {
         Self::new(pipeline_id, Some(parent_id), &frame_rect, node_type)
     }
 
 
     pub fn add_child(&mut self, child: ClipId) {
         self.children.push(child);
     }
 
-    pub fn apply_old_scrolling_state(&mut self, new_scrolling: &ScrollingState) {
+    pub fn apply_old_scrolling_state(&mut self, old_scrolling_state: &ScrollingState) {
         match self.node_type {
             NodeType::ScrollFrame(ref mut scrolling) => {
                 let scroll_sensitivity = scrolling.scroll_sensitivity;
-                *scrolling = *new_scrolling;
+                let scrollable_size = scrolling.scrollable_size;
+                *scrolling = *old_scrolling_state;
                 scrolling.scroll_sensitivity = scroll_sensitivity;
+                scrolling.scrollable_size = scrollable_size;
             }
-            _ if new_scrolling.offset != LayerVector2D::zero() => {
+            _ if old_scrolling_state.offset != LayerVector2D::zero() => {
                 warn!("Tried to scroll a non-scroll node.")
             }
             _ => {}
         }
     }
 
     pub fn set_scroll_origin(&mut self, origin: &LayerPoint, clamp: ScrollClamping) -> bool {
         let scrollable_size = self.scrollable_size();
--- a/gfx/webrender/src/clip_scroll_tree.rs
+++ b/gfx/webrender/src/clip_scroll_tree.rs
@@ -42,20 +42,19 @@ pub struct ClipScrollTree {
     /// added frames and clips. The ClipScrollTree increments this by one every
     /// time a new dynamic frame is created.
     current_new_node_item: u64,
 
     /// The root reference frame, which is the true root of the ClipScrollTree. Initially
     /// this ID is not valid, which is indicated by ```node``` being empty.
     pub root_reference_frame_id: ClipId,
 
-    /// The topmost scrolling node that we have, which is decided by the first scrolling node
-    /// to be added to the tree. This is really only useful for Servo, so we should figure out
-    /// a good way to remove it in the future.
-    pub topmost_scrolling_node_id: Option<ClipId>,
+    /// The root scroll node which is the first child of the root reference frame.
+    /// Initially this ID is not valid, which is indicated by ```nodes``` being empty.
+    pub topmost_scrolling_node_id: ClipId,
 
     /// A set of pipelines which should be discarded the next time this
     /// tree is drained.
     pub pipelines_to_discard: FastHashSet<PipelineId>,
 }
 
 #[derive(Clone)]
 pub struct TransformUpdateState {
@@ -78,29 +77,36 @@ pub struct TransformUpdateState {
 impl ClipScrollTree {
     pub fn new() -> ClipScrollTree {
         let dummy_pipeline = PipelineId::dummy();
         ClipScrollTree {
             nodes: FastHashMap::default(),
             pending_scroll_offsets: FastHashMap::default(),
             currently_scrolling_node_id: None,
             root_reference_frame_id: ClipId::root_reference_frame(dummy_pipeline),
-            topmost_scrolling_node_id: None,
+            topmost_scrolling_node_id: ClipId::root_scroll_node(dummy_pipeline),
             current_new_node_item: 1,
             pipelines_to_discard: FastHashSet::default(),
         }
     }
 
     pub fn root_reference_frame_id(&self) -> ClipId {
         // TODO(mrobinson): We should eventually make this impossible to misuse.
         debug_assert!(!self.nodes.is_empty());
         debug_assert!(self.nodes.contains_key(&self.root_reference_frame_id));
         self.root_reference_frame_id
     }
 
+    pub fn topmost_scrolling_node_id(&self) -> ClipId {
+        // TODO(mrobinson): We should eventually make this impossible to misuse.
+        debug_assert!(!self.nodes.is_empty());
+        debug_assert!(self.nodes.contains_key(&self.topmost_scrolling_node_id));
+        self.topmost_scrolling_node_id
+    }
+
     pub fn collect_nodes_bouncing_back(&self) -> FastHashSet<ClipId> {
         let mut nodes_bouncing_back = FastHashSet::default();
         for (clip_id, node) in self.nodes.iter() {
             if let NodeType::ScrollFrame(ref scrolling) = node.node_type {
                 if scrolling.bouncing_back {
                     nodes_bouncing_back.insert(*clip_id);
                 }
             }
@@ -130,16 +136,21 @@ impl ClipScrollTree {
             if node.ray_intersects_node(cursor) {
                 Some(clip_id)
             } else {
                 None
             }
         })
     }
 
+    pub fn find_scrolling_node_at_point(&self, cursor: &WorldPoint) -> ClipId {
+        self.find_scrolling_node_at_point_in_node(cursor, self.root_reference_frame_id())
+            .unwrap_or(self.topmost_scrolling_node_id())
+    }
+
     pub fn is_point_clipped_in_for_node(
         &self,
         point: WorldPoint,
         node_id: &ClipId,
         cache: &mut FastHashMap<ClipId, Option<LayerPoint>>,
         clip_store: &ClipStore
     ) -> bool {
         if let Some(point) = cache.get(node_id) {
@@ -245,27 +256,21 @@ impl ClipScrollTree {
         scroll_location: ScrollLocation,
         cursor: WorldPoint,
         phase: ScrollEventPhase,
     ) -> bool {
         if self.nodes.is_empty() {
             return false;
         }
 
-        let topmost_scrolling_node_id = match self.topmost_scrolling_node_id {
-            Some(id) => id,
-            None => return false,
-        };
-
-        let scrolling_node = self.find_scrolling_node_at_point_in_node(
-            &cursor,
-            self.root_reference_frame_id()
-        ).unwrap_or(topmost_scrolling_node_id);;
-
-        let clip_id = match (phase, scrolling_node, self.currently_scrolling_node_id) {
+        let clip_id = match (
+            phase,
+            self.find_scrolling_node_at_point(&cursor),
+            self.currently_scrolling_node_id,
+        ) {
             (ScrollEventPhase::Start, scroll_node_at_point_id, _) => {
                 self.currently_scrolling_node_id = Some(scroll_node_at_point_id);
                 scroll_node_at_point_id
             }
             (_, scroll_node_at_point_id, Some(cached_clip_id)) => {
                 let clip_id = match self.nodes.get(&cached_clip_id) {
                     Some(_) => cached_clip_id,
                     None => {
@@ -273,16 +278,17 @@ impl ClipScrollTree {
                         scroll_node_at_point_id
                     }
                 };
                 clip_id
             }
             (_, _, None) => return false,
         };
 
+        let topmost_scrolling_node_id = self.topmost_scrolling_node_id();
         let non_root_overscroll = if clip_id != topmost_scrolling_node_id {
             self.nodes.get(&clip_id).unwrap().is_overscrolling()
         } else {
             false
         };
 
         let mut switch_node = false;
         if let Some(node) = self.nodes.get_mut(&clip_id) {
@@ -475,20 +481,16 @@ impl ClipScrollTree {
             frame_rect,
             sticky_frame_info,
             id.pipeline_id(),
         );
         self.add_node(node, id);
     }
 
     pub fn add_node(&mut self, node: ClipScrollNode, id: ClipId) {
-        if let NodeType::ScrollFrame(..) = node.node_type {
-            self.topmost_scrolling_node_id.get_or_insert(id);
-        }
-
         // When the parent node is None this means we are adding the root.
         match node.parent {
             Some(parent_id) => self.nodes.get_mut(&parent_id).unwrap().add_child(id),
             None => self.root_reference_frame_id = id,
         }
 
         debug_assert!(!self.nodes.contains_key(&id));
         self.nodes.insert(id, node);
--- a/gfx/webrender/src/device.rs
+++ b/gfx/webrender/src/device.rs
@@ -1908,16 +1908,28 @@ impl Device {
         self.gl.blend_equation_separate(gl::MIN, gl::FUNC_ADD);
     }
     pub fn set_blend_mode_subpixel_pass0(&self) {
         self.gl.blend_func(gl::ZERO, gl::ONE_MINUS_SRC_COLOR);
     }
     pub fn set_blend_mode_subpixel_pass1(&self) {
         self.gl.blend_func(gl::ONE, gl::ONE);
     }
+    pub fn set_blend_mode_subpixel_with_bg_color_pass0(&self) {
+        self.gl.blend_func_separate(gl::ZERO, gl::ONE_MINUS_SRC_COLOR, gl::ZERO, gl::ONE);
+        self.gl.blend_equation(gl::FUNC_ADD);
+    }
+    pub fn set_blend_mode_subpixel_with_bg_color_pass1(&self) {
+        self.gl.blend_func_separate(gl::ONE_MINUS_DST_ALPHA, gl::ONE, gl::ZERO, gl::ONE);
+        self.gl.blend_equation(gl::FUNC_ADD);
+    }
+    pub fn set_blend_mode_subpixel_with_bg_color_pass2(&self) {
+        self.gl.blend_func_separate(gl::ONE, gl::ONE, gl::ONE, gl::ONE_MINUS_SRC_ALPHA);
+        self.gl.blend_equation(gl::FUNC_ADD);
+    }
 }
 
 /// return (gl_internal_format, gl_format)
 fn gl_texture_formats_for_image_format(
     gl: &gl::Gl,
     format: ImageFormat,
 ) -> (gl::GLint, gl::GLuint) {
     match format {
--- a/gfx/webrender/src/frame.rs
+++ b/gfx/webrender/src/frame.rs
@@ -73,17 +73,17 @@ impl<'a> FlattenContext<'a> {
             .get(complex_clips)
             .collect()
     }
 
     fn flatten_root(
         &mut self,
         traversal: &mut BuiltDisplayListIter<'a>,
         pipeline_id: PipelineId,
-        frame_size: &LayoutSize,
+        content_size: &LayoutSize,
     ) {
         self.builder.push_stacking_context(
             &LayerVector2D::zero(),
             pipeline_id,
             CompositeOps::default(),
             TransformStyle::Flat,
             true,
             true,
@@ -92,21 +92,21 @@ impl<'a> FlattenContext<'a> {
         // We do this here, rather than above because we want any of the top-level
         // stacking contexts in the display list to be treated like root stacking contexts.
         // FIXME(mrobinson): Currently only the first one will, which for the moment is
         // sufficient for all our use cases.
         self.builder.notify_waiting_for_root_stacking_context();
 
         // For the root pipeline, there's no need to add a full screen rectangle
         // here, as it's handled by the framebuffer clear.
-        let clip_id = ClipId::root_reference_frame(pipeline_id);
+        let clip_id = ClipId::root_scroll_node(pipeline_id);
         if self.scene.root_pipeline_id != Some(pipeline_id) {
             if let Some(pipeline) = self.scene.pipelines.get(&pipeline_id) {
                 if let Some(bg_color) = pipeline.background_color {
-                    let root_bounds = LayerRect::new(LayerPoint::zero(), *frame_size);
+                    let root_bounds = LayerRect::new(LayerPoint::zero(), *content_size);
                     let info = LayerPrimitiveInfo::new(root_bounds);
                     self.builder.add_solid_rectangle(
                         ClipAndScrollInfo::simple(clip_id),
                         &info,
                         &RectangleContent::Fill(bg_color),
                         PrimitiveFlags::None,
                     );
                 }
@@ -115,24 +115,22 @@ impl<'a> FlattenContext<'a> {
 
 
         self.flatten_items(traversal, pipeline_id, LayerVector2D::zero());
 
         if self.builder.config.enable_scrollbars {
             let scrollbar_rect = LayerRect::new(LayerPoint::zero(), LayerSize::new(10.0, 70.0));
             let info = LayerPrimitiveInfo::new(scrollbar_rect);
 
-            if let Some(node_id) = self.clip_scroll_tree.topmost_scrolling_node_id {
-                self.builder.add_solid_rectangle(
-                    ClipAndScrollInfo::simple(clip_id),
-                    &info,
-                    &RectangleContent::Fill(DEFAULT_SCROLLBAR_COLOR),
-                    PrimitiveFlags::Scrollbar(node_id, 4.0),
-                );
-            }
+            self.builder.add_solid_rectangle(
+                ClipAndScrollInfo::simple(clip_id),
+                &info,
+                &RectangleContent::Fill(DEFAULT_SCROLLBAR_COLOR),
+                PrimitiveFlags::Scrollbar(self.clip_scroll_tree.topmost_scrolling_node_id(), 4.0),
+            );
         }
 
         self.builder.pop_stacking_context();
     }
 
     fn flatten_items(
         &mut self,
         traversal: &mut BuiltDisplayListIter<'a>,
@@ -337,30 +335,40 @@ impl<'a> FlattenContext<'a> {
             self.clip_scroll_tree,
         );
 
         self.pipeline_epochs.push((pipeline_id, pipeline.epoch));
 
         let iframe_rect = LayerRect::new(LayerPoint::zero(), bounds.size);
         let origin = reference_frame_relative_offset + bounds.origin.to_vector();
         let transform = LayerToScrollTransform::create_translation(origin.x, origin.y, 0.0);
-        self.builder.push_reference_frame(
+        let iframe_reference_frame_id = self.builder.push_reference_frame(
             Some(clip_id),
             pipeline_id,
             &iframe_rect,
             &transform,
             origin,
             true,
             self.clip_scroll_tree,
         );
 
+        self.builder.add_scroll_frame(
+            ClipId::root_scroll_node(pipeline_id),
+            iframe_reference_frame_id,
+            pipeline_id,
+            &iframe_rect,
+            &pipeline.content_size,
+            ScrollSensitivity::ScriptAndInputEvents,
+            self.clip_scroll_tree,
+        );
+
         self.flatten_root(
             &mut pipeline.display_list.iter(),
             pipeline_id,
-            &bounds.size,
+            &pipeline.content_size,
         );
 
         self.builder.pop_reference_frame();
     }
 
     fn flatten_item<'b>(
         &'b mut self,
         item: DisplayItemRef<'a, 'b>,
@@ -666,16 +674,17 @@ impl<'a> FlattenContext<'a> {
         };
 
         self.builder.add_solid_rectangle(
             *clip_and_scroll,
             &prim_info,
             content,
             PrimitiveFlags::None,
         );
+
         for clipped_rect in &clipped_rects {
             let mut info = info.clone();
             info.rect = *clipped_rect;
             self.builder.add_solid_rectangle(
                 *clip_and_scroll,
                 &info,
                 content,
                 PrimitiveFlags::None,
@@ -1092,16 +1101,17 @@ impl FrameContext {
                 tiled_image_map: resource_cache.get_tiled_image_map(),
                 pipeline_epochs: Vec::new(),
                 replacements: Vec::new(),
             };
 
             roller.builder.push_root(
                 root_pipeline_id,
                 &root_pipeline.viewport_size,
+                &root_pipeline.content_size,
                 roller.clip_scroll_tree,
             );
 
             roller.builder.setup_viewport_offset(
                 window_size,
                 inner_rect,
                 device_pixel_ratio,
                 roller.clip_scroll_tree,
--- a/gfx/webrender/src/frame_builder.rs
+++ b/gfx/webrender/src/frame_builder.rs
@@ -463,42 +463,55 @@ impl FrameBuilder {
                     viewport_offset.x,
                     viewport_offset.y,
                     0.0,
                 );
             }
             root_node.local_clip_rect = viewport_clip;
         }
 
-        if let Some(clip_id) = clip_scroll_tree.topmost_scrolling_node_id {
-            if let Some(root_node) = clip_scroll_tree.nodes.get_mut(&clip_id) {
-                root_node.local_clip_rect = viewport_clip;
-            }
+        let clip_id = clip_scroll_tree.topmost_scrolling_node_id();
+        if let Some(root_node) = clip_scroll_tree.nodes.get_mut(&clip_id) {
+            root_node.local_clip_rect = viewport_clip;
         }
     }
 
     pub fn push_root(
         &mut self,
         pipeline_id: PipelineId,
         viewport_size: &LayerSize,
+        content_size: &LayerSize,
         clip_scroll_tree: &mut ClipScrollTree,
     ) -> ClipId {
         let viewport_rect = LayerRect::new(LayerPoint::zero(), *viewport_size);
         let identity = &LayerToScrollTransform::identity();
         self.push_reference_frame(
             None,
             pipeline_id,
             &viewport_rect,
             identity,
             LayerVector2D::zero(),
             true,
             clip_scroll_tree,
         );
 
-        clip_scroll_tree.root_reference_frame_id
+        let topmost_scrolling_node_id = ClipId::root_scroll_node(pipeline_id);
+        clip_scroll_tree.topmost_scrolling_node_id = topmost_scrolling_node_id;
+
+        self.add_scroll_frame(
+            topmost_scrolling_node_id,
+            clip_scroll_tree.root_reference_frame_id,
+            pipeline_id,
+            &viewport_rect,
+            content_size,
+            ScrollSensitivity::ScriptAndInputEvents,
+            clip_scroll_tree,
+        );
+
+        topmost_scrolling_node_id
     }
 
     pub fn add_clip_node(
         &mut self,
         new_node_id: ClipId,
         parent_id: ClipId,
         pipeline_id: PipelineId,
         clip_region: ClipRegion,
@@ -1123,16 +1136,17 @@ impl FrameBuilder {
                 }
             }
         }
 
         let prim_font = FontInstance::new(
             font.font_key,
             font.size,
             *color,
+            font.bg_color,
             render_mode,
             font.subpx_dir,
             font.platform_options,
             font.variations.clone(),
             font.synthetic_italics,
         );
         let prim = TextRunPrimitiveCpu {
             font: prim_font,
--- a/gfx/webrender/src/gamma_lut.rs
+++ b/gfx/webrender/src/gamma_lut.rs
@@ -6,16 +6,17 @@
 Gamma correction lookup tables.
 
 This is a port of Skia gamma LUT logic into Rust, used by WebRender.
 */
 //#![warn(missing_docs)] //TODO
 #![allow(dead_code)]
 
 use api::ColorU;
+use std::cmp::max;
 
 /// Color space responsible for converting between lumas and luminances.
 #[derive(Clone, Copy, Debug, PartialEq)]
 pub enum LuminanceColorSpace {
     /// Linear space - no conversion involved.
     Linear,
     /// Simple gamma space - uses the `luminance ^ gamma` function.
     Gamma(f32),
@@ -290,100 +291,58 @@ impl GammaLut {
             tables: [[0; 256]; 1 << LUM_BITS],
         };
 
         table.generate_tables(contrast, paint_gamma, device_gamma);
 
         table
     }
 
-    // Skia normally preblends based on what the text color is.
-    // If we can't do that, use Skia default colors.
-    pub fn preblend_default_colors_bgra(&self, pixels: &mut [u8], width: usize, height: usize) {
-        let preblend_color = ColorU::new(0x7f, 0x80, 0x7f, 0xff);
-        self.preblend_bgra(pixels, width, height, preblend_color);
-    }
+    // Assumes pixels are in BGRA format. Assumes pixel values are in linear space already.
+    pub fn preblend(&self, pixels: &mut [u8], color: ColorU) {
+        let table_r = self.get_table(color.r);
+        let table_g = self.get_table(color.g);
+        let table_b = self.get_table(color.b);
 
-    fn replace_pixels_bgra(&self, pixels: &mut [u8], width: usize, height: usize,
-                           table_r: &[u8; 256], table_g: &[u8; 256], table_b: &[u8; 256]) {
-         for y in 0..height {
-            let current_height = y * width * 4;
-
-            for pixel in pixels[current_height..current_height + (width * 4)].chunks_mut(4) {
-                pixel[0] = table_b[pixel[0] as usize];
-                pixel[1] = table_g[pixel[1] as usize];
-                pixel[2] = table_r[pixel[2] as usize];
-                // Don't touch alpha
-            }
+        for pixel in pixels.chunks_mut(4) {
+            let (b, g, r) = (table_b[pixel[0] as usize], table_g[pixel[1] as usize], table_r[pixel[2] as usize]);
+            pixel[0] = b;
+            pixel[1] = g;
+            pixel[2] = r;
+            pixel[3] = max(max(b, g), r);
         }
     }
 
-    // Mostly used by windows and GlyphRunAnalysis::GetAlphaTexture
-    fn replace_pixels_rgb(&self, pixels: &mut [u8], width: usize, height: usize,
-                          table_r: &[u8; 256], table_g: &[u8; 256], table_b: &[u8; 256]) {
-         for y in 0..height {
-            let current_height = y * width * 3;
-
-            for pixel in pixels[current_height..current_height + (width * 3)].chunks_mut(3) {
-                pixel[0] = table_r[pixel[0] as usize];
-                pixel[1] = table_g[pixel[1] as usize];
-                pixel[2] = table_b[pixel[2] as usize];
-            }
+    #[cfg(target_os="macos")]
+    pub fn coregraphics_convert_to_linear(&self, pixels: &mut [u8]) {
+        for pixel in pixels.chunks_mut(4) {
+            pixel[0] = self.cg_inverse_gamma[pixel[0] as usize];
+            pixel[1] = self.cg_inverse_gamma[pixel[1] as usize];
+            pixel[2] = self.cg_inverse_gamma[pixel[2] as usize];
         }
     }
 
     // Assumes pixels are in BGRA format. Assumes pixel values are in linear space already.
-    pub fn preblend_bgra(&self, pixels: &mut [u8], width: usize, height: usize, color: ColorU) {
-        let table_r = self.get_table(color.r);
-        let table_g = self.get_table(color.g);
-        let table_b = self.get_table(color.b);
-
-        self.replace_pixels_bgra(pixels, width, height, table_r, table_g, table_b);
-    }
-
-    // Assumes pixels are in RGB format. Assumes pixel values are in linear space already. NOTE:
-    // there is no alpha here.
-    pub fn preblend_rgb(&self, pixels: &mut [u8], width: usize, height: usize, color: ColorU) {
-        let table_r = self.get_table(color.r);
-        let table_g = self.get_table(color.g);
-        let table_b = self.get_table(color.b);
-
-        self.replace_pixels_rgb(pixels, width, height, table_r, table_g, table_b);
-    }
-
-    #[cfg(target_os="macos")]
-    pub fn coregraphics_convert_to_linear_bgra(&self, pixels: &mut [u8], width: usize, height: usize) {
-        self.replace_pixels_bgra(pixels, width, height,
-                                 &self.cg_inverse_gamma,
-                                 &self.cg_inverse_gamma,
-                                 &self.cg_inverse_gamma);
-    }
-
-    // Assumes pixels are in BGRA format. Assumes pixel values are in linear space already.
-    pub fn preblend_grayscale_bgra(&self, pixels: &mut [u8], width: usize, height: usize, color: ColorU) {
+    pub fn preblend_grayscale(&self, pixels: &mut [u8], color: ColorU) {
         let table_g = self.get_table(color.g);
 
-         for y in 0..height {
-            let current_height = y * width * 4;
-
-            for pixel in pixels[current_height..current_height + (width * 4)].chunks_mut(4) {
-                let luminance = compute_luminance(pixel[2], pixel[1], pixel[0]);
-                pixel[0] = table_g[luminance as usize];
-                pixel[1] = table_g[luminance as usize];
-                pixel[2] = table_g[luminance as usize];
-                pixel[3] = table_g[luminance as usize];
-            }
+        for pixel in pixels.chunks_mut(4) {
+            let luminance = compute_luminance(pixel[2], pixel[1], pixel[0]);
+            let alpha = table_g[luminance as usize];
+            pixel[0] = alpha;
+            pixel[1] = alpha;
+            pixel[2] = alpha;
+            pixel[3] = alpha;
         }
     }
 
 } // end impl GammaLut
 
 #[cfg(test)]
 mod tests {
-    use std::cmp;
     use super::*;
 
     fn over(dst: u32, src: u32, alpha: u32) -> u32 {
         (src * alpha + dst * (255 - alpha))/255
     }
 
     fn overf(dst: f32, src: f32, alpha: f32) -> f32 {
         ((src * alpha + dst * (255. - alpha))/255.) as f32
@@ -409,17 +368,17 @@ mod tests {
                     let preblend = table[alpha as usize];
                     let lin_dst = (dst as f32 / 255.).powf(g) * 255.;
                     let lin_src = (src as f32 / 255.).powf(g) * 255.;
 
                     let preblend_result = over(dst, src, preblend as u32);
                     let true_result = ((overf(lin_dst, lin_src, alpha as f32) / 255.).powf(1. / g) * 255.) as u32;
                     let diff = absdiff(preblend_result, true_result);
                     //println!("{} -- {} {} = {}", alpha, preblend_result, true_result, diff);
-                    max_diff = cmp::max(max_diff, diff);
+                    max_diff = max(max_diff, diff);
                 }
 
                 //println!("{} {} max {}", src, dst, max_diff);
                 assert!(max_diff <= 33);
                 dst += 1;
 
             }
             src += 1;
--- a/gfx/webrender/src/glyph_rasterizer.rs
+++ b/gfx/webrender/src/glyph_rasterizer.rs
@@ -1,14 +1,14 @@
 /* This Source Code Form is subject to the terms of the Mozilla Public
  * License, v. 2.0. If a copy of the MPL was not distributed with this
  * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
 
 #[cfg(test)]
-use api::{ColorF, IdNamespace, LayoutPoint, SubpixelDirection};
+use api::{ColorF, ColorU, IdNamespace, LayoutPoint, SubpixelDirection};
 use api::{DevicePoint, DeviceUintSize, FontInstance, FontRenderMode};
 use api::{FontKey, FontTemplate, GlyphDimensions, GlyphKey};
 use api::{ImageData, ImageDescriptor, ImageFormat};
 #[cfg(test)]
 use app_units::Au;
 use device::TextureFilter;
 use glyph_cache::{CachedGlyphInfo, GlyphCache};
 use gpu_cache::GpuCache;
@@ -442,16 +442,17 @@ fn raterize_200_glyphs() {
 
     let font_key = FontKey::new(IdNamespace(0), 0);
     glyph_rasterizer.add_font(font_key, FontTemplate::Raw(Arc::new(font_data), 0));
 
     let font = FontInstance::new(
         font_key,
         Au::from_px(32),
         ColorF::new(0.0, 0.0, 0.0, 1.0),
+        ColorU::new(0, 0, 0, 0),
         FontRenderMode::Subpixel,
         SubpixelDirection::Horizontal,
         None,
         Vec::new(),
         false,
     );
 
     let mut glyph_keys = Vec::with_capacity(200);
--- a/gfx/webrender/src/platform/macos/font.rs
+++ b/gfx/webrender/src/platform/macos/font.rs
@@ -366,30 +366,26 @@ impl FontContext {
                 }
             })
     }
 
     // Assumes the pixels here are linear values from CG
     fn gamma_correct_pixels(
         &self,
         pixels: &mut Vec<u8>,
-        width: usize,
-        height: usize,
         render_mode: FontRenderMode,
         color: ColorU,
     ) {
         // Then convert back to gamma corrected values.
         match render_mode {
             FontRenderMode::Alpha => {
-                self.gamma_lut
-                    .preblend_grayscale_bgra(pixels, width, height, color);
+                self.gamma_lut.preblend_grayscale(pixels, color);
             }
             FontRenderMode::Subpixel => {
-                self.gamma_lut
-                    .preblend_bgra(pixels, width, height, color);
+                self.gamma_lut.preblend(pixels, color);
             }
             _ => {} // Again, give mono untouched since only the alpha matters.
         }
     }
 
     #[allow(dead_code)]
     fn print_glyph_data(&mut self, data: &[u8], width: usize, height: usize) {
         // Rust doesn't have step_by support on stable :(
@@ -590,55 +586,46 @@ impl FontContext {
             // allowed to write to the alpha channel, because we're done calling
             // CG functions now.
 
             if smooth {
                 // Convert to linear space for subpixel AA.
                 // We explicitly do not do this for grayscale AA ("Alpha without
                 // smoothing" or Mono) because those rendering modes are not
                 // gamma-aware in CoreGraphics.
-                self.gamma_lut.coregraphics_convert_to_linear_bgra(
+                self.gamma_lut.coregraphics_convert_to_linear(
                     &mut rasterized_pixels,
-                    metrics.rasterized_width as usize,
-                    metrics.rasterized_height as usize,
                 );
             }
 
-            for i in 0 .. metrics.rasterized_height {
-                let current_height = (i * metrics.rasterized_width * 4) as usize;
-                let end_row = current_height + (metrics.rasterized_width as usize * 4);
-
-                for pixel in rasterized_pixels[current_height .. end_row].chunks_mut(4) {
-                    if invert {
-                        pixel[0] = 255 - pixel[0];
-                        pixel[1] = 255 - pixel[1];
-                        pixel[2] = 255 - pixel[2];
-                    }
+            for pixel in rasterized_pixels.chunks_mut(4) {
+                if invert {
+                    pixel[0] = 255 - pixel[0];
+                    pixel[1] = 255 - pixel[1];
+                    pixel[2] = 255 - pixel[2];
+                }
 
-                    // Set alpha to the value of the green channel. For grayscale
-                    // text, all three channels have the same value anyway.
-                    // For subpixel text, the mask's alpha only makes a difference
-                    // when computing the destination alpha on destination pixels
-                    // that are not completely opaque. Picking an alpha value
-                    // that's somehow based on the mask at least ensures that text
-                    // blending doesn't modify the destination alpha on pixels where
-                    // the mask is entirely zero.
-                    pixel[3] = pixel[1];
-                } // end row
-            } // end height
+                // Set alpha to the value of the green channel. For grayscale
+                // text, all three channels have the same value anyway.
+                // For subpixel text, the mask's alpha only makes a difference
+                // when computing the destination alpha on destination pixels
+                // that are not completely opaque. Picking an alpha value
+                // that's somehow based on the mask at least ensures that text
+                // blending doesn't modify the destination alpha on pixels where
+                // the mask is entirely zero.
+                pixel[3] = pixel[1];
+            }
 
             if smooth {
                 // Convert back from linear space into device space, and perform
                 // some "preblending" based on the text color.
                 // In Alpha + smoothing mode, this will also convert subpixel AA
                 // into grayscale AA.
                 self.gamma_correct_pixels(
                     &mut rasterized_pixels,
-                    metrics.rasterized_width as usize,
-                    metrics.rasterized_height as usize,
                     font.render_mode,
                     font.color,
                 );
             }
         }
 
         Some(RasterizedGlyph {
             left: metrics.rasterized_left as f32,
--- a/gfx/webrender/src/platform/unix/font.rs
+++ b/gfx/webrender/src/platform/unix/font.rs
@@ -6,42 +6,44 @@ use api::{FontInstance, FontKey, FontRen
 use api::{FontInstancePlatformOptions, FontLCDFilter, FontHinting};
 use api::{NativeFontHandle, SubpixelDirection, GlyphKey, ColorU};
 use api::{FONT_FORCE_AUTOHINT, FONT_NO_AUTOHINT, FONT_EMBEDDED_BITMAP};
 use api::{FONT_EMBOLDEN, FONT_VERTICAL_LAYOUT, FONT_SUBPIXEL_BGR};
 use freetype::freetype::{FT_BBox, FT_Outline_Translate, FT_Pixel_Mode, FT_Render_Mode};
 use freetype::freetype::{FT_Done_Face, FT_Error, FT_Get_Char_Index, FT_Int32};
 use freetype::freetype::{FT_Done_FreeType, FT_Library_SetLcdFilter, FT_Pos};
 use freetype::freetype::{FT_F26Dot6, FT_Face, FT_Glyph_Format, FT_Long, FT_UInt};
-use freetype::freetype::{FT_GlyphSlot, FT_LcdFilter, FT_New_Memory_Face};
+use freetype::freetype::{FT_GlyphSlot, FT_LcdFilter, FT_New_Face, FT_New_Memory_Face};
 use freetype::freetype::{FT_Init_FreeType, FT_Load_Glyph, FT_Render_Glyph};
 use freetype::freetype::{FT_Library, FT_Outline_Get_CBox, FT_Set_Char_Size, FT_Select_Size};
 use freetype::freetype::{FT_LOAD_COLOR, FT_LOAD_DEFAULT, FT_LOAD_FORCE_AUTOHINT};
 use freetype::freetype::{FT_LOAD_IGNORE_GLOBAL_ADVANCE_WIDTH, FT_LOAD_NO_AUTOHINT};
 use freetype::freetype::{FT_LOAD_NO_BITMAP, FT_LOAD_NO_HINTING, FT_LOAD_VERTICAL_LAYOUT};
 use freetype::freetype::{FT_FACE_FLAG_SCALABLE, FT_FACE_FLAG_FIXED_SIZES, FT_Err_Cannot_Render_Glyph};
 use glyph_rasterizer::{GlyphFormat, RasterizedGlyph};
 use internal_types::FastHashMap;
 use std::{cmp, mem, ptr, slice};
+use std::cmp::max;
+use std::ffi::CString;
 use std::sync::Arc;
 
 // These constants are not present in the freetype
 // bindings due to bindgen not handling the way
 // the macros are defined.
 //const FT_LOAD_TARGET_NORMAL: FT_UInt = 0 << 16;
 const FT_LOAD_TARGET_LIGHT: FT_UInt  = 1 << 16;
 const FT_LOAD_TARGET_MONO: FT_UInt   = 2 << 16;
 const FT_LOAD_TARGET_LCD: FT_UInt    = 3 << 16;
 const FT_LOAD_TARGET_LCD_V: FT_UInt  = 4 << 16;
 
 struct Face {
     face: FT_Face,
     // Raw byte data has to live until the font is deleted, according to
     // https://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_New_Memory_Face
-    _bytes: Arc<Vec<u8>>,
+    _bytes: Option<Arc<Vec<u8>>>,
 }
 
 pub struct FontContext {
     lib: FT_Library,
     faces: FastHashMap<FontKey, Face>,
     lcd_extra_pixels: i64,
 }
 
@@ -98,27 +100,49 @@ impl FontContext {
                     &mut face,
                 )
             };
             if result.succeeded() && !face.is_null() {
                 self.faces.insert(
                     *font_key,
                     Face {
                         face,
-                        _bytes: bytes,
+                        _bytes: Some(bytes),
                     },
                 );
             } else {
                 println!("WARN: webrender failed to load font {:?}", font_key);
             }
         }
     }
 
-    pub fn add_native_font(&mut self, _font_key: &FontKey, _native_font_handle: NativeFontHandle) {
-        panic!("TODO: Not supported on Linux");
+    pub fn add_native_font(&mut self, font_key: &FontKey, native_font_handle: NativeFontHandle) {
+        if !self.faces.contains_key(&font_key) {
+            let mut face: FT_Face = ptr::null_mut();
+            let pathname = CString::new(native_font_handle.pathname).unwrap();
+            let result = unsafe {
+                FT_New_Face(
+                    self.lib,
+                    pathname.as_ptr(),
+                    native_font_handle.index as FT_Long,
+                    &mut face,
+                )
+            };
+            if result.succeeded() && !face.is_null() {
+                self.faces.insert(
+                    *font_key,
+                    Face {
+                        face,
+                        _bytes: None,
+                    },
+                );
+            } else {
+                println!("WARN: webrender failed to load font {:?}", font_key);
+            }
+        }
     }
 
     pub fn delete_font(&mut self, font_key: &FontKey) {
         if let Some(face) = self.faces.remove(font_key) {
             let result = unsafe { FT_Done_Face(face.face) };
             assert!(result.succeeded());
         }
     }
@@ -549,67 +573,55 @@ impl FontContext {
                         final_buffer[dest + 1] = alpha;
                         final_buffer[dest + 2] = alpha;
                         final_buffer[dest + 3] = alpha;
                         src = unsafe { src.offset(1) };
                         dest += 4;
                     }
                 }
                 FT_Pixel_Mode::FT_PIXEL_MODE_LCD => {
-                    if subpixel_bgr {
-                        while dest < row_end {
-                            final_buffer[dest + 0] = unsafe { *src };
-                            final_buffer[dest + 1] = unsafe { *src.offset(1) };
-                            final_buffer[dest + 2] = unsafe { *src.offset(2) };
-                            final_buffer[dest + 3] = 0xff;
-                            src = unsafe { src.offset(3) };
-                            dest += 4;
+                    while dest < row_end {
+                        let (mut r, g, mut b) = unsafe { (*src, *src.offset(1), *src.offset(2)) };
+                        if subpixel_bgr {
+                            mem::swap(&mut r, &mut b);
                         }
-                    } else {
-                        while dest < row_end {
-                            final_buffer[dest + 2] = unsafe { *src };
-                            final_buffer[dest + 1] = unsafe { *src.offset(1) };
-                            final_buffer[dest + 0] = unsafe { *src.offset(2) };
-                            final_buffer[dest + 3] = 0xff;
-                            src = unsafe { src.offset(3) };
-                            dest += 4;
-                        }
+                        final_buffer[dest + 0] = b;
+                        final_buffer[dest + 1] = g;
+                        final_buffer[dest + 2] = r;
+                        final_buffer[dest + 3] = max(max(b, g), r);
+                        src = unsafe { src.offset(3) };
+                        dest += 4;
                     }
                 }
                 FT_Pixel_Mode::FT_PIXEL_MODE_LCD_V => {
-                    if subpixel_bgr {
-                        while dest < row_end {
-                            final_buffer[dest + 0] = unsafe { *src };
-                            final_buffer[dest + 1] = unsafe { *src.offset(bitmap.pitch as isize) };
-                            final_buffer[dest + 2] = unsafe { *src.offset((2 * bitmap.pitch) as isize) };
-                            final_buffer[dest + 3] = 0xff;
-                            src = unsafe { src.offset(1) };
-                            dest += 4;
+                    while dest < row_end {
+                        let (mut r, g, mut b) =
+                            unsafe { (*src, *src.offset(bitmap.pitch as isize), *src.offset((2 * bitmap.pitch) as isize)) };
+                        if subpixel_bgr {
+                            mem::swap(&mut r, &mut b);
                         }
-                    } else {
-                        while dest < row_end {
-                            final_buffer[dest + 2] = unsafe { *src };
-                            final_buffer[dest + 1] = unsafe { *src.offset(bitmap.pitch as isize) };
-                            final_buffer[dest + 0] = unsafe { *src.offset((2 * bitmap.pitch) as isize) };
-                            final_buffer[dest + 3] = 0xff;
-                            src = unsafe { src.offset(1) };
-                            dest += 4;
-                        }
+                        final_buffer[dest + 0] = b;
+                        final_buffer[dest + 1] = g;
+                        final_buffer[dest + 2] = r;
+                        final_buffer[dest + 3] = max(max(b, g), r);
+                        src = unsafe { src.offset(1) };
+                        dest += 4;
                     }
                     src_row = unsafe { src_row.offset((2 * bitmap.pitch) as isize) };
                 }
                 FT_Pixel_Mode::FT_PIXEL_MODE_BGRA => {
                     // The source is premultiplied BGRA data.
                     let dest_slice = &mut final_buffer[dest .. row_end];
                     let src_slice = unsafe { slice::from_raw_parts(src, dest_slice.len()) };
                     dest_slice.copy_from_slice(src_slice);
                 }
                 _ => panic!("Unsupported {:?}", pixel_mode),
             }
             src_row = unsafe { src_row.offset(bitmap.pitch as isize) };
+            dest = row_end;
         }
 
         Some(RasterizedGlyph {
             left: ((dimensions.left + left) as f32 * scale).round(),
             top: ((dimensions.top + top - actual_height) as f32 * scale).round(),
             width: actual_width as u32,
             height: actual_height as u32,
             scale,
--- a/gfx/webrender/src/platform/windows/font.rs
+++ b/gfx/webrender/src/platform/windows/font.rs
@@ -248,56 +248,56 @@ impl FontContext {
                     top: -bounds.top,
                     width,
                     height,
                     advance: advance,
                 }
             })
     }
 
-    // DWRITE gives us values in RGB. WR doesn't really touch it after. Note, CG returns in BGR
-    // TODO: Decide whether all fonts should return RGB or BGR
-    fn convert_to_rgba(&self, pixels: &[u8], render_mode: FontRenderMode) -> Vec<u8> {
+    // DWrite ClearType gives us values in RGB, but WR expects BGRA.
+    fn convert_to_bgra(&self, pixels: &[u8], render_mode: FontRenderMode) -> Vec<u8> {
         match render_mode {
             FontRenderMode::Bitmap => {
                 unreachable!("TODO: bitmap fonts");
             }
             FontRenderMode::Mono => {
-                let mut rgba_pixels: Vec<u8> = vec![0; pixels.len() * 4];
+                let mut bgra_pixels: Vec<u8> = vec![0; pixels.len() * 4];
                 for i in 0 .. pixels.len() {
-                    rgba_pixels[i * 4 + 0] = pixels[i];
-                    rgba_pixels[i * 4 + 1] = pixels[i];
-                    rgba_pixels[i * 4 + 2] = pixels[i];
-                    rgba_pixels[i * 4 + 3] = pixels[i];
+                    let alpha = pixels[i];
+                    bgra_pixels[i * 4 + 0] = alpha;
+                    bgra_pixels[i * 4 + 1] = alpha;
+                    bgra_pixels[i * 4 + 2] = alpha;
+                    bgra_pixels[i * 4 + 3] = alpha;
                 }
-                rgba_pixels
+                bgra_pixels
             }
             FontRenderMode::Alpha => {
                 let length = pixels.len() / 3;
-                let mut rgba_pixels: Vec<u8> = vec![0; length * 4];
+                let mut bgra_pixels: Vec<u8> = vec![0; length * 4];
                 for i in 0 .. length {
                     // Only take the G channel, as its closest to D2D
                     let alpha = pixels[i * 3 + 1] as u8;
-                    rgba_pixels[i * 4 + 0] = alpha;
-                    rgba_pixels[i * 4 + 1] = alpha;
-                    rgba_pixels[i * 4 + 2] = alpha;
-                    rgba_pixels[i * 4 + 3] = alpha;
+                    bgra_pixels[i * 4 + 0] = alpha;
+                    bgra_pixels[i * 4 + 1] = alpha;
+                    bgra_pixels[i * 4 + 2] = alpha;
+                    bgra_pixels[i * 4 + 3] = alpha;
                 }
-                rgba_pixels
+                bgra_pixels
             }
             FontRenderMode::Subpixel => {
                 let length = pixels.len() / 3;
-                let mut rgba_pixels: Vec<u8> = vec![0; length * 4];
+                let mut bgra_pixels: Vec<u8> = vec![0; length * 4];
                 for i in 0 .. length {
-                    rgba_pixels[i * 4 + 0] = pixels[i * 3 + 0];
-                    rgba_pixels[i * 4 + 1] = pixels[i * 3 + 1];
-                    rgba_pixels[i * 4 + 2] = pixels[i * 3 + 2];
-                    rgba_pixels[i * 4 + 3] = 0xff;
+                    bgra_pixels[i * 4 + 0] = pixels[i * 3 + 0];
+                    bgra_pixels[i * 4 + 1] = pixels[i * 3 + 1];
+                    bgra_pixels[i * 4 + 2] = pixels[i * 3 + 2];
+                    bgra_pixels[i * 4 + 3] = 0xff;
                 }
-                rgba_pixels
+                bgra_pixels
             }
         }
     }
 
     pub fn is_bitmap_font(&mut self, _font: &FontInstance) -> bool {
         // TODO(gw): Support bitmap fonts in DWrite.
         false
     }
@@ -323,53 +323,47 @@ impl FontContext {
         &mut self,
         font: &FontInstance,
         key: &GlyphKey,
     ) -> Option<RasterizedGlyph> {
         let analysis = self.create_glyph_analysis(font, key);
         let texture_type = dwrite_texture_type(font.render_mode);
 
         let bounds = analysis.get_alpha_texture_bounds(texture_type);
-        let width = (bounds.right - bounds.left) as usize;
-        let height = (bounds.bottom - bounds.top) as usize;
+        let width = (bounds.right - bounds.left) as u32;
+        let height = (bounds.bottom - bounds.top) as u32;
 
         // Alpha texture bounds can sometimes return an empty rect
         // Such as for spaces
         if width == 0 || height == 0 {
             return None;
         }
 
-        let mut pixels = analysis.create_alpha_texture(texture_type, bounds);
+        let pixels = analysis.create_alpha_texture(texture_type, bounds);
+        let mut bgra_pixels = self.convert_to_bgra(&pixels, font.render_mode);
 
         match font.render_mode {
             FontRenderMode::Mono | FontRenderMode::Bitmap => {}
             FontRenderMode::Alpha | FontRenderMode::Subpixel => {
                 let lut_correction = match font.platform_options {
                     Some(option) => if option.force_gdi_rendering {
                         &self.gdi_gamma_lut
                     } else {
                         &self.gamma_lut
                     },
                     None => &self.gamma_lut,
                 };
 
-                lut_correction.preblend_rgb(
-                    &mut pixels,
-                    width,
-                    height,
-                    font.color,
-                );
+                lut_correction.preblend(&mut bgra_pixels, font.color);
             }
         }
 
-        let rgba_pixels = self.convert_to_rgba(&mut pixels, font.render_mode);
-
         Some(RasterizedGlyph {
             left: bounds.left as f32,
             top: -bounds.top as f32,
-            width: width as u32,
-            height: height as u32,
+            width,
+            height,
             scale: 1.0,
             format: GlyphFormat::from(font.render_mode),
-            bytes: rgba_pixels,
+            bytes: bgra_pixels,
         })
     }
 }
--- a/gfx/webrender/src/prim_store.rs
+++ b/gfx/webrender/src/prim_store.rs
@@ -1,14 +1,14 @@
 /* This Source Code Form is subject to the terms of the Mozilla Public
  * License, v. 2.0. If a copy of the MPL was not distributed with this
  * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
 
 use api::{BorderRadius, BuiltDisplayList, ColorF, ComplexClipRegion, DeviceIntRect};
-use api::{DevicePoint, ExtendMode, FontInstance, FontRenderMode, GlyphInstance, GlyphKey};
+use api::{DevicePoint, ExtendMode, FontInstance, GlyphInstance, GlyphKey};
 use api::{GradientStop, ImageKey, ImageRendering, ItemRange, ItemTag, LayerPoint, LayerRect};
 use api::{ClipMode, LayerSize, LayerVector2D, LineOrientation, LineStyle};
 use api::{TileOffset, YuvColorSpace, YuvFormat};
 use border::BorderCornerInstance;
 use clip::{ClipSourcesHandle, ClipStore, Geometry};
 use frame_builder::PrimitiveContext;
 use gpu_cache::{GpuBlockData, GpuCache, GpuCacheAddress, GpuCacheHandle, GpuDataRequest,
                 ToGpuBlocks};
@@ -549,49 +549,32 @@ pub struct TextRunPrimitiveCpu {
     pub font: FontInstance,
     pub offset: LayerVector2D,
     pub glyph_range: ItemRange<GlyphInstance>,
     pub glyph_count: usize,
     pub glyph_keys: Vec<GlyphKey>,
     pub glyph_gpu_blocks: Vec<GpuBlockData>,
 }
 
-#[derive(Debug, Copy, Clone, Eq, PartialEq)]
-pub enum TextRunMode {
-    Normal,
-    Shadow,
-}
 
 impl TextRunPrimitiveCpu {
-    pub fn get_font(&self,
-                    run_mode: TextRunMode,
-                    device_pixel_ratio: f32,
-    ) -> FontInstance {
+    pub fn get_font(&self, device_pixel_ratio: f32) -> FontInstance {
         let mut font = self.font.clone();
-        match run_mode {
-            TextRunMode::Normal => {}
-            TextRunMode::Shadow => {
-                // Shadows never use subpixel AA, but need to respect the alpha/mono flag
-                // for reftests.
-                font.render_mode = font.render_mode.limit_by(FontRenderMode::Alpha);
-            }
-        };
         font.size = font.size.scale_by(device_pixel_ratio);
         font
     }
 
     fn prepare_for_render(
         &mut self,
         resource_cache: &mut ResourceCache,
         device_pixel_ratio: f32,
         display_list: &BuiltDisplayList,
-        run_mode: TextRunMode,
         gpu_cache: &mut GpuCache,
     ) {
-        let font = self.get_font(run_mode, device_pixel_ratio);
+        let font = self.get_font(device_pixel_ratio);
 
         // Cache the glyph positions, if not in the cache already.
         // TODO(gw): In the future, remove `glyph_instances`
         //           completely, and just reference the glyphs
         //           directly from the display list.
         if self.glyph_keys.is_empty() {
             let subpx_dir = font.subpx_dir.limit_by(font.render_mode);
             let src_glyphs = display_list.get(self.glyph_range);
@@ -623,16 +606,17 @@ impl TextRunPrimitiveCpu {
             }
         }
 
         resource_cache.request_glyphs(font, &self.glyph_keys, gpu_cache);
     }
 
     fn write_gpu_blocks(&self, request: &mut GpuDataRequest) {
         request.push(ColorF::from(self.font.color).premultiplied());
+        request.push(ColorF::from(self.font.bg_color));
         request.push([
             self.offset.x,
             self.offset.y,
             self.font.subpx_dir.limit_by(self.font.render_mode) as u32 as f32,
             0.0,
         ]);
         request.extend_from_slice(&self.glyph_gpu_blocks);
 
@@ -1092,17 +1076,16 @@ impl PrimitiveStore {
 
     fn prepare_prim_for_render_inner(
         &mut self,
         prim_index: PrimitiveIndex,
         prim_context: &PrimitiveContext,
         resource_cache: &mut ResourceCache,
         gpu_cache: &mut GpuCache,
         render_tasks: &mut RenderTaskTree,
-        text_run_mode: TextRunMode,
     ) {
         let metadata = &mut self.cpu_metadata[prim_index.0];
         match metadata.prim_kind {
             PrimitiveKind::Rectangle | PrimitiveKind::Border | PrimitiveKind::Line => {}
             PrimitiveKind::Picture => {
                 self.cpu_pictures[metadata.cpu_prim_index.0]
                     .prepare_for_render(
                         prim_index,
@@ -1111,17 +1094,16 @@ impl PrimitiveStore {
                     );
             }
             PrimitiveKind::TextRun => {
                 let text = &mut self.cpu_text_runs[metadata.cpu_prim_index.0];
                 text.prepare_for_render(
                     resource_cache,
                     prim_context.device_pixel_ratio,
                     prim_context.display_list,
-                    text_run_mode,
                     gpu_cache,
                 );
             }
             PrimitiveKind::Image => {
                 let image_cpu = &mut self.cpu_images[metadata.cpu_prim_index.0];
 
                 resource_cache.request_image(
                     image_cpu.image_key,
@@ -1352,17 +1334,16 @@ impl PrimitiveStore {
                 let sub_prim_index = PrimitiveIndex(run.prim_index.0 + i);
 
                 self.prepare_prim_for_render_inner(
                     sub_prim_index,
                     prim_context,
                     resource_cache,
                     gpu_cache,
                     render_tasks,
-                    TextRunMode::Shadow,
                 );
             }
         }
 
         if !self.update_clip_task(
             prim_index,
             prim_context,
             geometry.device_rect,
@@ -1375,17 +1356,16 @@ impl PrimitiveStore {
         }
 
         self.prepare_prim_for_render_inner(
             prim_index,
             prim_context,
             resource_cache,
             gpu_cache,
             render_tasks,
-            TextRunMode::Normal,
         );
 
         Some(geometry)
     }
 }
 
 
 //Test for one clip region contains another
--- a/gfx/webrender/src/renderer.rs
+++ b/gfx/webrender/src/renderer.rs
@@ -218,17 +218,20 @@ bitflags! {
 // behaviour per draw-call.
 type ShaderMode = i32;
 
 #[repr(C)]
 enum TextShaderMode {
     Alpha = 0,
     SubpixelPass0 = 1,
     SubpixelPass1 = 2,
-    ColorBitmap = 3,
+    SubpixelWithBgColorPass0 = 3,
+    SubpixelWithBgColorPass1 = 4,
+    SubpixelWithBgColorPass2 = 5,
+    ColorBitmap = 6,
 }
 
 impl Into<ShaderMode> for TextShaderMode {
     fn into(self) -> i32 {
         self as i32
     }
 }
 
@@ -633,16 +636,17 @@ impl SourceTextureResolver {
 
 #[derive(Debug, Copy, Clone, PartialEq)]
 pub enum BlendMode {
     None,
     Alpha,
     PremultipliedAlpha,
     PremultipliedDestOut,
     Subpixel,
+    SubpixelWithBgColor,
 }
 
 // Tracks the state of each row in the GPU cache texture.
 struct CacheRow {
     is_dirty: bool,
 }
 
 impl CacheRow {
@@ -1004,17 +1008,18 @@ impl BrushShader {
     ) where M: Into<ShaderMode> {
         match blend_mode {
             BlendMode::None => {
                 self.opaque.bind(device, projection, mode, renderer_errors)
             }
             BlendMode::Alpha |
             BlendMode::PremultipliedAlpha |
             BlendMode::PremultipliedDestOut |
-            BlendMode::Subpixel => {
+            BlendMode::Subpixel |
+            BlendMode::SubpixelWithBgColor => {
                 self.alpha.bind(device, projection, mode, renderer_errors)
             }
         }
     }
 
     fn deinit(self, device: &mut Device) {
         self.opaque.deinit(device);
         self.alpha.deinit(device);
@@ -1207,16 +1212,17 @@ pub struct Renderer {
     // Most draw directly to the framebuffer, but some use inputs
     // from the cache shaders to draw. Specifically, the box
     // shadow primitive shader stretches the box shadow cache
     // output, and the cache_image shader blits the results of
     // a cache shader (e.g. blur) to the screen.
     ps_rectangle: PrimitiveShader,
     ps_rectangle_clip: PrimitiveShader,
     ps_text_run: PrimitiveShader,
+    ps_text_run_subpx_bg_pass1: PrimitiveShader,
     ps_image: Vec<Option<PrimitiveShader>>,
     ps_yuv_image: Vec<Option<PrimitiveShader>>,
     ps_border_corner: PrimitiveShader,
     ps_border_edge: PrimitiveShader,
     ps_gradient: PrimitiveShader,
     ps_angle_gradient: PrimitiveShader,
     ps_radial_gradient: PrimitiveShader,
     ps_line: PrimitiveShader,
@@ -1476,16 +1482,23 @@ impl Renderer {
 
         let ps_text_run = try!{
             PrimitiveShader::new("ps_text_run",
                                  &mut device,
                                  &[],
                                  options.precache_shaders)
         };
 
+        let ps_text_run_subpx_bg_pass1 = try!{
+            PrimitiveShader::new("ps_text_run",
+                                 &mut device,
+                                 &["SUBPX_BG_PASS1"],
+                                 options.precache_shaders)
+        };
+
         // All image configuration.
         let mut image_features = Vec::new();
         let mut ps_image: Vec<Option<PrimitiveShader>> = Vec::new();
         // PrimitiveShader is not clonable. Use push() to initialize the vec.
         for _ in 0 .. IMAGE_BUFFER_KINDS.len() {
             ps_image.push(None);
         }
         for buffer_kind in 0 .. IMAGE_BUFFER_KINDS.len() {
@@ -1819,16 +1832,17 @@ impl Renderer {
             brush_image_rgba8,
             brush_image_a8,
             cs_clip_rectangle,
             cs_clip_border,
             cs_clip_image,
             ps_rectangle,
             ps_rectangle_clip,
             ps_text_run,
+            ps_text_run_subpx_bg_pass1,
             ps_image,
             ps_yuv_image,
             ps_border_corner,
             ps_border_edge,
             ps_gradient,
             ps_angle_gradient,
             ps_radial_gradient,
             ps_blend,
@@ -2473,17 +2487,18 @@ impl Renderer {
             }
             BatchKind::Transformable(transform_kind, batch_kind) => match batch_kind {
                 TransformBatchKind::Rectangle(needs_clipping) => {
                     debug_assert!(
                         !needs_clipping || match key.blend_mode {
                             BlendMode::Alpha |
                             BlendMode::PremultipliedAlpha |
                             BlendMode::PremultipliedDestOut |
-                            BlendMode::Subpixel => true,
+                            BlendMode::Subpixel |
+                            BlendMode::SubpixelWithBgColor => true,
                             BlendMode::None => false,
                         }
                     );
 
                     if needs_clipping {
                         self.ps_rectangle_clip.bind(
                             &mut self.device,
                             transform_kind,
@@ -2817,16 +2832,17 @@ impl Renderer {
             for batch in &target.alpha_batcher.batch_list.alpha_batch_list.batches {
                 if self.debug_flags.contains(DebugFlags::ALPHA_PRIM_DBG) {
                     let color = match batch.key.blend_mode {
                         BlendMode::None => ColorF::new(0.3, 0.3, 0.3, 1.0),
                         BlendMode::Alpha => ColorF::new(0.0, 0.9, 0.1, 1.0),
                         BlendMode::PremultipliedAlpha => ColorF::new(0.0, 0.3, 0.7, 1.0),
                         BlendMode::PremultipliedDestOut => ColorF::new(0.6, 0.2, 0.0, 1.0),
                         BlendMode::Subpixel => ColorF::new(0.5, 0.0, 0.4, 1.0),
+                        BlendMode::SubpixelWithBgColor => ColorF::new(0.6, 0.0, 0.5, 1.0),
                     }.into();
                     for item_rect in &batch.item_rects {
                         self.debug.add_rect(item_rect, color);
                     }
                 }
 
                 match batch.key.kind {
                     BatchKind::Transformable(transform_kind, TransformBatchKind::TextRun(glyph_format)) => {
@@ -2892,16 +2908,68 @@ impl Renderer {
 
                                 // When drawing the 2nd pass, we know that the VAO, textures etc
                                 // are all set up from the previous draw_instanced_batch call,
                                 // so just issue a draw call here to avoid re-uploading the
                                 // instances and re-binding textures etc.
                                 self.device
                                     .draw_indexed_triangles_instanced_u16(6, batch.instances.len() as i32);
                             }
+                            BlendMode::SubpixelWithBgColor => {
+                                // Using the three pass "component alpha with font smoothing
+                                // background color" rendering technique:
+                                //
+                                // /webrender/doc/text-rendering.md
+                                //
+                                self.device.set_blend_mode_subpixel_with_bg_color_pass0();
+
+                                self.ps_text_run.bind(
+                                    &mut self.device,
+                                    transform_kind,
+                                    projection,
+                                    TextShaderMode::SubpixelWithBgColorPass0,
+                                    &mut self.renderer_errors,
+                                );
+
+                                self.draw_instanced_batch(
+                                    &batch.instances,
+                                    VertexArrayKind::Primitive,
+                                    &batch.key.textures
+                                );
+
+                                self.device.set_blend_mode_subpixel_with_bg_color_pass1();
+
+                                self.ps_text_run_subpx_bg_pass1.bind(
+                                    &mut self.device,
+                                    transform_kind,
+                                    projection,
+                                    TextShaderMode::SubpixelWithBgColorPass1,
+                                    &mut self.renderer_errors,
+                                );
+
+                                // When drawing the 2nd and 3rd passes, we know that the VAO, textures etc
+                                // are all set up from the previous draw_instanced_batch call,
+                                // so just issue a draw call here to avoid re-uploading the
+                                // instances and re-binding textures etc.
+                                self.device
+                                    .draw_indexed_triangles_instanced_u16(6, batch.instances.len() as i32);
+
+                                self.device.set_blend_mode_subpixel_with_bg_color_pass2();
+
+                                self.ps_text_run.bind(
+                                    &mut self.device,
+                                    transform_kind,
+                                    projection,
+                                    TextShaderMode::SubpixelWithBgColorPass2,
+                                    &mut self.renderer_errors,
+                                );
+
+                                self.device
+                                    .draw_indexed_triangles_instanced_u16(6, batch.instances.len() as i32);
+                            }
                             BlendMode::Alpha | BlendMode::PremultipliedDestOut | BlendMode::None => {
                                 unreachable!("bug: bad blend mode for text");
                             }
                         }
 
                         prev_blend_mode = BlendMode::None;
                         self.device.set_blend(false);
                     }
@@ -2918,17 +2986,17 @@ impl Renderer {
                                 BlendMode::PremultipliedAlpha => {
                                     self.device.set_blend(true);
                                     self.device.set_blend_mode_premultiplied_alpha();
                                 }
                                 BlendMode::PremultipliedDestOut => {
                                     self.device.set_blend(true);
                                     self.device.set_blend_mode_premultiplied_dest_out();
                                 }
-                                BlendMode::Subpixel => {
+                                BlendMode::Subpixel | BlendMode::SubpixelWithBgColor => {
                                     unreachable!("bug: subpx text handled earlier");
                                 }
                             }
                             prev_blend_mode = batch.key.blend_mode;
                         }
 
                         self.submit_batch(
                             &batch.key,
--- a/gfx/webrender/src/resource_cache.rs
+++ b/gfx/webrender/src/resource_cache.rs
@@ -346,23 +346,25 @@ impl ResourceCache {
         options: Option<FontInstanceOptions>,
         platform_options: Option<FontInstancePlatformOptions>,
         variations: Vec<FontVariation>,
     ) {
         let FontInstanceOptions {
             render_mode,
             subpx_dir,
             synthetic_italics,
+            bg_color,
             ..
         } = options.unwrap_or_default();
         assert!(render_mode != FontRenderMode::Bitmap);
         let mut instance = FontInstance::new(
             font_key,
             glyph_size,
             ColorF::new(0.0, 0.0, 0.0, 1.0),
+            bg_color,
             render_mode,
             subpx_dir,
             platform_options,
             variations,
             synthetic_italics,
         );
         if self.glyph_rasterizer.is_bitmap_font(&instance) {
             instance.render_mode = instance.render_mode.limit_by(FontRenderMode::Bitmap);
--- a/gfx/webrender/src/tiling.rs
+++ b/gfx/webrender/src/tiling.rs
@@ -15,17 +15,17 @@ use glyph_rasterizer::GlyphFormat;
 use gpu_cache::{GpuCache, GpuCacheAddress, GpuCacheHandle, GpuCacheUpdateList};
 use gpu_types::{BlurDirection, BlurInstance, BrushInstance, BrushImageKind, ClipMaskInstance};
 use gpu_types::{CompositePrimitiveInstance, PrimitiveInstance, SimplePrimitiveInstance};
 use gpu_types::{BRUSH_FLAG_USES_PICTURE};
 use internal_types::{FastHashMap, SourceTexture};
 use internal_types::BatchTextures;
 use picture::PictureKind;
 use prim_store::{PrimitiveIndex, PrimitiveKind, PrimitiveMetadata, PrimitiveStore};
-use prim_store::{BrushMaskKind, BrushKind, DeferredResolve, RectangleContent, TextRunMode};
+use prim_store::{BrushMaskKind, BrushKind, DeferredResolve, RectangleContent};
 use profiler::FrameProfileCounters;
 use render_task::{AlphaRenderItem, ClipWorkItem, MaskGeometryKind, MaskSegment};
 use render_task::{RenderTaskAddress, RenderTaskId, RenderTaskKey, RenderTaskKind};
 use render_task::{BlurTask, ClearMode, RenderTaskLocation, RenderTaskTree};
 use renderer::BlendMode;
 use renderer::ImageBufferKind;
 use resource_cache::{GlyphFetchResult, ResourceCache};
 use std::{cmp, usize, f32, i32};
@@ -52,19 +52,23 @@ impl AlphaBatchHelpers for PrimitiveStor
         metadata: &PrimitiveMetadata,
         transform_kind: TransformedRectKind,
     ) -> BlendMode {
         let needs_blending = !metadata.opacity.is_opaque || metadata.clip_task_id.is_some() ||
             transform_kind == TransformedRectKind::Complex;
 
         match metadata.prim_kind {
             PrimitiveKind::TextRun => {
-                let text_run_cpu = &self.cpu_text_runs[metadata.cpu_prim_index.0];
-                match text_run_cpu.font.render_mode {
-                    FontRenderMode::Subpixel => BlendMode::Subpixel,
+                let font = &self.cpu_text_runs[metadata.cpu_prim_index.0].font;
+                match font.render_mode {
+                    FontRenderMode::Subpixel => if font.bg_color.a != 0 {
+                        BlendMode::SubpixelWithBgColor
+                    } else {
+                        BlendMode::Subpixel
+                    },
                     FontRenderMode::Alpha |
                     FontRenderMode::Mono |
                     FontRenderMode::Bitmap => BlendMode::PremultipliedAlpha,
                 }
             },
             PrimitiveKind::Rectangle => {
                 let rectangle_cpu = &self.cpu_rectangles[metadata.cpu_prim_index.0];
                 match rectangle_cpu.content {
@@ -272,17 +276,18 @@ impl BatchList {
     fn get_suitable_batch(
         &mut self,
         key: BatchKey,
         item_bounding_rect: &DeviceIntRect,
     ) -> &mut Vec<PrimitiveInstance> {
         match key.blend_mode {
             BlendMode::None => self.opaque_batch_list.get_suitable_batch(key),
             BlendMode::Alpha | BlendMode::PremultipliedAlpha |
-            BlendMode::PremultipliedDestOut | BlendMode::Subpixel => {
+            BlendMode::PremultipliedDestOut | BlendMode::Subpixel |
+            BlendMode::SubpixelWithBgColor => {
                 self.alpha_batch_list
                     .get_suitable_batch(key, item_bounding_rect)
             }
         }
     }
 
     fn finalize(&mut self) {
         self.opaque_batch_list.finalize()
@@ -556,17 +561,17 @@ impl AlphaRenderItem {
                         );
                         let batch = batch_list.get_suitable_batch(key, item_bounding_rect);
                         batch.push(base_instance.build(uv_address.as_int(gpu_cache), 0, 0));
                     }
                     PrimitiveKind::TextRun => {
                         let text_cpu =
                             &ctx.prim_store.cpu_text_runs[prim_metadata.cpu_prim_index.0];
 
-                        let font = text_cpu.get_font(TextRunMode::Normal, ctx.device_pixel_ratio);
+                        let font = text_cpu.get_font(ctx.device_pixel_ratio);
 
                         ctx.resource_cache.fetch_glyphs(
                             font,
                             &text_cpu.glyph_keys,
                             glyph_fetch_buffer,
                             gpu_cache,
                             |texture_id, glyph_format, glyphs| {
                                 debug_assert_ne!(texture_id, SourceTexture::Invalid);
@@ -1231,17 +1236,17 @@ impl RenderTarget for ColorRenderTarget 
                                     PrimitiveKind::TextRun => {
                                         // Add instances that reference the text run GPU location. Also supply
                                         // the parent shadow prim address as a user data field, allowing
                                         // the shader to fetch the shadow parameters.
                                         let text = &ctx.prim_store.cpu_text_runs
                                             [sub_metadata.cpu_prim_index.0];
                                         let text_run_cache_prims = &mut self.text_run_cache_prims;
 
-                                        let font = text.get_font(TextRunMode::Shadow, ctx.device_pixel_ratio);
+                                        let font = text.get_font(ctx.device_pixel_ratio);
 
                                         ctx.resource_cache.fetch_glyphs(
                                             font,
                                             &text.glyph_keys,
                                             &mut self.glyph_fetch_buffer,
                                             gpu_cache,
                                             |texture_id, _glyph_format, glyphs| {
                                                 let batch = text_run_cache_prims
--- a/gfx/webrender_api/src/api.rs
+++ b/gfx/webrender_api/src/api.rs
@@ -663,17 +663,22 @@ impl RenderApi {
         clamp: ScrollClamping,
     ) {
         self.send(
             document_id,
             DocumentMsg::ScrollNodeWithId(origin, id, clamp),
         );
     }
 
-    /// Does a hit test as the given point
+    /// Does a hit test on display items in the specified document, at the given
+    /// point. If a pipeline_id is specified, it is used to further restrict the
+    /// hit results so that only items inside that pipeline are matched. If the
+    /// HitTestFlags argument contains the FIND_ALL flag, then the vector of hit
+    /// results will contain all display items that match, ordered from front
+    /// to back.
     pub fn hit_test(&self,
                     document_id: DocumentId,
                     pipeline_id: Option<PipelineId>,
                     point: WorldPoint,
                     flags: HitTestFlags)
                     -> HitTestResult {
         let (tx, rx) = channel::msg_channel().unwrap();
         self.send(document_id, DocumentMsg::HitTest(pipeline_id, point, flags, tx));
--- a/gfx/webrender_api/src/display_item.rs
+++ b/gfx/webrender_api/src/display_item.rs
@@ -684,25 +684,29 @@ impl ComplexClipRegion {
 #[derive(Clone, Copy, Debug, Deserialize, Eq, Hash, PartialEq, Serialize)]
 pub enum ClipId {
     Clip(u64, PipelineId),
     ClipExternalId(u64, PipelineId),
     DynamicallyAddedNode(u64, PipelineId),
 }
 
 impl ClipId {
+    pub fn root_scroll_node(pipeline_id: PipelineId) -> ClipId {
+        ClipId::Clip(0, pipeline_id)
+    }
+
     pub fn root_reference_frame(pipeline_id: PipelineId) -> ClipId {
         ClipId::DynamicallyAddedNode(0, pipeline_id)
     }
 
     pub fn new(id: u64, pipeline_id: PipelineId) -> ClipId {
-        // We do this because it is very easy to accidentally create something that
-        // seems like the root node, but isn't one.
+        // We do this because it is very easy to create accidentally create something that
+        // seems like a root scroll node, but isn't one.
         if id == 0 {
-            return ClipId::root_reference_frame(pipeline_id);
+            return ClipId::root_scroll_node(pipeline_id);
         }
 
         ClipId::ClipExternalId(id, pipeline_id)
     }
 
     pub fn pipeline_id(&self) -> PipelineId {
         match *self {
             ClipId::Clip(_, pipeline_id) |
@@ -713,15 +717,15 @@ impl ClipId {
 
     pub fn external_id(&self) -> Option<u64> {
         match *self {
             ClipId::ClipExternalId(id, _) => Some(id),
             _ => None,
         }
     }
 
-    pub fn is_root(&self) -> bool {
+    pub fn is_root_scroll_node(&self) -> bool {
         match *self {
-            ClipId::DynamicallyAddedNode(0, _) => true,
+            ClipId::Clip(0, _) => true,
             _ => false,
         }
     }
 }
--- a/gfx/webrender_api/src/display_list.rs
+++ b/gfx/webrender_api/src/display_list.rs
@@ -19,18 +19,18 @@ use serde::{Deserialize, Serialize, Seri
 use serde::ser::{SerializeMap, SerializeSeq};
 use std::io::{Read, Write};
 use std::{io, ptr};
 use std::marker::PhantomData;
 use std::slice;
 use time::precise_time_ns;
 
 // We don't want to push a long text-run. If a text-run is too long, split it into several parts.
-// Please check the renderer::MAX_VERTEX_TEXTURE_WIDTH for the detail.
-pub const MAX_TEXT_RUN_LENGTH: usize = 2040;
+// This needs to be set to (renderer::MAX_VERTEX_TEXTURE_WIDTH - VECS_PER_PRIM_HEADER - VECS_PER_TEXT_RUN) * 2
+pub const MAX_TEXT_RUN_LENGTH: usize = 2038;
 
 #[repr(C)]
 #[derive(Clone, Copy, Debug, Deserialize, Eq, Hash, PartialEq, Serialize)]
 pub struct ItemRange<T> {
     start: usize,
     length: usize,
     _boo: PhantomData<T>,
 }
@@ -652,17 +652,17 @@ impl DisplayListBuilder {
 
         // We start at 1 here, because the root scroll id is always 0.
         const FIRST_CLIP_ID: u64 = 1;
 
         DisplayListBuilder {
             data: Vec::with_capacity(capacity),
             pipeline_id,
             clip_stack: vec![
-                ClipAndScrollInfo::simple(ClipId::root_reference_frame(pipeline_id)),
+                ClipAndScrollInfo::simple(ClipId::root_scroll_node(pipeline_id)),
             ],
             next_clip_id: FIRST_CLIP_ID,
             builder_start_time: start_time,
             content_size,
             save_state: None,
         }
     }
 
--- a/gfx/webrender_api/src/font.rs
+++ b/gfx/webrender_api/src/font.rs
@@ -46,21 +46,22 @@ impl<'de> Deserialize<'de> for NativeFon
             Ok(font) => Ok(NativeFontHandle(font)),
             _ => Err(de::Error::custom(
                 "Couldn't find a font with that PostScript name!",
             )),
         }
     }
 }
 
-/// Native fonts are not used on Linux; all fonts are raw.
 #[cfg(not(any(target_os = "macos", target_os = "windows")))]
-#[cfg_attr(not(any(target_os = "macos", target_os = "windows")),
-           derive(Clone, Serialize, Deserialize))]
-pub struct NativeFontHandle;
+#[derive(Clone, Serialize, Deserialize)]
+pub struct NativeFontHandle {
+    pub pathname: String,
+    pub index: u32,
+}
 
 #[cfg(target_os = "windows")]
 pub type NativeFontHandle = FontDescriptor;
 
 #[repr(C)]
 #[derive(Copy, Clone, Deserialize, Serialize, Debug)]
 pub struct GlyphDimensions {
     pub left: i32,
@@ -204,24 +205,29 @@ pub struct GlyphOptions {
 }
 
 #[repr(C)]
 #[derive(Clone, Copy, Debug, Deserialize, Hash, Eq, PartialEq, PartialOrd, Ord, Serialize)]
 pub struct FontInstanceOptions {
     pub render_mode: FontRenderMode,
     pub subpx_dir: SubpixelDirection,
     pub synthetic_italics: bool,
+    /// When bg_color.a is != 0 and render_mode is FontRenderMode::Subpixel,
+    /// the text will be rendered with bg_color.r/g/b as an opaque estimated
+    /// background color.
+    pub bg_color: ColorU,
 }
 
 impl Default for FontInstanceOptions {
     fn default() -> FontInstanceOptions {
         FontInstanceOptions {
             render_mode: FontRenderMode::Subpixel,
             subpx_dir: SubpixelDirection::Horizontal,
             synthetic_italics: false,
+            bg_color: ColorU::new(0, 0, 0, 0),
         }
     }
 }
 
 #[cfg(target_os = "windows")]
 #[repr(C)]
 #[derive(Clone, Copy, Debug, Deserialize, Hash, Eq, PartialEq, PartialOrd, Ord, Serialize)]
 pub struct FontInstancePlatformOptions {
@@ -308,38 +314,41 @@ pub struct FontInstance {
     pub font_key: FontKey,
     // The font size is in *device* pixels, not logical pixels.
     // It is stored as an Au since we need sub-pixel sizes, but
     // can't store as a f32 due to use of this type as a hash key.
     // TODO(gw): Perhaps consider having LogicalAu and DeviceAu
     //           or something similar to that.
     pub size: Au,
     pub color: ColorU,
+    pub bg_color: ColorU,
     pub render_mode: FontRenderMode,
     pub subpx_dir: SubpixelDirection,
     pub platform_options: Option<FontInstancePlatformOptions>,
     pub variations: Vec<FontVariation>,
     pub synthetic_italics: bool,
 }
 
 impl FontInstance {
     pub fn new(
         font_key: FontKey,
         size: Au,
         color: ColorF,
+        bg_color: ColorU,
         render_mode: FontRenderMode,
         subpx_dir: SubpixelDirection,
         platform_options: Option<FontInstancePlatformOptions>,
         variations: Vec<FontVariation>,
         synthetic_italics: bool,
     ) -> FontInstance {
         FontInstance {
             font_key,
             size,
             color: color.into(),
+            bg_color,
             render_mode,
             subpx_dir,
             platform_options,
             variations,
             synthetic_italics,
         }
     }