Bug 1438687 - Add Developer documentation for LocaleService. draft
authorZibi Braniecki <zbraniecki@mozilla.com>
Sun, 25 Mar 2018 15:01:57 +0200
changeset 772246 a97eed4b7a966ffe65f9076200ab7bcc72df1195
parent 772245 551c159e6284c0fa360cd2fe06abcc2de0245b53
child 772247 0cfa620e34f2c29391ca1e5b06ae4c41c9530ab8
push id103879
push userbmo:gandalf@aviary.pl
push dateSun, 25 Mar 2018 13:04:10 +0000
bugs1438687
milestone61.0a1
Bug 1438687 - Add Developer documentation for LocaleService. MozReview-Commit-ID: JBfR1B6FwmJ
intl/docs/index.rst
intl/docs/locale.rst
intl/moz.build
new file mode 100644
--- /dev/null
+++ b/intl/docs/index.rst
@@ -0,0 +1,25 @@
+====================
+Internationalization
+====================
+
+Internationalization ("i18n") is a domain of computer science focused on making
+software accessible accross languages, regions, scripts and cultures.
+
+On the most abstract level Gecko i18n is a combination of algorithms, data structures
+and APIs that aim to enable Gecko to work with all human scripts and languages
+both as a UI toolkit and as a web engine.
+
+In order to achieve that, i18n has to hook into many components such as layout, gfx, dom,
+widget, build, front-end, JS engine and accessiblity.
+It also has to be available accross programming languages and frameworks used in the
+platform and front-end.
+
+Below is a list of articles that introduce the concepts necessary to understand and
+use Mozilla's I18n APIs.
+
+.. toctree::
+   :maxdepth: 1
+
+   locale
+   dataintl
+   localization
new file mode 100644
--- /dev/null
+++ b/intl/docs/locale.rst
@@ -0,0 +1,582 @@
+.. role:: js(code)
+   :language: javascript
+
+=================
+Locale management
+=================
+
+Internationalization requires to recognize in some way what combination of languages,
+scripts and regional preferences the user wants to format their data into.
+
+A combination of those preferences is called a Locale.
+
+There are multiple models of locale data structures in the industry with varying degrees
+of compatibility between each other. Historically, each major platform used their own,
+and many standard bodies provided conflicting proposals.
+
+Mozilla follows Unicode and W3C recommendation and conform to a standard known
+as BCP47 which describes a low level textual representation of the `Locale` as a
+language tag.
+
+A few examples of language tags: *en-US*, *de*, *ar*, *zh-Hans*, *es-CL*.
+
+Locales and Language Tags
+=========================
+
+Above examples present language tags with several fields ommitted, which is allowed
+by the standard.
+In order to understand the full model it is necessary to look at the more complete data
+structure of the Locale.
+
+Locale data structure consists of four primary fields.
+
+ - Language (Example: English - *en*, French - *fr*, Serbian - *sr*)
+ - Script (Example: Latin - *Latn*, Cyrylic - *Cyrl*)
+ - Region (Example: American - *US*, Canadian - *CA*, Russian - *RU*)
+ - Variants (Example: Mac OS - *macos*, Windows - *windows*, Linux - *linux*)
+
+On top of that a locale may contain:
+ - extensions and private fields
+     Mozilla currently has partial support for them in JS implementation and plans to
+     extend the support to all APIs.
+ - extkeys and grandfathered tags
+       Mozilla does not support them.
+
+
+In result an example locale can be visualized as:
+
+.. code-block:: javascript
+
+  {
+      "language": "sr",
+      "script": "Cyrl",
+      "region": "RU",
+      "variants": []
+  }
+
+which can be then serialized into a string: **"sr-Cyrl-RU"**.
+
+Locale Fallback Chains
+======================
+
+Locale sensitive operations are always condidered "best-effort". That means that it
+cannot be assume that a perfect match will exist between what the user requested and what
+the API can provide.
+
+In result, the practice at Mozilla is to *always* operate on locale fallback chains -
+ordered lists of locales according to the user preference.
+
+An example of a locale fallback chain may be: :js:`["es-CL", "es-ES", "es", "fr", "en"]`.
+
+The above means a request to format the data according to the Chilean Spanish if possible,
+fall back on Spanish Spanish, then any (generic) Spanish, French and eventually on
+English.
+
+It is *always* better to use a locale fallback chain over a single locale.
+In case there's only one locale available, a list with one element will handle that,
+and allow for future extensions without a costly refactor.
+
+Language Negotiation
+====================
+
+Due to the imperfections in data matching, all operations on locales should always
+use a language negotiation algorithm to resolve the best available set of locales
+based on the list of all available locales and an ordered list of requested locales.
+
+Such algorithms may vary in sophistication and number of used strategies. Mozilla's
+solution is based on a modified logic from RFC5656.
+
+The three lists of locales used in negotiation:
+
+ - **Available** - locales that can be selected
+ - **Requested** - locales that the user selected in decreasing order of preference
+ - **Resolved** - result of the negotiation
+
+On top of that there's a concept of *DefaultLocale* which is a single locale out
+of the list of available ones that should be used as a in case there is no match
+to be found between available and requested locales.
+
+A result of a negotiation is an ordered list of locales that are available to the system
+and it is expected for the consumer to attempt using the locales in the resolved order.
+
+Negotiation should be used in all scenarios like selecting language resources,
+calendar, number formattings etc.
+
+Filtering / Matching / Lookup
+-----------------------------
+
+Mozilla's `LocaleService` API offers three language negotiation strategies.
+
+Filtering
+^^^^^^^^^
+
+The first one is called **filtering** and handles a scenario the reader wants to
+filter get all possible available locales using the requested locales.
+
+This is the most common scenario, where there is a benefit of creating a maximal
+possible list of locales that the user may benefit from.
+
+An example of a scenario:
+
+.. code-block:: javascript
+
+    let requested = ["fr-CA", "en-US"]
+    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"]
+
+    let result = Services.locale.negotiateLanguages(requested, available);
+
+    result; // ["fr-CA", "fr", "fr-ZH", "en-GB", "en-ZA"]
+
+In the example above the algorithm was able to match *"fr-CA"* as a perfect match,
+but then was able to find other matches as well - a generic French is a very
+good match, and Swiss French is also very close to the top requested language.
+
+In case of the second of the requested locales, unfortunately American English
+is not available, but British English and South African English are there.
+
+The algorithm is greedy and attempts to match as many locales
+as possible. This is usually what the developer wants.
+
+Matching
+^^^^^^^^
+
+In less common scenarios the code needs to match just one, best, available locale for
+each of the requested locales.
+
+An example of this scenario:
+
+.. code-block:: javascript
+
+    let requested = ["fr-CA", "en-US"]
+    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"]
+
+    let result = Services.locale.negotiateLanguages(requested, available);
+
+    result; // ["fr-CA", "en-GB"]
+
+The best available locales for *"fr-CA"* is a perfect match, and for *"en-US"*, the
+algorithm selected British English.
+
+Lookup
+^^^^^^
+
+The third strategy should be used in cases where no matter what only one locale
+can be every used. Some third-party APIs don't support fallback and it doesn't make
+sense to continue resolving after finding the first locale.
+
+It is still advised to continue using this API as a fallback chain list, just in
+this case with a single element.
+
+.. code-block:: javascript
+
+    let requested = ["fr-CA", "en-US"]
+    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"]
+
+    let result = Services.locale.negotiateLanguages(requested, available);
+
+    result; // ["fr-CA"]
+
+
+Chained Language Negotiation
+============================
+
+In some cases the user may want to link a language selection to another component.
+
+For example, Firefox extension may come with its own list of available locales, which
+may have locales that Firefox doesn't.
+
+In that case, negotiation between user requested locales and the addon's list may result
+in a selection of locales superceeding that of Firefox itself.
+
+
+.. code-block:: none
+
+         Fx Available
+        +-------------+
+        |  it, fr, ar |
+        +-------------+                 Fx Locales
+                      |                +--------+
+                      +--------------> | fr, ar |
+                      |                +--------+
+            Requested |
+     +----------------+
+     | es, fr, pl, ar |
+     +----------------+                 Addon Locales
+                      |                +------------+
+                      +--------------> | es, fr, ar |
+      Addon Available |                +------------+
+    +-----------------+
+    |  de, es, fr, ar |
+    +-----------------+
+
+
+In that case, an addon may end up being displayed in Spanish, while Firefox UI will
+use French. In most cases this is a bad UX.
+
+In order to avoid that, one can chain the addon negotiation and take Firefox's resolved
+locales as a `requested` and negotiate that against the addons' `available` list.
+
+.. code-block:: none
+
+        Fx Available
+       +-------------+
+       |  it, ar, fr |
+       +-------------+                Fx Locales
+                     |                +--------+
+                     +--------------> | fr, ar |
+                     |                +--------+
+           Requested |                         |                Addon Locales
+    +----------------+                         |                +--------+
+    | es, fr, pl, ar |                         +------------->  | fr, ar |
+    +----------------+                         |                +--------+
+                                               |
+                               Addon A^ailable |
+                             +-----------------+
+                             |  de, es, ar, fr |
+                             +-----------------+
+
+Available Locales
+=================
+
+In Gecko, available locales are combined from a list of locale resources
+packaged into the bundle (see: `Packaged Locales`), and a list of instlaled special
+addons called `language packs`.
+
+The primary notion of which locales are available is based on which locales Gecko has
+localization resources for.
+
+In the future Mozilla may want to separate this into different sets of available locales,
+like locales available for Intl API, installer, component like devtools, hypenations etc.
+
+Each addon may also have its own list of locales, but currently Gecko does not store a
+centralized list of those sets.
+
+Requested Locales
+=================
+
+Similarly to available locales, Geceko currently only stores a single list of requested locales.
+
+The locale list is stored in a pref `intl.locale.requested`, but for all purposes
+from within the app should be set/read via `LocaleService::GetRequestedLocales` and
+`LocaleService::SetRequestedLocales` API, never directly via the pref itself.
+
+Using the API will perform necessary sanity checks and canonicalize the values.
+
+The pref usually will be used to store a comma separated list of valid BCP47 locale
+codes, but it can also have two special meanings:
+
+ - If the pref is not set at all, Gecko will use the default locale as the requested one.
+ - If the pref is set to an empty string, Gecko will look into OS app locales as the requested.
+
+If the developer wants to programmatically request the app to follow OS locales,
+they can call the `SetRequestedLocales` API with no argument.
+
+Regional Preferences
+====================
+
+Every locale comes with a set of default preferences that are specific to a culture
+and region. This contains preferences such as calendar system, way to display
+time (24h vs 12h clock), which day the week starts on, which days constitute a weekend,
+what numbering system and date time formatting a given locale uses
+(for example "MM/DD" in en-US vs "DD/MM" in en-AU).
+
+For all such preferences Gecko has a list of default settings for every region,
+but there's also a degree of customization every user may want to make.
+
+All major operating systems have Settings UI for selecting those preferences,
+and since Firefox does not provide its own, Gecko looks into the OS for them.
+
+There are two ways Gecko can try to encode those preferences.
+
+One is to store them as a separate data structure and provide API to retrieve them.
+Another is to use Unicode Extension Keys to add such information to the Locale
+object and in result language tag.
+An example of a correctly encoded BCP47 language tag for American English with
+hour cycle specifically set to be *"24"* is **"en-US-u-hc-h24"**.
+
+At the moment Mozilla uses the former method. A special API `mozilla::intl::OSPreferences`
+handles communication with the host operating system retrieving regional preferences
+and altering internationalization formatting with user preferences.
+
+One thing to notice is that the boundary between regional preferences and language
+selection is not strong. In many cases the internationalization formats
+will contain language specific terms and literals. For example a date formatting
+pattern into Japanese may look like this - *"2018年3月24日"*, or the date format
+may contains names of months or weekdays those will be translated
+("April", "Tuesday" etc.).
+
+For that reason it is tricky to follow regional preferences in a scenario where Operating
+System locale selection does not match Firefox UI locales.
+
+Such behavior might lead to a UI case like "Today is październik" in English Firefox
+with Polish name of the month.
+
+For that reason, by default, Gecko will *only* look into OS Preferences if the *language*
+portion of the locale of the OS and Firefox match.
+That means that if Windows is in *"en-AU"* and Firefox is in *"en-US"* Gecko will look
+into Windows Regional Preferences, but if Windows is in *"de-CH"* and Firefox
+is in *"fr-FR"* it won't.
+In order to enforce Gecko to look into OS preferences irrelevant of the language match,
+set the flag `intl.locale.use_os_preferences` to `true`.
+
+Default and Last Fallback Locales
+=================================
+
+Every Gecko application is built with a single locale as the default one. Such locale
+is guaranteed to have all linguistic resources available, should be used
+as the default locale in case language negotiation cannot find any match, and also
+as the last locale to look for in a fallback chain.
+
+If all else fails, Gecko also support a notion of last fallback locale, which is
+currently hardcoded to *"en-US"* and is the very final locale to try in case
+nothing else (including the default locale) works.
+Notice that Unicode and ICU use *"en-GB"* in that role because more English speaking
+people around the World recognize British regional preferences than American (metric vs.
+imperial, farenheit vs celsius etc.).
+Mozilla may switch to *"en-GB"* in the future.
+
+Packaged Locales
+================
+
+When Gecko application is being packaged it bundles a selection of locale resources
+to be available within it. At the moment, for example, most Firefox for Android
+builds come with almost 100 locales packaged into it, while Desktop Firefox comes
+with usually just one packaged locale.
+
+There is currently work being done on enabling more flexibility in how
+the locales are packaged to allow for bundling applications with different
+sets of locales in different areas - dictionaries, hypenations, product language resources,
+installer language resources, etc.
+
+UI Direction
+============
+
+Since the UI direction is so tightly coupled with the locale selection, the
+main method of testing the directionality of the Gecko app lives in LocaleService.
+
+`LocaleService::IsAppLocaleRTL` returns a boolean indicating what's the current
+direction of the app UI.
+
+Server / Client
+===============
+
+Locale management can operate in a client/server model. This allows a Gecko process
+to manage locales (server mode) or just receive the locale selection from a parent
+process (client mode).
+
+The client mode is currently used by all child processes of Desktop Firefox, and
+may be used by, for example, GeckoView to follow locale selection from a parent
+process.
+
+To check in which mode the process operates `LocaleService::IsServer` method is available.
+
+Mozilla Exceptions
+==================
+
+There's currently only a single exception of the BCP47 used across the board and that's
+a legacy "ja-JP-mac" locale. The "mac" is a variant and BCP47 requires all variants
+to be 5-8 character long.
+
+Gecko supports the limitation by accepting the 3-letter variants in our APIs and also
+provide a special `GetAppLocalesAsLangTags` method which returns this locale in that form.
+(`GetAppLocalesAsBCP47` will canonicalize it and turn into `ja-JP-x-variant-mac`).
+
+In the future there's hope this exception will be removed.
+
+Environments
+============
+
+While all the concepts described above apply to all programming languages and frameworks
+used by Mozilla, there are differences in completeness of the implementation.
+
+Below is the current list of APIs supported in each environment and examples of how to
+use them:
+
+C++
+---
+
+In C++ the core API for Locale is `mozilla::intl::Locale` and the service for locale
+management is `mozilla::intl::LocaleService`.
+
+For any OSPreference operations there's `mozilla::intl::OSPreferences`.
+
+
+JavaScript
+----------
+
+In JavaScript users can use `mozilla.org/intl/mozILocaleService` XPCOM API to access
+the LocaleService and `mozilla.org/intl/mozIOSPreferences` for OS preferences.
+
+The LocaleService API is exposed as `Services.locale` object.
+
+There's currently no API available for operations on language tags and Locale objects,
+but `Intl.Locale` API is being in works.
+
+Rust
+----
+
+For Rust Mozilla provides a crate `fluent-locale` which implements the concepts described
+above. This crate is not yet vendored into mozilla-central.
+
+Events
+======
+
+The `LocaleService` observes one pref for changes - `intl.locale.requested`, and one
+event - `intl:system-locale-changes` which can be emitted by OSPreferences.
+
+`LocaleService` also emits two events itself: `intl:app-locales-changed` and
+`intl:requested-locales-changed` which all code can listen to.
+
+Those events may be broadcasted in result of new language packs being installed, or
+uninstalled, or user selection of languages changing.
+
+In most cases, the code should probably observe the `intl:app-locales-changed`
+and react to only that event since this is the one indicating a change
+in the currently used language settings that the components should follow.
+
+Testing
+=======
+
+Many components may have logic encoded to react to changes in requested, available
+of resolved locales.
+
+In order to test the components behavior, it is important to replicate
+the environment in which such change may happen.
+
+Since in most cases it is advised for the components of the application to tie its
+language negotiation to the main application (see `Chained Language Negotiation`),
+it is not enough to add a new locale to requested to trigger the language change.
+
+First, it is necessary to add a new locale to the available ones, then change
+the requested, and only that will result in a new negotiation and language
+change happening.
+
+There are two primary ways to add locale to available ones.
+
+The first one is to use `L10nRegistry` to add a new `FileSource`. It may look like this:
+
+.. code-block:: javascript
+
+    let fs = new FileSource(["ko-KR", "ar"], "resource://mock-addon/localization/{locale}");
+
+    // here one can populate `fs.cache`
+    // to get particular resources available for the new locales.
+
+    L10nRegistry.registerSource(fs);
+
+    let availableLocales = Services.locale.getAvailableLocales();
+
+    assert(availableLocales.includes("ko-KR"));
+    assert(availableLocales.includes("ar"));
+
+    Services.locale.setRequestedLocales(["ko-KR");
+
+    let appLocales = Services.locale.getAppLocalesAsBCP47();
+    assert(appLocales[0], "ko-KR");
+
+Second method is much more limited, as it only mocks the locale availability,
+but it is also simpler:
+
+.. code-block:: javascript
+
+    Services.locale.setAvailableLocales(["ko-KR", "ar"]);
+    Services.locale.setRequestedLocales(["ko-KR"]);
+
+    let appLocales = Services.locale.getAppLocalesAsBCP47();
+    assert(appLocales[0], "ko-KR");
+
+In the future Mozilla plans to add a special, third way for addons, to allow for either
+manual or automated testing purposes disconnecting its locales from main application
+ones.
+
+Testing the outcome
+-------------------
+
+Except of testing for reaction to locale changes, it is advised to avoid writing
+tests that expect a certain locale to be selected, or certain internationalization
+or localization data to be used.
+
+Doing so locks down the test infrastructure to be only usable when launched in
+a single locale environment and requires those tests to be updates whenever the underlying
+data changes.
+
+In case of testing locale selection it is best to use a fake locale like `x-test`, that
+will unlikely be already present at the beginning of the test.
+In case of testing for internationalization data it is best to use `resolvedOptions()`,
+to verify the right data is being used, rather than comparing the output string.
+In case of localization, it is best to test against the correct `data-l10n-id`
+being set or in edge cases verify that a given variable is present in the string using
+`String.prototype.includes`.
+
+Startup
+=======
+
+There are cases where it may be important to understand how Gecko locale management
+acts during the startup.
+
+Below is the description of the `server` mode, since the `client` mode is starting
+with no data and doesn't perform any operations waiting for the parent to fill
+basic locale lists (`requested` and `appLocales`) and then maintain them in a
+unidirectional way.
+
+In the `server` mode Gecko starts with no knowledge of available locales, nor of
+`requested`.
+
+Initially, all fields are resolved lazily, so no data for available, requested,
+default or resolved locales is retrieved.
+
+If any code queries any of the APIs, it triggers the initial data fetching
+and language negotiation.
+
+The initial request comes from the XPCLocale which is initializing
+the first JS context and needs to know what locale the JS context should use as
+the default.
+
+At that moment `LocaleService` fetches the list of available locales using
+packaged locales which are retrieved via `multilocale.txt` file in the toolkit's
+package.
+This gives LocaleService information about which locales are initially available.
+
+Notice that this happens before any of the language packs gets registered, so
+at that point Gecko only knows about packaged locales.
+
+For requested locales, the initial request comes before user profile preferences
+are being read, so the data is being fetched using packaged preferences.
+
+In most cases the `intl.locale.requested` pref will be not set, which means Gecko will
+use the default locale which is retrieved from `update.locale` file (also packaged).
+
+This means that the initial result of language negotiation is between packaged
+locales as available and the default requested locale.
+
+Next, the profile is being read and if the user set any requested locales,
+LocaleService updates its list of requested locales and broadcasts
+`intl:requested-locales-changed` event.
+
+This may lead to language renegotiation if the requested locale is one of the packaged
+ones. In that case, `intl:app-locales-changed` will be broadcasted.
+
+Finally, the AddonManager registers all the language packs and they get added to
+`L10nRegistry` and in result update LocaleService's available locales.
+
+That triggers language negotiation and if the language from the language pack
+is used in the requested list, final list of locales is being set.
+
+All of that happens before any UI is being built, but there's no guarantee of this
+order being preserved, so it is important to understand that depending on where the
+code is used during the startup it may receive different list of locales.
+
+In order to maintain the correct locale settings it is important to set an observer
+on `intl:app-locales-changed` and update the code when the locale list changes.
+
+That ensures the code always uses the best possible locale selection during startup,
+but also during runtime in case user changes their requested locale list, or
+language packs are updated/removed on the fly.
+
+Summary
+=======
+
+The model of locale management described above has been designed in year 2017 and
+in result, a lot of older Gecko code may not be well integrated into it.
+
+In case of questions, please consult Intl module peers.
--- a/intl/moz.build
+++ b/intl/moz.build
@@ -45,8 +45,10 @@ with Files("icu-patches/**"):
 with Files("tzdata/**"):
     BUG_COMPONENT = ("Core", "JavaScript: Internationalization API")
 
 with Files("update*"):
     BUG_COMPONENT = ("Core", "JavaScript: Internationalization API")
 
 with Files("icu_sources_data.py"):
     BUG_COMPONENT = ("Firefox Build System", "General")
+
+SPHINX_TREES['/intl'] = 'docs'