ansible: switch hgmaster to CentOS 7 (bug 1261212); r?fubar, kang draft
authorGregory Szorc <gps@mozilla.com>
Fri, 01 Apr 2016 17:39:02 -0700
changeset 7674 e407e3225ccfc7d3f76cc5b2cac6bad4a5fe9704
parent 7673 f32bc8e372a32a1eda15b779c9cafbbed07ad2e5
push id730
push usergszorc@mozilla.com
push dateSat, 02 Apr 2016 00:53:17 +0000
reviewersfubar, kang
bugs1261212
ansible: switch hgmaster to CentOS 7 (bug 1261212); r?fubar, kang As part of the switch, the sshd config has changed significantly. Before, we used openssh with the LPK (LDAP) patches. It listened on port 22 and processed root and hg logins. In the new world, we have 2 sshd daemons. The system sshd listens on port 22 and behaves like a normal sshd. The hg sshd listens on port 222. It disallows root login (yay!). LDAP integration is handled by the AuthorizedKeysCommand calling the custom LDAP key lookup script we just implemented. The new hg sshd configuration is using modern and recommended OpenSSH settings. We now generate and prefer ED25519 keys. DSA keys are gone. The RSA key length has increased from 2048 to 4096 bits. CentOS 7 also brings systemd to the table. The new sshd for hg is managed via systemd. Kafka and Zookeeper are still managed via supervisor. We will likely want to convert these to systemd someday. But it's easier to leave them as supervisor for now since they are still running on CentOS 6 for hgweb. On the Docker side of things, the hg-ssh role now needs to start more system services via supervisor. I couldn't get systemd working in Docker and the internet tells me it is pain, pain, and more pain. For some reason, system LDAP integration didn't work in CentOS 7 until I started nscd in the Docker container. I'm not sure why. But having nscd running is a good thing because that's how production works. Since we're no longer using openssh-lpk, a lot of the code from the openssh-lpk Ansible role has been copied into the hg-ssh role. This could probably be split into a new role. I imagine we'll do that when we need to convert reviewboard-hg to CentOS 7. For now, I'm fine with the duplication. If nothing else, it minimizes the risk of CentOS 6 and CentOS 7 interfering with each other in the Ansible configs. MozReview-Commit-ID: HIDMv539Apm
ansible/deploy-hgmo.yml
ansible/group_vars/hgmo
ansible/hgmo-strip-repo.yml
ansible/hosts
ansible/roles/docker-hg-ssh/files/entrypoint.py
ansible/roles/docker-hg-ssh/files/supervisor-docker.conf
ansible/roles/docker-hg-ssh/meta/main.yml
ansible/roles/hg-ssh/defaults/main.yml
ansible/roles/hg-ssh/files/sshd_config
ansible/roles/hg-ssh/files/sshd_hg.service
ansible/roles/hg-ssh/handlers/main.yml
ansible/roles/hg-ssh/meta/main.yml
ansible/roles/hg-ssh/tasks/main.yml
ansible/roles/hg-ssh/templates/nslcd.conf.j2
ansible/roles/hg-ssh/templates/sshd_config_hg.j2
hgserver/tests/helpers.sh
hgserver/tests/test-auth.t
testing/clobber.hgmaster
testing/vcttesting/docker.py
testing/vcttesting/hgmo.py
testing/vcttesting/hgmo_mach_commands.py
--- a/ansible/deploy-hgmo.yml
+++ b/ansible/deploy-hgmo.yml
@@ -1,10 +1,10 @@
 ---
-- hosts: hgssh2.dmz.scl3.mozilla.com
+- hosts: hgssh3.dmz.scl3.mozilla.com
   gather_facts: no
   tasks:
     - name: verify deploying changeset is available on server
       command: hg -R {{ vct }} log -r {{ lookup('file', '../.vctnode') }} -T '{phase}'
       register: vct_node_phase
       delegate_to: 127.0.0.1
 
     - name: require public vct node
@@ -35,35 +35,33 @@
       run_once: true
 
     # We need to write this out on clients.
     - name: capture mirror key
       slurp: src=/etc/mercurial/mirror
       register: mirror_private_key
 
     - name: capture host key
-      slurp: src=/etc/ssh/ssh_host_rsa_key.pub
+      slurp: src=/etc/mercurial/ssh/ssh_host_rsa_key.pub
       register: mirror_host_key
 
 - hosts: hgweb-prod
   roles:
     - { role: hg-web,
         # We have to use hostvars to reference variables on other hosts.
         # slurp captures content in base64 encoded form. Decode it
         # before it is passed in.
-        mirror_private_key: "{{ hostvars['hgssh2.dmz.scl3.mozilla.com'].mirror_private_key.content | b64decode }}",
-        mirror_host_key: "{{ hostvars['hgssh2.dmz.scl3.mozilla.com'].mirror_host_key.content | b64decode }}",
+        mirror_private_key: "{{ hostvars['hgssh3.dmz.scl3.mozilla.com'].mirror_private_key.content | b64decode }}",
+        mirror_host_key: "{{ hostvars['hgssh3.dmz.scl3.mozilla.com'].mirror_host_key.content | b64decode }}",
         # hg-zlb.vips.scl3.mozilla.com resolves to multiple IPs.
         mirror_ips: ["63.245.215.25", "63.245.215.102"],
         vct_node: "{{ lookup('file', '../.vctnode') }}",
       }
 
-- hosts:
-    - hgssh2.dmz.scl3.mozilla.com
-
+- hosts: hgssh-prod
   pre_tasks:
     # Until we integrate secrets with Ansible, the LDAP config is
     # pre-defined on the server.
     - name: capture LDAP config
       slurp: src=/etc/mercurial/ldap.json
       register: ldap_config
 
   roles:
@@ -74,17 +72,17 @@
         ldap_uri: "{{ (ldap_config.content | b64decode | from_json).url }}",
         hgweb_hosts: "{{ groups['hgweb-prod'] }}",
       }
 
   tasks:
     # Install CRON to generate Mercurial bundle files. This only needs
     # to run on the master.
     - include: tasks/hgmo-bundle-cron.yml
-      when: ansible_hostname == "hgssh2"
+      when: ansible_hostname == "hgssh3"
 
     - name: discover kafka topics
       command: /opt/kafka/bin/kafka-topics.sh --zookeeper {{ kafka_zookeeper_connect }} --list
       register: kafka_topics
       run_once: true
 
     - name: create kafka topics
       command: /opt/kafka/bin/kafka-topics.sh --zookeeper {{ kafka_zookeeper_connect }} --create --topic {{ item.topic }} --partitions {{ item.partitions }} --replication-factor {{ kafka_replication_factor }} --config min.insync.replicas={{ kafka_min_insync_replicas }} --config unclean.leader.election.enable=false --config max.message.bytes=104857600
@@ -95,17 +93,17 @@
         - { topic: pushlog, partitions: 1 }
 
     - name: record deployment of this changeset
       copy: dest=/etc/mercurial/deployed_vct_changeset
             content={{ lookup('file', '../.vctnode') }}
             owner=root
             group=root
             mode=0644
-      when: ansible_hostname == 'hgssh2'
+      when: ansible_hostname == 'hgssh3'
 
     - name: notify IRC of deployment
       irc: server=irc.mozilla.org
            port=6697
            use_ssl=true
            channel="#vcs"
            nick=hg-deploy-bot
            color=red
--- a/ansible/group_vars/hgmo
+++ b/ansible/group_vars/hgmo
@@ -14,10 +14,9 @@ kafka_zookeeper_connect: "hgssh3.dmz.scl
 
 kafka_replication_factor: 5
 kafka_min_insync_replicas: 3
 
 # LDAP integration on hgssh servers combined with a zookeeper user
 # defined in LDAP means that Ansible's muckery of the zookeeper user
 # doesn't work. So ignore it on these hosts.
 ignore_zookeeper_user:
-  - hgssh1.dmz.scl3.mozilla.com
   - hgssh2.dmz.scl3.mozilla.com
--- a/ansible/hgmo-strip-repo.yml
+++ b/ansible/hgmo-strip-repo.yml
@@ -1,10 +1,10 @@
 ---
-- hosts: hgssh1.dmz.scl3.mozilla.com
+- hosts: hgssh3.dmz.scl3.mozilla.com
   gather_facts: no
   tasks:
     - name: Strip repo on master
       command: /var/hg/venv_pash/bin/hg --config extensions.strip= -R /repo/hg/mozilla/{{ repo | mandatory }} strip -r {{ rev | mandatory }}
 
 - hosts: hgweb-prod
   gather_facts: no
   tasks:
--- a/ansible/hosts
+++ b/ansible/hosts
@@ -4,16 +4,17 @@ reviewboard-hg1.dmz.scl3.mozilla.com ans
 
 [rbweb-prod]
 reviewboard[1:2].webapp.scl3.mozilla.com ansible_ssh_user=root
 
 [hgweb-prod]
 hgweb[1:10].dmz.scl3.mozilla.com ansible_ssh_user=root
 
 [hgssh-prod]
-hgssh[1:2].dmz.scl3.mozilla.com ansible_ssh_user=root
+hgssh2.dmz.scl3.mozilla.com ansible_ssh_user=root
+hgssh3.dmz.scl3.mozilla.com ansible_sudo=yes
 
 [hgssh-stage]
 hgssh.stage.dmz.scl3.mozilla.com ansible_ssh_user=root
 
 [hgmo:children]
 hgssh-prod
 hgweb-prod
--- a/ansible/roles/docker-hg-ssh/files/entrypoint.py
+++ b/ansible/roles/docker-hg-ssh/files/entrypoint.py
@@ -20,24 +20,24 @@ subprocess.check_call([
     cwd='/vct/ansible')
 
 del os.environ['DOCKER_ENTRYPOINT']
 
 ldap_hostname = os.environ['LDAP_PORT_389_TCP_ADDR']
 ldap_port = os.environ['LDAP_PORT_389_TCP_PORT']
 ldap_uri = 'ldap://%s:%s/' % (ldap_hostname, ldap_port)
 
-# Generate host SSH keys.
-if not os.path.exists('/etc/ssh/ssh_host_dsa_key'):
-    subprocess.check_call(['/usr/bin/ssh-keygen', '-t', 'dsa',
-                           '-f', '/etc/ssh/ssh_host_dsa_key'])
+# Generate host SSH keys for hg.
+if not os.path.exists('/etc/mercurial/ssh/ssh_host_ed25519_key'):
+    subprocess.check_call(['/usr/bin/ssh-keygen', '-t', 'ed25519',
+                           '-f', '/etc/mercurial/ssh/ssh_host_ed25519_key', '-N', ''])
 
-if not os.path.exists('/etc/ssh/ssh_host_rsa_key'):
-    subprocess.check_call(['/usr/bin/ssh-keygen', '-t', 'rsa', '-b', '2048',
-                           '-f', '/etc/ssh/ssh_host_rsa_key'])
+if not os.path.exists('/etc/mercurial/ssh/ssh_host_rsa_key'):
+    subprocess.check_call(['/usr/bin/ssh-keygen', '-t', 'rsa', '-b', '4096',
+                           '-f', '/etc/mercurial/ssh/ssh_host_rsa_key', '-N', ''])
 
 ldap_conf = open('/etc/mercurial/ldap.json', 'rb').readlines()
 with open('/etc/mercurial/ldap.json', 'wb') as fh:
     for line in ldap_conf:
         line = line.replace('%url%', ldap_uri)
         line = line.replace('%writeurl%', ldap_uri)
         fh.write(line)
 
@@ -61,11 +61,9 @@ with open('/etc/mercurial/hgrc', 'wb') a
     for line in hgrc_lines:
         # This isn't the most robust ini parsing logic in the world, but it
         # gets the job done.
         if line.startswith('hosts = '):
             line = 'hosts = %s\n' % ', '.join(kafka_servers)
 
         fh.write(line)
 
-subprocess.check_call(['/sbin/service', 'rsyslog', 'start'])
-
 os.execl(sys.argv[1], *sys.argv[1:])
--- a/ansible/roles/docker-hg-ssh/files/supervisor-docker.conf
+++ b/ansible/roles/docker-hg-ssh/files/supervisor-docker.conf
@@ -1,4 +1,15 @@
-[program:sshd]
-command=/usr/sbin/sshd -D
+[program:rsyslog]
+command = /usr/sbin/rsyslogd -n
 autorestart = true
 redirect_stderr = true
+
+# We need to run nslcd or system integration with LDAP doesn't work.
+[program:nslcd]
+command = /usr/sbin/nslcd -d
+autorestart = true
+redirect_stderr = true
+
+[program:sshd]
+command=/usr/sbin/sshd -D -f /etc/mercurial/ssh/sshd_config
+autorestart = true
+redirect_stderr = true
--- a/ansible/roles/docker-hg-ssh/meta/main.yml
+++ b/ansible/roles/docker-hg-ssh/meta/main.yml
@@ -2,14 +2,15 @@
 dependencies:
   - docker-kafkabroker
   - docker-python-coverage
   # hgweb_hosts is used to populate SSH host keys. We don't know these
   # until Docker containers have started, so make it empty.
   # Something similar applies to ZooKeeper and Kafka settings.
   - {
       role: hg-ssh,
+      sshd_hg_port: 22,
       hgweb_hosts: [],
       kafka_broker_id: 1024,
       kafka_host_name: dummyhost,
       kafka_zookeeper_connect: dummyhostports,
       zk_servers: { localhost: dummy }
     }
--- a/ansible/roles/hg-ssh/defaults/main.yml
+++ b/ansible/roles/hg-ssh/defaults/main.yml
@@ -1,2 +1,5 @@
 ---
+basedn: "dc=mozilla"
 pash_hostname: "hg.mozilla.org"
+home_attribute: "fakeHome"
+uid_attribute: "mail"
new file mode 100644
--- /dev/null
+++ b/ansible/roles/hg-ssh/files/sshd_config
@@ -0,0 +1,23 @@
+# This is the system level sshd. It is *not* the sshd used by Mercurial. See
+# the sshd_config_hg file for that.
+
+SyslogFacility AUTHPRIV
+LogLevel VERBOSE
+PermitRootLogin no
+PasswordAuthentication no
+# TODO hook up 2FA (bug 1259231)
+ChallengeResponseAuthentication no
+Protocol 2
+UsePrivilegeSeparation sandbox
+
+# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
+# but this is overridden so installations will only check .ssh/authorized_keys
+AuthorizedKeysFile .ssh/authorized_keys
+
+AllowAgentForwarding no
+
+AcceptEnv LANG LC_ALL LC_MESSAGES
+
+# Add extra logging to enable forensics.
+Subsystem sftp /usr/libexec/openssh/sftp-server -f AUTHPRIV -l INFO
+
new file mode 100644
--- /dev/null
+++ b/ansible/roles/hg-ssh/files/sshd_hg.service
@@ -0,0 +1,14 @@
+[Unit]
+Description=OpenSSH server daemon for Mercurial
+Documentation=man:sshd(8) man:sshd_config(5)
+After=network.target syslog.target
+
+[Service]
+ExecStart=/usr/sbin/sshd -D -f /etc/mercurial/ssh/sshd_config
+ExecReload=/bin/kill -HUP $MAINPID
+KillMode=process
+Restart=on-failure
+RestartSec=42s
+
+[Install]
+WantedBy=multi-user.target
--- a/ansible/roles/hg-ssh/handlers/main.yml
+++ b/ansible/roles/hg-ssh/handlers/main.yml
@@ -1,4 +1,16 @@
 ---
-# Fails in Docker.
+- name: run authconfig
+  command: /usr/sbin/authconfig --enablemkhomedir --enableldap --enableldapauth --ldapserver={{ ldap_uri }} --ldapbasedn={{ basedn }} --updateall
+
+# TODO Ansible isn't recognizing systemd services?
 - name: restart rsyslogd
   service: name=rsyslog state=restarted
+  ignore_errors: True
+
+- name: restart sshd
+  service: name=sshd state=restarted
+  ignore_errors: True
+
+- name: systemd daemon reload
+  command: /usr/bin/systemctl daemon-reload
+  ignore_errors: True
--- a/ansible/roles/hg-ssh/meta/main.yml
+++ b/ansible/roles/hg-ssh/meta/main.yml
@@ -1,22 +1,13 @@
 ---
 dependencies:
-  - ius-repo
+  - {
+      role: ius-repo,
+      when: "{{ ansible_distribution == 'CentOS' and ansible_distribution_major_version == '6' }}"
+    }
   - supervisor
   - {
-      role: openssh-lpk,
-      home_attribute: fakeHome,
-      accept_env: "AUTOLAND_REQUEST_USER LANG LC_ALL LC_MESSAGES",
-      force_command: /usr/local/bin/pash_wrapper,
-      # By default, SSH limits to 10 concurrent connections for
-      # individual users. This may interfere with replication if
-      # multiple replication events are in progress. So we up the limit.
-      # See bug 1038478.
-      max_startups: 50,
-      max_sessions: 50,
-    }
-  - {
       role: kafka-broker,
       kafka_host_name: "{{ inventory_hostname }}",
       kafka_broker_id: "{{ zk_servers[inventory_hostname] }}",
       when: "{{ inventory_hostname in zk_servers }}",
     }
--- a/ansible/roles/hg-ssh/tasks/main.yml
+++ b/ansible/roles/hg-ssh/tasks/main.yml
@@ -1,37 +1,81 @@
 ---
 - name: determine if running in Docker
   stat: path=/vct
   register: vct_dir
 
+# This is needed so authconfig can have the appropriate ldap_uri at container
+# start time, which results in /etc/openldap/ldap.conf getting updated
+# appropriately, which is necessary for nscd and `ldapsearch` to
+# "just work."
+- name: find LDAP URI in Docker
+  set_fact: ldap_uri=ldap://{{ ansible_env.LDAP_PORT_389_TCP_ADDR }}:{{ ansible_env.LDAP_PORT_389_TCP_PORT }}/
+  when: ansible_env.LDAP_PORT_389_TCP_ADDR is defined
+  tags: docker-startup
+
 - name: Install packages required to run a Mercurial server
   yum: name={{ item }} state=present
   with_items:
+    - authconfig
+    - nss-pam-ldapd
+    - openldap-clients
     # Needed to build python-ldap package for virtualenv.
     - openldap-devel
-    - python27
-    - python27-devel
-    - python-ldap
+    - openssh-server
+    - python-devel
     - sudo
     - rsyslog
     - tar
 
 # yum will incur network traffic when URLs are specified. Download the
 # package locally first so we can run offline after initial bootstrap.
 - name: download Mozilla rpms
-  get_url: url=https://s3-us-west-2.amazonaws.com/moz-packages/CentOS6/{{ item.path }}
+  get_url: url=https://s3-us-west-2.amazonaws.com/moz-packages/CentOS7/{{ item.path }}
            dest=/var/tmp/{{ item.path }}
            sha256sum={{ item.sha256 }}
   with_items:
-    - { path: mercurial-3.7.3-1.x86_64.rpm, sha256: 924a8828cfe53901db1366115d927b958f35f5e6a9c418cbc670c5e19137c090 }
+    - { path: mercurial-3.7.3-1.x86_64.rpm, sha256: 7cdd06e8fb5266fe9bd726c79db6040b68053a601daecb2418820c1d3e4f56a2 }
 
 - name: install Mozilla rpms
   command: yum localinstall -y /var/tmp/mercurial-3.7.3-1.x86_64.rpm
 
+- name: create directory for LDAP certificates
+  file: path=/etc/openldap/cacerts
+        state=directory
+        owner=root
+        group=root
+        mode=0755
+
+- name: install Mozilla certificates
+  copy: src={{ item.src }}
+        dest=/etc/openldap/cacerts/{{ item.dest }}
+        owner=root
+        group=root
+        mode=0644
+  with_items:
+    - { src: files/mozilla-root-ca.crt, dest: mozilla.crt }
+    - { src: files/mozilla-root-certificate-services.crt, dest: ca.crt }
+
+- name: configure system authentication settings
+  template: src=nslcd.conf.j2
+            dest=/etc/nslcd.conf
+  notify: run authconfig
+  tags: docker-startup
+
+- name: configure sshd
+  file: src=sshd_config
+        dest=/etc/ssh/sshd_config
+  notify: restart sshd
+  tags: docker-startup
+
+- name: generate SSH host keys (Docker only)
+  command: /usr/bin/ssh-keygen -A -N ''
+  when: vct_dir.stat.exists == True
+
 - name: install global ssh config
   copy: src=ssh_config
         dest=/etc/ssh/ssh_config
         owner=root
         group=root
         mode=0640
 
 - name: Create groups for SCM ACLs
@@ -49,49 +93,92 @@
   group: name=hg
 
 - name: Create hg user
   user: name=hg group=hg
 
 - name: hg user ssh config is prepared
   file: path=/home/hg/.ssh state=directory mode=0775 owner=hg group=hg
 
-- name: Mercurial config directory is present
-  file: path=/etc/mercurial state=directory mode=0775
+- name: mercurial config directory is present
+  file: path=/etc/mercurial state=directory mode=0755
+
+- name: directory for hg sshd files
+  file: path=/etc/mercurial/ssh
+        state=directory
+        owner=root
+        group=root
+        mode=0750
+
+- name: sshd config for hg server
+  template: src=sshd_config_hg.j2
+            dest=/etc/mercurial/ssh/sshd_config
+            owner=root
+            group=root
+            mode=0640
+
+# entrypoint.py from the docker container will generate these keys. But there is
+# a race condition between it and the startup code in hgmo.py wanting to copy
+# the file. So generate the cert at image build time to be on the safe side.
+- name: generate hg ED25519 host key (Docker only)
+  command: /usr/bin/ssh-keygen -t ed25519 -N '' -f /etc/mercurial/ssh/ssh_host_ed25519_key creates=/etc/mercurial/ssh/ssh_host_ed25519_key.pub
+
+- name: generate hg RSA host key (Docker only)
+  command: /usr/bin/ssh-keygen -t rsa -b 4096 -N '' -f /etc/mercurial/ssh/ssh_host_rsa_key creates=/etc/mercurial/ssh/ssh_host_rsa_key.pub
+
+# In order to be used as an AuthorizedKeysCommand in sshd, the
+# file has to be in a tree that is root:root 0755 all the way to /.
+- name: install ldap ssh key lookup script
+  copy: src={{ vct }}/scripts/ldap-lookup-ssh-key
+        dest=/usr/local/bin/ldap-lookup-ssh-key
+        owner=root
+        group=root
+        mode=0755
+
+- name: systemd service file for hg sshd
+  copy: src=sshd_hg.service
+        dest=/etc/systemd/system/sshd_hg.service
+        owner=root
+        group=root
+        mode=0644
+  notify: systemd daemon reload
+
+- name: ensure hg sshd runs on startup
+  command: /usr/bin/systemctl enable sshd_hg.service
 
 - name: directories for support tools is present
   file: path=/usr/local/bin
         state=directory
         owner=root
         group=root
         mode=0755
 
 - name: install pash configuration file
   template: src=pash.json.j2
             dest=/etc/mercurial/pash.json
             owner=root
             group=root
             mode=0644
 
 - name: replication SSH key is present
-  command: /usr/bin/ssh-keygen -b 2048 -f /etc/mercurial/mirror -t rsa creates=/etc/mercurial/mirror
+  command: /usr/bin/ssh-keygen -b 4096 -f /etc/mercurial/mirror -t rsa -N '' creates=/etc/mercurial/mirror
 
 - name: capture content of replication SSH key
   slurp: src=/etc/mercurial/mirror.pub
   register: mirror_ssh_key_public
 
 - name: ensure proper permissions on replication key
   file: path={{ item }} owner=hg group=hg
   with_items:
     - /etc/mercurial/mirror
     - /etc/mercurial/mirror.pub
 
 - name: hg user has replication key configured in authorized_keys
   copy: dest=/home/hg/.ssh/authorized_keys
-        content={{ mirror_ssh_key_public.content | b64decode }}
+        content="{{ mirror_ssh_key_public.content | b64decode }}"
         owner=hg
         group=hg
         mode=0640
 
 - name: known hosts file for mirrors is populated
   template: src=known_hosts.j2
             dest=/etc/mercurial/known_hosts
             owner=hg
@@ -132,16 +219,18 @@
         group=root
         mode=0644
   with_items:
     - { venv: venv_pash, path: hghooks, pth: mozhghooks }
     - { venv: venv_tools, path: hghooks, pth: mozhghooks }
     - { venv: venv_pash, path: pylib/vcsreplicator, pth: vcsreplicator }
     - { venv: venv_tools, path: pylib/vcsreplicator, pth: vcsreplicator }
 
+# TODO need to ensure /var/hg/version-control-tools exists
+
 - name: set up version-control-tools repo (server only)
   command: /var/hg/venv_tools/bin/hg --config extensions.vcsreplicator=! -R /var/hg/version-control-tools pull https://hg.mozilla.org/hgcustom/version-control-tools
   when: vct_dir.stat.exists == False
 
 - name: update version-control-tools repo (server only)
   command: /var/hg/venv_tools/bin/hg -R /var/hg/version-control-tools up -r {{ lookup('file', '../../../../.vctnode') }}
   when: vct_dir.stat.exists == False
 
@@ -200,16 +289,26 @@
     # Install pash.py first to ensure SSH root login works.
     - pash.py
     - hg_helper.py
     - ldap_helper.py
     - pash_wrapper
     - repo_group.py
     - sh_helper.py
 
+# Until reviewboard-hg switches to CentOS 7, pash.py's shebang needs to be
+# adjusted to run from the virtualenv. We purposefully use /usr/bin/python
+# in the shebang by default because we can't rely on the virtualenv being
+# present during fresh installs. If we didn't do this, we'd easily lock
+# ourselves out of root login.
+- name: replace shebang in pash.py
+  replace: dest=/usr/local/bin/pash.py
+           regexp='^#! \/usr\/bin\/python'
+           replace='#!/var/hg/venv_pash/bin/python'
+
 - name: install repo-push script
   copy: src={{ vct }}/scripts/repo-push.sh
         dest=/usr/local/bin/repo-push.sh
         owner=root
         group=root
         mode=0755
 
 - name: ensure bundles directory exists
new file mode 100644
--- /dev/null
+++ b/ansible/roles/hg-ssh/templates/nslcd.conf.j2
@@ -0,0 +1,17 @@
+uri {{ ldap_uri | mandatory }}
+base {{ basedn | mandatory }}
+binddn {{ bind_dn | mandatory }}
+bindpw {{ bind_pw | mandatory }}
+
+scope sub
+
+bind_timelimit 30
+
+map passwd uid {{ uid_attribute | mandatory }}
+map passwd homeDirectory {{ home_attribute | mandatory }}
+
+uid nslcd
+gid ldap
+ssl no
+tls_cacertdir /etc/openldap/cacerts
+tls_cacertfile /etc/openldap/cacerts/mozilla.crt
new file mode 100644
--- /dev/null
+++ b/ansible/roles/hg-ssh/templates/sshd_config_hg.j2
@@ -0,0 +1,62 @@
+# This is the sshd config for the Mercurial server. It is integrated with
+# LDAP and dispatches logins to the pash tool.
+
+# Logs SSH key on login, which is used to establish a better audit
+# trail.
+LogLevel VERBOSE
+SyslogFacility AUTHPRIV
+
+# No root for the hg ssh daemon
+PermitRootLogin no
+
+# Only allow public key auth.
+PasswordAuthentication no
+ChallengeResponseAuthentication no
+Protocol 2
+PidFile /var/run/sshd_hg.pid
+
+Port {{ sshd_hg_port | default(222) }}
+
+# We have no need for an SSH agent, so don't accept it.
+AllowAgentForwarding no
+
+# We have no need for TCP forwarding, so disable it.
+AllowTcpForwarding no
+
+# Use a separate set of keys from the host SSH.
+# Keys are in order of preference.
+HostKey /etc/mercurial/ssh/ssh_host_ed25519_key
+HostKey /etc/mercurial/ssh/ssh_host_rsa_key
+
+# Keep in sync with "modern" settings from
+# https://wiki.mozilla.org/Security/Guidelines/OpenSSH
+KexAlgorithms curve25519-sha256@libssh.org,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256
+Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
+MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
+
+UsePrivilegeSeparation sandbox
+
+# AUTOLAND_REQUEST_USER is set by autoland to "spoof" the pushlog
+# user. pash verifies only the special autoland account can perform
+# the spoofing.
+AcceptEnv AUTOLAND_REQUEST_USER LANG LC_ALL LC_MESSAGES
+
+# We search for SSH keys for the requested user in LDAP using
+# an external program.
+# TODO establish dedicated user for LDAP lookups
+AuthorizedKeysCommand /usr/local/bin/ldap-lookup-ssh-key
+AuthorizedKeysCommandUser hg
+
+# Handles launching HG and perform other admin related
+# tasks, such as modifying user repos.
+ForceCommand /usr/local/bin/pash_wrapper
+
+# TODO enable ChrootDirectory and run SSH sessions in a
+# limited environment.
+
+# By default, SSH limits to 10 concurrent connections for
+# individual users. This may interfere with replication if
+# multiple replication events are in progress. So we up the limit.
+# See bug 1038478.
+MaxStartups 50
+MaxSessions 50
--- a/hgserver/tests/helpers.sh
+++ b/hgserver/tests/helpers.sh
@@ -15,17 +15,18 @@ hgmoenv() {
 
   hgmo start --master-ssh-port $HGPORT > /dev/null
   if [ $? -ne 0 ]; then
     exit 80
   fi
   $(hgmo shellinit)
 
   cat > ssh-known-hosts << EOF
-${SSH_SERVER} ssh-rsa ${SSH_HOST_KEY}
+${SSH_SERVER} ssh-rsa ${SSH_HOST_RSA_KEY}
+${SSH_SERVER} ssh-ed25519 ${SSH_HOST_ED25519_KEY}
 EOF
 
   cat > ssh_config << EOF
 Host *
   StrictHostKeyChecking no
   PasswordAuthentication no
   PreferredAuthentications publickey
   UserKnownHostsFile `pwd`/ssh-known-hosts
--- a/hgserver/tests/test-auth.t
+++ b/hgserver/tests/test-auth.t
@@ -198,22 +198,25 @@ Do another login to verify no pash error
   $ hgmo exec hgssh cat /var/log/pash.log
 
 mozreview-ldap-associate isn't enabled on hgssh
 
   $ ssh -T -F ssh_config -i key1 -l user1@example.com -p $HGPORT $SSH_SERVER mozreview-ldap-associate
   mozreview-ldap-associate command not available
   [1]
 
-Failure to connect to LDAP mirror is fatal
+Failure to connect to LDAP mirror locks us out
+What happens here is nscd caches the valid passwd entry lookup for the user.
+However, the SSH key lookup via LDAP fails and this manifests as no public keys
+available.
 
   $ hgmo exec hgssh /set-ldap-property url ldap://localhost:6000
   $ ssh -T -F ssh_config -i key1 -l user1@example.com -p $HGPORT $SSH_SERVER
-  Could not connect to the LDAP server at ldap://localhost:6000
-  [1]
+  Permission denied (publickey).\r (esc)
+  [255]
 
   $ hgmo exec hgssh /set-ldap-property url real
 
 Failure to connect to LDAP master server is not fatal
 
   $ hgmo exec hgssh /set-ldap-property write_url ldap://localhost:6000
 
   $ ssh -T -F ssh_config -i key1 -l user1@example.com -p $HGPORT $SSH_SERVER
new file mode 100644
--- /dev/null
+++ b/testing/clobber.hgmaster
@@ -0,0 +1,1 @@
+Switching image from CentOS 6 to CentOS 7
--- a/testing/vcttesting/docker.py
+++ b/testing/vcttesting/docker.py
@@ -689,17 +689,17 @@ class Docker(object):
 
         hg-master runs the ssh service while hg-slave runs hgweb. The mirroring
         and other bits should be the same as in production with the caveat that
         LDAP integration is probably out of scope.
         """
         images = self.ensure_images_built([
             'ldap',
         ], ansibles={
-            'hgmaster': ('docker-hgmaster', 'centos6'),
+            'hgmaster': ('docker-hgmaster', 'centos7'),
             'hgweb': ('docker-hgweb', 'centos6'),
         }, existing=images, verbose=verbose, use_last=use_last)
 
         self.state['last-hgmaster-id'] = images['hgmaster']
         self.state['last-hgweb-id'] = images['hgweb']
         self.state['last-ldap-id'] = images['ldap']
 
         return images
@@ -1246,17 +1246,17 @@ class Docker(object):
             ansible_images['hgrb'] = ('docker-hgrb', 'centos6')
             ansible_images['rbweb'] = ('docker-rbweb', 'centos6')
             ansible_images['hgweb'] = ('docker-hgweb', 'centos6')
 
         if hgmo:
             docker_images |= {
                 'ldap',
             }
-            ansible_images['hgmaster'] = ('docker-hgmaster', 'centos6')
+            ansible_images['hgmaster'] = ('docker-hgmaster', 'centos7')
             ansible_images['hgweb'] = ('docker-hgweb', 'centos6')
 
         if bmo:
             docker_images |= {
                 'bmoweb',
             }
 
         images = self.ensure_images_built(docker_images,
--- a/testing/vcttesting/hgmo.py
+++ b/testing/vcttesting/hgmo.py
@@ -82,17 +82,18 @@ class HgCluster(object):
             self.master_image = master_image
             self.web_image = web_image
             self.ldap_id = None
             self.master_id = None
             self.web_ids = []
             self.ldap_uri = None
             self.master_ssh_hostname = None
             self.master_ssh_port = None
-            self.master_host_key = None
+            self.master_host_rsa_key = None
+            self.master_host_ed25519_key = None
             self.web_urls = []
             self.kafka_hostports = []
             self.zookeeper_connect = None
 
     def start(self, ldap_port=None, master_ssh_port=None, web_count=2,
               coverage=False):
         """Start the cluster.
 
@@ -197,37 +198,42 @@ class HgCluster(object):
                     '/set-kafka-servers',
                     host,
                     str(port),
                 ] + zookeeper_hostports
                 e.submit(self._d.execute, s['Id'], command)
 
         # Obtain replication SSH key from master. This key is random since it
         # is generated at container build time.
-        with futures.ThreadPoolExecutor(3) as e:
+        with futures.ThreadPoolExecutor(4) as e:
             f_private_key = e.submit(self._d.get_file_content, master_id, '/etc/mercurial/mirror')
             f_public_key = e.submit(self._d.get_file_content, master_id, '/etc/mercurial/mirror.pub')
-            f_master_host_key = e.submit(self._d.get_file_content, master_id,
-                                         '/etc/ssh/ssh_host_rsa_key.pub')
+            f_master_host_ed25519_key = e.submit(self._d.get_file_content, master_id,
+                                                 '/etc/mercurial/ssh/ssh_host_ed25519_key.pub')
+            f_master_host_rsa_key = e.submit(self._d.get_file_content, master_id,
+                                             '/etc/mercurial/ssh/ssh_host_rsa_key.pub')
 
         mirror_private_key = f_private_key.result()
         mirror_public_key = f_public_key.result()
-        master_host_key = f_master_host_key.result()
-        master_host_key = ' '.join(master_host_key.split()[0:2])
+        master_host_rsa_key = f_master_host_rsa_key.result()
+        master_host_rsa_key = ' '.join(master_host_rsa_key.split()[0:2])
+        master_host_ed25519_key = f_master_host_ed25519_key.result()
+        master_host_ed25519_key = ' '.join(master_host_ed25519_key.split()[0:2])
 
         f_mirror_host_keys = []
 
         with futures.ThreadPoolExecutor(web_count + 1) as e:
             # Set SSH keys on hgweb instances.
             cmd = [
                 '/set-mirror-key.py',
                 mirror_private_key,
                 mirror_public_key,
                 master_state['NetworkSettings']['IPAddress'],
-                master_host_key,
+                # FUTURE this will need updated once hgweb supports ed25519 keys
+                master_host_rsa_key,
             ]
             for i in web_ids:
                 e.submit(self._d.execute(i, cmd))
 
             # Obtain host keys from mirrors.
             for s in web_states:
                 f_mirror_host_keys.append((
                     s['NetworkSettings']['IPAddress'],
@@ -260,17 +266,18 @@ class HgCluster(object):
         self.ldap_image = ldap_image
         self.master_image = master_image
         self.web_image = web_image
         self.ldap_id = ldap_id
         self.master_id = master_id
         self.web_ids = web_ids
         self.master_ssh_hostname = master_ssh_hostname
         self.master_ssh_port = master_ssh_hostport
-        self.master_host_key = master_host_key
+        self.master_host_rsa_key = master_host_rsa_key
+        self.master_host_ed25519_key = master_host_ed25519_key
         self.web_urls = []
         self.kafka_hostports = []
         for s in all_states:
             hostname, hostport = self._d._get_host_hostname_port(s, '9092/tcp')
             self.kafka_hostports.append('%s:%d' % (hostname, hostport))
         for s in web_states:
             hostname, hostport = self._d._get_host_hostname_port(s, '80/tcp')
             self.web_urls.append('http://%s:%d/' % (hostname, hostport))
@@ -325,17 +332,18 @@ class HgCluster(object):
                 'master_image': self.master_image,
                 'web_image': self.web_image,
                 'ldap_id': self.ldap_id,
                 'master_id': self.master_id,
                 'web_ids': self.web_ids,
                 'ldap_uri': self.ldap_uri,
                 'master_ssh_hostname': self.master_ssh_hostname,
                 'master_ssh_port': self.master_ssh_port,
-                'master_host_key': self.master_host_key,
+                'master_host_rsa_key': self.master_host_rsa_key,
+                'master_host_ed25519_key': self.master_host_ed25519_key,
                 'web_urls': self.web_urls,
                 'kafka_hostports': self.kafka_hostports,
                 'zookeeper_connect': self.zookeeper_connect,
         }
         with open(self.state_path, 'wb') as fh:
             json.dump(s, fh, sort_keys=True, indent=4)
 
         return s
--- a/testing/vcttesting/hgmo_mach_commands.py
+++ b/testing/vcttesting/hgmo_mach_commands.py
@@ -57,17 +57,18 @@ class HgmoCommands(object):
 
     @Command('shellinit', category='hgmo',
              description='Print shell commands to export variables')
     def shellinit(self):
         print('export SSH_CID=%s' % self.c.master_id)
         print('export SSH_SERVER=%s' % self.c.master_ssh_hostname)
         print('export SSH_PORT=%d' % self.c.master_ssh_port)
         # Don't export the full value because spaces.
-        print('export SSH_HOST_KEY=%s' % self.c.master_host_key.split()[1])
+        print('export SSH_HOST_RSA_KEY=%s' % self.c.master_host_rsa_key.split()[1])
+        print('export SSH_HOST_ED25519_KEY=%s' % self.c.master_host_ed25519_key.split()[1])
         for i, url in enumerate(self.c.web_urls):
             print('export HGWEB_%d_URL=%s' % (i, url))
         for i, cid in enumerate(self.c.web_ids):
             print('export HGWEB_%d_CID=%s' % (i, cid))
         for i, hostport in enumerate(self.c.kafka_hostports):
             print('export KAFKA_%d_HOSTPORT=%s' % (i, hostport))
         print('export ZOOKEEPER_CONNECT=%s' % self.c.zookeeper_connect)