Monday, 6 October 2014

Mandatory data retention in Australia

Once again their are proposals for mandatory retention of Australian Internet data to improve domestic surveillance. I think these are a terrible idea, both personally and professionally. Here is a letter I just sent to my local parlimentarian and senator that explains my reasoning.

Dear Mr Thomson,

I'm a resident in your electorate and am writing in regards to the proposed changes to the Telecommunications (Interception and Access) act to support mandatory data retention.

As an an information security professional with >15 years experience, the proposed legislation is highly concerning. The proposals would create a huge trove of information that service providers would need to keep absolutely secure.

Evidence suggests that this will not be possible. Consider the numerous security breaches of large financial institutions, the latest just last week at JP Morgan exposed the information of 76 million customers. Financial organisations are subject to strong regulation, have a direct incentive to maintain the security of their customer data, maintain generally good internal controls, undergo frequent external audits and hold decades' of experience dealing with rogue insiders. Despite all of this, they are frequently broken in to.

Internet service providers have little to none of these factors in their favour. They generally maintain good security of their networks, but they do not have the skills, incentives or mindset needed to maintain the security of the highly-private data that is proposed to be collected. I would consider a breach to be inevitable - it might be an ISP employee checking on their spouse's browsing habits, an unethical provider selling the information or perhaps a foreign intelligence agency chuckling as they download the browsing habits of Australia...

As a citizen, I am appalled by these proposals. They treat all citizens as suspects, and have the temerity to bill us for the privilege. They create a system of domestic surveillance more complete than any that enabled the horrors of last century and, in doing so, allow countries that lack our commitment to human rights to point and say "hey, but Australia's already doing it". They create moral hazard for more surveillance, more collection and greater access to the collected data that will be difficult to wind back. There has been no explanation of what threat could justify this immense undertaking.

I ask you to oppose these amendments and exhort your colleagues to do the same. If you would like to discuss the above further I would be happy to oblige.

Damien Miller

If you agree, I encourage you to compose an email of your own or join one of the online campaigns like

Tuesday, 14 January 2014

Hostname canonicalisation in OpenSSH

OpenSSH 6.5 will introduce some new options to allow the client to canonicalise unqualified domain names, allowing it (for example) to understand that I actually meant "" when I typed "ssh bigserver". This turns out to be important because, even though your host's DNS resolver will connect you to the host that you intended, ssh doesn't know the full name for it.

If ssh doesn't know the full name for a host then it can't reliably match it with a host key. The problem is even worse when the server is offering a certificate host key - these (should) contain the fully-qualified domain name (FQDN) of the server, but this break when users type "ssh bigserver" without the remainder of the domain name. A common workaround is to add "Hostname" or "HostkeyAlias" directives to ssh_config, but that is messy and doesn't scale well to lots of hosts. The other workaround for certificates of adding the unqualified names to the list of certificate principals is also terrible.

One might be forgiven for thinking that the system resolver should be able to help us here; after all - it knows the FQDN for the destination host because it knows all the domain search paths the user configured and which one was actually taken. Unfortunately, it turns out not to be useful for two reasons:

  1. The resolver doesn't actually offer a way to figure out what the fully-qualified name is. Some platforms do, via the AI_FQDN option - but it isn't widely available (Windows and OpenBSD only AFAIK)
  2. Even if we could get the name, then we couldn't trust it for anything configured via DHCP anyway. On most systems, the set of domain search paths is configured by DHCP and a rogue DHCP server could supply a malicious set.

My solution has been to add explicit hostname canonicalisation options that allow the user to define their own optional DNS search paths in OpenSSH itself. These options are: CanonicalDomains, CanonicalizeFallbackLocal, CanonicalizeHostname, CanonicalizeMaxDots and CanonicalizePermittedCNAMEs. You may notice that they substantially duplicate the search path functionality you'd expect to find in resolv.conf.

CanonicalizeHostname turns canonicalisation off and on (it's off by default). CanonicalDomains specifies the list of domains to search for an unqualified hostname in. CanonicalizeMaxDots sets how many '.' characters must appear in a domain name before it is considered unqualified (e.g. if you want names like "ftp.dmz" to be subject to canonicalisation then you would set this to one or more). CanonicalizeFallbackLocal specifies whether the original, unqualified name should be passed to the system resolver if it wasn't found in any of the suffixes in CanonicalDomains. Finally CanonicalizePermittedCNAMEs specifies some rules for selectively following CNAMEs (DNS aliases) when canonicalising a name.

This should all be more clear with an example. This is what is at the top of my ~/.ssh/config:

CanonicalizeHostname yes
CanonicalizeMaxDots 0
CanonicalizeFallbackLocal no

This enables canonicalisation with a single search path of When I type "ssh mail", the hostname mail will be judged unqualified since it contains the no period characters (specifically, less than or equal to CanonicalizeMaxDots), so ssh will try to resolve it in one of the CanonicalDomains. If did not exist, then ssh won't bother attempting to continue with the original hostname mail.

I haven't mentioned CanonicalizePermittedCNAMEs yet, since it is the most complex and most users won't need it. It's useful in cases where your organisation's DNS has a number of CNAME aliases that point to the same host(s). It allows the user to specify rules for when the alias should be allowed to replace the original host name. This option accepts multiple arguments, each of the form source_pattern:target_pattern. If the name (after canonicalisation and resolution) matches source_pattern and the destination of the CNAME matches target_pattern then the target of the CNAME will replace the original name The rules are pretty flexible; they accept the pattern syntax used widely in OpenSSH (with negation, '*' and '?' wildcards). Hopefully an example will make this clear too:

CanonicalizeHostname yes
CanonicalizeMaxDots 1
CanonicalizeFallbackLocal yes
CanonicalizePermittedCNAMEs mail.* dns**

This example enables canonicalisation with a couple of suffixes in the search path. It also turns CanonicalizeMaxDots up to 1, so a name like mail.dmz will be searched in each suffix. If a name does not resolve in any suffix then it will be passed to the system resolver as a fallback. Finally, some rules for following CNAMEs are specified: any CNAME matching mail.* will be followed so long as the ultimate destination is and any host name matching dns* will be followed so long as the destination matches dns*

These options will be available in OpenSSH 6.5, which is due really soon (hopefully by the end of the month). I'd love to hear any feedback about them.

Tuesday, 10 December 2013

PGP keys rotated

I just (belatedly) rotated my PGP keys. The old ID was 86FF9C48 and the new 6D920D30 with a fingerprint of 59C2 118E D206 D927 E667 EBE3 D3E5 F56B 6D92 0D30. The new key should be available from the keyserver network and is signed by my old key. As a very infrequent user of gnupg for anything but generating signatures, I found's key transition guidelines to be very useful in doing this.

One thing that I noticed along the way that doesn't seem to be in the documentation. Where gnupg asks you for an expiry duration, it will actually accept an exact timestamp too. So you can answer something like 20201231T235959 and it will do the right thing.

My whole key blob is now:

Friday, 29 November 2013

ChaCha20 and Poly1305 in OpenSSH

Recently, I committed support for a new authenticated encryption cipher for OpenSSH, This cipher combines two primitives from Daniel J. Bernstein: the ChaCha20 cipher and the Poly1305 MAC (Message Authentication Code) and was inspired by Adam Langley's similar proposal for TLS.

Why another cipher and MAC? A few reasons... First, we would like a high-performance cipher to replace RC4 since it is pretty close to broken now, we'd also like an authenticated encryption mode to complement AES-GCM - which is great if your hardware supports it, but takes significant voodoo to make run in constant time and, finally, having an authenticated encryption mode that is based on a stream cipher allows us to encrypt the packet lengths again.

Wait, what do you mean by "encrypt the packet lengths again"? (last rhetorical question, I promise) Well, it's a long story that requires a little background:

Back in the dark ages of the SSH2 protocol's design, there wasn't consensus among cryptographers on the best order to apply encryption and authentication in protocols - in fact, the three main cryptographic protocols to emerge from the 1990s - SSL, SSH and IPsec - all use different choices: SSL calculated a MAC over the packet's plaintext, appended it to the plaintext packet and encrypted and sent the lot - a construction now called "MAC then Encrypt" or "MtE". IPsec encrypted the plaintext, calculated the MAC over the ciphertext and appended it - this is now called "Encrypt then MAC" (EtM). SSH calculated the MAC over the plaintext, encrypted it and then appended the MAC - this is called "Encrypt and MAC" (EaM).

Of these, only "Encrypt then MAC" is now considered safe and in retrospect it's pretty easy to see why: for MtE and EaM, it's necessary to decrypt and process the packet before checking the MAC. Doing this allows an active attacker (i.e. one who is happy to forge or modify messages) the chance to peek behind the veil of the encryption before the MAC check detects their mischief. This has resulted in attacks on both SSL/TLS and SSH that wouldn't otherwise have been possible.

Recent versions of OpenSSH have offered some solutions to the problems caused by the original Encrypt-and-MAC design: AES-GCM cipher modes and Encrypt-then-MAC MAC modes. The AES-GCM ciphers and replace the usual cipher+MAC combination with a combined authenticated encryption mode the provides confidentiality and integrity in a single cryptographic algorithm. The Encrypt-then-MAC MAC modes alter the SSH packet format to be more IPsec-like: performing encryption first and then authenticating the ciphertext.

Both AES-GCM and the EtM MAC modes have a small downside though: because we no longer desire to decrypt the packet as we go, the packet length must be transmitted in plaintext. This unfortunately makes some forms of traffic analysis easier as the attacker can just read the packet lengths directly. OpenSSH takes some countermeasures to obscure the lengths of obvious secrets like passwords used for login or typed into an active session, but I haven't felt entirely comfortable with the protocol revealing the length of every packet sent on the wire.

The new avoids this though. In addition to providing authenticated encryption with integrity-checking performed before unwrapping encrypted data, this mode uses a second stream cipher instance to separately encrypt the packet lengths to obscure them from eavesdroppers. An active attacker can still play games by fiddling with the packet lengths, but doing so will reveal nothing about the packet payloads themselves - they can make the receiving end read a smaller or larger packet than intended, but the MAC will be checked (and the check will fail) before anything is decrypted or used. Fortunately ChaCha20 is very fast and has quite small keys, so maintaining a separate instance is very cheap.

We're not done yet though - an attacker may still observe the encrypted packets on the network to try to ascertain their length, and right now they are likely to be successful. I hope to add some features to frustrate this sort of traffic analysis some time next year.

Full details on the new mode are in the PROTOCOL.chacha20poly1305 file in OpenSSH and the source code for the cipher itself. If there is anything that these don't explain, then feel free to contact me.