Certificates have been a key aspect of the software world for more nearly 35 years now. They are responsible for a large part of how software is secured. What better reason to explore what a certificate is and how it works?
Essentially a certificate associates an identity to a digital key pair for a period of time. This digital key is part of a system can be used to create secure communications, attest the validity of messages, and prevent tampering. But how does this work?
The key pair
To understand certificates, we first need to understand the concept of a key pair. Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) proved mathematically in 1977 that if you have two large prime numbers, you can calculate two sets of values – keys – that share a special relationship. Anything encrypted by one key can be decrypted by the other. One of the keys is published and shared; this is called the public key. When a public key is shared, it’s usually provided in a certificate. The second key is retained by the owner. This is the private key. Because the numbers are very large, it is mathematically prohibitive to try to recalculate the private key from the public key.
We always talk about keys as if they are single values: one public, one private. Most of the time, this is sufficient for everyday use. Under the covers, each key contains multiple values as defined in PKCS#1 (RFC 3447). The public key contains two values that are mathematically derived from the prime numbers: the modulus (n) and the exponent (e, which is usually 65,537).
The private key can contain as few as two values (the modulus and the private exponent, d). Typically, however it contains both prime numbers, the modulus, both exponents, and a few calculated values which can save time when encrypting/decrypting messages. If this file is exposed, the public key is no longer secure. Anyone with the private key can compromise any data the public key protects. Because of this, the key file must be kept secure (preferably in a key vault).
What’s in a certificate
A certificate is simply a binary file that contains the owner’s public key and a few fields which can be used for determining the owner of the certificate, the authority that created and issued the certificate, and details for ensuring the certificate is valid at the time it is used.
Field | Description |
---|---|
Version number | The version of the X.509 specification |
Serial number | Uniquely identifies the certificate, allowing it to be revoked |
Signing algorithm | Specifies how the certificate was signed (more on that below) |
Issuer | The party responsible for issuing the certificate and attesting to its validity, typically a certificate authority. |
Validity | The dates when the certificate starts and stops being trusted. Newer systems (like Lets Encrypt) keep the validity period shorter, narrowing the opportunity for keys to be leaked or the certificate. |
Subject | Who the certificate represents |
Public Key | The public component of the RSA key pair |
Extensions | Additional certificate configuration, including approved uses, an URL that can be used to determine if the certificate was revoked, and validation policies |
Subject Key Identifier | Used to make it easy to identify certificates containing a particular public key, typically containing the SHA-1 hash of the public key |
Authority Key Identifier | Used to make it easy to identify certificates from a particular issuer, typically containing the SHA-1 hash of the issuer’s public keys |
Signature | This hash is used to certify the validity of the information in the certificate (more on this in a moment) |
Encoding certificates
Both certificate data and the private key rely on the ASN.1 standard (Abstract Syntax Notation One) to encode their data, storing it in a binary format called DER (distinguished encoding rules). Basically, this is just a standardized way of serializing the fields. Each field is encoded as a tuple consisting of a Type (tag number), the length of the data, and the data value. To make this easier to understand, let’s show a few examples. There are quite a few more types that can be encoded, but this will provide you with the basics.
Encoding strings
A string type is represented with the hex value 0x13
. Encoding the word Hello in ASN.1 format would give me 13 05 48 65 6C 6C 6F
:
13
- Printable String type05
- Length of 5 characters48 65 6C 6C 6F
- The hexadecimal values for each letter.
Encoding integers
An integer is represented with the tag 0x02
. Encoding the numeric value 10 would be 02 01 0A
:
02
- Integer01
- Length of 1 byte to decode0A
- The value to decode, with 0x0A being the hexadecimal value of the number 10.
Encoding sequences
When there are multiple items in a record, this is often stored as a Sequence (0x30
). For example, let’s assume we want to represent a complex type that contains two fields: word and number. We’ll use the values we encoded above. This is notated as:
1Message :: = SEQUENCE {
2 word PRINTABLE STRING,
3 number INTEGER
4}
Encoding are values, we get the following:
130 0A # A sequence (`0x30`) with 10 bytes of data (`0x0A`)
2 13 05 48 65 6C 6C 6F # The encoded string (7 bytes)
3 02 01 0A # The encoded integer (3 bytes)
The most important thing to understand is that an encoded certificate is really just a Sequence of encoded fields representing a few different data types. A more complete example which illustrates this is shown below in Figure 1:
Signatures
A signature ensures that the certificate is valid and has not been altered. It is built from the fields of the certificate (technically called the TBSCerticate, or “To Be Signed Certificate”). A core set of fields (including the subject) are typically provided to the certificate authority as a certificate signing request (CSR).
If the CA approves the request, they add additional fields including extensions indicating the specific use, a serial number, and their own public key. These fields are encoded as a sequence in a specific order. This data is then signed with the issuer’s private key. To validate the certificate, the signature is decrypted with the issuer’s public key. The TBSCertificate is re-hashed and compared to the decrypted value. If the two fields match, the certificate is unaltered.
Distributing the files
To distribute the files, we typically base-64 encode the DER file and add a header (“BEGIN CERTIFICATE”/“BEGIN PRIVATE KEY”) and footer (“END CERTIFICATE”/“END PRIVATE KEY”). This standardized format is referred to as PEM (originally meaning “Privacy Enhanced Mail”). A certificate is typically given the extension .crt
or .cer
, and private keys often get the extension .pem
. By using this standardized approach, the files can be delivered in the body of an email message or stored as a string field in a secret management system.
This is the format you will most commonly see when working with certificates or private keys.
Fun fact: a PEM file for a certificate file can actually contain multiple certificates, each delimited with its own header and footer. This allows multiple certificates to be distributed together. This can include additional certificates from a certificate authority, or it can be used as a way to distribute a set of certificates representing the trusted certificate authorities.
Certificate authorities
The entire process of using certificates relies on trust. This trust begins with a set of issuers – the certificate authorities – that follow standardized practices to securely manage and distribute certificates.
In the next post, we’ll explore how this part of the process works. We’ll also look at self-signed certificates and where those fit into the model.