Deploy HSTS Safely in Production: A Real-World Case Study

A few years ago I watched a team turn on HSTS in production with a one-line config change and a lot of confidence.

By lunch, support had a queue full of users who couldn’t reach a legacy upload app on a forgotten subdomain. By the end of the day, the team had learned the hard way that HSTS is easy to enable and surprisingly hard to roll back once browsers cache it.

That’s the part people skip.

HSTS is one of the best low-effort security headers you can deploy. It tells browsers: “stop trying HTTP for this site, always use HTTPS.” That blocks protocol downgrade attacks and helps kill off accidental insecure requests. But if you deploy it carelessly, you can brick parts of your own estate for weeks or months.

Here’s a real-world style rollout pattern that works, with the mistakes first and the safer version after.

The situation

The company had:

www.example.com on modern HTTPS
app.example.com on HTTPS
api.example.com on HTTPS
files.example.com pointing to an old system that still served some content over plain HTTP
a bunch of mystery subdomains nobody had audited in years

The goal was simple: enforce HTTPS everywhere and qualify for stronger browser protections.

The first attempt looked like this.

Before: the risky rollout

The ops team added this to the main Nginx config for example.com:

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;

On paper, that looked great:

1 year policy
covers all subdomains
ready for preload

In reality, it assumed every current and future subdomain was HTTPS-clean, permanently.

It wasn’t.

What broke

Users who had visited www.example.com got the HSTS policy cached by their browser. Because includeSubDomains was set, the browser also forced HTTPS for files.example.com.

But files.example.com didn’t fully support HTTPS. Some requests failed. Some redirected in circles. Some hit certificate errors.

The ugly part: once the browser cached the policy, telling users “just try HTTP” stopped working. The browser refused.

That’s the operational trap with HSTS. A bad redirect can be fixed on the server. A bad HSTS policy lives in the client.

Why this happens

A browser that sees:

Strict-Transport-Security: max-age=31536000; includeSubDomains

stores a rule for the host for one year. After that:

http://www.example.com becomes https://www.example.com before the request is sent
if includeSubDomains is present, http://anything.example.com also gets upgraded
if HTTPS is broken on one of those hosts, users are stuck until the policy expires or they manually clear browser state

Preload makes this even more permanent, because browsers ship your domain in a built-in list. That’s great when you’re ready. It’s reckless when you’re not.

The safer rollout

The second attempt was boring, staged, and much better.

Phase 1: audit first

Before sending any HSTS header, the team made an inventory:

every DNS record under example.com
every app and CDN endpoint
every external vendor CNAME
certificates for all public hosts
redirect behavior from HTTP to HTTPS

This is the unglamorous part nobody wants to do. Do it anyway.

I usually care about four checks:

Does the host resolve publicly?
Does HTTP redirect cleanly to HTTPS?
Does HTTPS load with a valid cert and full chain?
Are there mixed-content or callback flows that still assume HTTP?

A quick external headers check also helps catch obvious mistakes. If you want a fast sanity check before rollout, run a free security headers scan at HeaderTest.

Phase 2: start with a short max-age

Instead of going straight to one year, they deployed HSTS only on the main site with a tiny duration:

add_header Strict-Transport-Security "max-age=300" always;

That’s 5 minutes.

No includeSubDomains. No preload. No heroics.

This gave them a safe test window. If they found a problem, the browser cache would age out quickly.

For Apache, the equivalent looked like this:

Header always set Strict-Transport-Security "max-age=300"

For an Express app behind TLS termination, if the app itself sets headers:

app.use((req, res, next) => {
  res.setHeader('Strict-Transport-Security', 'max-age=300');
  next();
});

Though honestly, I prefer setting HSTS at the edge proxy or load balancer so it’s consistent across apps.

Phase 3: verify real behavior

After deploying the short policy, the team tested:

fresh browser sessions
repeat visits after HSTS caching
login redirects
SSO callbacks
mobile apps using embedded webviews
old bookmarked HTTP URLs
direct visits to known subdomains

They also checked that HTTP always redirected to HTTPS before app logic ran.

Bad:

location / {
    proxy_pass http://app_backend;
}

server {
    listen 80;
    server_name example.com;
}

Better:

server {
    listen 80;
    server_name example.com www.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name example.com www.example.com;

    ssl_certificate /etc/ssl/fullchain.pem;
    ssl_certificate_key /etc/ssl/private/key.pem;

    add_header Strict-Transport-Security "max-age=300" always;

    location / {
        proxy_pass http://app_backend;
    }
}

That ordering matters. HSTS does not replace redirects. It teaches browsers to stop needing them later.

Phase 4: increase max-age gradually

Once the short rollout was stable, they increased the duration:

First day:

Strict-Transport-Security: max-age=300

Then:

Strict-Transport-Security: max-age=86400

Then:

Strict-Transport-Security: max-age=2592000

Finally:

Strict-Transport-Security: max-age=31536000

This staged approach gave them checkpoints. If something weird surfaced after a week, they hadn’t committed every browser to a year-long policy yet.

After: enabling includeSubDomains safely

Only after the subdomain audit was complete did they move to this:

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

That change happened after:

files.example.com was migrated to valid HTTPS
old subdomains were removed or firewalled off
wildcard and SAN cert coverage was cleaned up
vendor-managed subdomains were verified

This is where a lot of teams get burned. They think includeSubDomains means “the subdomains we care about.” It means all of them.

That includes:

legacy admin tools
forgotten staging boxes
old marketing microsites
third-party services on CNAMEs
future subdomains someone creates next month

If your org is sloppy with DNS hygiene, includeSubDomains will expose it immediately.

Preload is the last step, not the first

Preload is attractive because it hardens first contact too. Even the very first visit is forced to HTTPS because the browser already knows your domain’s policy.

But preload has a high cost if you make a mistake.

The team waited until they met the usual preload expectations:

HTTPS on the apex and www
valid certs
HTTP redirects cleanly to HTTPS
HSTS with at least one year
includeSubDomains
preload token present

Then they switched to:

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;

That was months after the initial rollout, not the same day.

I think that’s the right call for most production systems. Preload should feel a little annoying to approve. That friction is healthy.

A rollback plan that actually works

You can reduce HSTS with:

Strict-Transport-Security: max-age=0

That tells browsers to delete the policy.

But there’s a catch: the browser has to successfully reach the site over HTTPS and receive that header. If HTTPS is broken, users may never get the rollback instruction.

That’s why “we’ll just revert it” is not much of a plan.

A real rollback plan includes:

keeping HTTPS stable during rollback
preserving valid certificates
removing includeSubDomains only after affected hosts can serve the new policy
understanding that preload removal takes time and browser release cycles

The final production pattern

The final setup for the company looked like this:

server {
    listen 80;
    server_name example.com www.example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name example.com www.example.com;

    ssl_certificate /etc/ssl/fullchain.pem;
    ssl_certificate_key /etc/ssl/private/key.pem;

    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;

    location / {
        proxy_pass http://app_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

And the real win wasn’t the header itself. It was the cleanup work it forced:

no more half-broken HTTP endpoints
no more stray public subdomains
no more guessing which apps were TLS-ready

That’s how HSTS should be deployed: not as a checkbox, but as the final lock after you’ve already shut the doors.

The production checklist I’d use

If I were rolling out HSTS on a real site today, I’d do it in this order:

Inventory every public subdomain.
Fix HTTPS everywhere that matters.
Redirect all HTTP to HTTPS.
Start with max-age=300.
Watch logs, support tickets, and auth flows.
Increase to a day, then a month, then a year.
Add includeSubDomains only after a full subdomain audit.
Add preload only when you’re sure you want the commitment.

The mistake is treating HSTS like a simple header change. It’s really a browser-side policy rollout with a long memory.

Do the boring audit work first, and HSTS becomes one of the safest wins in your security baseline. Skip that work, and one line of config can haunt you for months.

The situation#

Before: the risky rollout#

What broke#

Why this happens#

The safer rollout#

Phase 1: audit first#

Phase 2: start with a short max-age#

Phase 3: verify real behavior#

Phase 4: increase max-age gradually#

After: enabling includeSubDomains safely#

Preload is the last step, not the first#

A rollback plan that actually works#

The final production pattern#

The production checklist I’d use#