Skip to content

Fail fast on invalid certificates at TLS config load#2999

Merged
zuiderkwast merged 5 commits intovalkey-io:unstablefrom
yang-z-o:tls-fail-fast-expired-cert
Jan 28, 2026
Merged

Fail fast on invalid certificates at TLS config load#2999
zuiderkwast merged 5 commits intovalkey-io:unstablefrom
yang-z-o:tls-fail-fast-expired-cert

Conversation

@yang-z-o
Copy link
Contributor

@yang-z-o yang-z-o commented Jan 3, 2026

Closes #2997

Overview

This PR adds the certificates validation at TLS load, rejects invalid (expired/not-yet-valid) certificates:

Apply to all TLS config paths:

  • Server certificates tls-cert-file
  • Server-side client certificates tls-client-cert-file
  • CA certificate file tls-ca-cert-file
  • CA certificate directory tls-ca-cert-dir (now eagerly loaded to be consistent with file-based CAs)

Apply to both scenarios:

  • Server startup (initial TLS load)
  • Runtime reload vis CONFIG SET

Implementation

  • Added isCertValid function to check if an X509 certificate is within its validity period (not expired, not future-dated)
  • Added areAllCaCertsValid function to iterate through all loaded CA certificates and validate them
  • Added loadCaCertDir function to eagerly load all certificates from a directory into the X509_STORE
  • Modified createSSLContext to validate:
    • Server/client certificates immediately after loading
    • All CA certificates after loading from file/directory

Test results

1. Server startup (initial TLS load)

tls-cert-file ./tests/tls/server-expired.crt

41522:M 31 Dec 2025 16:13:18.851 # Server TLS certificate is invalid. Aborting TLS configuration.
41522:M 31 Dec 2025 16:13:18.851 # Failed to configure TLS. Check logs for more info.


tls-client-cert-file ./tests/tls/client-expired.crt

41557:M 31 Dec 2025 16:14:43.296 # Client TLS certificate is invalid. Aborting TLS configuration.
41557:M 31 Dec 2025 16:14:43.296 # Failed to configure TLS. Check logs for more info.


tls-ca-cert-file ./tests/tls/ca-expired.crt
tls-ca-cert-dir ./tests/tls/ca-expired

41567:M 31 Dec 2025 16:15:15.635 # One or more loaded CA certificates are invalid. Aborting TLS configuration.
41567:M 31 Dec 2025 16:15:15.635 # Failed to configure TLS. Check logs for more info.

2. Runtime reload via CONFIG SET

127.0.0.1:6379> config set tls-cert-file ./tests/tls/server-expired.crt
(error) ERR CONFIG SET failed (possibly related to argument 'tls-cert-file') - Unable to update TLS configuration. Check server logs.

62975:M 02 Jan 2026 20:10:43.588 # Server TLS certificate is invalid. Aborting TLS configuration.
62975:M 02 Jan 2026 20:10:43.588 # Failed applying new configuration. Possibly related to new tls-cert-file setting. Restoring previous settings.


127.0.0.1:6379> config set tls-client-cert-file ./tests/tls/client-expired.crt
(error) ERR CONFIG SET failed (possibly related to argument 'tls-client-cert-file') - Unable to update TLS configuration. Check server logs.

62975:M 02 Jan 2026 20:10:57.972 # Client TLS certificate is invalid. Aborting TLS configuration.
62975:M 02 Jan 2026 20:10:57.972 # Failed applying new configuration. Possibly related to new tls-client-cert-file setting. Restoring previous settings.


127.0.0.1:6379> config set tls-ca-cert-file ./tests/tls/ca-expired.crt
127.0.0.1:6379> config set tls-ca-cert-dir ./tests/tls/ca-expired
(error) ERR CONFIG SET failed (possibly related to argument 'tls-ca-cert-file') - Unable to update TLS configuration. Check server logs.

62975:M 02 Jan 2026 20:10:50.175 # One or more loaded CA certificates are invalid. Aborting TLS configuration.
62975:M 02 Jan 2026 20:10:50.175 # Failed applying new configuration. Possibly related to new tls-ca-cert-file setting. Restoring previous settings.

@codecov
Copy link

codecov bot commented Jan 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.24%. Comparing base (278607b) to head (cb20800).
⚠️ Report is 8 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2999      +/-   ##
============================================
- Coverage     74.38%   74.24%   -0.14%     
============================================
  Files           129      129              
  Lines         71041    71041              
============================================
- Hits          52844    52745      -99     
- Misses        18197    18296      +99     
Files with missing lines Coverage Δ
src/tls.c 100.00% <ø> (ø)

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yang-z-o yang-z-o mentioned this pull request Jan 12, 2026
@yang-z-o yang-z-o force-pushed the tls-fail-fast-expired-cert branch from 570801b to 51925b4 Compare January 16, 2026 21:02
@yang-z-o
Copy link
Contributor Author

Hello @madolson @zuiderkwast @PingXie, I briefly mentioned this PR in last week's weekly meeting.
This PR adds TLS certificate validation at server startup and reloads. Since it introduces a behavior change, it may require core team guidance. Would really appreciate it if you could take a look and share your thoughts. Thanks! 🙏

@zuiderkwast zuiderkwast moved this to In Progress in Valkey 9.1 Jan 20, 2026
@zuiderkwast
Copy link
Contributor

Thanks for the heads up. A small behavior change doesn't necessarily need to be a major decision, if it's backward compatible and doesn't introduce any new public API (configs, commands, etc.) and doesn't break any user scenario. So what exactly is the behavior change?

Before:

  • It was possible to configure expired certificates. CONFIG SET retured OK.
  • The result is that clients can't connect.

After:

  • Not possible to configure expired certificates. CONFIG SET returns an error.
  • Clients can still connect if the old certificates are still valid. When they expire, clients can't connect.

CONFIG SET can now return an error where it previously returned OK, but that OK was silencing a severe admin error resulting in clients unable to connect.

To me, it doesn't seem like a breaking change, so I don't think it needs to be a major decision, but @valkey-io/core-team please speak up if you think otherwise.

I (or someone else) will review this PR soon.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in general. I have only minor comments.

We use camelCase for function names and snake_case for variables, so please follow this style in new code. (Sorry, the style is not written anywhere. We're working on it.)

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@madolson do you think the behavior change poses any risk?

@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge labels Jan 20, 2026
@PingXie
Copy link
Member

PingXie commented Jan 20, 2026

The new behavior is better so good from my end.

Btw, @zuiderkwast once we merge #3076 copilot should be able to take care of the coding style review (and we can keep iterating on the instructions).

@zuiderkwast
Copy link
Contributor

Btw, @zuiderkwast once we merge #3076 copilot should be able to take care of the coding style review (and we can keep iterating on the instructions).

Gotcha. I approved it now. :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds fail-fast validation for TLS certificates at configuration load time, rejecting certificates that are expired or not yet valid. This applies to server certificates, client certificates, and CA certificates (both file and directory-based), and works during both server startup and runtime CONFIG SET operations.

Changes:

  • Added certificate validity checking functions (isCertValid, loadCaCertDir, areAllCaCertsValid) to validate X509 certificates
  • Enhanced TLS configuration to eagerly load and validate all CA certificates from directories
  • Generated test certificates (expired and not-yet-valid) for comprehensive testing
  • Added integration tests to verify rejection of invalid certificates at startup

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/tls.c Implements certificate validation logic with three new functions to check certificate validity, eagerly load CA certificates from directories, and validate all loaded CA certificates
utils/gen-test-certs.sh Extends certificate generation script to create expired and not-yet-valid certificates for testing (server, client, and CA certificates)
tests/unit/tls.tcl Adds integration tests to verify that invalid certificates are rejected at server startup for all certificate types
tests/README.md Updates documentation to mention generation of invalid test certificates

madolson pushed a commit that referenced this pull request Jan 21, 2026
### Overview
This PR adds support for automatic background TLS reloading, closes
#2649
TLS validity checks and fail-fast behavior on invalid certificates are
handled separately in #2999.
- New configuration
  - `tls-auto-reload-interval <seconds>`
  - `0` disabled (default, backward compatible)
  - `>0` check interval in seconds
- TLS materials change detection in background
  - SHA-256 fingerprint checking for certificate files
- `inode + mtime` checking for CA certificate directories and key files
- Skips reload if materials haven't changed
`tlsCheckMaterialsAndUpdateCache`
- TLS contexts reload
- CPU-intensive certificate parsing happens in dedicated BIO worker
thread `BIO_TLS_RELOAD`
  - Main thread never blocks, atomically swaps SSL contexts
- Two-phase reload: background preparation `tlsConfigureAsync` + main
thread application `tlsApplyPendingReload`


**Note**: Original TLS load and reload still remain in main thread using
`tlsConfigureSync`, including:
- Initial TLS load (server startup)
- Runtime reload via CONFIG SET


---------

Signed-off-by: Yang Zhao <zymy701@gmail.com>
@yang-z-o

This comment was marked as off-topic.

arshidkv12 pushed a commit to arshidkv12/valkey that referenced this pull request Jan 23, 2026
### Overview
This PR adds support for automatic background TLS reloading, closes
valkey-io#2649
TLS validity checks and fail-fast behavior on invalid certificates are
handled separately in valkey-io#2999.
- New configuration
  - `tls-auto-reload-interval <seconds>`
  - `0` disabled (default, backward compatible)
  - `>0` check interval in seconds
- TLS materials change detection in background
  - SHA-256 fingerprint checking for certificate files
- `inode + mtime` checking for CA certificate directories and key files
- Skips reload if materials haven't changed
`tlsCheckMaterialsAndUpdateCache`
- TLS contexts reload
- CPU-intensive certificate parsing happens in dedicated BIO worker
thread `BIO_TLS_RELOAD`
  - Main thread never blocks, atomically swaps SSL contexts
- Two-phase reload: background preparation `tlsConfigureAsync` + main
thread application `tlsApplyPendingReload`

**Note**: Original TLS load and reload still remain in main thread using
`tlsConfigureSync`, including:
- Initial TLS load (server startup)
- Runtime reload via CONFIG SET

---------

Signed-off-by: Yang Zhao <zymy701@gmail.com>
Signed-off-by: arshidkv12 <arshidkv12@gmail.com>
Signed-off-by: Yang Zhao <zymy701@gmail.com>
Signed-off-by: Yang Zhao <zymy701@gmail.com>
Signed-off-by: Yang Zhao <zymy701@gmail.com>
Signed-off-by: Yang Zhao <zymy701@gmail.com>
Signed-off-by: Yang Zhao <zymy701@gmail.com>
@yang-z-o yang-z-o force-pushed the tls-fail-fast-expired-cert branch from e341ea3 to cb20800 Compare January 24, 2026 02:25
@yang-z-o yang-z-o added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jan 24, 2026
@github-actions github-actions bot removed the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jan 24, 2026
@yang-z-o
Copy link
Contributor Author

Rebased onto the latest unstable to resolve conflicts.
Opened the PR for TLS valkey-io/valkey-doc#402

@zuiderkwast zuiderkwast merged commit eb3f465 into valkey-io:unstable Jan 28, 2026
64 of 65 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Valkey 9.1 Jan 28, 2026
@zuiderkwast zuiderkwast removed the to-be-merged Almost ready to merge label Jan 28, 2026
zuiderkwast pushed a commit to valkey-io/valkey-doc that referenced this pull request Feb 3, 2026
Changes include:
- Unify TLS topic naming: previously some references used “encryption”
while others used “tls”
- Remove source code repo specific information: instructions on building
or running unit tests
- Add information on new TLS feature and behavior:
   - valkey-io/valkey#2999
   - valkey-io/valkey#3020

---------

Signed-off-by: Yang Zhao <zymy701@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[NEW] Fail fast on invalid certificates at TLS config load

3 participants