From 1ba816e618c1e2dae7230020a234692f120b0593 Mon Sep 17 00:00:00 2001
From: Gladwin Johnson <90415114+gladjohn@users.noreply.github.com>
Date: Wed, 8 Oct 2025 08:43:07 -0700
Subject: [PATCH 1/4] Document caching strategy for Managed Identity v2

Added detailed caching strategy and resilience plan for Managed Identity v2, including problem identification, proposed solutions, call sequence, cache renewal matrix, invalidation rules, and security considerations.
---
 docs/msi_v2/caching_strategy.md | 66 +++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)
 create mode 100644 docs/msi_v2/caching_strategy.md

diff --git a/docs/msi_v2/caching_strategy.md b/docs/msi_v2/caching_strategy.md
new file mode 100644
index 0000000000..1d1ea15244
--- /dev/null
+++ b/docs/msi_v2/caching_strategy.md
@@ -0,0 +1,66 @@
+# Managed Identity v2 (Attested TB) — Resilience & Caching Plan
+
+## TL;DR
+We reduce cold-start latency and dependency risk for MSI v2 by caching safe, long-lived artifacts, coordinating renewal across processes, and keeping the hot path in memory. **MAA is used only to (re)issue the binding certificate**; bound AT acquisition relies on that cert. Result: fewer failures, less churn, smoother CX.
+
+---
+
+## Problem
+- Cold starts/reboots trigger extra external calls (MAA → IMDS → eSTS).
+- OS certificate store I/O can contend under load.
+- Multiple processes may race to re-issue binding certificates.
+- We want resilience to **MAA** issues and predictable **cert renewal**.
+
+---
+
+## Solution (What’s Changing)
+1. **Probe once** (link-local) to detect **MSI v2** → cache result **in-proc**.
+2. Treat the **binding certificate** (from IMDS `/issuecredential`) as the **primary anchor** (~7-day validity); use it to get ATs.
+3. **Proactive renewal at half-life (+ small jitter)** to rotate well before expiry.
+4. **Single-writer coordination** so only one process issues/renews; others reuse the same cert.
+5. **MAA token** is used **only** for issuance/renewal; short-lived cache to prevent attestations calls.
+
+---
+
+## Call Sequence (cold start)
+```
+Call 0 (local): Probe IMDS v2 → cache MSI source (V2/V1)
+1       (local): Create KeyGuard key (per reboot)
+2       (external): Get MAA token  // only for (re)issuing cert
+3       (local): IMDS /issuecredential → binding cert + metadata
+4       (external): eSTS-R → bound AT (mtls_pop/bearer) using client mTLS
+5       (external): Call resource with bound AT + client mTLS
+```
+---
+
+## Cache & Renewal Matrix
+
+| Item | Scope | Where | TTL | Notes |
+|---|---|---|---|---|
+| **MSI v2 probe result** | Per process | In-proc static | Process lifetime | NO changes needed here |
+| **MAA token** | Per **keyHandle** | small file cache | ≤ JWT `exp` (~8h) | Only for cert issuance; evict on reboot/policy change/attest fail; refresh half-life + jitter |
+| **Binding cert + `/issuecredential` metadata** | Per **Managed Identity per user context** | Persisted (Win: `CurrentUser\My`; Linux: protected file/PEM) | ~7 days | Renew at **half-life + jitter**; Serialize issuance |
+| **Access tokens (`bearer` or `mtls_pop`)** | Per audience | In memory | Service-configured | Reacquire after reboot (new key) |
+
+---
+
+## Invalidation Rules
+- **Reboot** → Use **persisted binding cert** to fetch new ATs; re-attest on first demand on service failure.
+- **Cert expiry** → re-issue.
+- **MAA token expired** → re-attest and re-issue.
+
+---
+
+## Security
+- Keys are **non-exportable** in **KeyGuard**; MSAL stores **handles**, not private keys.
+- Persisted items are **user-scoped** and protected (DPAPI on Windows; restricted file perms/OS keyring on Linux).
+
+---
+
+  ## Why This Improves CX
+- **MAA is out of the hot path**—steady-state calls rely on a **multi-day binding cert**.
+- Different identities on the same VM, uses **cached MAA token**
+- **No thundering herd**—single process renews certificate; others reuse.
+- **Predictable renewals**—half-life + jitter prevents synchronized spikes.
+
+---

From 2cae747107a1250b9bdb57337e55b2c8acb1ff18 Mon Sep 17 00:00:00 2001
From: Gladwin Johnson <90415114+gladjohn@users.noreply.github.com>
Date: Sat, 15 Nov 2025 13:07:26 -0800
Subject: [PATCH 2/4] Revise MSI v2 caching strategy for resilience and
 efficiency

Updated the caching strategy for MSI v2 to enhance resilience and reduce cold-start latency. Key changes include improved certificate renewal processes and better caching mechanisms.
---
 docs/msi_v2/caching_strategy.md | 72 ++++++++++++---------------------
 1 file changed, 25 insertions(+), 47 deletions(-)

diff --git a/docs/msi_v2/caching_strategy.md b/docs/msi_v2/caching_strategy.md
index 1d1ea15244..6a06e1256e 100644
--- a/docs/msi_v2/caching_strategy.md
+++ b/docs/msi_v2/caching_strategy.md
@@ -1,66 +1,44 @@
 # Managed Identity v2 (Attested TB) — Resilience & Caching Plan
 
 ## TL;DR
-We reduce cold-start latency and dependency risk for MSI v2 by caching safe, long-lived artifacts, coordinating renewal across processes, and keeping the hot path in memory. **MAA is used only to (re)issue the binding certificate**; bound AT acquisition relies on that cert. Result: fewer failures, less churn, smoother CX.
+
+We reduce cold-start latency and dependency risk for MSI v2 by caching safe, long-lived artifacts, coordinating renewal across processes, and keeping the hot path in memory. **MAA is used only to (re)issue the binding certificate**; bound access tokens (ATs) are obtained using that certificate. If the binding cert or its cache is lost or invalid, we recover by re-attesting and re-issuing. No expirations are hardcoded; we always use the values returned by services.
 
 ---
 
 ## Problem
+
 - Cold starts/reboots trigger extra external calls (MAA → IMDS → eSTS).
-- OS certificate store I/O can contend under load.
+- Accessing persisted certificates can contend under load.
 - Multiple processes may race to re-issue binding certificates.
-- We want resilience to **MAA** issues and predictable **cert renewal**.
+- We want resilience to **MAA** issues and predictable **cert renewal** while:
+  - avoiding thundering herds, and
+  - not hardcoding any lifetimes.
 
 ---
 
 ## Solution (What’s Changing)
-1. **Probe once** (link-local) to detect **MSI v2** → cache result **in-proc**.
-2. Treat the **binding certificate** (from IMDS `/issuecredential`) as the **primary anchor** (~7-day validity); use it to get ATs.
-3. **Proactive renewal at half-life (+ small jitter)** to rotate well before expiry.
-4. **Single-writer coordination** so only one process issues/renews; others reuse the same cert.
-5. **MAA token** is used **only** for issuance/renewal; short-lived cache to prevent attestations calls.
-
----
-
-## Call Sequence (cold start)
-```
-Call 0 (local): Probe IMDS v2 → cache MSI source (V2/V1)
-1       (local): Create KeyGuard key (per reboot)
-2       (external): Get MAA token  // only for (re)issuing cert
-3       (local): IMDS /issuecredential → binding cert + metadata
-4       (external): eSTS-R → bound AT (mtls_pop/bearer) using client mTLS
-5       (external): Call resource with bound AT + client mTLS
-```
----
-
-## Cache & Renewal Matrix
 
-| Item | Scope | Where | TTL | Notes |
-|---|---|---|---|---|
-| **MSI v2 probe result** | Per process | In-proc static | Process lifetime | NO changes needed here |
-| **MAA token** | Per **keyHandle** | small file cache | ≤ JWT `exp` (~8h) | Only for cert issuance; evict on reboot/policy change/attest fail; refresh half-life + jitter |
-| **Binding cert + `/issuecredential` metadata** | Per **Managed Identity per user context** | Persisted (Win: `CurrentUser\My`; Linux: protected file/PEM) | ~7 days | Renew at **half-life + jitter**; Serialize issuance |
-| **Access tokens (`bearer` or `mtls_pop`)** | Per audience | In memory | Service-configured | Reacquire after reboot (new key) |
+1. **Probe IMDS once per process** to detect **MSI v2** and cache that result in memory for the life of the process.
+2. Use the **binding certificate** returned by IMDS `/issuecredential` as the long‑lived credential for bound AT requests:
+   - its lifetime comes from the cert / metadata returned by IMDS (no hardcoded duration).
+3. **Proactively renew** the binding cert and the MAA token when roughly **half** of their lifetime has elapsed, with a **small random offset** per process, and always **well before** their actual expiry.
+4. Use **single‑writer coordination per managed identity (per user context)** on each machine so that only one process issues/renews the binding cert and MAA token; other processes reuse the same artifacts.
+5. Use the **MAA token only** for issuing/renewing the binding certificate:
+   - cache it for up to its JWT `exp`,
+   - refresh it at half‑life + jitter,
+   - and evict it on attestation/policy failures.
 
 ---
 
-## Invalidation Rules
-- **Reboot** → Use **persisted binding cert** to fetch new ATs; re-attest on first demand on service failure.
-- **Cert expiry** → re-issue.
-- **MAA token expired** → re-attest and re-issue.
-
----
-
-## Security
-- Keys are **non-exportable** in **KeyGuard**; MSAL stores **handles**, not private keys.
-- Persisted items are **user-scoped** and protected (DPAPI on Windows; restricted file perms/OS keyring on Linux).
-
----
+## Call Sequence (cold start)
 
-  ## Why This Improves CX
-- **MAA is out of the hot path**—steady-state calls rely on a **multi-day binding cert**.
-- Different identities on the same VM, uses **cached MAA token**
-- **No thundering herd**—single process renews certificate; others reuse.
-- **Predictable renewals**—half-life + jitter prevents synchronized spikes.
+Cold start / first bound call for a given managed identity + user context:
 
----
+```text
+0: Probe IMDS to detect MSI v2 vs v1 and cache the result in the process.
+1: Ensure a KeyGuard key / handle exists for this reboot.
+2: Call MAA to obtain an attestation token (using KeyGuard evidence).
+3: Call IMDS `/issuecredential` with the MAA token → returns binding certificate + metadata.
+4: Call eSTS to request a bound AT (mtls_pop or bearer) using client mTLS with the binding certificate.
+5: Call the resource using the bound AT and client mTLS.

From 7a995e9da60eb7a79953a6113c0d3a0d9f2a04cf Mon Sep 17 00:00:00 2001
From: Gladwin Johnson <90415114+gladjohn@users.noreply.github.com>
Date: Sat, 15 Nov 2025 17:19:44 -0800
Subject: [PATCH 3/4] Update caching_strategy.md

---
 docs/msi_v2/caching_strategy.md | 119 ++++++++++++++++++++++++--------
 1 file changed, 90 insertions(+), 29 deletions(-)

diff --git a/docs/msi_v2/caching_strategy.md b/docs/msi_v2/caching_strategy.md
index 6a06e1256e..f881957856 100644
--- a/docs/msi_v2/caching_strategy.md
+++ b/docs/msi_v2/caching_strategy.md
@@ -2,43 +2,104 @@
 
 ## TL;DR
 
-We reduce cold-start latency and dependency risk for MSI v2 by caching safe, long-lived artifacts, coordinating renewal across processes, and keeping the hot path in memory. **MAA is used only to (re)issue the binding certificate**; bound access tokens (ATs) are obtained using that certificate. If the binding cert or its cache is lost or invalid, we recover by re-attesting and re-issuing. No expirations are hardcoded; we always use the values returned by services.
+We reduce cold-start latency and dependency risk for MSI v2 by:
+
+- treating the **binding certificate from IMDS `/issuecredential`** as the long-lived credential for bound ATs,
+- caching safe artifacts (MAA token, binding cert, ATs),
+- renewing at **half-life with jitter**, and
+- using a **single writer per managed identity per user** to avoid thundering herds.
+
+All lifetimes (MAA tokens, binding certs, ATs) come from **MAA / IMDS / eSTS**; nothing is hardcoded.  
+If a cached artifact is missing, invalid, or corrupted, we treat it as a **cache miss** and re-acquire via the normal flow.
 
 ---
 
-## Problem
+## Behavior Summary
 
-- Cold starts/reboots trigger extra external calls (MAA → IMDS → eSTS).
-- Accessing persisted certificates can contend under load.
-- Multiple processes may race to re-issue binding certificates.
-- We want resilience to **MAA** issues and predictable **cert renewal** while:
-  - avoiding thundering herds, and
-  - not hardcoding any lifetimes.
+1. **IMDS probe (per process)**  
+   - On first MSI use in a process, we probe IMDS to detect **MSI v2 vs v1**.  
+   - The result is cached **in that process only** (no cross-process state).
 
----
+2. **Binding cert as the long-lived credential**  
+   - IMDS `/issuecredential` returns a **binding certificate + metadata**.  
+   - This cert is the **credential we use to get bound ATs** (mtls_pop/bearer).  
+   - Its validity window comes from IMDS (e.g., cert `notBefore` / `notAfter`); we do **not** assume “7 days” or any fixed value.
 
-## Solution (What’s Changing)
+3. **Renewal timing (half‑life + jitter)**
 
-1. **Probe IMDS once per process** to detect **MSI v2** and cache that result in memory for the life of the process.
-2. Use the **binding certificate** returned by IMDS `/issuecredential` as the long‑lived credential for bound AT requests:
-   - its lifetime comes from the cert / metadata returned by IMDS (no hardcoded duration).
-3. **Proactively renew** the binding cert and the MAA token when roughly **half** of their lifetime has elapsed, with a **small random offset** per process, and always **well before** their actual expiry.
-4. Use **single‑writer coordination per managed identity (per user context)** on each machine so that only one process issues/renews the binding cert and MAA token; other processes reuse the same artifacts.
-5. Use the **MAA token only** for issuing/renewing the binding certificate:
-   - cache it for up to its JWT `exp`,
-   - refresh it at half‑life + jitter,
-   - and evict it on attestation/policy failures.
+For any artifact whose expiry comes from the service (MAA, IMDS, eSTS), we:
 
----
+- treat the time between “when we obtained it” and “when it expires” as its **lifetime**;
+- plan to renew it **around halfway through that lifetime** (half‑life);
+- add a small, per‑process **random jitter** around that halfway point so different processes don’t all renew at the same time; and
+- always enforce a small safety buffer so renewal completes **before** expiry (for example, at least a few minutes before in the worst case).
+
+**Binding certificate vs. others**
+
+- For the **binding certificate**, we additionally guarantee that it is rotated **at least 24 hours before the certificate’s expiry time**. 
+- Other artifacts (MAA token, access tokens) simply follow the **half‑life + jitter** rule with the normal safety buffer.
+
+
+4. **Caches and how they are shared**
+
+- **MAA token (file cache, shared across processes)**  
+  - The MAA token is stored in a small per‑user file cache so that all MSAL processes for that user on the same machine can reuse it.  
+  - Access to this cache is coordinated so that only one process at a time writes or refreshes the token; other processes read the latest complete value from the file.
+
+- **Binding certificate (persisted in certificate store)**  
+  - The binding certificate returned by IMDS `/issuecredential` is persisted in the OS certificate store, scoped per user and per managed identity.  
+  - When the certificate is renewed, updates to the store entry are coordinated so that only one process at a time replaces it; other processes continue to read the stored certificate.
+
+- **Access tokens (in‑memory MSAL cache)**  
+  - Access tokens remain in MSAL’s existing in‑memory cache, scoped to a single process.  
+  - There is no new cross‑process sharing for ATs: each process uses its own in‑memory cache and reacquires bound ATs as needed using the shared binding certificate.
 
-## Call Sequence (cold start)
 
-Cold start / first bound call for a given managed identity + user context:
+5. **Caches**
+
+   | Item | Scope | Stored as | TTL source | Behavior |
+   |---|---|---|---|---|
+   | **MSI v2 probe result** | Per process | In-memory | Process lifetime | First MSI call in a process probes IMDS and caches v2/v1/none. If the probe fails, that process falls back to MSI v1 behavior. New processes probe again. |
+   | **MAA token** | Per key / identity context | Per-user file cache (shared across processes) | JWT `exp` from MAA | Used **only** for `/issuecredential`. Stored in a small per-user file cache so all MSAL processes for that user on the same machine can reuse it. When it needs to be refreshed, processes coordinate so that only one process updates the file; others read the latest complete value. Renewed at half-life with per-process jitter (always before `exp`). If missing, expired, invalid, or attestation/policy/key errors occur, we discard and get a new token next time. |
+   | **Binding cert + `/issuecredential` metadata** | Per managed identity per user | User certificate store (plus metadata) | Cert / metadata from IMDS | Long-lived credential for bound ATs. Persisted in the user’s certificate store so all processes for that user can read the same cert. The cert is renewed at roughly half-life with per-process jitter, but in all cases rotation completes **at least 24 hours before the certificate’s expiry** (where lifetime allows). When renewal happens, only one process at a time updates the stored certificate and metadata; others continue to read the existing entry. If the cert or metadata is missing, invalid, or rejected by IMDS/eSTS (expired, not yet valid, binding mismatch, etc.), we discard it and re-issue via MAA → `/issuecredential`. |
+   | **Access tokens (bearer / mtls_pop)** | Per (audience, managed identity, binding-cert thumbprint) | In-memory per process | `exp` from eSTS | Regular MSAL token cache, unchanged by this design. Tokens are cached per process in memory. Never reused past `exp`. On 401/403 or invalid token errors, we drop the token and reacquire with the **current** binding cert. Rotating the binding cert changes the thumbprint, so tokens for the old thumbprint are naturally not reused. |
+
+6. **Failure & recovery**
+
+   - **Lost / deleted cache files** (MAA token or binding cert metadata):  
+     - treated as a cache miss → we obtain a new MAA token and/or re‑issue the binding cert on the next call, with only one process updating the shared cache or cert store entry at a time.
+   - **Corrupted or invalid entries** (cannot parse, cert not usable, token fails validation):  
+     - treated as a cache miss → we discard the bad entry and re-acquire using the normal MAA → IMDS → eSTS flow.
+   - **MAA policy / key rotation**:  
+     - we don’t poll for changes; we infer them from MAA/IMDS/eSTS errors that clearly indicate attestation/policy/key issues;  
+     - on such errors we drop the affected MAA token (and binding cert if needed) and perform a **fresh attestation** on next demand.
+   - **Reboot**:  
+     - we try the persisted binding cert first; if it is valid and accepted by eSTS, we reuse it and reacquire ATs;  
+     - if it fails locally or at eSTS, we treat it as invalid and re-run MAA → `/issuecredential` to get a new cert.
+
+7. **Retries**
+
+   - **MAA**  
+     - Calls go through **MAA Native**, which implements its own retry and backoff.  
+     - MSAL does **not** control per-call retry policy for MAA and does not add an extra retry layer on top. We only apply the cache invalidation rules above when a MAA call ultimately fails or succeeds.
+   - **IMDS and eSTS**  
+     - Use the existing MSAL HTTP retry pipeline (bounded retries, exponential backoff, jitter) for transient failures (network, certain 5xx/429, etc.).  
+     - No retries for permanent 4xx that indicate bad input or policy violations.  
+     - If all retries fail, we surface the error and do not overwrite previously valid cache entries.
+
+8. **Security & isolation (high level)**  
+
+   - Private keys stay in the platform key store (e.g., KeyGuard); MSAL only deals with **handles/evidence**, not raw keys.  
+   - Persisted artifacts (MAA tokens, binding certs, metadata) are:
+     - scoped to the **current user** and **managed identity**, and
+     - stored in per-user secure locations with restricted permissions.  
+   - Deleting these artifacts is safe; it just forces a clean re-attestation and re-issuance on next use.
+
+---
+
+## Why This Improves CX
 
-```text
-0: Probe IMDS to detect MSI v2 vs v1 and cache the result in the process.
-1: Ensure a KeyGuard key / handle exists for this reboot.
-2: Call MAA to obtain an attestation token (using KeyGuard evidence).
-3: Call IMDS `/issuecredential` with the MAA token → returns binding certificate + metadata.
-4: Call eSTS to request a bound AT (mtls_pop or bearer) using client mTLS with the binding certificate.
-5: Call the resource using the bound AT and client mTLS.
+- **MAA is out of the hot path**: steady-state uses cached binding certs and ATs; MAA is only needed to (re)issue certs.
+- **No thundering herd**: renew at half-life with per-process jitter, and shared caches (file for MAA token, cert store for binding cert) ensure that only one process refreshes them at a time while others reuse the result.
+- **Predictable behavior**: missing/corrupt/expired artifacts always behave like cache misses with a well-defined recovery path.
+- **No hidden hardcoded lifetimes**: we always use the lifetimes returned by MAA, IMDS, and eSTS; the only additional rule is that binding certs are rotated at least 24 hours before their expiry.

From d90e6af35bf47694eedddcb79b3d48075c48742c Mon Sep 17 00:00:00 2001
From: Gladwin Johnson <90415114+gladjohn@users.noreply.github.com>
Date: Sat, 15 Nov 2025 17:49:20 -0800
Subject: [PATCH 4/4] Update caching_strategy.md

---
 docs/msi_v2/caching_strategy.md | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/docs/msi_v2/caching_strategy.md b/docs/msi_v2/caching_strategy.md
index f881957856..1d2a792cff 100644
--- a/docs/msi_v2/caching_strategy.md
+++ b/docs/msi_v2/caching_strategy.md
@@ -30,9 +30,15 @@ If a cached artifact is missing, invalid, or corrupted, we treat it as a **cache
 For any artifact whose expiry comes from the service (MAA, IMDS, eSTS), we:
 
 - treat the time between “when we obtained it” and “when it expires” as its **lifetime**;
-- plan to renew it **around halfway through that lifetime** (half‑life);
-- add a small, per‑process **random jitter** around that halfway point so different processes don’t all renew at the same time; and
-- always enforce a small safety buffer so renewal completes **before** expiry (for example, at least a few minutes before in the worst case).
+- schedule renewal at **half‑life** (the midpoint of that lifetime); and
+- add a small **random jitter** so different processes don’t all renew at the same instant.
+
+Concretely:
+
+- For each artifact, **each process** picks a random offset in the range **–5 minutes to +5 minutes** around the half‑life point.
+- We always clamp the final renewal time so that it is **at least 5 minutes before expiry**.
+- For the **binding certificate**, we also guarantee that renewal happens **no later than 24 hours before the cert expires**; if half‑life + jitter would land later than that, we move renewal earlier to stay ≥ 24 hours before expiry.
+- Renewal is triggered on the **front‑end**: the first caller that sees “now ≥ scheduled renewal time” does the refresh; other callers keep using the last valid value until the update completes.
 
 **Binding certificate vs. others**
 
@@ -60,7 +66,7 @@ For any artifact whose expiry comes from the service (MAA, IMDS, eSTS), we:
    | Item | Scope | Stored as | TTL source | Behavior |
    |---|---|---|---|---|
    | **MSI v2 probe result** | Per process | In-memory | Process lifetime | First MSI call in a process probes IMDS and caches v2/v1/none. If the probe fails, that process falls back to MSI v1 behavior. New processes probe again. |
-   | **MAA token** | Per key / identity context | Per-user file cache (shared across processes) | JWT `exp` from MAA | Used **only** for `/issuecredential`. Stored in a small per-user file cache so all MSAL processes for that user on the same machine can reuse it. When it needs to be refreshed, processes coordinate so that only one process updates the file; others read the latest complete value. Renewed at half-life with per-process jitter (always before `exp`). If missing, expired, invalid, or attestation/policy/key errors occur, we discard and get a new token next time. |
+   | **MAA token (Windows only)** | Per key / identity context | Per-user file cache (shared across processes) | JWT `exp` from MAA | Used **only** for `/issuecredential`. Stored in a small per-user file so all MSAL processes for that user on the same machine can reuse it. When it needs to be refreshed, processes coordinate so that **only one process at a time** updates the file; others read the latest complete value. File updates are **atomic from the reader’s point of view**: a reader sees either the old token or the new token, never a partially written one. If a write fails and the file cannot be parsed or validated, we treat it as a cache miss and reacquire a fresh token. Renewed at half-life with per-process jitter (always before `exp`). If missing, expired, invalid, or attestation/policy/key errors occur, we discard and get a new token next time. |
    | **Binding cert + `/issuecredential` metadata** | Per managed identity per user | User certificate store (plus metadata) | Cert / metadata from IMDS | Long-lived credential for bound ATs. Persisted in the user’s certificate store so all processes for that user can read the same cert. The cert is renewed at roughly half-life with per-process jitter, but in all cases rotation completes **at least 24 hours before the certificate’s expiry** (where lifetime allows). When renewal happens, only one process at a time updates the stored certificate and metadata; others continue to read the existing entry. If the cert or metadata is missing, invalid, or rejected by IMDS/eSTS (expired, not yet valid, binding mismatch, etc.), we discard it and re-issue via MAA → `/issuecredential`. |
    | **Access tokens (bearer / mtls_pop)** | Per (audience, managed identity, binding-cert thumbprint) | In-memory per process | `exp` from eSTS | Regular MSAL token cache, unchanged by this design. Tokens are cached per process in memory. Never reused past `exp`. On 401/403 or invalid token errors, we drop the token and reacquire with the **current** binding cert. Rotating the binding cert changes the thumbprint, so tokens for the old thumbprint are naturally not reused. |
 
@@ -76,6 +82,9 @@ For any artifact whose expiry comes from the service (MAA, IMDS, eSTS), we:
    - **Reboot**:  
      - we try the persisted binding cert first; if it is valid and accepted by eSTS, we reuse it and reacquire ATs;  
      - if it fails locally or at eSTS, we treat it as invalid and re-run MAA → `/issuecredential` to get a new cert.
+   - **Linux binding-cert files (corruption / deletion / access)**  
+     - On Linux, the binding certificate and its metadata are stored as files in a per-user directory with restricted permissions (for example, only that user can read/write). We rely on the OS to prevent other users on the machine from accessing or tampering with these files.  
+     - If the file is deleted, truncated, or corrupted outside of MSAL, the next read will fail parsing or validation. We treat that as a cache miss: we discard any unusable data and recover by re-issuing the binding certificate via the normal IMDS flow.  
 
 7. **Retries**