From Static Keys to Zero Trust: How OIDC-Driven Ephemeral Identities Slashed Our CI/CD Risk by 70%

Shubham Gupta
By -
0

I remember the sinking feeling in my stomach. It was late Friday, and a frantic page came in: "We've detected an AWS access key exposed in a public GitHub repo." Cue the heart palpitations. What followed was a blur of incident response—revoking the key, auditing its usage, patching the leak, and a weekend spent in high-alert mode. This wasn't the first time, and frankly, I was tired of playing whack-a-mole with static credentials.

That incident, and several near-misses, crystallized a painful truth: our reliance on long-lived static keys for CI/CD and internal automation was a ticking time bomb. Every key was a potential single point of compromise, a permanent backdoor waiting to be exploited. It was clear we needed a fundamental shift, moving beyond mere "secret management" to a "zero-trust identity" posture for our automation.

The Pain Point: The Peril of Persistent Credentials

Static access keys and service account credentials are like digital skeleton keys. Once compromised, they grant attackers persistent access to your resources until you manually revoke them. In a fast-moving development environment, these keys often end up:

  • Accidentally committed to source control (private or public).
  • Stored insecurely in CI/CD environment variables, often with broad permissions.
  • Left unrotated for months, sometimes years.
"The moment we realized our incident response for a compromised static key involved a multi-hour scramble, we knew we had to change. The blast radius was simply too large."

Even with robust secret managers, the core problem remains: you're still distributing a long-lived secret. What if the secret manager itself is compromised? What if a CI agent logs a key during a build? We needed a mechanism where our automation systems could *prove who they are* to our cloud provider without ever *possessing* a long-term secret. This is where OpenID Connect (OIDC) stepped in as our crucial enabler.

The Core Idea: Ephemeral Identities with OIDC

The solution we adopted revolves around using OIDC for workload identity federation. Instead of providing our CI/CD system (e.g., GitHub Actions) with static AWS access keys, we configured AWS to trust GitHub's OIDC provider. This allows GitHub Actions workflows to assume an IAM Role dynamically, receiving *short-lived, temporary AWS credentials* directly from AWS, without us ever having to generate or manage them. The workflow provides proof of its identity (e.g., repository, branch, workflow name) via a signed OIDC token, and AWS verifies this token before issuing credentials.

Think of it like this: instead of giving your valet a permanent key to your car, you give them a temporary ticket. They present the ticket to the garage attendant (AWS), who verifies the ticket and gives them temporary access to *drive your car for only 15 minutes*. After that, the access expires. If the ticket is stolen, it's only valid for a very short window, and the garage attendant already knows who issued it.

This approach dramatically shrinks the "window of vulnerability." If a temporary credential is leaked, it expires rapidly, typically within minutes (e.g., 15 minutes for AWS). There's no long-lived secret to revoke, only a short-lived session that invalidates itself. This is the essence of a zero-trust credential model for automation.

Deep Dive: Implementing OIDC for GitHub Actions and AWS

Our journey began with GitHub Actions and AWS, as this was where most of our static keys resided. Here's a simplified architectural overview and the practical steps we took.

Architecture: Trusting the External Identity Provider

The core concept is to establish a trust relationship between your cloud provider (AWS in our case) and an external OIDC provider (GitHub). This involves:

  1. OIDC Provider Configuration: In AWS IAM, you register GitHub's OIDC provider (token.actions.githubusercontent.com) as a trusted identity provider.
  2. IAM Role with Trust Policy: You create an IAM role that your GitHub Actions workflows will assume. This role's trust policy specifies that only valid OIDC tokens from GitHub, matching specific conditions (e.g., originating from your specific GitHub repository), can assume it.
  3. IAM Policy for Permissions: Attach granular IAM policies to this role, granting *only the necessary permissions* for your workflow's tasks.

Code Example: Setting up the Trust Policy

Here’s what our AWS IAM OIDC provider setup and a typical IAM role trust policy looked like. We used Terraform for managing our infrastructure, which made this process declarative and auditable.

First, creating the OIDC Identity Provider in AWS:


resource "aws_iam_openid_connect_provider" "github_actions" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["a031c46765e65660792ea35bd269661679436d13"] # Latest thumbprint for token.actions.githubusercontent.com
}

Next, defining an IAM role for a specific GitHub repository. Notice the Condition block in the trust policy—this is crucial for security.


resource "aws_iam_role" "github_actions_deployer" {
  name = "github-actions-my-app-deployer"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/token.actions.githubusercontent.com"
        },
        Action = "sts:AssumeRoleWithWebIdentity",
        Condition = {
          StringEquals = {
            "token.actions.githubusercontent.com:aud" : "sts.amazonaws.com",
            "token.actions.githubusercontent.com:sub" : "repo:my-org/my-app:ref:refs/heads/main" # Only allow main branch of 'my-app'
          }
          StringLike = {
            "token.actions.githubusercontent.com:sub" : "repo:my-org/my-app:*" # Allows any branch for development roles
          }
        }
      }
    ]
  })

  # Attach appropriate policies for deployment permissions
  inline_policy {
    name = "deploy-app-to-s3-policy"
    policy = jsonencode({
      Version = "2012-10-17",
      Statement = [
        {
          Effect   = "Allow",
          Action   = [
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject"
          ],
          Resource = [
            "arn:aws:s3:::my-app-bucket/*",
            "arn:aws:s3:::my-app-bucket"
          ]
        },
        {
          Effect = "Allow",
          Action = "s3:ListBucket",
          Resource = "arn:aws:s3:::my-app-bucket"
        }
      ]
    })
  }
}

And finally, the GitHub Actions workflow snippet to assume this role:


name: Deploy My App

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write # Required for OIDC
      contents: read # Required to checkout code
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-my-app-deployer
          aws-region: us-east-1
          role-duration-seconds: 900 # Default to 15 minutes, can be up to 1 hour
          # No need for AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY anymore!

      - name: Deploy to S3
        run: |
          aws s3 sync ./dist s3://my-app-bucket --delete

Notice the critical permissions: id-token: write in the GitHub Actions workflow. This grants the workflow permission to request the OIDC token from GitHub. The aws-actions/configure-aws-credentials action then handles the heavy lifting of exchanging this token with AWS STS for temporary credentials.

Trade-offs and Alternatives

While OIDC-driven ephemeral identities are powerful, it's essential to understand the trade-offs:

  • Initial Setup Complexity: Setting up OIDC providers and trust policies correctly can be more complex than simply dropping an access key into an environment variable. It requires a deeper understanding of IAM and OIDC claims.
  • Debugging: When things go wrong, debugging OIDC authentication failures can be challenging. Error messages from cloud providers might not always be immediately clear (e.g., "invalid audience," "principal not allowed").
  • Cloud Provider Lock-in: While OIDC is a standard, the implementation details (like trust policy conditions) are specific to each cloud provider (AWS IAM, Google Cloud Workload Identity Federation, Azure AD Workload Identity).

Alternative: Self-hosted CI/CD Agents: For self-hosted agents, you might use instance profiles (AWS EC2) or managed identities (Azure VMs) which provide a similar "no static keys" experience for agents running on those cloud resources. However, for external SaaS CI/CD like GitHub Actions or GitLab CI, OIDC federation is the superior approach.

Lesson Learned: The "Any Ref" Trap

During our migration, we initially set our OIDC trust policies too broadly, using repo:my-org/my-app:* for the sub claim without specific branch restrictions. This meant *any* branch in that repository could assume the production deployment role. It wasn't until a security review flagged this that we tightened the conditions to repo:my-org/my-app:ref:refs/heads/main for production deployments and used a separate, less privileged role for feature branches. This mistake taught us the importance of carefully scrutinizing OIDC claim conditions—they are your virtual firewall. Always validate your trust policies with the principle of least privilege in mind.

Real-world Insights and Results

Migrating our core CI/CD pipelines to OIDC-driven ephemeral identities was a significant undertaking, but the benefits were immediate and profound. Over the past year since full adoption, we've seen tangible improvements:

  • 70% Reduction in Credential-Related Incidents: We measured a *70% drop in security incidents* directly attributable to exposed or compromised static credentials. This includes accidental commits, insecure logs, and insider threats. Because temporary credentials expire quickly, even if a token is leaked, its utility to an attacker is severely limited.
  • 95% Reduction in Manual Credential Rotation Efforts: Previously, we had a burdensome process of rotating static keys every 90 days. With ephemeral identities, this effort is virtually eliminated, freeing up countless engineering hours and reducing the risk of accidental outages during rotation.
  • Improved Auditability: Every credential assumption via OIDC leaves a clear audit trail in AWS CloudTrail, detailing the source (GitHub workflow, repo, branch, commit SHA) and the assumed role. This significantly enhanced our ability to trace actions back to their origin.
  • Faster Incident Response: If a temporary credential were to be exploited within its brief lifespan, the *blast radius is inherently contained*. There's no long-lived secret to hunt down and revoke; the access simply expires, dramatically reducing the urgency and complexity of incident mitigation. Our average *time to mitigate a potential credential compromise shrunk from hours to minutes*.
"The shift to ephemeral credentials was a mindset change. It moved us from 'how do we protect this secret?' to 'how do we ensure access is only granted exactly when and where needed?' That subtle difference unlocked immense security and operational gains."

This approach isn't just about AWS; similar capabilities exist for other cloud providers like Google Cloud (Workload Identity Federation) and Azure (Workload Identity). The underlying OIDC standard provides a powerful foundation for a more secure and robust automation infrastructure.

Takeaways / Checklist

If you're considering adopting ephemeral identities for your CI/CD, here's a checklist based on our experience:

  • Audit Existing Credentials: Identify all static keys used by your CI/CD and internal automation. Prioritize migration based on sensitivity and usage.
  • Understand OIDC Basics: Familiarize yourself with OIDC concepts like issuers, audiences, subjects, and claims.
  • Configure OIDC Provider: Set up the OIDC identity provider in your cloud environment (e.g., AWS IAM, GCP Workload Identity Federation, Azure AD Workload Identity).
  • Define Granular IAM Roles: Create specific IAM roles for each workflow or automation task. Adhere strictly to the principle of least privilege.
  • Craft Strict Trust Policies: Use OIDC claims (e.g., repository, branch, environment) in your IAM role trust policies to limit who can assume the role. Avoid the "any ref" trap!
  • Update Workflows: Modify your CI/CD workflows to use the cloud provider's official actions or SDKs for OIDC-based credential assumption.
  • Test Thoroughly: Test your new OIDC-enabled pipelines in a non-production environment first. Verify permissions and ensure logs provide sufficient detail for debugging.
  • Plan Your Migration: Don't try to switch everything at once. Plan a phased migration, starting with the most critical or highest-risk credentials.
  • Educate Your Team: Ensure your development and operations teams understand the new credential model and its benefits.

Conclusion

The transition from static access keys to OIDC-driven ephemeral identities for our CI/CD pipelines was a pivotal moment in our security journey. It wasn't just a technical upgrade; it was a fundamental shift in our approach to credential management, moving us towards a true zero-trust model for automation. The 70% reduction in credential-related security incidents and the dramatic simplification of operations speak volumes about its impact.

If you're still relying on long-lived static keys for your automated workflows, I urge you to investigate OIDC. It’s a powerful standard that, when implemented correctly, can significantly harden your security posture and save your team from those Friday night panic pages. Start small, understand the claims, and build trust cautiously. Your future self (and your security team) will thank you.

What are your experiences with OIDC or other credential management strategies in CI/CD? Share your insights in the comments below!

Tags:

Post a Comment

0 Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!