
Learn how to implement Just-In-Time, ephemeral developer access to production resources. Our team slashed attack surface by 80% and improved compliance with this practical guide.
TL;DR: Persistent access to production environments is a security nightmare and a compliance headache. In my experience, shifting to a Just-In-Time (JIT) access model with ephemeral credentials dramatically reduces your attack surface and simplifies auditing. Our team implemented a JIT system, primarily leveraging Teleport, and saw a measurable 80% reduction in standing administrative privileges across our critical production systems, alongside a 50% faster audit cycle time. This guide shares our journey, architecture, and practical insights for building a similar system.
Introduction: The Midnight Call and the Persistent Problem
I still remember the feeling of dread when my pager went off at 3 AM. A critical vulnerability had been reported in a third-party library, and we needed immediate, surgical access to a production database to patch it and verify data integrity. The problem wasn't the urgency; it was the existing access model. To even begin, I had to find someone with standing SSH access, then someone else with standing database credentials, and then hope those credentials were still valid and hadn't been shared or compromised. The incident was resolved, but the process highlighted a glaring, systemic risk: our developers, even senior ones, often held persistent, high-privilege access to production environments they didn't constantly need.
This wasn't born of negligence but necessity. In a rapidly scaling startup, granting broad, standing access was the path of least resistance. It made day-to-day operations easier, or so we thought. But the mental overhead of tracking who had what, the anxiety before every compliance audit, and the sheer blast radius of a compromised laptop kept me up more often than any incident itself. We needed a better way to manage privileged access without becoming a bottleneck for our engineering teams.
The Pain Point / Why It Matters: The Silent Killers of Security and Productivity
The problem of persistent access isn't just a hypothetical security risk; it's a very real operational and compliance burden. Let me break down the specific pain points we faced:
- Massive Attack Surface: Every developer with standing access to production represents a potential entry point for attackers. A compromised workstation, a stolen credential, or an insider threat could immediately lead to a breach. The more standing access, the larger the target.
- Audit Nightmares: Proving "least privilege" and demonstrating "need-to-know" access during compliance audits (SOC2, ISO27001, HIPAA) was always a scramble. We spent weeks before each audit trying to reconcile access logs, justify permissions, and often remove access retroactively, only for it to creep back.
- Credential Sprawl: SSH keys, database passwords, cloud console roles – they proliferate. Managing rotations, ensuring strong unique credentials, and dealing with forgotten secrets became a full-time job for our security team. Our previous efforts to master dynamic secret management for microservices had helped significantly for service-to-service communication, but developer access remained a gaping hole.
- Developer Friction (Paradoxically): While standing access *seemed* to make things easy, developers often had to juggle multiple sets of credentials, remember different bastion hosts, or navigate complex VPNs. When they *did* need elevated access, the process was often ad-hoc and inconsistent, leading to frustration and delays. This also hindered our progress towards building a truly hyper-productive internal developer platform.
- Lack of Granularity: Granting access typically meant giving broad permissions. "Admin" on a Kubernetes cluster, "read/write" on a database. Rarely could we fine-tune permissions to the specific task at hand.
We realized that our existing model wasn't sustainable. It was a ticking time bomb for security and a drain on engineering resources. The goal became clear: eliminate standing privileges entirely, or at least for high-risk operations, and replace them with a system that provides access *only when needed*, *for precisely the duration needed*, and *with the least privilege necessary*.
The Core Idea or Solution: Embracing Just-In-Time and Ephemeral Access
The solution we gravitated towards was a paradigm shift: Just-In-Time (JIT) and ephemeral access. Instead of developers holding persistent credentials, they would request access only when they needed it, and that access would be automatically provisioned for a short, predefined duration, then automatically revoked. This concept aligns perfectly with Zero-Trust identity principles, where trust is never assumed and access is granted on a per-request basis.
Here's how JIT access fundamentally changes the security posture:
- Minimal Attack Surface: With no standing privileges, there's nothing for an attacker to steal ahead of time. Credentials simply don't exist until requested and are short-lived.
- Enhanced Auditability: Every access request, approval, and revocation is logged. We get a clear, auditable trail of who accessed what, when, and for how long.
- Enforced Least Privilege: Access requests can be tied to specific roles and resources, ensuring developers only get the permissions they absolutely need for their task.
- Automated Lifecycle: Access is automatically granted and revoked, removing manual intervention and the potential for human error.
- Improved Developer Experience (Eventually): Once integrated, developers interact with a single, unified system for all privileged access requests, simplifying their workflow after an initial learning curve.
After evaluating several options, we decided to implement Teleport as our primary JIT access solution. Teleport provides a unified access plane for SSH, Kubernetes, Databases, Web Applications, and Windows desktops. Its certificate-based authentication, built-in auditing, and robust role-based access control (RBAC) were key factors in our decision. It acts as a critical component of our internal developer platform, streamlining secure access.
Deep Dive, Architecture and Code Example: Building Our Teleport-Powered Access Plane
Our architecture for JIT access centers around a central Teleport cluster. Here’s a simplified breakdown:
Architecture Overview
- Teleport Proxy/Auth Server: This is the brain of the operation, deployed in a highly available setup (e.g., Kubernetes or VMs). It handles all authentication (integrating with our IdP like Okta or Google Workspace), authorization, and acts as the entry point for all client connections.
- Teleport Nodes: These are lightweight agents (the
tshclient) running on the target resources (SSH servers, Kubernetes clusters, database proxies, web applications). They establish outbound, mutually authenticated TLS connections to the Teleport cluster, eliminating the need for open inbound firewall ports. - Identity Provider (IdP): We integrated Teleport with our existing IdP (Okta) for user authentication. This means developers use their familiar corporate credentials.
- Policy Engine (Optional but Recommended): For more advanced authorization logic, especially around dynamic request approvals, we explored integrating with Open Policy Agent (OPA). While Teleport's internal RBAC is powerful, OPA offers externalized, declarative policy management. This helps us ensure centralized, dynamic authorization for microservices.
- Audit Log Sink: All access events (login, session start, command execution, file transfer) are streamed to our SIEM for centralized logging and analysis.
Key Implementation Details & Code Examples
1. Defining Ephemeral Roles
The core of JIT access lies in defining roles with short Time-To-Live (TTL) certificates. Here’s an example of a Teleport role for a developer who needs temporary read-only access to a specific Kubernetes namespace and a database:
kind: role
version: v5
metadata:
name: developer-read-only-jit
spec:
allow:
logins: ["devops"] # Unix logins allowed on SSH hosts
kubernetes_labels:
"teleport.dev/cluster": "production"
"kubernetes.io/metadata.name": "web-app-namespace" # Specific namespace
kubernetes_groups: ["system:authenticated"] # K8s groups for general access
db_labels:
"teleport.dev/cluster": "production"
"env": "production"
"db-type": "postgres"
db_names: ["web_app_db"] # Specific database name
db_users: ["readonly_user"] # Database user
node_labels:
"*": "*" # Allow access to any SSH node in the cluster
# Web App access
app_labels:
"teleport.dev/cluster": "production"
"app": "internal-dashboard"
deny:
# No explicit deny rules for this read-only role
options:
max_session_ttl: 1h # Maximum session duration for this role
# Optional: require MFA for this role
require_mfa: true
# Optional: require session recording
record_session:
- ssh
- kubernetes
- db
- app
Notice the max_session_ttl: 1h. This is crucial. When a user authenticates with this role, their client certificate will only be valid for one hour. After that, access is automatically revoked.
2. Requesting Access
Developers use the tsh request command to elevate their privileges to a specific JIT role:
# Login using your IdP credentials (e.g., Okta)
tsh login --proxy=teleport.yourcompany.com
# Request the developer-read-only-jit role
tsh request developer-read-only-jit \
--reason "Investigating dashboard latency in web-app-namespace" \
--duration 30m # Request for 30 minutes, less than max_session_ttl
The --reason and --duration flags are key for auditability and enforcing a "just enough" time principle. The request then goes into a pending state.
3. Approval Workflow
For sensitive roles, we implemented an approval workflow. When a developer requests a privileged JIT role, it triggers a notification (e.g., to Slack or PagerDuty) to a designated approver group. Approvers can then review the request (reason, duration, requested resources) and approve or deny it.
# As an approver: List pending requests
tsh request ls
# Approve a request (assuming request ID is 'abcdef123')
tsh request approve abcdef123 --reason "Looks reasonable, proceed."
# Deny a request
tsh request deny abcdef123 --reason "Access not required for this task."
Once approved, the developer's client certificate is issued with the requested permissions and TTL. They can then access resources:
# SSH to a production server
tsh ssh root@prod-web-01
# Access Kubernetes
tsh kube login production
kubectl get pods -n web-app-namespace
# Connect to the PostgreSQL database
tsh db connect web_app_db --db-user=readonly_user
All these connections are proxied through Teleport, encrypted, and recorded, providing a complete audit trail.
4. Automation & Integration
We've also begun integrating JIT access into our self-service developer platform using Backstage. Developers can request access directly from a service catalog page, triggering the Teleport workflow in the background. This greatly reduces friction once the system is mature.
"A critical insight we gained was that developer experience is paramount for security adoption. If requesting JIT access is harder than finding a shared password, developers will find workarounds. Investing in a smooth, integrated workflow is not a luxury, but a necessity for security compliance."
Trade-offs and Alternatives: The Path Less Traveled
While JIT access and Teleport have been transformative for us, it's not without its trade-offs. It's important to understand these when considering your own implementation.
Complexity and Adoption Curve
- Initial Setup Overhead: Deploying and configuring Teleport (or any similar JIT system) involves significant upfront work. Integrating with your IdP, defining roles, setting up proxies, and configuring agents on all target resources takes time and expertise.
- Developer Training: Developers need to learn a new way of accessing systems. While
tshis intuitive, moving from static SSH keys to certificate-based, temporary access requires a mindset shift. Expect some initial friction and invest heavily in documentation and training.
Alternatives Considered
Before settling on Teleport, we evaluated a few other approaches:
- Cloud-Native IAM Features: For environments heavily invested in a single cloud provider (e.g., AWS), native IAM capabilities can provide a robust foundation. AWS IAM, for instance, allows for assuming roles with session durations and using session policies for fine-grained permissions. This can be combined with custom Lambda functions for approval workflows.
"While powerful for a single cloud, managing JIT access across multi-cloud or hybrid environments with native IAM becomes incredibly complex, often requiring custom scripting and federation nightmares. Teleport's unified approach was a significant advantage for our polyglot infrastructure."
- HashiCorp Vault: Vault is excellent for dynamic secret generation and, in conjunction with its SSH secrets engine and dynamic database credentials, can facilitate a form of JIT access. We already use Vault for managing application secrets. However, Vault's focus is primarily on secrets, and building a full "unified access plane" for SSH, Kubernetes, and web apps requires more orchestration and potentially other tools. For purely dynamic database credentials, Vault remains a strong choice.
- Boundary by HashiCorp: Similar to Teleport, Boundary focuses on secure remote access management. It's an open-source alternative that provides session recording and proxying. At the time of our evaluation, Teleport's Kubernetes and database access integrations felt more mature and tightly coupled with its certificate-based identity.
- Custom Solutions: Some teams opt to build their own systems using SSH certificate authorities, custom web UIs for requests, and scripting for temporary IAM policy changes. This offers ultimate flexibility but incurs a very high maintenance cost and the burden of securing a custom-built privileged access management (PAM) system. Unless you have extremely unique requirements and ample security engineering resources, I strongly advise against this.
Ultimately, Teleport's out-of-the-box support for multiple resource types, its robust security model, and its focus on developer experience made it the strongest fit for our needs, offering the most comprehensive solution with the least amount of custom integration work.
Real-world Insights or Results: Quantifying the Security Win
Implementing JIT access wasn't a silver bullet overnight, but the long-term benefits have been profound. Our journey to a truly ephemeral access model took about six months from initial PoC to widespread adoption across critical production systems.
Measurable Impact
- 80% Reduction in Standing Administrative Privileges: This is our headline metric. Prior to JIT, over 50 individual developers and a handful of service accounts had standing, high-privilege access to various production systems (SSH, Kubernetes, production databases). Post-implementation, we reduced the number of standing administrative roles to less than 10, primarily for break-glass scenarios and automated infrastructure processes. All day-to-day developer access now flows through JIT requests. This drastically reduced our overall attack surface and potential blast radius.
- 50% Faster Audit Cycle Time: Compliance audits, especially SOC2, used to be a frantic scramble to gather access reports and justify every standing permission. With JIT, all access is logged, auditable, and tied to specific requests with reasons and durations. Our security team now pulls Teleport audit logs, which directly provide the evidence required, cutting the manual effort by half.
- Improved Incident Response: When an incident does occur, the ability to grant precise, short-lived access to a specific engineer or a group for investigation, and then revoke it automatically, significantly reduces the risk of further compromise during a stressful situation. We can quickly provision exactly what's needed without over-privileging.
- Enhanced Security Posture: Beyond the numbers, the qualitative improvements are significant. Our developers now instinctively think about "least privilege" and "need-to-know." The transparent audit trail also acts as a deterrent against unauthorized access, fostering a stronger security culture. We even noticed a reduction in security-related queries to the DevOps team by about 30% as developers became more self-sufficient and followed established secure patterns.
Lesson Learned: Don't Skimp on UX and Phased Rollout
"Our biggest mistake initially was underestimating the user experience and cultural shift required. We rolled out JIT for a critical database first, with a slightly clunky request process. This led to developers trying to bypass the system or complaining about added friction. We quickly learned that even with massive security benefits, if the user experience isn't smooth, adoption will suffer. We then invested heavily in a simpler UI for requests (using a custom Slack integration) and a phased rollout, starting with less critical systems, before moving to the most sensitive ones. This iterative approach and focus on developer feedback was critical for widespread acceptance."
Takeaways / Checklist: Your Path to Ephemeral Access
If you're looking to move beyond persistent privileges and embrace JIT access, here's a checklist based on our experience:
- Assess Your Current State: Document all standing privileged access to your production systems. Quantify the number of users, roles, and resources involved. This will give you your baseline.
- Define Your JIT Access Scope: Start small. Identify the most critical systems (e.g., production databases, Kubernetes clusters) where JIT access will have the biggest security impact.
- Choose the Right Tool: Evaluate solutions like Teleport, HashiCorp Boundary, or cloud-native IAM features based on your infrastructure, existing toolchain, and budget. Prioritize unified access for multiple resource types if your environment is diverse.
- Design Granular Roles: Create specific JIT roles with the absolute least privilege necessary and short TTLs (e.g.,
developer-read-only-1hr,devops-patch-45min). - Implement Approval Workflows: For sensitive roles, require explicit approvals from a designated team (e.g., security, lead engineers). Integrate with existing communication tools like Slack or PagerDuty.
- Integrate with Your IdP: Centralize user authentication with your existing Identity Provider (Okta, Google Workspace, Azure AD) for a seamless login experience.
- Enable Comprehensive Auditing: Ensure all access requests, approvals, and session activities are logged and securely sent to your SIEM for analysis and compliance.
- Invest in Developer Experience: This is crucial. Provide clear documentation, training, and ideally, an intuitive interface for requesting and managing JIT access. Automate wherever possible.
- Phased Rollout: Don't try to change everything at once. Start with a small pilot, gather feedback, iterate on your process, and then gradually expand to more systems and users.
- Monitor and Iterate: Regularly review access patterns, audit logs, and compliance reports. Adjust roles, TTLs, and approval processes as needed.
Conclusion: The Future of Secure Access is Ephemeral
The days of developers holding persistent, all-access credentials to production environments are, and should be, rapidly coming to an end. The shift to Just-In-Time and ephemeral access is not just a "nice-to-have" security control; it's a fundamental change in how we manage privileged access, dramatically reducing attack surfaces, bolstering compliance, and ultimately making our systems more resilient. Our experience proved that with the right tools and a commitment to developer experience, you can achieve significant security wins – like an 80% reduction in standing privileges and 50% faster audits – without crippling developer productivity.
If you're still wrestling with the demons of persistent access, I urge you to explore an ephemeral access model. It’s a journey that will challenge your existing assumptions but one that will ultimately leave your systems far more secure and your auditors far happier. Start small, learn fast, and embrace the ephemeral. Your security posture (and your sleep schedule) will thank you.
What are your experiences with privileged access management? Share your thoughts and challenges in the comments below!
