What I keep finding on the way to your crown jewels

The last zero-day I used on a red team engagement, I can't remember. I don't think I ever have. Twelve years of breaking into companies that pay me to break into them, and I've never once needed an exploit developer, a CVE, or a novel technique to achieve my objectives. I've used and leveraged exploits and novel techniques, certainly, especially for initial access. But most often, the credentials were already there.

This isn't bragging. Any competent red teamer will tell you the same thing. I'm saying it because it matters for how you think about risk. Every post-incident write-up and tabletop exercise assumes the attacker needs to do something. Find the unpatched library. Develop the exploit. Move laterally. The reality I see, month after month, is that the attacker's job is mostly reading and assuming the permissions of the victim.

Here's what the work looks like.

AWS keys, good hygiene or not

First thing I do when I land on a developer laptop is cat ~/.aws/credentials. Sometimes there's a long-lived access key sitting in plaintext. Sometimes there's a profile backed by aws sso login with a cached session in ~/.aws/sso/cache/. Sometimes, rarely, the keys aren't on disk at all because the team keeps everything in Keychain and requires MFA on every role assumption. What I want you to notice is that none of these three cases changes the story I'm about to tell.

On one engagement the key in the default profile hadn't been touched in seven hundred days. Long-lived, plaintext, active. The compliance dashboard said all IAM user keys rotated every ninety days, and the dashboard was telling the truth about IAM. What it wasn't tracking was what the laptop looked like. New keys had been issued on schedule. The old ones had never been revoked, because the rotation policy was "issue a new key," not "deactivate the old one." The developer, reasonably, had never re-run aws configure. Why would they? Their stuff kept working.

On another engagement the team had done everything you'd want. No long-lived keys anywhere. Every engineer used aws sso login against an Okta-backed Identity Center. Role sessions capped at eight hours. MFA required. Their dev role was scoped tight: no S3, no Secrets Manager, just a handful of staging resources. Textbook.

I didn't need long-lived keys. I didn't need to bypass MFA. I just needed to be on the laptop while they were working, and the SSO session in the cache was valid until the end of their day. On paper that session could do almost nothing. In practice it could do one thing that mattered: sts:AssumeRole into a shared-services account.

The shared-services account had been set up years earlier so pipelines could reach a common artifact bucket. Over time it had accumulated read access to half a dozen data stores across the org, because it was the only account every team trusted. Hop one, the dev role scoped to staging. Hop two, shared-services, assumed through that dev role, with read access to production customer data. The IAM graph never looked dangerous from any single vantage point. End to end, from a laptop disk to customer records, it was four minutes of work.

The EDR was green. The IAM access analyzer was green. The compliance dashboard was green. The only thing red was the credential state sitting on one developer's laptop, chained against a trust graph nobody had walked end to end since the shared-services account was stood up.

The GCP service account

Three weeks into a larger engagement against a $3B customer-support-services company, I was trying to reach their GCP environment. I had access to JAMF through a social-engineered help desk ticket and a dumb misconfiguration, but the real data was gated behind SRE-only laptops.

I spent several days mapping the SRE team through AD groups and Confluence. One name came up constantly in infrastructure-automation docs. I pushed a signed osascript through their macOS management platform to that one laptop. On disk, in a folder called ~/projects/infrastructure-automation/, was a file named prod-data-pipeline-sa.json. A GCP service account key. Plaintext JSON with the private key and client email right there.

That one file had roles/cloudsql.client on the primary customer database, roles/storage.objectViewer on three dozen buckets, and the one that gets me every time, roles/cloudkms.cryptoKeyDecrypter on the default keyring. Encryption at rest doesn't mean much when the compromised identity is authorized to decrypt.

A short BigQuery query:

SELECT table_id, row_count, size_bytes
FROM `project.customer_data.__TABLES__`

Four point two million customer records across twelve tables.

That service account key had been on that one SRE's laptop for at least the past year. I know because I could see the file's creation date. Nobody was watching that file. Nobody could have told you that the identity sitting in a developer's home folder could decrypt and read four point two million customer records across twelve tables in a different account. The EDR on that laptop reported no issues, of course, because nothing unusual was happening. It was just a file.

A risky CI runner

On a different engagement the credential I wanted was on a machine nobody thought of as a machine.

They ran self-hosted GitHub Actions runners on dedicated VMs for internal CI. Dozens of them, provisioned by Terraform, humming along for months. The runners pulled jobs from any internal repo. A nightly workflow on one repo authenticated to the production container registry to push new base images. That workflow wrote the registry token into ~/.docker/config.json on the runner and, because the runner wasn't ephemeral and the workspace wasn't cleaned between jobs, left it there.

I didn't need to break into anything. I opened a PR on an unrelated internal repo with a three-line workflow step: cat ~/.docker/config.json. The runner dutifully picked up my PR, ran the job, and printed the registry token into the CI log, which my branch had permission to read.

That token let me push arbitrary images to the production registry namespace. The production Kubernetes cluster pulled from that registry and trusted anything signed by the org's key, which the token was authorized to use. I didn't push anything. I wrote a finding that said I could have replaced the next base image update with code of my choosing, and closed the PR.

The runner was inventoried and even had an EDR agent on it. Nothing watching it had any concept of "credentials that one job wrote, that a different job running as a different trust principal can read." That risk class doesn't really exist in the model the tools were built to evaluate.

The `.env` pointed at "staging"

A junior engineer on their second week, a repo README that said cp .env.example .env and fill in your keys, and a .env file that had been sitting on their laptop ever since.

The entry I cared about was a DATABASE_URL pointing at what everyone on the team called staging. I ran one psql query against it. It returned full names, emails, dates of birth, and social security numbers. Staging, it turned out, was a weekly restore of production with no scrubbing. Somebody had proposed a pipeline to tokenize PII before the restore three quarters earlier. It was still in the backlog. The data platform team kept the restore job running because analysts depended on it and nobody had raised a flag loud enough to block them.

So "staging" was production. A .env file on a second-week intern's laptop, protected by nothing beyond the same disk encryption every other file on the laptop had, contained read credentials to what was effectively a copy of the customer database, SSNs included.

Nobody had done anything malicious. The intern hadn't done anything wrong. The platform team had rotated the staging database endpoint in their Kubernetes configs and in the secrets manager, and both rotations looked complete from the inside. What they hadn't rotated, because they didn't know it existed, was the .env file on one laptop in one repo.

The internal git history graveyard

Secret scanners catch the commit going to production. Peer review or your LLM catches it on the PR. Most of your repos and commit histories, most of the time, are clean. But that isn't where the credentials live.

Internal repos are a different story. Risk tolerance is higher because the audience is narrower. A developer knows a live DATABASE_URL with creds pasted into a test fixture won't ship to customers, so it goes in. Someone notices a week later, opens a follow-up commit that replaces the string with a placeholder, and moves on without rotating anything. The string is still in the history. The credential is still live, because nobody rotated it and the database monitoring doesn't care who reads.

On one engagement I pulled a valid GitHub PAT off a developer's laptop. Nothing clever, it was just in ~/.config/gh/hosts.yml, exactly where gh auth login puts it. The token had repo scope across the entire org, because that was the default the platform team had set when they rolled out the CLI. From that one token I could clone every private repo the org had, just over four hundred of them, and run a secret extractor across the full history of each.

What shook out was dozens of AWS keys of varying scope, four live Datadog API keys, a Slack bot token that we didn't leverage though it was interesting, several database URLs pointing at production replicas of customer data, a service account JWTs for an internal data system still valid. Of those findings, a handful had follow-up commits with messages like "update secret."

The CircleCI incident exposed this pattern at industry scale. When an attacker lands on a box that holds tokens to an internal source-control platform, you get the exercise of adding hundreds of repos to a spreadsheet, analyzing them, and having conversations with a thousand people on rotating all of those credentials.

The endpoint-side of this story is quiet, too. A GitHub PAT in a config file. Priliveged app session cookies in a browser database file. SSH keys without passphrases. Any of those, on any laptop that has them, reaches the whole graveyard in one API call.

IR Slack channels at 8pm

A different seat, same problem. A good chunk of my work wasn't purely red team operations. It was getting added to a Slack channel at an hour nobody wants to be in a Slack channel, because an incident was opened due to external impact leading back to some developer laptop and the IR lead is trying to scope what else is on fire.

The first question, every time, is some version of: what can that laptop reach?

Nobody has a current answer. The laptop's owner is asleep (or panicking). The IAM admin is in another timezone. The cloud posture tool last evaluated the IAM yesterday and doesn't know what credentials are actually sitting on that particular endpoint right now. So it's done by hand. Grep the home folder for .aws, .ssh, .kube, .docker, credentials.json, anything ending in .pem. Pull the browser state. Check the shell history. Validate credentials against its API. Then pivot from what comes back.

By hour four we usually know roughly what the attacker had access to, assuming they moved the way we would've. By hour six we have a containment plan. The hours between "EDR fired" and "we know the blast radius" are the hours that determine whether this is a contained incident or a breach notification. And they're almost entirely spent doing, by hand, the thing nobody had done preemptively.

Patterns over stories

I could keep going. The GitHub PAT in ~/.config/gh/hosts.yml belonging to someone who left the company eight months ago, still an org owner. The kube config with a service account token that was minted without an expiration because "we'll rotate it in the next sprint" five sprints ago. The contractor laptop whose browser had a saved session cookie for the production admin panel. The .ssh/id_rsa without a passphrase whose known_hosts included three customer-facing bastions.

The stories are different but the shape is the same.

A credential exists somewhere nobody's thinking about it.
The credential is still valid.
The credential's actual reach is far larger than anybody currently mapping risk in that company believes.

That third point is the one I want you to sit with. We have a whole industry of tools pointed at the first two. Secret scanners in repos. Credential hygiene reports. IAM policy analyzers. CSPM dashboards. Least-privilege reports. They're fine at what they do. What they don't do is hold the credential up against the live environment and ask: if I had this right now, where could I actually go?

The answer changes every day. New roles get attached to service accounts. New trust relationships open between accounts. New data lands in buckets that an identity can read. The credential you mapped last month isn't the same credential this month, because the world it can reach has moved underneath it.

Why red teams find this and your controls don't

When I find a credential on an engagement, I do two things. I validate it by hitting the API and seeing what comes back. Then I pivot. From the response, I learn about the next thing I can reach. Cross-account AssumeRoles. Federated identities. Service-account impersonation chains. Network paths that only become visible once you're inside the VPC. That second step is the one that turns a finding into a breach.

No tool your company is running does that second step continuously. EDR watches processes. SIEM watches logs. A cloud posture tool knows the IAM graph at the moment it was last evaluated, but it doesn't know which credentials are actually sitting on which endpoints to light up that graph. Secret scanners find the string. They don't validate it. They don't pivot from it.

Red teams find this once a year, if you're lucky and you pay well. Even good internal red team operations are iterating in single digit operations annually. IR teams find things under time pressure, after some alert has already gone off (or not). Attackers find it every day. The attacker advantage isn't necessarily skill, but tempo. They get to look continuously. You get to look when procurement cycles align and a consultant is available, or when the pager goes off and there's no choice.

There's a version of this argument that ends with "and that's why you need to rotate credentials more often" or "that's why you need tighter IAM." Those are both true and both insufficient. You can't rotate what you don't know exists. You can't tighten the blast radius of a key when nobody has modeled what the key actually reaches today. The gap isn't between best practice and current practice. The gap is between what exists on the endpoint and what anybody has bothered to map.

Closing

I've been thinking about this problem for a long time, because I kept doing the same thing over and over on engagements and it kept working, and then kept getting called in on the response side to do the same work under worse conditions. That's what convinced me it isn't a red team parlor trick. It's a structural gap in how security is instrumented. Continuous detection exists for behavior. It doesn't exist for reach.

That's what I'm building toward with Puck. A lightweight agent that does the two things I do on an engagement and the two things I do on an incident call: find the credentials already sitting on an endpoint, and validate what they actually reach, continuously, on every endpoint, so the blast radius of what's already there is something you can see without hiring me and without waiting for the alert.

The open-source one-shot version, Puck Scout, is where I'd start. Run it against one of your own laptops. See what falls out. You might be surprised. I stopped being surprised a long time ago.