03 Jul 2026

feedKernel Planet

Matthew Garrett: Securing agentic identity

As is the case for many people working in the security industry, the last few months of my life have been focused on dealing with people wanting to use LLMs everywhere. From an enterprise security perspective that's not an inherent problem - what's more of a problem is that people want those agents to have access to resources like their calendar and email and so on, and now we have somewhat non-deterministic agents that seem very enthusiastic to achieve what you asked whether that's a good idea or not, and we're combining this with credentials that give them access to sensitive data, and leaving those credentials on disk where they can be committed into git repos or exfiltrated to some other service to make use of them on the agent's behalf or well just any other number of things, at which point your CEO's email is suddenly readable by everyone and you're having a bad day.

As I mentioned in my last post, pretty much every strong mechanism for keeping credentials in place is just not supported in the wider world. We can imagine a universe where agents use hardware (or at least hypervisor) backed certificates to obtain credentials and any that end up leaking are worthless as a result. But, sadly, that's not an option for most people using existing identity providers. The state of the art is that you use the device code flow and a human authenticates and the token ends up back inside the agent environment and then it proceeds to do whatever it wants with it and you just hope that you wake up the next morning without an awful infoleak occurring.

(An aside: I do not like the device code flow as used in enterprise environments, and I never will. The identity provider doesn't have a real opportuity to inspect the security posture of the system asking for the token, and as a result some identity providers will restrict tokens that are issued in this way. The common alternative of doing stuff using a more standard flow and having a redirect URI pointing at localhost works fine for local systems and is a pain for remote ones, even if you can commit crimes with SSH forwarding. I'm going to suggest something that I think is better, and you are free to disagree)

I'm not in a position to get every identity provider and service provider to change their security posture, so I'm somewhat stuck in terms of the tokens they're willing to issue me - largely either JWTs or opaque access tokens, with no support for any mechanism of binding that token to an instance. The token that's going to have to be provided to the remote service is something I have little influence over. But that doesn't mean I can't influence the token that lands inside the agent's environment. I can issue a placeholder token to the agent, and force it to communicate via a proxy that swaps out the placeholder for the real thing. The worst the agent can do is exfiltrate the placeholder token, and as long as malicious actors don't have access to that proxy, it doesn't matter - nobody else can do anything with the placeholder.

This isn't a terribly novel insight, and it seems like almost everybody has reinvented this on their own. But a lot of these implementations involve you somehow obtaining the real token in advance and then pasting that into something that generates a placeholder that you provide to your agent environment somehow, and it's all a bit clunky and awkward, and it also means that you need to deal with something that keeps track of the mapping between placeholders and real tokens and oh no we've just invented a secret store, and if you want this to work at scale and reliably you're just invented a high availability distributed secret store, and a lot of people who've read that are now shaking their heads and reaching for gin. Can we simplify this, and improve security at the same time? I think we can!

Remember when I said "as long as malicious actors don't have access to that proxy, it doesn't matter"? What if they do? What if they compromise one machine inside your environment and are then able to email a bunch of employees and convince their agents to send more tokens back to them and then delete the email before a human reads it? Now you have someone inside the wall with access to those tokens, and presumably with access to the proxy, and now they can be anyone whose agent was gullible enough to think sending them a token was a good idea. This isn't good!

So, I thought for a while, and I came up with a new idea. We can have a broker service that obtains credentials for us. We can run that centrally, away from the agents. A client in an agentic environment can request a token, and that can result in a URL being generated and the user being directed to open a URL in a browser and authenticate. When the user authenticates, the authentication flow redirects the confirmation back via the broker, and the broker obtains the real auth token. The obvious thing to do now would be to return the auth token to the client in the agentic environment, but we don't do that. Instead, we mint a new JWT, and add a new claim - one that contains an encrypted copy of the token. In the process we can copy over all the original claims, because those aren't secret - and now even if the client inspects the token to figure out what access it has, it'll get a correct answer. We sign the new token with our own signing key, and pass that back to the client. The client now has a legitimate JWT that is utterly useless, because the signature isn't trusted by anyone other than us.

How does it use it? It makes an API request via a proxy, including the new token in the Authorization: header. The proxy verifies the signature on the token, and then decrypts the original token and swaps out the fake token for the real one. The remote API sees what it expects, and everyone is happy. There's never a real token in the agentic environment, but also we don't need to store anyting anywhere. The only state is the encryption keys, and those can be injected into the environment at startup. You need to scale? Just start more of these processes. You need to support multiple availability zones? Just start more of these processes in different places. No persistent data is ever held in the broker or the proxy. You don't need to care about distributed databases or secret stores.

This felt wonderfully elegant and I felt smug about coming up with a better idea, and then I went to a bar earlier this week and sat down to read RFC 8705 and the guy next to me saw that over my shoulder and asked what I was reading and I explained why I was interested and we talked about agentic identity and then he mentioned that fly.io had something that sounded very similar and I read that and gosh yes it is very similar, so damn you fly.io for stealing my ideas 3 years before I even had them. Anyway. Now I need to do better.

Remember that there's still a risk around anyone who has access to the proxy having access to the encrypted keys? We can remove that risk as well. It's not uncommon for agentic environments to have an identity issued via something like SPIFFE, at which point they have a client certificate. You can probably guess where I'm going with this. If we require that an agent present a client cert to the broker when requesting a token, we can embed a representation of that client cert into the token we mint. The proxy can then require mTLS for the client connection, and can verify that the presented certificate matches the one represented in the token. If it does then whoever's using the token has access to the private key associated with the environment it was issued to. If we then ensure that the private keys backing these certificates are either hardware or hypervisor backed, and as such tied to a specific instance, we now have a high degree of confidence that the token can only be used in its intended environment. Even if our identity provider doesn't support RFC 8705, we can.

This is fairly straightforward where you're using a platform where your identity provider is also the environment that's consuming your tokens, and more annoying for third parties. The broker potentially needs some amount of third party vendor knowledge to make that work for everyone. This is even more the case where login isn't via your identity provider (thanks, github), but none of this is insurmountable - just annoying. And where vendors issue opaque tokens rather than JWTs, this still isn't a problem; we can just mint a new JWT that includes the opaque token as an encrypted claim, and include the same certificate binding. The opaque token ends up being the thing that's presented to the third party, but only after we've verified the mTLS binding.

In an ideal world none of this would be necessary - someone would spin up a new agentic environment, a user would prove their identity, and a certificate embodying that identity would be issued to the environment with a private key that can't be exfiltrated. That certificate would be sufficient to obtain new certificates associated with the same private key, and we could still bind that into mTLS identity. This would be much simpler, but browsers don't support it, so it's not likely to happen any time soon.

Anyway. Even if we can't have the best thing, we can do better than we are at the moment, and also it would be lovely if we could standardise on this rather than have everyone build their own thing. The end.

03 Jul 2026 12:38am GMT

02 Jul 2026

feedKernel Planet

Matthew Garrett: Preventing token theft

When you log into a service you're given an authentication token. Each further request to the site includes that token, allowing the server to figure out who you are and ensuring that you have access to your data. Depending on site policy, this token may either be stored in memory (and so vanish if you restart your browser) or disk. The token is the proof of your identity. As far as the site is concerned, anyone with your token is you. These tokens may be traditional browser cookies, but they may also be stored in either site local storage or (if you're not using a browser) in some other storage location.

In recent years we've seen infostealer malware (like LummaC2) gain the ability to exfiltrate user tokens, allowing attackers to gain access to the user's data without needing to retain access to the user's machine. This attack is viable even if the site has strong MFA requirements, so passkeys don't help. Encrypting the tokens on disk doesn't prevent the malware from scraping them out of the browser's RAM or obtaining whatever key is used to encrypt them. This feels like a pretty hard problem to solve.

But that hasn't stopped people from trying! Dirk Balfanz wrote an IETF draft describing a mechanism for using self-signed certificates for TLS authentication. This uses the mutual authentication feature of the TLS protocol that requires both sides prove their identity to each other. In regular TLS, the remote site presents a signed certificate that tells you who it is. When performing mutual authentication, you then present a certificate to the remote site telling it who you are. These client certificates are largely unused outside enterprise environments because they're a huge pain to deploy. It's not so much that this has sharp edges, it's that it's entirely made of sharp edges. Managing certificate deployment to your devices is hard. Browsers get confused if the certificates change under them. You have one certificate and it lives forever, so sites you present it to can track your identity. Users are prompted to choose a certificate to authenticate with, and if they pick the wrong one everything breaks and is hard to recover. I've deployed this and I did not have a good time.

But Balfanz's idea was simple. Rather than require certificates to be deployed, browsers would simply generate a certificate on the fly. The goal wasn't to prove the device or user's identity in any global way - but it would associate a TLS session with a specific certificate. You could then, for example, include a hash of the certificate in the cookie, and if someone tried to use that cookie without presenting that certificate then the cookie could be rejected. If the browser used a hardware-backed private key for the certificate then it would be impossible for an attacker to steal it. Sure, you could still steal cookies, but you wouldn't be able to use them.

This was written almost 15 years ago, and seems simple, elegant, and functional. It didn't happen. Part of the reason for that is that, well, it wasn't quite so simple. One problem was privacy related. Cookies are only sent after the TLS session is established, so anyone monitoring the network doesn't know anything about the user identity. A naive implementation of this approach would have meant the client certificate being sent before session establishment, and now user identity can be tracked (no longer an issue if this was implemented on top of TLS 1.3, but this was a log time ago). This was avoided by reordering the client handshake, but that meant having to modify the TLS specification and implementations would have to be updated to support this. Another was that figuring out the granularity of the certificates was difficult. You'd want to use different certificates for every site to avoid them effectively becoming tracking cookies, but you need to provide the certificate before cookies are set, and you don't know what origin the site is going to set in its cookies. If you generate a certificate for a.example.com and a different one for b.example.com, and a.example.com sets a cookie for *.example.com and includes the certificate you used for a.example.com, that cookie isn't going to work on b.example.com and things are broken. This meant supporting it wasn't as straightforward as it seemed - you'd need to ensure that your cookie scope was compatible with the certificate scope. You could probably make this work well enough by aligning it with the Public Suffix List, but there was still some risk of expectations not being aligned.

And, perhaps most importantly, TLS session resumption (replaced by pre-shared keys in TLS 1.3) somewhat defeats the purpose of the exercise - clients store state that allows them to re-establish a TLS connection without performing certificate exchange (this reduces overhead if a connection gets interrupted or you switch to a new network or anything along those lines), and anyone in a position to steal cookies could steal that state as well.

The followup attempt was channel IDs. This simplified the implementation somewhat - rather than certificates, a raw public key would be sent, along with proof of possession of the private key in the form of a signature over a portion of the TLS handshake. This was required even in the event of session resumption, which avoided having to worry about theft of session secrets. The timing of the exchange was after the encrypted session had been established, so user identity couldn't be leaked that way either. Cookies could then be bound to this identifier. Unfortunately it didn't really deal with the problem of scoping keys in a way that would match cookie requirements, and the spec suggests that the right way of handling this is to scope keys to TLDs, which would enable user tracking across sites (Chrome's implementation apparently restricted it to eTLD+1, which would match the third party cookie policy and avoid the tracking risk).

Chrome added support for this, but it was removed in early 2018. The discussion of some of the pain points in that message is interesting, explicitly calling out problems with connection coalescing across domains and the incompatibility with zero-RTT TLS1.3. The overall consensus at the time seems to be that trying to solve this entirely at the TLS layer has too many rough edges, and a different approach should be taken.

And so almost 7 years after the initial draft for origin bound certificates, we come to token binding. This ended up being a rather more complex endeavour, covering 3 different RFCs describing how it impacts TLS, how to incorporate it into HTTP, and how to manage all the various parties involved in the process. The short version is that it's pretty similar to channel ID, except that there's also a documented mechanism for allowing tokens to be bound to one party and consumed by another, avoiding any need for widely scoped keys. Token binding effectively solved all the issues in the original proposal, but at the cost of somewhat more complexity.

The RFC was finalised in October 2018. Chrome removed its (incomplete, draft) support for token binding in November 2018. Edge carried support until late 2024. Despite getting all the way through the RFC process, it's functionally dead.

The process up until this point had been largely initiated by Google, with Microsoft contributing significantly to the token binding standards. The work had been focused on identifying a generic solution to the problem rather than tying it to any specific authentication flow. The next step was in a different direction - rather than trying to fix this for the entire internet, how about we try to fix it for OAuth?

RFC 8705 is titled "OAuth 2.0 Mutual-TLS Client Authentication and Certificate-Bound Access Tokens". This is basically the 2011 approach, but (a) with an explicit definition of how the certificate should be incorporated into issued auth cookies, and (b) with a proviso that well uh if you're going to use tokens issued by your IdP to authenticate to someone else then well you're going to need to use the same cert for both. This is probably fine for the company-owned-laptop case where you're actually fine with multiple sites being able to tie identities together (that's kind of the point here!), and also works for "I am using an app and not a browser", but doesn't work for more generic scenarios. It also doesn't seem to take the session resumption case into account at all? Support for RFC8705 seems poor, as far as I can tell of the big players only Auth0 implements it. In theory it works fine with self-signed client certs but in reality that's going to be almost as difficult to support across multiple platforms as just issuing proper client certs in the first place, so deployment is going to be kind of a pain. But the good news is it doesn't rely on any TLS extensions or custom browser behaviour, so at the client side it works fine with any browser.

Which brings us on to RFC 9449, "Demonstrating Proof of Possession". This goes even further than RFC8705 in terms of reducing the burden of deployment - it works fine with existing browsers, and it doesn't even require any certs. The client generates a keypair and provides the pubkey when requesting the cookie. The cookie contains the pubkey. Every request to the service now provides the cookie with the pubkey and also provides a signature over the URI and HTTP method. If the signature matches the pubkey in the token then clearly the signature came from the machine the token was issued to, and everything is good.

This does come with some downsides, though. The first is that it uses browser interfaces to generate the keys (typically crypto.subtle.generatekey()) and as far as I can tell there are no browsers that guarantee that that key is going to be generated in hardware even if it's marked non-exportable, so anyone able to steal the cookies can also steal the keys. The second is that the signature only covers the URI and HTTP method, and not the message content or any other headers, so anyone able to exfiltrate a valid signature can replay it against the same URI with different message content. The recommended way to handle this is to reject any signatures that weren't generated within the last few seconds, which is a wonderful additional way to allow clock skew to give you a Bad Day. And the third is that every single request has to be separately signed, which is not intrinsically a problem because computers are fast and have multiple cores, but if you're trying to solve the first problem by sticking the key in a TPM then you're dealing with something that's slow and single threaded and that's maybe acceptable if you're using client certificates (because there's going to be one signature per session and you can use the same session for multiple requests) but probably not if you're dealing with a user opening a browser that restores previous tabs and each of those is a webapp that fires off 100 requests in parallel.

In case it wasn't clear, I don't like DPoP. It doesn't feel like it actually solves the underlying problem that we see in the real world (malware running in a context where if it can grab the tokens it can grab the keys), it adds a massive amount of overhead, and it has baked in replay vulnerabilities. I don't know why it exists and I'm incredibly suspicious of vendors telling me that it fixes my problems, because if they're telling me that then I'm going to end up assuming that they either don't understand my problems or they don't understand their technology, and neither of those is good.

Still. Then we get to the thing that prompted me to write this - Chrome's announcement that they had launched device-bound session credentials. This is interesting because it's a Chrome feature that's explicitly intended to counter on-device malware, which was one of the things that was out of scope in 2018 when token binding was being removed. Since this is entire web level it doesn't have to be an RFC, and so is instead defined by W3C. I'm going to handwave all the complexity and say that it's basically a way to register a public key when a cookie is issued, and then prove possession of the private key when it's time to renew the cookie. By making the cookies shortlived and having support for rotating them in the background, user impact is basically zero and while it's still possible for an attacker to exfiltrate and use a cookie they'll only be able to do so for a short window before it needs to be refreshed - something the attacker can't do, since they don't have the private key. This avoids the DPoP overhead because you only need to do signing once per cookie per cookie lifetime, and not on every single request. I don't like this due to the window where exfiltrated tokens can be used, but it feels like a strict improvement over the status quo. An extension called device-bound session credentials for enterprise allows pre-enrollment of device keys, so even though the actual runtime DBCE flow doesn't involve certificates, certificates can be used for device registration in enterprise environments and you can make sure that auth cookies only go to trusted devices. Unfortunately this is Chrome-only, and so we're going to need to wait for it to be backported to all the random app frameworks for it to have widespread support on mobile or for almost everyone's desktop app that's actually three websites in an Electron wrapper. Mozilla's current position is that they're not in favour of it, so I guess we'll see where Safari lands in terms of broad uptake.

The last thing on my list is another client cert/OAuth binding, this one still in draft state at the time of writing. This one is aimed primarily at the use of agent-driven tooling, where you have something running in the background using a whole bunch of tools that are each acting on your behalf. Authenticating to all of them separately isn't a fun time, but giving broadly scoped access tokens to a non-deterministic agent and trusting that it'll never post them somewhere public also isn't a fun time. The key distinction between it and RFC8705 is that it's aimed at connections rather than sessions, which avoids the worries about session resumption. This is done with TLS Exporters, which in TLS 1.3 should be unique to the connection even over session resumption (TLS 1.2 may reuse some of the same key material for exporters over session resumption, so it's recommended to enforce 1.3 for this). By providing a new signature alongside the cookie on every new connection, the client proves that it still has access to the private key. This is a very new spec and I haven't had much time to work through it yet, but my naive understanding is that unlike RFC8705 this would require some additional client support to be able to regenerate the client signature on every TLS reconnection.

This doesn't avoid all the problems that RFC8705 has, including how to scope certificates. For the agentic use case that probably doesn't matter - all these tools are acting on behalf of the same user, it's fine if all the sites involved know they're the same user. But it doesn't solve the general purpose user use case, and right now DBSC seems like the best we have there.

But. Part of me still wonders whether Dirk Balfanz's approach was the right one. Yes, there's risk associated with TLS session resumption, but in the worst case you could just switch that off for high risk setups. The cookie scope argument is real, and also in cases where it could violate privacy the site owner could already choose to broaden their cookie scope and violate your privacy, and in cases where it breaks things you could just not make use of it. The other problems are largely fixed by TLS 1.3, and then we're just left with "Browsers handle client certificates badly" to which my answer is "Yes, and we should fix that anyway".

Despite having a pretty good answer to this solution over a decade ago, the closest we have to actual deployment is something that offers strictly worse security guarantees. And tokens keep getting stolen, and compromises keep occurring, and for the most part people shrug and get on with things.

02 Jul 2026 2:23am GMT

04 Jun 2026

feedKernel Planet

Dave Airlie (blogspot): Appearing on the Software Engineering Radio Podcast

Software Engineering Radio is a podcast for people in IT/development with over 700 episodes across many topics over 20 years. They haven't touched on the Linux kernel much. I was invited on as part of my role at Red Hat as a Distinguished Engineer, but the podcast is really an insight into kernel maintenance, in graphics and beyond, touching on the scope and scale of the project.

It was my first time to record something that wasn't just me talking at a conference/meetup, and it was all very professional, with sound checks and brainstorming before hand.

The content is at a pretty broad and introductory level. We talked about kernel development processes, maintenance processes, and we touch on rust in the kernel a bit. It's mostly about the sheer size and scale of the project and how Linus releases things, how trees get to Linus and how the GPU work is done.

Hopefully you enjoy listening to it!

[1] https://se-radio.net/2026/06/se-radio-723-dave-airlie-on-linux-kernel-maintenance/

04 Jun 2026 12:05am GMT