23 Oct 2025
Planet Debian
Steinar H. Gunderson: Modern perfect hashing
Wojciech Muła posted about modern perfect hashing for strings and I wanted to make some comments about my own implementation (that sadly never got productionized because doubling the speed compared to gperf wasn't really that impactful in the end).
First, let's define the problem, just so we're all on the same page; the goal is to create code that maps a known, fixed set of strings to a predefined integer (per string), and rejects everything else. This is essentially the same as a hash table, except that since the set of strings is known ahead of time, we can do better than a normal hash table. (So no "but I heard SwissTables uses SIMD and thus cannot be beat", please. :-) ) My use case is around a thousand strings or so, and we'll assume that a couple of minutes of build time is OK (shorter would be better, but we can probably cache somehow). If you've got millions of strings, and you don't know them compile-time (for instance because you want to use your hash table in the join phase of a database), see this survey; it's a different problem with largely different solutions.
Like Wojciech, I started splitting by length. This means that we can drop all bounds checking after this, memcmp will be optimized by the compiler to use SIMD if relevant, and so on.
But after that, he recommends using PEXT (bit extraction, from BMI2), which has two problems: First, the resulting table can get quite big if your input set isn't well-behaved. (You can do better than the greedy algorithm he suggests, but not infinitely so, and finding the optimal mask quickly is sort of annoying if you don't want to embed a SAT solver or something.) Second, I needed the code to work on Arm, where you simply don't have this instruction or anything like it available. (Also, not all x86 has it, and on older Zen, it's slow.)
So, we need some other way, short of software emulation of PEXT (which exists, but we'd like to do better), to convert a sparse set of bits into a table without any collisions. It turns out the computer chess community has needed to grapple with this for a long time (they want to convert from "I have a \ on \ and there are pieces on relevant squares \, give me an index that points to an array of squares I can move to"), and their solution is to use… well, magic. It turns out that if you do something like ((value & mask) * magic)
, it is very likely that the upper bits will be collision-free between your different value
s if you try enough different numbers for magic
. We can use this too; for instance, here is code for all length-4 CSS keywords:
static const uint8_t table[] = { 6, 0, 0, 3, 2, 5, 9, 0, 0, 1, 0, 8, 7, 0, 0, }; static const uint8_t strings[] = { 1, 0, 'z', 'o', 'o', 'm', 2, 0, 'c', 'l', 'i', 'p', 3, 0, 'f', 'i', 'l', 'l', 4, 0, 'l', 'e', 'f', 't', 5, 0, 'p', 'a', 'g', 'e', 6, 0, 's', 'i', 'z', 'e', 7, 0, 'f', 'l', 'e', 'x', 8, 0, 'f', 'o', 'n', 't', 9, 0, 'g', 'r', 'i', 'd', 10, 0, 'm', 'a', 's', 'k', }; uint16_t block; memcpy(&block, str + 0, sizeof(block)); uint32_t pos = uint32_t(block * 0x28400000U) >> 28; const uint8_t *candidate = strings + 6 * table[pos]; if (memcmp(candidate + 2, str, 4) == 0) { return candidate[0] + (candidate[1] << 8); } return 0;
There's a bit to unpack here; we read the first 16 bits from our value with memcpy (big-endian users beware!), multiply it with the magic value 0x28400000U
found by trial and error, shift the top bits down, and now all of our ten candidate values ("zoom", "clip", etc.) have different top four bits. We use that to index into a small table, check that we got the right one instead of a random collision (e.g. "abcd", 0x6261, would get a value of 12, and table[12]
is 7, so we need to disambiguate that from "flex", which is what we are actually looking for in that spot), and then return the 16-bit identifier related to the match (or zero, if we didn't find it).
We don't need to use the first 16 bits; we could have used any other consecutive 16 bits, or any 32 bits, or any 64 bits, or possibly any of those masked off, or even XOR of two different 32-bit sets if need be. My code prefers smaller types because a) they tend to give smaller code size (easier to load into registers, or can even be used as immediates), and b) you can bruteforce them instead of doing random searches (which, not the least, has the advantage that you can give up much quicker).
You also don't really need the intermediate table; if the fit is particularly good, you can just index directly into the final result without wasting any space. Here's the case for length-24 CSS keywords, where we happened to have exactly 16 candidates and we found a magic giving a perfect (4-bit) value, making it a no-brainer:
static const uint8_t strings[] = { 95, 0, 'b', 'o', 'r', 'd', 'e', 'r', '-', 'b', 'l', 'o', 'c', 'k', '-', 's', 't', 'a', 'r', 't', '-', 'w', 'i', 'd', 't', 'h', 40, 0, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 't', 'e', 'x', 't', '-', 'o', 'r', 'i', 'e', 'n', 't', 'a', 't', 'i', 'o', 'n', 115, 1, 's', 'c', 'r', 'o', 'l', 'l', '-', 'p', 'a', 'd', 'd', 'i', 'n', 'g', '-', 'b', 'l', 'o', 'c', 'k', '-', 'e', 'n', 'd', 198, 2, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 't', 'r', 'a', 'n', 's', 'f', 'o', 'r', 'm', '-', 'o', 'r', 'i', 'g', 'i', 'n', 225, 0, '-', 'i', 'n', 't', 'e', 'r', 'n', 'a', 'l', '-', 'o', 'v', 'e', 'r', 'f', 'l', 'o', 'w', '-', 'b', 'l', 'o', 'c', 'k', 101, 2, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 'b', 'o', 'r', 'd', 'e', 'r', '-', 'e', 'n', 'd', '-', 's', 't', 'y', 'l', 'e', 93, 0, 'b', 'o', 'r', 'd', 'e', 'r', '-', 'b', 'l', 'o', 'c', 'k', '-', 's', 't', 'a', 'r', 't', '-', 'c', 'o', 'l', 'o', 'r', 102, 2, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 'b', 'o', 'r', 'd', 'e', 'r', '-', 'e', 'n', 'd', '-', 'w', 'i', 'd', 't', 'h', 169, 1, 't', 'e', 'x', 't', '-', 'd', 'e', 'c', 'o', 'r', 'a', 't', 'i', 'o', 'n', '-', 's', 'k', 'i', 'p', '-', 'i', 'n', 'k', 156, 0, 'c', 'o', 'n', 't', 'a', 'i', 'n', '-', 'i', 'n', 't', 'r', 'i', 'n', 's', 'i', 'c', '-', 'h', 'e', 'i', 'g', 'h', 't', 201, 2, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 't', 'r', 'a', 'n', 's', 'i', 't', 'i', 'o', 'n', '-', 'd', 'e', 'l', 'a', 'y', 109, 1, 's', 'c', 'r', 'o', 'l', 'l', '-', 'm', 'a', 'r', 'g', 'i', 'n', '-', 'i', 'n', 'l', 'i', 'n', 'e', '-', 'e', 'n', 'd', 240, 0, '-', 'i', 'n', 't', 'e', 'r', 'n', 'a', 'l', '-', 'v', 'i', 's', 'i', 't', 'e', 'd', '-', 's', 't', 'r', 'o', 'k', 'e', 100, 2, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 'b', 'o', 'r', 'd', 'e', 'r', '-', 'e', 'n', 'd', '-', 'c', 'o', 'l', 'o', 'r', 94, 0, 'b', 'o', 'r', 'd', 'e', 'r', '-', 'b', 'l', 'o', 'c', 'k', '-', 's', 't', 'a', 'r', 't', '-', 's', 't', 'y', 'l', 'e', 196, 2, '-', 'w', 'e', 'b', 'k', 'i', 't', '-', 't', 'e', 'x', 't', '-', 's', 'i', 'z', 'e', '-', 'a', 'd', 'j', 'u', 's', 't', }; uint32_t block; memcpy(&block, str + 16, sizeof(block)); uint32_t pos = uint32_t(block * 0xe330a008U) >> 28; const uint8_t *candidate = strings + 26 * pos; if (memcmp(candidate + 2, str, 24) == 0) { return candidate[0] + (candidate[1] << 8); } return 0;
You can see that we used a 32-bit value here (bytes 16 through 19 of the input), and a corresponding 32-bit magic (though still not with an AND mask). So we got fairly lucky, but sometimes you do that. Of course, we need to validate the entire 24-byte value even though we only discriminated on four of the bytes! (Unless you know for sure that you never have any out-of-distribution inputs, that is. There are use cases where this is true.)
(If you wonder what 95, 0
or similar is above; that's just "the answer the user wanted for that input". It corresponds to a 16-bit enum in the parser.)
If there are only a few values, we don't need any of this; just like Wojciech, we do with a simple compare. Here's the generated code for all length-37 CSS keywords, plain and simple:
if (memcmp(str, "-internal-inactive-list-box-selection", 37) == 0) { return 171; } return 0;
(Again 171 is "the desired output for that input", not a value the code generator decides in any way.)
So how do we find these magic values? There's really only one way: Try lots of different ones and see if they work. But there's a trick to accelerate "see if they work", which I also borrowed from computer chess: The killer heuristic.
See, to try if a magic is good, you generally try to hash all the different values and see if any two go into the same bucket. (If they do, it's not a perfect hash and the entire point of the exercise is gone.) But it turns out that most of the time, it's the same two values that collide. So every couple hundred candidates, we check which two values disproved the magic, and put those in a slot. Whenever we check magics, we can now try those first, and more likely than not, discard the candidate right away and move on to the next one (whether it is by exhaustive search or randomness). It's actually a significant speedup.
But occasionally, we simply cannot find a magic for a given group; either there is none, or we didn't have enough time to scan through enough of the 64-bit space. At this point, Wojciech suggests we switch on one of the characters (heuristically) to get smaller subgroups and try again. I didn't actually find this to perform all that well; indirect branch predictors are better than 20 years ago, but the pattern is typically not that predictable. What I tried instead was to have more of a yes/no on some character (i.e., a non-indirect branch), which makes for a coarser split.
It's not at all obvious where the best split would be. You'd intuitively think that 50/50 would be a good idea, but if you have e.g. 40 elements, you'd much rather split them 32/8… if you can find perfect hashes for both subgroups (5-bit and 3-bit, respectively). If not, a 20-20 split is most likely better, since you very easily can find magics that put 20 elements into 32 buckets without collisions. I ended up basically trying all the different splits and scoring them, but this makes the searcher rather slow, and it means you basically must have some sort of cache if you want to run it as part of your build system. This is the part I'm by far the least happy about; gperf isn't great by modern standards, but it never feels slow to run.
The end result for me was: Runtime about twice as fast as gperf, compiled code about half as big. That's with everything hard-coded; if you're pushed for space (or are icache-bound), you could make more generic code at the expense of some speed.
So, if anyone wants to make a more modern gperf, I guess this space is up for grabs? It's not exactly technology that will make your stock go to AI levels, though.
23 Oct 2025 7:30pm GMT
Russ Allbery: Review: Politics on the Edge
Review: Politics on the Edge, by Rory Stewart
Publisher: | Penguin Books |
Copyright: | 2023, 2025 |
Printing: | 2025 |
ISBN: | 979-8-217-06167-9 |
Format: | Kindle |
Pages: | 429 |
Rory Stewart is a former British diplomat, non-profit executive, member of Parliament, and cabinet minister. Politics on the Edge is a memoir of his time in the UK Parliament from 2019 to 2019 as a Tory (Conservative) representing the Penrith and The Border constituency in northern England. It ends with his failed run against Boris Johnson for leader of the Conservative Party and Prime Minister.
This book provoked many thoughts, only some of which are about the book. You may want to get a beverage; this review will be long.
Since this is a memoir told in chronological order, a timeline may be useful. After Stewart's time as a regional governor in occupied Iraq (see The Prince of the Marshes), he moved to Kabul to found and run an NGO to preserve traditional Afghani arts and buildings (the Turquoise Mountain Foundation, about which I know nothing except what Stewart wrote in this book). By his telling, he found that work deeply rewarding but thought the same politicians who turned Iraq into a mess were going to do the same to Afghanistan. He started looking for ways to influence the politics more directly, which led him first to Harvard and then to stand for Parliament.
The bulk of this book covers Stewart's time as MP for Penrith and The Border. The choice of constituency struck me as symbolic of Stewart's entire career: He was not a resident and had no real connection to the district, which he chose for political reasons and because it was the nearest viable constituency to his actual home in Scotland. But once he decided to run, he moved to the district and seems sincerely earnest in his desire to understand it and become part of its community. After five years as a backbencher, he joined David Cameron's government in a minor role as Minister of State in the Department for Environment, Food, and Rural Affairs. He then bounced through several minor cabinet positions (more on this later) before being elevated to Secretary of State for International Development under Theresa May. When May's government collapsed during the fight over the Brexit agreement, he launched a quixotic challenge to Boris Johnson for leader of the Conservative Party.
I have enjoyed Rory Stewart's writing ever since The Places in Between. This book is no exception. Whatever one's other feelings about Stewart's politics (about which I'll have a great deal more to say), he's a talented memoir writer with an understated and contemplative style and a deft ability to shift from concrete description to philosophical debate without bogging down a story. Politics on the Edge is compelling reading at the prose level. I spent several afternoons happily engrossed in this book and had great difficulty putting it down.
I find Stewart intriguing since, despite being a political conservative, he's neither a neoliberal nor any part of the new right. He is instead an apparently-sincere throwback to a conservatism based on epistemic humility, a veneration of rural life and long-standing traditions, and a deep commitment to the concept of public service. Some of his principles are baffling to me, and I think some of his political views are obvious nonsense, but there were several things that struck me throughout this book that I found admirable and depressingly rare in politics.
First, Stewart seems to learn from his mistakes. This goes beyond admitting when he was wrong and appears to include a willingness to rethink entire philosophical positions based on new experience.
I had entered Iraq supporting the war on the grounds that we could at least produce a better society than Saddam Hussein's. It was one of the greatest mistakes in my life. We attempted to impose programmes made up by Washington think tanks, and reheated in air-conditioned palaces in Baghdad - a new taxation system modelled on Hong Kong; a system of ministers borrowed from Singapore; and free ports, modelled on Dubai. But we did it ultimately at the point of a gun, and our resources, our abstract jargon and optimistic platitudes could not conceal how much Iraqis resented us, how much we were failing, and how humiliating and degrading our work had become. Our mission was a grotesque satire of every liberal aspiration for peace, growth and democracy.
This quote comes from the beginning of this book and is a sentiment Stewart already expressed in The Prince of the Marshes, but he appears to have taken this so seriously that it becomes a theme of his political career. He not only realized how wrong he was on Iraq, he abandoned the entire neoliberal nation-building project without abandoning his belief in the moral obligation of international aid. And he, I think correctly, identified a key source of the error: an ignorant, condescending superiority that dismissed the importance of deep expertise.
Neither they, nor indeed any of the 12,000 peacekeepers and policemen who had been posted to South Sudan from sixty nations, had spent a single night in a rural house, or could complete a sentence in Dinka, Nuer, Azande or Bande. And the international development strategy - written jointly between the donor nations - resembled a fading mission statement found in a new space colony, whose occupants had all been killed in an alien attack.
Second, Stewart sincerely likes ordinary people. This shone through The Places in Between and recurs here in his descriptions of his constituents. He has a profound appreciation for individual people who have spent their life learning some trade or skill, expresses thoughtful and observant appreciation for aspects of local culture, and appears to deeply appreciate time spent around people from wildly different social classes and cultures than his own. Every successful politician can at least fake gregariousness, and perhaps that's all Stewart is doing, but there is something specific and attentive about his descriptions of other people, including long before he decided to enter politics, that makes me think it goes deeper than political savvy.
Third, Stewart has a visceral hatred of incompetence. I think this is the strongest through-line of his politics in this book: Jobs in government are serious, important work; they should be done competently and well; and if one is not capable of doing that, one should not be in government. Stewart himself strikes me as an insecure overachiever: fiercely ambitious, self-critical, a bit of a micromanager (I suspect he would be difficult to work for), but holding himself to high standards and appalled when others do not do the same. This book is scathing towards multiple politicians, particularly Boris Johnson whom Stewart clearly despises, but no one comes off worse than Liz Truss.
David Cameron, I was beginning to realise, had put in charge of environment, food and rural affairs a Secretary of State who openly rejected the idea of rural affairs and who had little interest in landscape, farmers or the environment. I was beginning to wonder whether he could have given her any role she was less suited to - apart perhaps from making her Foreign Secretary. Still, I could also sense why Cameron was mesmerised by her. Her genius lay in exaggerated simplicity. Governing might be about critical thinking; but the new style of politics, of which she was a leading exponent, was not. If critical thinking required humility, this politics demanded absolute confidence: in place of reality, it offered untethered hope; instead of accuracy, vagueness. While critical thinking required scepticism, open-mindedness and an instinct for complexity, the new politics demanded loyalty, partisanship and slogans: not truth and reason but power and manipulation. If Liz Truss worried about the consequences of any of this for the way that government would work, she didn't reveal it.
And finally, Stewart has a deeply-held belief in state capacity and capability. He and I may disagree on the appropriate size and role of the government in society, but no one would be more disgusted by an intentional project to cripple government in order to shrink it than Stewart.
One of his most-repeated criticisms of the UK political system in this book is the way the cabinet is formed. All ministers and secretaries come from members of Parliament and therefore branches of government are led by people with no relevant expertise. This is made worse by constant cabinet reshuffles that invalidate whatever small amounts of knowledge a minister was able to gain in nine months or a year in post. The center portion of this book records Stewart's time being shuffled from rural affairs to international development to Africa to prisons, with each move representing a complete reset of the political office and no transfer of knowledge whatsoever.
A month earlier, they had been anticipating every nuance of Minister Rogerson's diary, supporting him on shifts twenty-four hours a day, seven days a week. But it was already clear that there would be no pretence of a handover - no explanation of my predecessor's strategy, and uncompleted initiatives. The arrival of a new minister was Groundhog Day. Dan Rogerson was not a ghost haunting my office, he was an absence, whose former existence was suggested only by the black plastic comb.
After each reshuffle, Stewart writes of trying to absorb briefings, do research, and learn enough about his new responsibilities to have the hope of making good decisions, while growing increasingly frustrated with the system and the lack of interest by most of his colleagues in doing the same. He wants government programs to be successful and believes success requires expertise and careful management by the politicians, not only by the civil servants, a position that to me both feels obviously correct and entirely at odds with politics as currently practiced.
I found this a fascinating book to read during the accelerating collapse of neoliberalism in the US and, to judge by current polling results, the UK. I have a theory that the political press are so devoted to a simplistic left-right political axis based on seating arrangements during the French Revolution that they are missing a significant minority whose primary political motivation is contempt for arrogant incompetence. They could be convinced to vote for Sanders or Trump, for Polanski or Farage, but will never vote for Biden, Starmer, Romney, or Sunak.
Such voters are incomprehensible to those who closely follow and debate policies because their hostile reaction to the center is not about policies. It's about lack of trust and a nebulous desire for justice. They've been promised technocratic competence and the invisible hand of market forces for most of their lives, and all of it looks like lies. Everyday living is more precarious, more frustrating, more abusive and dehumanizing, and more anxious, despite (or because of) this wholehearted embrace of economic "freedom." They're sick of every complaint about the increasing difficulty of life being met with accusations about their ability and work ethic, and of being forced to endure another round of austerity by people who then catch a helicopter ride to a party on some billionaire's yacht.
Some of this is inherent in the deep structural weaknesses in neoliberal ideology, but this is worse than an ideological failure. The degree to which neoliberalism started as a project of sincere political thinkers is arguable, but that is clearly not true today. The elite class in politics and business is now thoroughly captured by people whose primary skill is the marginal manipulation of complex systems for their own power and benefit. They are less libertarian ideologues than narcissistic mediocrities. We are governed by management consultants. They are firmly convinced their organizational expertise is universal, and consider the specific business of the company, or government department, irrelevant.
Given that context, I found Stewart's instinctive revulsion towards David Cameron quite revealing. Stewart, later in the book, tries to give Cameron some credit by citing several policy accomplishments and comparing him favorably to Boris Johnson (which, true, is a bar Cameron probably flops over). But I think Stewart's baffled astonishment at Cameron's vapidity says a great deal about how we have ended up where we are. This last quote is long, but I think it provides a good feel for Stewart's argument in this book.
But Cameron, who was rumoured to be sceptical about nation-building projects, only nodded, and then looking confidently up and down the table said, "Well, at least we all agree on one extremely straightforward and simple point, which is that our troops are doing very difficult and important work and we should all support them."
It was an odd statement to make to civilians running humanitarian operations on the ground. I felt I should speak. "No, with respect, we do not agree with that. Insofar as we have focused on the troops, we have just been explaining that what the troops are doing is often futile, and in many cases making things worse." Two small red dots appeared on his cheeks. Then his face formed back into a smile. He thanked us, told us he was out of time, shook all our hands, and left the room.
Later, I saw him repeat the same line in interviews: "the purpose of this visit is straightforward... it is to show support for what our troops are doing in Afghanistan". The line had been written, in London, I assumed, and tested on focus groups. But he wanted to convince himself it was also a position of principle.
"David has decided," one of his aides explained, when I met him later, "that one cannot criticise a war when there are troops on the ground."
"Why?"
"Well... we have had that debate. But he feels it is a principle of British government."
"But Churchill criticised the conduct of the Boer War; Pitt the war with America. Why can't he criticise wars?"
"British soldiers are losing their lives in this war, and we can't suggest they have died in vain."
"But more will die, if no one speaks up..."
"It is a principle thing. And he has made his decision. For him and the party."
"Does this apply to Iraq too?"
"Yes. Again he understands what you are saying, but he voted to support the Iraq War, and troops are on the ground."
"But surely he can say he's changed his mind?"
The aide didn't answer, but instead concentrated on his food. "It is so difficult," he resumed, "to get any coverage of our trip." He paused again. "If David writes a column about Afghanistan, we will struggle to get it published."
"But what would he say in an article anyway?" I asked.
"We can talk about that later. But how do you get your articles on Afghanistan published?"
I remembered how the US politicians and officials had shown their mastery of strategy and detail. I remembered the earnestness of Gordon Brown when I had briefed him on Iraq. Cameron seemed somehow less serious. I wrote as much in a column in the New York Times, saying that I was afraid the party of Churchill was becoming the party of Bertie Wooster.
I don't know Stewart's reputation in Britain, or in the constituency that he represented. I know he's been accused of being a self-aggrandizing publicity hound, and to some extent this is probably true. It's hard to find an ambitious politician who does not have that instinct. But whatever Stewart's flaws, he can, at least, defend his politics with more substance than a corporate motto. One gets the impression that he would respond favorably to demonstrated competence linked to a careful argument, even if he disagreed. Perhaps this is an illusion created by his writing, but even if so, it's a step in the right direction.
When people become angry enough at a failing status quo, any option that promises radical change and punishment for the current incompetents will sound appealing. The default collapse is towards demagogues who are skilled at expressing anger and disgust and are willing to promise simple cures because they are indifferent to honesty. Much of the political establishment in the US, and possibly (to the small degree that I can analyze it from an occasional news article) in the UK, can identify the peril of the demagogue, but they have no solution other than a return to "politics as usual," represented by the amoral mediocrity of a McKinsey consultant. The rare politicians who seem to believe in something, who will argue for personal expertise and humility, who are disgusted by incompetence and have no patience for facile platitudes, are a breath of fresh air.
There are a lot of policies on which Stewart and I would disagree, and perhaps some of his apparent humility is an affectation from the rhetorical world of the 1800s that he clearly wishes he were inhabiting, but he gives the strong impression of someone who would shoulder a responsibility and attempt to execute it with competence and attention to detail. He views government as a job, where coworkers should cooperate to achieve defined goals, rather than a reality TV show. The arc of this book, like the arc of current politics, is the victory of the reality TV show over the workplace, and the story of Stewart's run against Boris Johnson is hard reading because of it, but there's a portrayal here of a different attitude towards politics that I found deeply rewarding.
If you liked Stewart's previous work, or if you want an inside look at parliamentary politics, highly recommended. I will be thinking about this book for a long time.
Rating: 9 out of 10
23 Oct 2025 4:47am GMT
21 Oct 2025
Planet Debian
Gunnar Wolf: LLM Hallucinations in Practical Code Generation — Phenomena, Mechanism, and Mitigation
This post is an unpublished review for LLM Hallucinations in Practical Code Generation - Phenomena, Mechanism, and Mitigation
How good can Large Language Models (LLMs) be at generating code? This would not seem like a very novel question to ask, as there are several benchmarks such as HumanEval and MBPP, published in 2021, before LLMs burst into the public view starting the current AI inflation. However, this article's authors point out code generation is very seldom done as isolated functions, but must be deployed in a coherent fashion together with the rest of the project or repository it is meant to be integrated in. By 2024 there are several benchmarks, such as CoderEval o EvoCodeBench, measuring functional correctness of LLM-generated code, measured by test case pass rates.
This article brings a new proposal to the table: comparing LLM-generated repository-level-evaluated code, by examining the hallucinations generated. To do this, they begin by running the Python code generation tasks proposed in the CoderEval benchmark against six code-generating LLMs, analizing the results and building a taxonomy to describe code-based LLM hallucinations, with three types of conflicts (task requirement, factual knowledge and project context) as first-level categories and eight subcategories within them. Second, the authors compare the results of each of the LLMs per main hallucination category. Third, they try to find the root cause for the hallucinations.
The article is structured very clearly, not only presenting the three Research Questions (RQ) but also refering to them as needed to explain why and how each partial result is interpreted. RQ1 (establishing a hallucination taxonomy) is, in my opinion, the most thoroughly explored. RQ2 (LLM comparison) is clear, although it just presents "straight" results, seemingly without much analysis to them. RQ3 (root cause discussion) is undoubtedly interesting, but I feel it to be much more speculative and not directly related to the analysis performed.
After tackling their research questions, they venture a possible mitigation to counter the effect of hallucinations: enhancing the presented LLMs with a retrieval-augmented generation (RAG), so it better understands task requirements, factual knowledge and project context, hopefully reducing hallucination; they present results showing all of the models are clearly although modestly improved by the proposed RAG-based mitigation.
The article is clearly written and easy to read. I would like they would have dedicated more space to detail their RAG implementation, but I suppose it will appear in a follow-up article, as it was only briefly mentioned here. It should provide its target audience, is quite specialized but numerous nowadays, with interesting insights and discussion.
21 Oct 2025 10:08pm GMT