20 Jun 2026
Slashdot
OpenAI Announces Benchmarks for AI Life Sciences Research. Its Best Model Failed 63.9% of the Test
This week OpenAI announced a 750-task test to to measure "whether AI systems can support realistic life science research tasks, not just answer biology questions." But while OpenAI's top-performing GPT-Rosalind model led the rankings, Slashdot reader BrianFagioli notes that "it achieved a pass rate of just 36.1 percent, failing nearly two-thirds of benchmark tasks." Nerds.xyz points out that means "the best-performing model failed nearly two-thirds of the benchmark's tasks." The benchmark also revealed a familiar weakness. AI systems generally perform better when everything is presented as text. Once they are forced to work with supporting documents, figures, or complex datasets, performance drops noticeably. GPT-Rosalind's pass rate fell from 45.1 percent on text-only tasks to 28.1 percent on tasks involving artifacts or URLs. To be fair, the benchmark is not intended to suggest AI is useless in research. Quite the opposite. OpenAI found that models are becoming increasingly capable of scientific communication, evidence synthesis, and translating research findings into practical explanations. Those are valuable skills, particularly for researchers drowning in information. But LifeSciBench serves as a useful reminder that today's AI systems are still far from autonomous scientists. They can help. They can assist. They can sometimes provide surprisingly useful insights. What they cannot reliably do, however, is replace the expertise, judgment, and skepticism that real scientific research requires.
Read more of this story at Slashdot.
20 Jun 2026 9:34pm GMT
Remembering When Alan Turing Developed a Portable Voice Encryption Device
Long-time Slashdot reader smooth wombat writes: Alan Turing, one of the more famous people who worked at Bletchley Park to decipher the German Enigma coding machine, was also working on a separate project. His private papers, known as the Bayley papers for his assistant Donald Bayley who held onto the papers until his death in 2020, reveal Turning had produced a working model of a portable voice encryption device. He even demonstrated it by using a Winston Churchill speech recording. "Weighing just 39 kg, including its power pack," Jack Copeland wrote in an article for IEEE Spectrum, "Delilah would be at home in a truck, a trench, or a large backpack." More from Popular Mechanics: Turingâ(TM)s work at Bletchley Park actually informed the Delilah experimentation he was doing at Hanslope Park, and not just because he used Red Forms, the Army-issue sheets Hanslope staffers were meant to use to alert Bletchley staffers to enemy signals, as his personal scrap paper for Delilah experiments. He drew inspiration from one of the German cipher machines they had decoded at Bletchley; not the famed Enigma machine, but rather the SZ42. While the former relied on Morse Code, the latter utilized a 5-bit telegraph code, which Copeland notes âoewas a forerunner of ASCII and Unicode and is still used by some ham radio operators.â The SZ42 produced an obscuring key of telegraph characters, with an identical key produced to both the sender and receiver. If it could be done for text, Turing reasoned it could be done for sound as well... [T]he reason Delilah fell to the wayside of history isnâ(TM)t because it was a failure, but rather because it simply wasnâ(TM)t needed anymore. By the time Turing had built and demonstrated his device, the war was over. What good was a portable voice encryptor if you had no major enemies trying to intercept your calls, the government reasoned. So funding for the project stopped, and Turingâ(TM)s two-year experiment ended with a whimper. Turingâ(TM)s time as an electrical engineer at Hanslope Park became a footnote in his story, if even that.
Read more of this story at Slashdot.
20 Jun 2026 8:34pm GMT
Tech Pundit Cringely Co-Founds Startup '2Brains Inc' to Solve LLM Hallucinations
Long-time tech pundit Robert Cringely started his career at the Stanford Artificial Intelligence Lab back in 1978. Last month 73-year-old Cringely explained why his site went on a two-year hiatus - and it's not just because of a heart attack and a stroke last July: Just like everyone else, I've been busy all this time on Artificial Intelligence, founding with two partners a company called 2Brains... The work we were doing together is unfinished, but it's not stopped. The patents are filed, the architecture is documented, and the small team continuing the work includes me. Cringely's first piece made the cast that "the trillion-dollar bet the AI industry is making right now may be wrong, and that there's an architectural alternative we've patented and built." In Machines of Loving Grace, Amodei made the case that scaling compute would eventually solve essentially every hard problem in artificial intelligence. Buried in that optimism - or maybe not buried, maybe right out in the open - was a quiet absolution. Hallucinations, the embarrassing tendency of these systems to state falsehoods with total confidence, would take care of themselves. Make the models big enough, train them long enough, and the problem dissolves. You don't have to solve it. You just have to wait, and spend. And so the entire AI industry breathed a sigh of relief. I have spent forty years watching this industry, and I know a permission slip when I see one. Because that is what the essay became, whatever Amodei intended. It gave every other person writing nine- and ten-figure checks a reason not to worry about the one thing that should worry them most. The hallucination problem is the difference between a clever toy and a system a hospital or a bank or a court can actually rely on. It is the whole ballgame for enterprise AI. And the prevailing wisdom, blessed from the top, is that you needn't address it directly. Scale will provide... A small company I helped start, 2Brains Inc., set out in 2022 to solve hallucinations - before ChatGPT, before the scaling consensus hardened into received truth, back when the polite assumption was that the problem was simply insurmountable. We did not solve it by waiting for bigger models. We solved it architecturally, by separating the part of the system that generates language from the part that retrieves and verifies facts, and reconciling the two before anything reaches the user. It runs on ordinary processors. It is cheap. And on the industry's own benchmark for this kind of faithfulness, it more than doubles the published baseline, with no fabricated facts in the verified case at all. The article asks whether scaling will, at tremendous cost, eventually reduce hallucinations - or even worse, if the largest companies in the world "are spending a fortune chasing a cure that is not coming." And last week Cringely pitched more advantages for their solution, noting that most prompts aren't even chatbot-level creative prompts - but just requests to retrieve simple data: The reason 2Brains doesn't lie and the reason it's cheap are the same reason. It looks the fact up instead of guessing it - so it cannot fabricate, and the lookup runs on a processor that sips power instead of a chip that gulps it. Trust and thrift are not a trade-off you balance against each other. They fall out of a single design decision. You do not pay extra for the honest version. The honest version is the cheap version. That sentence is the whole company.
Read more of this story at Slashdot.
20 Jun 2026 7:34pm GMT
Ars Technica
The UK will scan asylum-seekers’ faces for age checks—despite knowing the tech is flawed
Tests of age-verification technology show the risks of life-altering errors.
20 Jun 2026 11:15am GMT
19 Jun 2026
OSnews
What was nice about the UI of Windows 2000
I mean, this is preaching to the choir, but let's go anyway. I liked the UIs of the entire era from 3.0 to 2000, really. I'm mostly using Windows 2000 as an example here because it runs so well in QEMU/KVM and that allows me to easily take screenshots. Some of the following will sound absolutely trivial, but I think it's worth pointing out. ↫ movq.de blog Just a series of observations about how much better graphical user interfaces were back in the '90s and early 2000s. We've lost so many affordances based on both common sense and scientific study, and what we ended up with is a confusing, inconsistent mess. It doesn't really matter where you look - user interface design has deteriorated since the early 2000s, a decline that only accelerated thanks to the arrival of the iPhone, where consistency is a dirty word, and the web, where the advertising people took prominence over the design people. I just want my buttons to look like buttons man.
19 Jun 2026 8:21pm GMT
Ars Technica
Rocket Report: Rebuild begins at Blue Origin launch pad; Relativity targets Mars
A French launch startup is scrapping the name of its rocket, apparently due to a trademark issue.
19 Jun 2026 1:36pm GMT
OSnews
To study how chips really work, MIT researchers built their own operating system
A fascinating novel approach by researchers at MIT, called Fractal, to study in-depth how processors actually work. A team at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) decided to build something different. Fractal, an operating system kernel written from the ground up, treats the hardware itself as the object of study. Its first major use, a deep look at branch predictors - a CPU's way of guessing what code to run next, before it knows for certain, so it doesn't have to waste time waiting to find out - inside Apple's M1 processor, has already turned up findings that prior work missed, including the first evidence that a class of speculative attack known as "Phantom" affects Apple Silicon. "We're using hardware in ways it wasn't designed for," says Joseph Ravichandran, the MIT PhD student in electrical engineering and computer science (EECS) who led the project. "It's not even obvious that this is a possible thing you could do with the hardware. But we found a way to pull all these different primitives off. It's like a microscope. If you've got a hand magnifying glass, you can see a little bit. But if you had an electron microscope, now we're really talking. That's what Fractal is. The electron microscope of operating systems." ↫ Rachel Gordon at MIT News While Fractal is small, its creators also added POSIX system calls, a C library, vim, GCC, a shell, and more. This way, it feels more familiar, and makes it easier for researchers to get started with the tool. Fractal is open source and hosted on GitHub, it has its own website, and there's a detailed research paper with more in-depth information.
19 Jun 2026 12:34pm GMT
Ars Technica
As global warming threatens corals, scientists search for reefs that can take the heat
Researchers say these coral strongholds may help repopulate more degraded reefs.
19 Jun 2026 11:15am GMT
18 Jun 2026
OSnews
AmigaOS 2: the greatest upgrade
Five years after releasing the Amiga 1000, Commodore was about to launch the Amiga 3000, their first real high-end Amiga. With a 68030 processor, on-board SCSI and a slightly updated graphics chipset, all in a sleek desktop case, the Amiga was truly ready for the era of professional 32-bit computing. But Moore's law wasn't the only thing thad had been pressuring Commodore since the release of the Amiga 1000: The desktop metaphor had matured even further, and the competition had been hard at work. IBM had launched OS/2, Windows 3.0 had turned Microsoft's offering from a proof of concept into something actually usable, and new players had entered the scene - among them NeXTStep, with its polished 3D look. It was time to bring AmigaOS, too, into the 1990s. ↫ Carl Svensson It's interesting - there's a lot of focus on the first version of the Amiga operating system and the third one, but you don't hear a lot about AmigaOS 2.x. It turns out this is rather odd, because as Svensson details, this version came with an absolute ton of changes and improvements, from an entirely new widget toolkit to a brand new file system, and so much more. The new widget toolkit and accompanying style guide also ensured that the operating system looked, felt, and behaved consistently. Remember when we cared about that? There's so much more cool features, though, like command history, line editing, universal clipboard support and more just for the CLI, as well as something called Commodities. These were tiny little programs managed from a central location, which didn't even need a GUI to work. Commodities included by default were things like ClickToFront, a focus-follows-mouse option, and more. Oh and of course, BASIC was replaced by ARexx. The list just keeps going, and you should really read Svensson's article.
18 Jun 2026 9:40pm GMT
01 Jun 2026
Planet Arch Linux
Today is my first day at JetBrains
Good morning from JetBrains Berlin office!
01 Jun 2026 12:00am GMT
11 May 2026
Planet Arch Linux
Ratty: A terminal emulator with inline 3D graphics
Just trying to answer one simple question: What if the terminal was 3D?
11 May 2026 12:00am GMT
18 Apr 2026
Planet Arch Linux
Break the loop, move to Berlin
Break the pattern today or the loop will repeat tomorrow.
18 Apr 2026 12:00am GMT