13 Feb 2026

feedPlanet Debian

Erich Schubert: Dogfood Generative AI

Current AI companies ignore licenses such as the GPL, and often train on anything they can scrape. This is not acceptable.

The AI companies ignore web conventions, e.g., they deep link images from your web sites (even adding ?utm_source=chatgpt.com to image URIs, I suggest that you return 403 on these requests), but do not direct visitors to your site. You do not get a reliable way of opting out from generative AI training or use. For example, the only way to prevent your contents from being used in "Google AI Overviews" is to use data-nosnippet and cripple the snippet preview in Google. The "AI" browsers such as Comet, Atlas do not identify as such, but rather pretend they are standard Chromium. There is no way to ban such AI use on your web site.

Generative AI overall is flooding the internet with garbage. It was estimated that 1/3rd of the content uploaded to YouTube is by now AI generated. This includes the same "veteran stories" crap in thousands of variants as well as brainrot content (that at least does not pretend to be authentic), some of which is among the most viewed recent uploads. Hence, these platforms even benefit from the AI slop. And don't blame the "creators" - because you can currently earn a decent amount of money from such contents, people will generate brainrot content.

If you have recently tried to find honest reviews of products you considered buying, you will have noticed thousands of sites with AI generated fake product reviews, that all are financed by Amazon PartnerNet commissions. Often with hilarious nonsense such as recommending "sewing thread with German instructions" as tool for repairing a sewing machine. And on Amazon, there are plenty of AI generated product reviews - the use of emoji is a strong hint. And if you leave a negative product review, there is a chance they offer you a refund to get rid of it… And the majority of SPAM that gets through my filters is by now sent via Gmail and Amazon SES.

Partially because of GenAI, StackOverflow is pretty much dead - which used to be one of the most valuable programming resources. (While a lot of people complain about moderation, famous moderator Shog9 from the early SO days suggested that a change in Google's ranking is also to blame, as it began favoring showing "new" content over the existing answered questions - causing more and more duplicates to be posted because people no longer found the existing good answers. In January 2026, there were around 3400 questions and 6000 answers posted, less than in the first month of SO of August 2008 (before the official launch).

Many open-source projects are suffering in many ways, e.g., false bug reports that caused curl to stop its bug bounty program. Wikipedia is also suffering badly from GenAI.

Science is also flooded with poor AI generated papers, often reviewed with help from AI. This is largely due to bad incentives - to graduate, you are expected to write many papers on certain "A" conferences, such as NeurIPS. On these conferences the number of submissions is growing insane, and the review quality plummets. All to often, the references in these papers are hallucinated, too; and libraries complain that they receive more and more requests to locate literature that does not appear to exist.

However, the worst effect (at least to me as an educator) is the noskilling effect (a rather novel term derived from deskilling, I have only seen it in this article by Weßels and Maibaum).

Instead of acquiring skills (writing, reading, summarizing, programming) by practising, too many people now outsource all this to AI, leading to them not learn the basics necessary to advance to a higher skill level. In my impression, this effect is dramatic. It is even worse than deskilling, as it does not mean losing an advanced skill that you apparently can replace, but often means not acquiring basic skills in the first place. And the earlier pupils start using generative AI, the less skills they acquire.

Dogfood the AI

Let's dogfood the AI. Here's an outline:

  1. Get a list of programming topics, e.g., get a list of algorithms from Wikidata, get a StackOverflow data dump.
  2. Generate flawed code examples for the algorithms / programming questions, maybe generate blog posts, too.
    You do not need a high-quality model for this. Use something you can run locally or access for free.
  3. Date everything back in time, remove typical indications of AI use.
  4. Upload to Github, because Microsoft will feed this to OpenAI…

Here is an example prompt that you can use:

You are a university educator, preparing homework assignments in debugging.
The programming language used is {lang}.
The students are tasked to find bugs in given code.
Do not just call existing implementations from libraries, but implement the algorithm from scratch.
Make sure there are two mistakes in the code that need to be discovered by the students.
Do NOT repeat instructions. Do NOT add small-talk. Do NOT provide a solution.
The code may have (misleading) comments, but must NOT mention the bugs.
If you do not know how to implement the algorithm, output an empty response.
Output only the code for the assignment! Do not use markdown.
Begin with a code comment that indicates the algorithm name and idea.
If you indicate a bug, always use a comment with the keyword BUG

Generate a {lang} implementation (with bugs) of: {n} ({desc})

Remember to remove the BUG comments! If you pick some slighly less common programming languages (by quantity of available code, say Go or Rust) you have higher chances that this gets into the training data.

If many of us do this, we can feed GenAI its own garbage. If we generate thousands of bad code examples, this will poison their training data, and may eventually lead to an effect known as "model collapse".

On the long run, we need to get back to an internet for people, not an internet for bots. Some kind of "internet 2.0", but I do not have a clear vision on how to keep AI out - if AI can train on it, they will. And someone will copy and paste the AI generated crap back into whatever system we built. Hence I don't think technology is the answere here, but human networks of trust.

13 Feb 2026 10:29am GMT

12 Feb 2026

feedPlanet Debian

Dirk Eddelbuettel: RcppSpdlog 0.0.27 on CRAN: C++20 Accommodations

Version 0.0.27 of RcppSpdlog arrived on CRAN moments ago, and will be uploaded to Debian and built for r2u shortly. The (nice) documentation site will be refreshed too. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site.

Brian Ripley has now turned C++20 on as a default for R-devel (aka R 4.6.0 'to be'), and this turned up misbehvior in packages using RcppSpdlog such as our spdl wrapper (offering a nicer interface from both R and C++) when relying on std::format. So for now, we turned this off and remain with fmt::format from the fmt library while we investigate further.

The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.27 (2026-02-11)

  • Under C++20 or later, keep relying on fmt::format until issues experienced using std::format can be identified and resolved

Courtesy of my CRANberries, there is also a diffstat report detailing changes. More detailed information is on the RcppSpdlog page, or the package documention site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

12 Feb 2026 1:59pm GMT

Freexian Collaborators: Debian Contributions: cross building, rebootstrap updates, Refresh of the patch tagging guidelines and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-01

Contributing to Debian is part of Freexian's mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

cross building, by Helmut Grohne

In version 1.10.1, Meson merged a patch to make it call the correct g-ir-scanner by default thanks to Eli Schwarz. This problem affected more than 130 source packages. Helmut retried building them all and filed 69 patches as a result. A significant portion of those packages require another Meson change to call the correct vapigen. Another notable change is converting gnu-efi to multiarch, which ended up requiring changes to a number of other packages. Since Aurelien dropped the libcrypt-dev dependency from libc6-dev, this transition now is mostly complete and has resulted in most of the Perl ecosystem correctly expressing perl-xs-dev dependencies needed for cross building. It is these infrastructure changes affecting several client packages that this work targets. As a result of this continued work, about 66% of Debian's source packages now have satisfiable cross Build-Depends in unstable and about 10000 (55%) actually can be cross built. There are now more than 500 open bug reports affecting more than 2000 packages most of which carry patches.

rebootstrap, by Helmut Grohne

Maintaining architecture cross-bootstrap requires continued effort for adapting to archive changes such as glib2.0 dropping a build profile or an e2fsprogs FTBFS. Beyond those generic problems, architecture-specific problems with e.g. musl-linux-any or sparc may arise. While all these changes move things forward on the surface, the bootstrap tooling has become a growing pile of patches. Helmut managed to upstream two changes to glibc for reducing its Build-Depends in the stage2 build profile and thanks Aurelien Jarno.

Refresh of the patch tagging guidelines, by Raphaël Hertzog

Debian Enhancement Proposal #3 (DEP-3) is named "Patch Tagging Guidelines" and standardizes meta-information that Debian contributors can put in patches included in Debian source packages. With the feedback received over the years, and with the change in the package management landscape, the need to refresh those guidelines became evident. As the initial driver of that DEP, I spent a good day reviewing all the feedback (that I kept in a folder) and producing a new version of the document. The changes aim to give more weight to the syntax that is compatible with git format-patch's output, and also to clarify the expected uses and meanings of a couple of fields, including some algorithm that parsers should follow to define the state of the patch. After the announcement of the new draft on debian-devel, the revised DEP-3 received a significant number of comments that I still have to process.

Miscellaneous contributions

12 Feb 2026 12:00am GMT