22 Jan 2026

feedPlanet Debian

Steinar H. Gunderson: Rewriting Git merge history, part 2

In part 1, we discovered the problem of rewriting git history in the presence of nontrivial merges. Today, we'll discuss the workaround I chose.

As I previously mentioned, and as Julia Evans' excellent data model document explains, a git commit is just a snapshot of a tree (suitably deduplicated by means of content hashes), a commit message and a (possibly empty) set of parents. So fundamentally, we don't really need to mess with diffs; if we can make the changes we want directly to the tree (well, technically, make a new tree that looks like what we want, and a new commit using that tree), we're good. (Diffs in git are, generally, just git diff looking at two trees and trying to make sense of it. This has the unfortunate result that there is no solid way of representing a rename; there are heuristics, but if you rename a file and change it in the same commit, they may fail and stuff like git blame or git log may be broken, depending on flags. Gerrit doesn't even seem to understand a no-change copy.)

In earlier related cases, I've taken this to the extreme by simply hand-writing a commit using git commit-tree. Create exactly the state that you want by whatever means, commit it in some dummy commit and then use that commit's tree with some suitable commit message and parent(s); voila. But it doesn't help us with history; while we can fix up an older commit in exactly the way we'd like, we also need the latter commits to have our new fixed-up commit as parent.

Thus, enter git filter-branch. git filter-branch comes with a suitable set of warnings about eating your repository and being deprecated (I never really figured out its supposed replacement git filter-repo, so I won't talk much about it), but it's useful when all else fails.

In particular, git filter-branch allows you to do arbitrary changes to the tree of a series of commits, updating the parent commit IDs as rewrites happen. So if you can express your desired changes in a way that's better than "run the editor" (or if you're happy running the editor and making the same edit manually 300 times!), you can just run that command over all commits in the entire branch (forgive me for breaking lines a bit):

git filter-branch -f --tree-filter \
  '! [ -f src/cluster.cpp ] || sed -i "s/if (mi.rank != 0)/if (mi.rank != 0
    \&\& mi.rank == rank())/" src/cluster.cpp' \
  665155410753978998c8080c813da660fc64bbfe^..cluster-master

This is suitably terrible. Remember, if we only did this for one commit, the change wouldn't be there in the next one (git diff would show that it was immediately reverted), so filter-branch needs to do this over and over again, once for each commit (tree) in the branch. And I wanted multiple fixups, so I had a bunch of these; some of them were as simple as "copy this file from /tmp" and some were shell scripts that did things like running clang-format.

You can do similar things for commit messages; at some point, I figured I should write "cluster" (the official name for the branch) and not "cluster-master" (my local name) in the merge messages, so I could just do

git filter-branch \
  --commit-msg-filter 'sed s/cluster-master/cluster/g' \
  665155410753978998c8080c813da660fc64bbfe^..cluster-master

I also did a bunch of them to fix up my email address (GIT_COMMITTER_EMAIL wasn't properly set), although I cannot honestly remember whether I used --env-filter or something else. Perhaps that was actually with git rebase and `-r --exec 'git commit --amend --no-edit --author …'` or similar. There are many ways to do ugly things. :-)

Eventually, I had the branch mostly in a state where I thought it would be ready for review, but after uploading to GitHub, one reviewer commented that some of my merges against master were commits that didn't exist in master. Huh? That's… surprising.

It took a fair bit of digging to figure out what had happened: git filter-branch had rewritten some commits that it didn't actually have to; the merge sources from upstream. This is normally harmless, since git hashes are deterministic, but these commits were signed by the author! And filter-branch (or perhaps fast-export, upon which it builds?) generally assumes that it can't sign stuff with other people's keys, so it just strips the signatures, deeming that better than having invalid ones sitting around. Now, of course, these commit signatures would still be valid since we didn't change anything, but evidently, filter-branch doesn't have any special code for that.

Removing an object like this (a "gpgsig" attribute, it seems) changes the commit hash, which is where the phantom commits came from. I couldn't get filter-branch to turn it off… but again, parents can be freely changed, diffs don't exist anyway. So I wrote a little script that took in parameters suitable for git commit-tree (mostly the parent list), rewrote known-bad parents to known-good parents, gave the script to git filter-branch --commit-filter, and that solved the problem. (I guess --parent-filter would also have worked; I don't think I saw it in the man page at the time.)

So, well, I won't claim this is an exercise in elegancy. (Perhaps my next adventure will be figuring out how this works in jj, which supposedly has conflicts as more of a first-class concept.) But it got the job done in a couple of hours after fighting with rebase for a long time, the PR was reviewed, and now the Stockfish cluster branch is a little bit more alive.

22 Jan 2026 7:45am GMT

21 Jan 2026

feedPlanet Debian

Evgeni Golov: Validating cloud-init configs without being root

Somehow this whole DevOps thing is all about generating the wildest things from some (usually equally wild) template.

And today we're gonna generate YAML from ERB, what could possibly go wrong?!

Well, actually, quite a lot, so one wants to validate the generated result before using it to break systems at scale.

The YAML we generate is a cloud-init cloud-config, and while checking that we generated a valid YAML document is easy (and we were already doing that), it would be much better if we could check that cloud-init can actually use it.

Enter cloud-init schema, or so I thought. Turns out running cloud-init schema is rather broken without root privileges, as it tries to load a ton of information from the running system. This seems like a bug (or multiple), as the data should not be required for the validation of the schema itself. I've not found a way to disable that behavior.

Luckily, I know Python.

Enter evgeni-knows-better-and-can-write-python:

#!/usr/bin/env python3

import sys
from cloudinit.config.schema import get_schema, validate_cloudconfig_file, SchemaValidationError

try:
    valid = validate_cloudconfig_file(config_path=sys.argv[1], schema=get_schema())
    if not valid:
        raise RuntimeError("Schema is not valid")
except (SchemaValidationError, RuntimeError) as e:
    print(e)
    sys.exit(1)

The canonical1 version if this lives in the Foreman git repo, so go there if you think this will ever receive any updates.

The hardest part was to understand thevalidate_cloudconfig_file API, as it will sometimes raise an SchemaValidationError, sometimes a RuntimeError and sometimes just return False. No idea why. But the above just turns it into a couple of printed lines and a non zero exit code, unless of course there are no problems, then you get peaceful silence.

21 Jan 2026 7:42pm GMT

20 Jan 2026

feedPlanet Debian

Sahil Dhiman: Conferences, why?

Back in December, I was working to help organize multiple different conferences. One has already happened; the rest are still works in progress. That's when the thought struck me: why so many conferences, and why do I work for them?

I have been fairly active in the scene since 2020. For most conferences, I usually arrive late in the city on the previous day and usually leave the city on conference close day. Conferences for me are the place to meet friends and new folks and hear about them, their work, new developments, and what's happening in their interest zones. I feel naturally happy talking to folks. In this case, people inspire me to work. Nothing can replace a passionate technical and social discussion, which stretches way into dinner parties and later.

For most conference discussions now, I just show up without a set role (DebConf is probably an exception to it). It usually involves talking to folks, suggesting what needs to be done, doing a bit of it myself, and finishing some last-minute stuff during the actual thing.

Having more of these conferences and helping make them happen naturally gives everyone more places to come together, meet, talk, and work on something.

No doubt, one reason for all these conferences is evangelism for, let's say Free Software, OpenStreetMap, Debian etc. which is good and needed for the pipeline. But for me, the primary reason would always be meeting folks.

20 Jan 2026 2:27am GMT