14 Jan 2026

feedPlanet Mozilla

The Mozilla Blog: How founders are meeting the moment: Lessons from Mozilla Ventures’ 2025 portfolio convening

Mozilla Ventures Convening 2025 Report book cover with green geometric design on black background

At Mozilla, we've long believed that technology can be built differently - not only more openly, but more responsibly, more inclusively, and more in service of the people who rely on it. As AI reshapes nearly every layer of the internet, those values are being tested in real time.

Our 2025 Mozilla Ventures Portfolio Convening Report captures how a new generation of founders is meeting that moment.

At the Mozilla Festival 2025 in Barcelona, from Nov. 7-9, we brought together 50 founders from 30 companies across our portfolio to grapple with some of the most pressing questions in technology today: How do we build AI that is trustworthy and governable? How do we protect privacy at scale? What does "better social" look like after the age of the global feed? And how do we ensure that the future of technology is shaped by people and communities far beyond today's centers of power?

Over three days of panels, talks, and hands-on sessions, founders shared not just what they're building, but what they're learning as they push into new terrain. What emerged is a vivid snapshot of where the industry is heading - and the hard choices required to get there.

Open source as strategy, not slogan

A major theme emerging across conversations with our founders was that open source is no longer a "nice to have." It's the backbone of trust, adoption, and long‑term resilience in AI, and a critical pillar for the startup ecosystem. But these founders aren't naïve about the challenges. Training frontier‑scale models costs staggering sums, and the gravitational pull of a few dominant labs is real. Yet companies like Union.ai, Jozu, and Oumi show that openness can still be a moat - if it's treated as a design choice, not a marketing flourish.

Their message is clear: open‑washing won't cut it. True openness means clarity about what's shared -weights, data, governance, standards - and why. It means building communities that outlast any single company. And it means choosing investors who understand that open‑source flywheels take time to spin up.

Community as the real competitive edge

Across November's sessions, founders returned to a simple truth: community is the moat. Flyte's growth into a Linux Foundation project, Jozu's push for open packaging standards, and Lelapa's community‑governed language datasets all demonstrate that the most durable advantage isn't proprietary code - it's shared infrastructure that people trust.

Communities harden technology, surface edge cases, and create the kind of inertia that keeps systems in place long after competitors appear. But they also require care: documentation, governance, contributor experience, and transparency. As one founder put it, "You can't build community overnight. It's years of nurturing."

Ethics as infrastructure

One of the most powerful threads came from Lelapa AI, which reframes data not as raw material to be mined but as cultural property. Their licensing model, inspired by Māori data sovereignty, ensures that African languages - and the communities behind them - benefit from the value they create. This is openness with accountability, a model that challenges extractive norms and points toward a more equitable AI ecosystem.

It's a reminder that ethical design isn't a layer on top of technology - it's part of the architecture.

The real competitor: fear

Founders spoke candidly about the biggest barrier to adoption: fear. Enterprises default to hyperscalers because no one gets fired for choosing the biggest vendor. Overcoming that inertia requires more than values. It requires reliability, security features, SSO, RBAC, audit logs - the "boring" but essential capabilities that make open systems viable in real organizations.

In other words, trust is built not only through ideals but through operational excellence.

A blueprint for builders

Across all 16 essays, a blueprint started to emerge for founders and startups committed to building responsible technology and open source AI:

Taken together, the 16 essays in this report point to something larger than any single technology or trend. They show founders wrestling with how AI is governed, how trust is earned, how social systems can be rebuilt at human scale, and how innovation looks different when it starts from Lagos or Johannesburg instead of Silicon Valley.

The future of AI doesn't have to be centralized, extractive or opaque. The founders in this portfolio are proving that openness, trustworthiness, diversity, and public benefit can reinforce one another - and that competitive companies can be built on all four.

We hope you'll dig into the report, explore the ideas these founders are surfacing, and join us in backing the people building what comes next.

The post How founders are meeting the moment: Lessons from Mozilla Ventures' 2025 portfolio convening appeared first on The Mozilla Blog.

14 Jan 2026 5:00pm GMT

The Rust Programming Language Blog: What does it take to ship Rust in safety-critical?

This is another post in our series covering what we learned through the Vision Doc process. In our first post, we described the overall approach and what we learned about doing user research. In our second post, we explored what people love about Rust. This post goes deep on one domain: safety-critical software.

When we set out on the Vision Doc work, one area we wanted to explore in depth was safety-critical systems: software where malfunction can result in injury, loss of life, or environmental harm. Think vehicles, airplanes, medical devices, industrial automation. We spoke with engineers at OEMs, integrators, and suppliers across automotive (mostly), industrial, aerospace, and medical contexts.

What we found surprised us a bit. The conversations kept circling back to a single tension: Rust's compiler-enforced guarantees support much of what Functional Safety Engineers and Software Engineers in these spaces spend their time preventing, but once you move beyond prototyping into the higher-criticality parts of a system, the ecosystem support thins out fast. There is no MATLAB/Simulink Rust code generation. There is no OSEK or AUTOSAR Classic-compatible RTOS written in Rust or with first-class Rust support. The tooling for qualification and certification is still maturing.

Quick context: what makes software "safety-critical"

If you've never worked in these spaces, here's the short version. Each safety-critical domain has standards that define a ladder of integrity levels: ISO 26262 in automotive, IEC 61508 in industrial, IEC 62304 in medical devices, DO-178C in aerospace. The details differ, but the shape is similar: as you climb the ladder toward higher criticality, the demands on your development process, verification, and evidence all increase, and so do the costs.1

This creates a strong incentive for decomposition: isolate the highest-criticality logic into the smallest surface area you can, and keep everything else at lower levels where costs are more manageable and you can move faster.

We'll use automotive terminology in this post (QM through ASIL D) since that's where most of our interviews came from, but the patterns generalize. These terms represent increasing levels of safety-criticality, with QM being the lowest and ASIL D being the highest. The story at low criticality looks very different from the story at high criticality, regardless of domain.

Rust is already in production for safety-critical systems

Before diving into the challenges, it is worth noting that Rust is not just being evaluated in these domains. It is deployed and running in production.

We spoke with a principal firmware engineer working on mobile robotics systems certified to IEC 61508 SIL 2:

"We had a new project coming up that involved a safety system. And in the past, we'd always done these projects in C using third party stack analysis and unit testing tools that were just generally never very good, but you had to do them as part of the safety rating standards. Rust presented an opportunity where 90% of what the stack analysis stuff had to check for is just done by the compiler. That combined with the fact that now we had a safety qualified compiler to point to was kind of a breakthrough." -- Principal Firmware Engineer (mobile robotics)

We also spoke with an engineer at a medical device company deploying IEC 62304 Class B software to intensive care units:

"All of the product code that we deploy to end users and customers is currently in Rust. We do EEG analysis with our software and that's being deployed to ICUs, intensive care units, and patient monitors." -- Rust developer at a medical device company

"We changed from this Python component to a Rust component and I think that gave us a 100-fold speed increase." -- Rust developer at a medical device company

These are not proofs of concept. They are shipping systems in regulated environments, going through audits and certification processes. The path is there. The question is how to make it easier for the next teams coming through.

Rust adoption is easiest at QM, and the constraints sharpen fast

At low criticality, teams described a pragmatic approach: use Rust and the crates ecosystem to move quickly, then harden what you ship. One architect at an automotive OEM told us:

"We can use any crate [from crates.io] [..] we have to take care to prepare the software components for production usage." -- Architect at Automotive OEM

But at higher levels, third-party dependencies become difficult to justify. Teams either rewrite, internalize, or strictly constrain what they use. An embedded systems engineer put it bluntly:

"We tend not to use 3rd party dependencies or nursery crates [..] solutions become kludgier as you get lower in the stack." -- Firmware Engineer

Some teams described building escape hatches, abstraction layers designed for future replacement:

"We create an interface that we'd eventually like to have to simplify replacement later on [..] sometimes rewrite, but even if re-using an existing crate we often change APIs, write more tests." -- Team Lead at Automotive Supplier (ASIL D target)

Even teams that do use crates from crates.io described treating that as a temporary accelerator, something to track carefully and remove from critical paths before shipping:

"We use crates mainly for things in the beginning where we need to set up things fast, proof of concept, but we try to track those dependencies very explicitly and for the critical parts of the software try to get rid of them in the long run." -- Team lead at an automotive software company developing middleware in Rust

In aerospace, the "control the whole stack" instinct is even stronger:

"In aerospace there's a notion of we must own all the code ourselves. We must have control of every single line of code." -- Engineering lead in aerospace

This is the first big takeaway: a lot of "Rust in safety-critical" is not just about whether Rust compiles for a target. It is about whether teams can assemble an evidence-friendly software stack and keep it stable over long product lifetimes.

The compiler is doing work teams used to do elsewhere

Many interviewees framed Rust's value in terms of work shifted earlier and made more repeatable by the compiler. This is not just "nice," it changes how much manual review you can realistically afford. Much of what was historically process-based enforcement through coding standards like MISRA C and CERT C becomes a language-level concern in Rust, checked by the compiler rather than external static analysis or manual review.

"Roughly 90% of what we used to check with external tools is built into Rust's compiler." -- Principal Firmware Engineer (mobile robotics)

We heard variations of this from teams dealing with large codebases and varied skill levels:

"We cannot control the skill of developers from end to end. We have to check the code quality. Rust by checking at compile time, or Clippy tools, is very useful for our domain." -- Engineer at a major automaker

Even on smaller teams, the review load matters:

"I usually tend to work on teams between five and eight. Even so, it's too much code. I feel confident moving faster, a certain class of flaws that you aren't worrying about." -- Embedded systems engineer (mobile robotics)

Closely related: people repeatedly highlighted Rust's consistency around error handling:

"Having a single accepted way of handling errors used throughout the ecosystem is something that Rust did completely right." -- Automotive Technical Lead

For teams building products with 15-to-20-year lifetimes and "teams of teams," compiler-enforced invariants scale better than "we will just review harder."

Teams want newer compilers, but also stability they can explain

A common pattern in safety-critical environments is conservative toolchain selection. But engineers pointed out a tension: older toolchains carry their own defect history.

"[..] traditional wisdom is that after something's been around and gone through motions / testing then considered more stable and safer [..] older compilers used tend to have more bugs [and they become] hard to justify" -- Software Engineer at an Automotive supplier

Rust's edition system was described as a real advantage here, especially for incremental migration strategies that are common in automotive programs:

"[The edition system is] golden for automotive, where incremental migration is essential." -- Software Engineer at major Automaker

In practice, "stability" is also about managing the mismatch between what the platform supports and what the ecosystem expects. Teams described pinning Rust versions, then fighting dependency drift:

"We can pin the Rust toolchain, but because almost all crates are implemented for the latest versions, we have to downgrade. It's very time-consuming." -- Engineer at a major automaker

For safety-critical adoption, "stability" is operational. Teams need to answer questions like: What does a Rust upgrade change, and what does it not change? What are the bounds on migration work? How do we demonstrate we have managed upgrade risk?

Target support matters in practical ways

Safety-critical software often runs on long-lived platforms and RTOSs. Even when "support exists," there can be caveats. Teams described friction around targets like QNX, where upstream Rust support exists but with limitations (for example, QNX 8.0 support is currently no_std only).2

This connects to Rust's target tier policy: the policy itself is clear, but regulated teams still need to map "tier" to "what can I responsibly bet on for this platform and this product lifetime."

"I had experiences where all of a sudden I was upgrading the compiler and my toolchain and dependencies didn't work anymore for the Tier 3 target we're using. That's simply not acceptable. If you want to invest in some technology, you want to have a certain reliability." -- Senior software engineer at a major automaker

core is the spine, and it sets expectations

In no_std environments, core becomes the spine of Rust. Teams described it as both rich enough to build real products and small enough to audit.

A lot of Rust's safety leverage lives there: Option and Result, slices, iterators, Cell and RefCell, atomics, MaybeUninit, Pin. But we also heard a consistent shape of gaps: many embedded and safety-critical projects want no_std-friendly building blocks (fixed-size collections, queues) and predictable math primitives, but do not want to rely on "just any" third-party crate at higher integrity levels.

"Most of the math library stuff is not in core, it's in std. Sin, cosine... the workaround for now has been the libm crate. It'd be nice if it was in core." -- Principal Firmware Engineer (mobile robotics)

Async is appealing, but the long-run story is not settled

Some safety-critical-adjacent systems are already heavily asynchronous: daemons, middleware frameworks, event-driven architectures. That makes Rust's async story interesting.

But people also expressed uncertainty about ecosystem lock-in and what it would take to use async in higher-criticality components. One team lead developing middleware told us:

"We're not sure how async will work out in the long-run [in Rust for safety-critical]. [..] A lot of our software is highly asynchronous and a lot of our daemons in the AUTOSAR Adaptive Platform world are basically following a reactor pattern. [..] [C++14] doesn't really support these concepts, so some of this is lack of familiarity." -- Team lead at an automotive software company developing middleware in Rust

And when teams look at async through an ISO 26262 lens, the runtime question shows up immediately:

"If we want to make use of async Rust, of course you need some runtime which is providing this with all the quality artifacts and process artifacts for ISO 26262." -- Team lead at an automotive software company developing middleware in Rust

Async is not "just a language feature" in safety-critical contexts. It pulls in runtime choices, scheduling assumptions, and, at higher integrity levels, the question of what it would mean to certify or qualify the relevant parts of the stack.

Recommendations

Find ways to help the safety-critical community support their own needs. Open source helps those who help themselves. The Ferrocene Language Specification (FLS) shows this working well: it started as an industry effort to create a specification suitable for safety-qualification of the Rust compiler, companies invested in the work, and it now has a sustainable home under the Rust Project with a team actively maintaining it.3

Contrast this with MC/DC coverage support in rustc. Earlier efforts stalled due to lack of sustained engagement from safety-critical companies.4 The technical work was there, but without industry involvement to help define requirements, validate the implementation, and commit to maintaining it, the effort lost momentum. A major concern was that the MC/DC code added maintenance burden to the rest of the coverage infrastructure without a clear owner. Now in 2026, there is renewed interest in doing this the right way: companies are working through the Safety-Critical Rust Consortium to create a Rust Project Goal in 2026 to collaborate with the Rust Project on MC/DC support. The model is shared ownership of requirements, with primary implementation and maintenance done by companies with a vested interest in safety-critical, done in a way that does not impede maintenance of the rest of the coverage code.

The remaining recommendations follow this pattern: the Safety-Critical Rust Consortium can help the community organize requirements and drive work, with the Rust Project providing the deep technical knowledge of Rust Project artifacts needed for successful collaboration. The path works when both sides show up.

Establish ecosystem-wide MSRV conventions. The dependency drift problem is real: teams pin their Rust toolchain for stability, but crates targeting the latest compiler make this difficult to sustain. An LTS release scheme, combined with encouraging libraries to maintain MSRV compatibility with LTS releases, could reduce this friction. This would require coordination between the Rust Project (potentially the release team) and the broader ecosystem, with the Safety-Critical Rust Consortium helping to articulate requirements and adoption patterns.

Turn "target tier policy" into a safety-critical onramp. The friction we heard is not about the policy being unclear, it is about translating "tier" into practical decisions. A short, target-focused readiness checklist would help: Which targets exist? Which ones are no_std only? What is the last known tested OS version? What are the top blockers? The raw ingredients exist in rustc docs, release notes, and issue trackers, but pulling them together in one place would lower the barrier. Clearer, consolidated information also makes it easier for teams who depend on specific targets to contribute to maintaining them. The Safety-Critical Rust Consortium could lead this effort, working with compiler team members and platform maintainers to keep the information accurate.

Document "dependency lifecycle" patterns teams are already using. The QM story is often: use crates early, track carefully, shrink dependencies for higher-criticality parts. The ASIL B+ story is often: avoid third-party crates entirely, or use abstraction layers and plan to replace later. Turning those patterns into a reusable playbook would help new teams make the same moves with less trial and error. This seems like a natural fit for the Safety-Critical Rust Consortium's liaison work.

Define requirements for a safety-case friendly async runtime. Teams adopting async in safety-critical contexts need runtimes with appropriate quality and process artifacts for standards like ISO 26262. Work is already happening in this space.5 The Safety-Critical Rust Consortium could lead the effort to define what "safety-case friendly" means in concrete terms, working with the async working group and libs team on technical feasibility and design.

Treat interop as part of the safety story. Many teams are not going to rewrite their world in Rust. They are going to integrate Rust into existing C and C++ systems and carry that boundary for years. Guidance and tooling to keep interfaces correct, auditable, and in sync would help. The compiler team and lang team could consider how FFI boundaries are surfaced and checked, informed by requirements gathered through the Safety-Critical Rust Consortium.

"We rely very heavily on FFI compatibility between C, C++, and Rust. In a safety-critical space, that's where the difficulty ends up being, generating bindings, finding out what the problem was." -- Embedded systems engineer (mobile robotics)

Conclusion

To sum up the main points in this post:

We make six recommendations: find ways to help the safety-critical community support their own needs, establish ecosystem-wide MSRV conventions, create target-focused readiness checklists, document dependency lifecycle patterns, define requirements for safety-case friendly async runtimes, and treat C/C++ interop as part of the safety story.

Get involved

If you're working in safety-critical Rust, or you want to help make it easier, check out the Rust Foundation's Safety-Critical Rust Consortium and the in-progress Safety-Critical Rust coding guidelines.

Hearing concrete constraints, examples of assessor feedback, and what "evidence" actually looks like in practice is incredibly helpful. The goal is to make Rust's strengths more accessible in environments where correctness and safety are not optional.

  1. If you're curious about how rigor scales with cost in ISO 26262, this Feabhas guide gives a good high-level overview.

  2. See the QNX target documentation for current status.

  3. The FLS team was created under the Rust Project in 2025. The team is now actively maintaining the specification, reviewing changes and keeping the FLS in sync with language evolution.

  4. See the MC/DC tracking issue for context. The initial implementation was removed due to maintenance concerns.

  5. Eclipse SDV's Eclipse S-CORE project includes an Orchestrator written in Rust for their async runtime, aimed at safety-critical automotive software.

14 Jan 2026 12:00am GMT

Tarek Ziadé: The Economics of AI Coding: A Real-World Analysis

My whole stream in the past months has been about AI coding. From skeptical engineers who say it creates unmaintainable code, to enthusiastic (or scared) engineers who say it will replace us all, the discourse is polarized. But I've been more interested in a different question: what does AI coding actually cost, and what does it actually save?

I recently had Claude help me with a substantial refactoring task: splitting a monolithic Rust project into multiple workspace repositories with proper dependency management. The kind of task that's tedious, error-prone, and requires sustained attention to detail across hundreds of files. When it was done, I asked Claude to analyze the session: how much it cost, how long it took, and how long a human developer would have taken.

The answer surprised me. Not because AI was faster or cheaper (that's expected), but because of how much faster and cheaper.

The Task: Repository Split and Workspace Setup

The work involved:

This is real work. Not a toy problem, not a contrived benchmark. The kind of multi-day slog that every engineer has faced: important but tedious, requiring precision but not creativity.

The Numbers

AI Execution Time

Total: approximately 3.5 hours across two sessions

AI Cost

Total tokens: 72,146 tokens

Estimated marginal cost: approximately $4.95

This is the marginal execution cost for this specific task. It doesn't include my Claude subscription, the time I spent iterating on prompts and reviewing output, or the risk of having to revise or fix AI-generated changes. For a complete accounting, you'd also need to consider those factors, though for this task they were minimal.

Human Developer Time Estimate

Conservative estimate: 2-3 days (16-24 hours)

This is my best guess based on experience with similar tasks, but it comes with uncertainty. A senior engineer deeply familiar with this specific codebase might work faster. Someone encountering similar patterns for the first time might work slower. Some tasks could be partially templated or parallelized across a team.

Breaking down the work:

  1. Planning and research (2-4 hours): Understanding codebase structure, planning dependency strategy, reading PyO3/Maturin documentation
  2. Code migration (4-6 hours): Copying files, updating all import statements, fixing compilation errors, resolving workspace conflicts
  3. Build system setup (2-3 hours): Writing Makefile, configuring Cargo.toml, setting up pyproject.toml, testing builds
  4. CI/CD configuration (2-4 hours): Writing GitHub Actions workflows, testing syntax, debugging failures, setting up matrix builds
  5. Documentation updates (2-3 hours): Updating multiple documentation files, ensuring consistency, writing migration guides
  6. Testing and debugging (3-5 hours): Running test suites, fixing unexpected failures, verifying tests pass, testing on different platforms
  7. Git operations and cleanup (1-2 hours): Creating branches, writing commit messages, final verification

Even if we're generous and assume a very experienced developer could complete this in 8 hours of focused work, the time and cost advantages remain substantial. The economics don't depend on the precise estimate.

The Bottom Line

These numbers compare execution time and per-task marginal costs. They don't capture everything (platform costs, review time, long-term maintenance implications), but they illustrate the scale of the difference for this type of systematic refactoring work.

Why AI Was Faster

The efficiency gains weren't magic. They came from specific characteristics of how AI approaches systematic work:

No context switching fatigue. Claude maintained focus across three repositories simultaneously without the cognitive load that would exhaust a human developer. No mental overhead from jumping between files, no "where was I?" moments after a break.

Instant file operations. Reading and writing files happens without the delays of IDE loading, navigation, or search. What takes a human seconds per file took Claude milliseconds.

Pattern matching without mistakes. Updating thousands of import statements consistently, without typos, without missing edge cases. No ctrl-H mistakes, no regex errors that you catch three files later.

Parallel mental processing. Tracking multiple files at once without the working memory constraints that force humans to focus narrowly.

Documentation without overhead. Generating comprehensive, well-structured documentation in one pass. No switching to a different mindset, no "I'll document this later" debt.

Error recovery. When workspace conflicts or dependency issues appeared, Claude fixed them immediately without the frustration spiral that can derail a human's momentum.

Commit message quality. Detailed, well-structured commit messages generated instantly. No wrestling with how to summarize six hours of work into three bullet points.

What Took Longer

AI wasn't universally faster. Two areas stood out:

Initial codebase exploration. Claude spent time systematically understanding the structure before implementing. A human developer might have jumped in faster with assumptions (though possibly paying for it later with rework).

User preference clarification. Some back-and-forth on git dependencies versus crates.io, version numbering conventions. A human working alone would just make these decisions implicitly based on their experience.

These delays were minimal compared to the overall time savings, but they're worth noting. AI coding isn't instantaneous magic. It's a different kind of work with different bottlenecks.

The Economics of Coding

Let me restate those numbers because they still feel surreal:

For this type of task, these are order-of-magnitude improvements over solo human execution. And they weren't achieved through cutting corners or sacrificing immediate quality. The tests passed, the documentation was comprehensive, the commits were well-structured, the code compiled cleanly.

That said, tests passing and documentation existing are necessary but not sufficient signals of quality. Long-term maintainability, latent bugs that only surface later, or future refactoring friction are harder to measure immediately. The code is in production and working, but it's too soon to know if there are subtle issues that will emerge over time.

This creates strange economics for a specific class of work: systematic, pattern-based refactoring with clear success criteria. For these tasks, the time and cost reductions change how we value engineering effort and prioritize maintenance work.

I used to avoid certain refactorings because the payoff didn't justify the time investment. Clean up import statements across 50 files? Update documentation after a restructure? Write comprehensive commit messages? These felt like luxuries when there was always more pressing work.

But at $5 marginal cost and 3.5 hours for this type of systematic task, suddenly they're not trade-offs anymore. They're obvious wins. The economics shift from "is this worth doing?" to "why haven't we done this yet?"

What This Doesn't Mean

Before the "AI will replace developers" crowd gets too excited, let me be clear about what this data doesn't show:

This was a perfect task for AI. Systematic, pattern-based, well-scoped, with clear success criteria. The kind of work where following existing patterns and executing consistently matters more than creative problem-solving or domain expertise.

AI did not:

The task was pure execution. Important execution, skilled execution, but execution nonetheless. A human developer would have brought the same capabilities to the table, just slower and at higher cost.

Where This Goes

I keep thinking about that 85-90% time reduction for this specific type of task. Not simple one-liners where AI already shines, but systematic maintenance work with high regularity, strong compiler or test feedback, and clear end states.

Tasks with similar characteristics might include:

Many maintenance tasks are messier: ambiguous semantics, partial test coverage, undocumented invariants, organizational constraints. The economics I observed here don't generalize to all refactoring work. But for the subset that is systematic and well-scoped, the shift is significant.

All the work that we know we should do but often defer because it doesn't feel like progress. What if the economics shifted enough for these specific tasks that deferring became the irrational choice?

I'm not suggesting AI replaces human judgment. Someone still needs to decide what "good" looks like, validate the results, understand the business context. But if the execution of systematic work becomes 10x cheaper and faster, maybe we stop treating certain categories of technical debt like unavoidable burdens and start treating them like things we can actually manage.

The Real Cost

There's one cost the analysis didn't capture: my time. I wasn't passive during those 3.5 hours. I was reading Claude's updates, reviewing file changes, answering questions, validating decisions, checking test results.

I don't know exactly how much time I spent, but it was less than the 3.5 hours Claude was working. Maybe 2 hours of active engagement? The rest was Claude working autonomously while I did other things.

So the real comparison isn't 3.5 AI hours versus 16-24 human hours. It's 2 hours of human guidance plus 3.5 hours of AI execution versus 16-24 hours of human solo work. Still a massive win, but different from pure automation.

This feels like the right model: AI as an extremely capable assistant that amplifies human direction rather than replacing human judgment. The economics work because you're multiplying effectiveness, not substituting one for the other.

Final Thoughts

Five dollars marginal cost. Three and a half hours. For systematic refactoring work that would have taken me days and cost hundreds or thousands of dollars in my time.

These numbers make me think differently about certain kinds of work. About how we prioritize technical debt in the systematic, pattern-based category. About what "too expensive to fix" really means for these specific tasks. About whether we're approaching some software maintenance decisions with outdated economic assumptions.

I'm still suspicious of broad claims that AI fundamentally changes how we work. But I'm less suspicious than I was. When the economics shift this dramatically for a meaningful class of tasks, some things that felt like pragmatic trade-offs start to look different.

The tests pass. The documentation is up to date. And I paid less than the cost of a fancy coffee drink.

Maybe the skeptics and the enthusiasts are both right. Maybe AI doesn't replace developers and maybe it does change some things meaningfully. Maybe it just makes certain kinds of systematic work cheap enough that we can finally afford to do them right.

What About Model and Pricing Changes?

One caveat worth noting: these economics depend on Claude Sonnet 4.5 at January 2026 pricing. Model pricing can change, model performance can regress or improve with updates, tool availability can shift, and organizational data governance constraints might limit what models you can use or what tasks you can delegate to them.

For individuals and small teams, this might not matter much in the short term. For larger organizations making long-term planning decisions, these factors matter. The specific numbers here are a snapshot, not a guarantee.

References

14 Jan 2026 12:00am GMT