03 Jun 2020

feedPlanet Debian

Keith Packard: picolibc-ryu

Float/String Conversion in Picolibc: Enter "Ryū"

I recently wrote about this topic having concluded that the best route for now was to use the malloc-free, but imprecise, conversion routines in the tinystdio alternative.

A few days later, Sreepathi Pai pointed me at some very recent work in this area:

This is amazing! Thirty years after the papers referenced in the previous post, Ulf Adams came up with some really cool ideas and managed to reduce the math required for 64-bit conversion to 128 bit integers. This is a huge leap forward; we were doing long multi-precision computations before, and now it's all short enough to fit in registers (ok, a lot of registers, but still).

Getting the Ryū Code

The code is available on github: https://github.com/ulfjack/ryu. Reading through it, it's very clear that the author focuses on performance with lots of tuning for common cases. Still, it's quite readable, especially compared with the newlib multi-precision based code.

Picolibc String/Float conversion interface

Picolibc has some pretty basic needs for the float/string conversion code, it wants four functions:

  1. __dtoa_engine

    int
    __dtoa_engine(double x, struct dtoa *dtoa, uint8_t max_digits, uint8_t max_decimals);
    
    

    This converts the double x to a string of decimal digits and a decimal exponent stored inside the 'dtoa' struct. It limits the total number of digits to max_digits and, optionally (when max_decimals is non-zero), limits the number of fractional digits to max_decimals - 1. This latter supports 'f' formats. Returns the number of digits stored, which is <= max_digits. Less if the number can be accurately represented in fewer digits.

  2. __ftoa_engine

    int
    __ftoa_engine(float x, struct ftoa *ftoa, uint8_t max_digits, uint8_t max_decimals);
    
    

    The same as __dtoa_engine, except for floats.

  3. __atod_engine

    double
    __atod_engine(uint64_t m10, int e10);
    
    

    To avoid needing to handle stdio inside the conversion function, __atod_engine receives fully parsed values, the base-10 significand (m10) and exponent (e10). The value to convert is m10 * pow(10, e10).

  4. __atof_engine

    float
    __atof_engine(uint32_t m10, int e10);
    
    

    The same as __atod_engine, except for floats.

With these, it can do printf, scanf, ecvt, fcvt, gcvt, strtod, strtof and atof.

Porting Ryū to Picolibc

The existing Ryū float-to-string code always generates the number of digits necessary for accurate output. I had to hack it up to generate correctly rounded shorter output when max_digits or max_decimals were smaller. I'm not sure I managed to do that correctly, but at least it appears to be passing all of the test cases I have. In normal operation, Ryū iteratively removes digits from the answer that aren't necessary to disambiguate with neighboring values.

What I changed was to keep removing digits using that method until the answer had few enough digits to fit in the desired length. There's some tricky rounding code that adjusts the final result and I had to bypass that if I'd removed extra digits.

That was about the only change necessary to the core algorithm. I also trimmed the code to only include the general case and not the performance improvements, then wrapped it with code to provide the _engine interface.

On the string-to-float side, most of what I needed to do was remove the string parsing bits at the start of the function and switch from performance-optimized to space-optimized versions of a couple of internal routines.

Correctness Results

Because these new functions are now 'exact', I was able to adjust the picolibc tests to compare all of the bits for string/float conversion instead of having to permit a bit of slop in the answers. With those changes, the picolibc test suite passes, which offers some assurance that things aren't completely broken.

Size Results

Snek uses the 32-bit float versions of the conversion routines, and for that, the size difference is:

   text    data     bss     dec     hex filename
  59068      44   37968   97080   17b38 snek-qemu-riscv-orig.elf
  59430      44   37968   97442   17ca2 snek-qemu-riscv-ryu.elf
    362

362 bytes added to gain accurate printf/strtof results seems like a good trade-off in this case.

Performance

I haven't measured performance at all, but I suspect that it won't be nearly as problematic on most platforms as the source code makes it appear. And that's because Ryū is entirely integer arithmetic with no floating point at all. This avoids using the soft fp code for platforms without hardware float support.

Pointers to the Code

I haven't merged this to picolibc master yet, it's on the ryu branch:

Review, especially of the hack above to return short results, would be greatly appreciated!

Thanks again to Ulf Adams for creating this code and to Sreepathi Pai for sending me a note about it!

03 Jun 2020 1:33am GMT

Dima Kogan: vnlog now functional on *BSD and OSX

So somebody finally bugged me about supporting vnlog tools on OSX. I was pretty sure that between all the redirection, process communication, and file descriptor management something was Linux-specific, but apparently not: everything just works. There were a few uninteresting issues with tool paths, core tool and linker flags and so on, but it was all really minor. I have a report that the test suite passes on OSX, and I verified it on FreeBSD.

I made a new 1.28 release tag, but it exists mostly for the benefit of any OSX or *BSD people who'd want to make a package for their system. Progress!

03 Jun 2020 1:20am GMT

02 Jun 2020

feedPlanet Debian

Olivier Berger: Mixing NRELab’s Antidote and Eclipse Che on the same k8s cluster

You may have heard of my search for Cloud solutions to run labs in an academic context, with a focus on free an open source solutions . You may read previous installments of this blog, or for a shorter, check the presentation I've recorded last week.

I've become quite interested, in the latest month, in 2 projects: NRELab's Antidote and Eclipse Che.

Antidote is the software that powers NRELabs, a labs platform for learning network automation, which runs on top of Kubernetes (k8s). The interesting thing is that for each learner, there can be a dedicated k8s namespace with multiple virtual nodes running on a separate network. This can be used in the context of virtual classes/labs where our students will perform network labs in parallel on the same cluster.

Eclipse Che powers Eclipse "on the Cloud", making available software development environments, for developers, on a Kubernetes Cloud. Developers typically work from a Web page instead of installing local development tools.

Both projects seem quite complementary. For one, we both teach networks and software developments. So that would naturally appeal for many professors.

Furthermore, Eclipse Che provides a few features that Antidote is lacking : authenticating users (with keycloak), and persisting their work in workspaces, between work sessions. Typically what we need in our academic context where students will work on the same labs during scheduled classes, week after week, during or off-hours.

Thus it would be great to have more integration between the 2 environments.

I intend to work on that front, but that takes time, as running stuff on Kubernetes isn't exactly trivial, at least when you're like me and want to use a "vanilla" kubernetes.

I've mainly relied on running k8s inside VMs using Vagrant and/or minikube so far.

A first milestone I've achieved is making sure that Antidote and Eclipse Che aren't incompatible. Antidote's "selfmedicate" script was actually running inside a Vagrant VM, where I had difficulties installing Eclipse Che (probably because of old software, or particular networking setup details). I've overcome this hurdle, as I'm now able to install both environments on a single Kubernetes VM (using my own Vagrant setup).

Running Eclipse Che (alongsite Antidote) on a k8s Vagrant VM.

This proves only that there's no show stopper there, but a lot of work remains.

Stay tuned.

02 Jun 2020 10:06pm GMT

feedPlanet Grep

Wouter Verhelst: SReview 0.6

... isn't ready yet, but it's getting there.

I had planned to release a new version of SReview, my online video review and transcoding system that I wrote originally for FOSDEM but is being used for DebConf, too, after it was set up and running properly for FOSDEM 2020. However, things got a bit busy (both in my personal life and in the world at large), so it fell a bit by the wayside.

I've now also been working on things a bit more, in preparation for an improved administrator's interface, and have started implementing a REST API to deal with talks etc through HTTP calls. This seems to be coming along nicely, thanks to OpenAPI and the Mojolicious plugin for parsing that. I can now design the API nicely, and autogenerate client side libraries to call them.

While at it, because libmojolicious-plugin-openapi-perl isn't available in Debian 10 "buster", I moved the docker containers over from stable to testing. This revealed that both bs1770gain and inkscape changed their command line incompatibly, resulting in me having to work around those incompatibilities. The good news is that I managed to do so in a way that keeps running SReview on Debian 10 viable, provided one installs Mojolicious::Plugin::OpenAPI from CPAN rather than from a Debian package. Or installs a backport of that package, of course. Or, heck, uses the Docker containers in a kubernetes environment or some such -- I'd love to see someone use that in production.

Anyway, I'm still finishing the API, and the implementation of that API and the test suite that ensures the API works correctly, but progress is happening; and as soon as things seem to be working properly, I'll do a release of SReview 0.6, and will upload that to Debian.

Hopefully that'll be soon.

02 Jun 2020 10:44am GMT

01 Jun 2020

feedPlanet Grep

Mattias Geniar: A PHP package to simplify working with percentages

I created a PHP package that can make it easier to work with percentages in any PHP application.

01 Jun 2020 12:00am GMT

28 May 2020

feedPlanet Grep

Steven Wittens: Software Development as Advanced Damage Control

The reason software isn't better is because it takes a lifetime to understand how much of a mess we've made of things, and by the time you get there, you will have contributed significantly to the problem.

Two software developers pairing up on a Rails app

The fastest code is code that doesn't need to run, and the best code is code you don't need to write. This is rather obvious. Less obvious is how to get there, or who knows. Every coder has their favored framework or language, their favored patterns and practices. Advice on what to do is easy to find. More rare is what not to do. They'll often say "don't use X, because of Y," but that's not so much advice as it is a specific criticism.

The topic interests me because significant feats of software engineering often don't seem to revolve around new ways of doing things. Rather, they involve old ways of not doing things. Constraining your options as a software developer often enables you to reach higher than if you hadn't.

Many of these lessons are hard learned, and in retrospect often come from having tried to push an approach further than it merited. Some days much of software feels like this, as if computing has already been pushing our human faculties well past the collective red line. Hence I find the best software advice is often not about code at all. If it's about anything, it's about data, and how you organize it throughout its lifecycle. That is the real currency of the coder's world.

Usually data is the ugly duckling, relegated to the role of an unlabeled arrow on a diagram. The main star is all the code that we will write, which we draw boxes around. But I prefer to give data top billing, both here and in general.

One-way Data Flow

In UI, there's the concept of one-way data flow, popularized by the now omnipresent React. One-way data flow is all about what it isn't, namely not two-way. This translates into benefits for the developer, who can reason more simply about their code. Unlike traditional Model-View-Controller architectures, React is sold as being just the View.

Expert readers however will note that the original trinity of Model-View-Controller does all flow one way, in theory. Its View receives changes from the Model and updates itself. The View never talks back to the model, it only operates through the Controller.

model view controller

The reason it's often two-way in practice is because there are lots of M's, V's and C's which all need to communicate and synchronize in some unspecified way:

model view controller - data flow

The source of truth is some kind of nebulous Ur-Model, and each widget in the UI is tied to a specific part of it. Each widget has its own local model, which has to bi-directionally sync up to it. Children go through their parent to reach up to the top.

When you flatten this, it starts to look more like this:

model view controller - 2-way data flow

Between an original model and a final view must sit a series of additional "Model Controllers" whose job it is to pass data down and to the right, and vice versa. Changes can be made in either direction, and there is no single source of truth. If both sides change at the same time, you don't know which is correct without more information. This is what makes it two-way.

model view controller - one-way stateless data flow

The innovation in one-way UI isn't exactly to remove the Controller, but to centralize it and call it a Reducer. It also tends to be stateless, in that it replaces the entire Model for every change, rather than updating it in place.

This makes all the intermediate arrows one-way, restoring the original idea behind MVC. But unlike most MVC, it uses a stateless function f: model => views to derive all the Views from the Ur-Model in one go. There are no permanent Views that are created and then set up to listen to an associated Model. Instead Views are pure data, re-derived for every change, at least conceptually.

In practice there is an actual trick to making this fast, namely incrementalism and the React Reconciler. You don't re-run everything, but you can pretend you do. A child is guaranteed to be called again if a parent has changed. But only after giving that parent, and its parents, a chance to react first.

Even if the Views are a complex nested tree, the data flow is entirely one way except at the one point where it loops back to the start. If done right, you can often shrink the controller/reducer to such a degree that it may as well not be there.

Much of the effort in developing UI is not in the widgets but in the logic around them, so this can save a lot of time. Typical MVC instead tends to spread synchronization concerns all over the place as the UI develops, somewhat like a slow but steadily growing cancer.

The solution seems to be to forbid a child from calling or changing the state of its parent directly. Many common patterns in old UI code become impossible and must be replaced with alternatives. Parents do often pass down callbacks to children to achieve the same thing by another name. But this is a cleaner split, because the child component doesn't know who it's calling. The parent can decide to pass-through or decorate a callback given to it by its parent, and this enables all sorts of fun composition patterns with little to no boilerplate.

You don't actually need to have one absolute Ur-Model. Rather the idea is separation of concerns along lines of where the data comes from and what it is going to be used for, all to ensure that change only flows in one direction.

The benefits are numerous because of what it enables: when you don't mutate state bidirectionally, your UI tree is also a data-dependency graph. This can be used to update the UI for you, requiring you to only declare what you want the end result to be. You don't need to orchestrate specific changes to and fro, which means a lot of state machines disappear from your code. Key here is the ability to efficiently check for changes, which is usually done using immutable data.

The merit of this approach is most obvious once you've successfully built a complex UI with it. The discipline it enforces leads to more elegant and robust solutions, because it doesn't let you wire things up lazily. You must instead take the long way around, and design a source of truth in accordance with all its intended derivatives. This forces but also enables you to see the bigger picture. Suddenly features that seemed insurmountably complicated, because they cross-cut too many concerns, can just fall out naturally. The experience is very similar to Immediate Mode UI, only with the ability to decouple more and do async.

If you don't do this, you end up with the typical Object-Oriented system. Every object can be both an actor and can be mutually acted upon. It is normal and encouraged to create two-way interactions with them and link them into cycles. The resulting architecture diagrams will be full of unspecified bidirectional arrows that are difficult to trace, which obscure the actual flows being realized.

Unless they represent a reliable syncing protocol, bidirectional arrows are wishful thinking.

Immutable Data

Almost all data in a computer is stored on a mutable medium, be it a drive or RAM. As such, most introductions to immutable data will preface it by saying that it's kinda weird. Because once you create a piece of data, you never update it. You only make a new, altered copy. This seems like a waste of perfectly good storage, volatile or not, and contradicts every programming tutorial.

Because of this it is mandatory to say that you can reduce the impact of it with data sharing. This produces a supposedly unintuitive copy-on-write system.

But there's a perfect parallel, and that's the pre-digital office. Back then, most information was kept on paper that was written, typed or printed. If a document had to be updated, it had to be amended or redone from scratch. Aside from very minor annotations or in-place corrections, changes were not possible. When you did redo a document, the old copy was either archived, or thrown away.

data sharing - copy on write

The perfectly mutable medium of computer memory is a blip, geologically speaking. It's easy to think it only has upsides, because it lets us recover freely from mistakes. Or so we think. But the same needs that gave us real life bureaucracy re-appear in digital form. Only it's much harder to re-introduce what came naturally offline.

Instead of thinking of mutable data as the default, I prefer to think of it as data that destroys its own paper trail. It shreds any evidence of the change and adjusts the scene of the crime so the past never happened. All edits are applied atomically, with zero allowances for delay, consideration, error or ambiguity. This transactional view of interacting with data is certainly appealing to systems administrators and high-performance fetishists, but it is a poor match for how people work with data in real life. We enter and update it incrementally, make adjustments and mistakes, and need to keep the drafts safe too. We need to sync between devices and across a night of sleep.

banksy self-shredding painting

Girl With Balloon aka The Self-shredding Painting (Banksy)

Storing your main project in a bunch of silicon that loses its state as soon as you turn off the power is inadvisable. This is why we have automated backups. Apple's Time Machine for instance turns your computer into a semi-immutable data store on a human time scale, garbage collected behind the scenes and after the fact. Past revisions of files are retained for as long is practical, provided the app supports revision control. It even works without the backup drive actually hooked up, as it maintains a local cache of the most recent edits as space permits.

It's a significant feat of engineering, supported by a clever reinterpretation of what "free disk space" actually means. It allows you to Think Different™ about how data works on your computer. It doesn't just give you the peace of mind of short-term OS-wide undo. It means you can still go fish a crumpled piece of data out of the trash long after throwing banana peels and coke cans on top. And you can do it inline, inside the app you're using, using a UI that is only slightly over the top for what it does.

That is what immutable data gets you as an end-user, and it's the result of deciding not to mutate everything in place as if empty disk space is a precious commodity. The benefits can be enormous, for example that synchronization problems get turned into fetching problems. This is called a Git.

It's so good most developers would riot if they were forced to work without it, but almost none grant their own creations the same abilities.

Linus Torvalds

Git repositories are of course notorious for only growing bigger, never shrinking, but that is a long-standing bug if we're really honest. It seems pretty utopian to want a seamless universe of data, perfectly normalized by key in perpetuity, whether mutable or immutable. Falsehoods programmers believe about X is never wrong on a long enough time-scale, and you will need affordances to cushion that inevitable blow sooner or later.

One of those falsehoods is that when you link a piece of data from somewhere else, you always wish to keep that link live instead of snapshotting it, better known as Database Normalization. Given that screenshots of screenshots are now the most common type of picture on the web, aside from cats, we all know that's a lie. Old bills don't actually self-update after you move house. In fact if you squint hard "Print to PDF" looks a lot like compiling source code into a binary for normies, used for much the same reasons.

The analogy to a piece of paper is poignant to me, because you certainly feel it when you try to actually live off SaaS software meant to replicate business processes. Working with spreadsheets and PDFs on my own desktop is easier and faster than trying to use an average business solution designed for that purpose in the current year. Because they built a tool for what they thought people do, instead of what we actually do.

These apps often have immutability, but they use it wrong: they prevent you from changing something as a matter of policy, letting workflow concerns take precedence over an executive override. If e.g. law requires a paper trail, past versions can be archived. But they should let you continue to edit as much as you damn well want, saving in the background if appropriate. The exceptions that get this right can probably be counted on one hand.

Business processes are meant to enable business, not constrain it. Requiring that you only ever have one version of everything at any time does exactly that. Immutability with history is often a better solution, though not a miracle cure. Doing it well requires expert skill in drawing boundaries between your immutable blobs. It also creates a garbage problem and it won't be as fast as mutable in the short term. But in the long term it just might save someone a rewrite. It's rarely pretty when real world constraints collide with an ivory tower that had too many false assumptions baked into it.

Rolls containing Acts of Parliament in the Parliamentary Archives at Victoria Tower, Palace of Westminster

Parliamentary Archives at Victoria Tower - Palace of Westminster

Pointerless Data

Data structures in a systems language like C will usually refer to each other using memory pointers: these are raw 64-bit addresses pointing into the local machine's memory, obscured by virtualization. They reference memory pages that are allocated, with their specific numeric value meaningless and unpredictable.

This has a curious consequence: the most common form of working with data on a computer is one of the least useful encodings of that data imaginable. It cannot be used as-is on any other machine, or even the same machine later, unless loaded at exactly the same memory offset in the exact same environment.

Almost anything else, even in an obscure format, would have more general utility. Serializing and deserializing binary data is hence a major thing, which includes having to "fix" all the pointers, a problem that has generated at least 573 kiloyaks worth of shaving. This is strange because the solution is literally just adding or subtracting a number from a bunch of other numbers over and over.

Okay that's a lie. But what's true is that every pointer p in a linked data structure is really a base + i, with a base address that was determined once and won't change. Using pointers in your data structure means you sprinkle base + invisibly around your code and your data. You bake this value into countless repeated memory cells, which you then have to subtract later if you want to use their contents for outside purposes.

Due to dynamic memory allocation the base can vary for different parts of your linked data structure. You have to assume it's different per pointer, and manually collate and defragment all the individual parts to serialize something.

Pointers are popular because they are easy, they let you forget where exactly in memory your data sits. This is also their downside: not only have you encoded your data in the least repeatable form possible, but you put it where you don't have permission to search through all of it, add to it, or reorganize it. malloc doesn't set you free, it binds you.

But that's a design choice. If you work inside one contiguous memory space, you can replace pointers with just the relative offset i. The resulting data can be snapshotted as a whole and written to disk. In addition to pointerless, certain data structures can even be made offsetless.

For example, a flattened binary tree where the index of a node in a list determines its position in the tree, row by row. Children are found at 2*i and 2*i + 1. This can be e.g. used on GPUs and allows for very efficient traversal and updates. It's also CPU-cache friendly. This doesn't work well for arbitrary graphs, but is still a useful trick to have in your toolbox. In specific settings, pointerless or offsetless data structures can have significant benefits. The fact that it lets you treat data like data again, and just cargo it around wholesale without concern about the minutiae, enables a bunch of other options around it.

Binary Tree - Flattened

It's not a silver bullet because going pointerless can just shift the problem around in the real world. Your relative offsets can still have the same issue as before, because your actual problem was wrangling the data-graph itself. That is, all the bookkeeping of dependent changes when you edit, delete or reallocate. Unless you can tolerate arbitrary memory fragmentation and bloating, it's going to be a big hassle to make it all work well.

Something else is going on beyond just pointers. See, most data structures aren't really data structures at all. They're acceleration structures for data. They accelerate storage, querying and manipulation of data that was already shaped in a certain way.

The contents of a linked list are the same as that of a linear array, and they serialize to the exact same result. A linked list is just an array that has been atomized, tagged and sprayed across an undefined memory space when it was built or loaded.

Because of performance, we tend to use our acceleration structures as a stand-in for the original data, and manipulate that. But it's important to realize this is programmer lazyness: it's only justified if all the code that needs to use that data has the same needs. For example, if one piece of code does insertions, but another needs random access, then neither an array nor linked list would win, and you need something else.

We can try to come up with ever-cleverer data structures to accommodate every imaginable use, and this is called a Postgres. It leads to a ritual called a Schema Design Meeting where a group of people with differently shaped pegs decide what shape the hole should be. Often you end up with a too-generic model that doesn't hold anything particularly well. All you needed was 1 linked list and 1 array containing the exact same data, and a function to convert one to the other, that you use maybe once or twice.

When a developer is having trouble maintaining consistency while coding data manipulations, that's usually because they're actually trying to update something that is both a source of truth and output derived from it, at the same time in the same place. Most of the time this is entirely avoidable. When you do need to do it, it is important to be aware that's what that is.

My advice is to not look for the perfect data structure which kills all birds with one stone, because this is called a Lisp and few people use it. Rather, accept the true meaning of diversity in software: you will have to wrangle different and incompatible approaches, transforming your data depending on context. You will need to rely on well-constructed adaptors that exist to allow one part to forget about most of the rest of the universe. It is best to become good at this and embrace it where you can.

As for handing your data to others, there is already a solution for that. They're called file formats, and they're a thing we used to have. Software used to be able to read many of them, and you could just combine any two tools that had the same ones. Without having to pay a subscription fee for the privilege, or use a bespoke one-time-use convertor. Obviously this was crazy.

These days we prefer to link our data and code using URLs, which is much better because web pages can change invisibly underneath you without any warning. You also can't get the old version back even if you liked it more or really needed it, because browsers have chronic amnesia. Unfortunately it upsets publishers and other copyright holders if anyone tries to change that, so we don't try.

squeak / smalltalk

Squeak/SmallTalk

Suspend and Resume

When you do have snapshottable data structures that can be copied in and out of memory wholesale, it leads to another question: can entire programs be made to work this way? Could they be suspended and resumed mid-operation, even transplanted or copied to another machine? Imagine if instead of a screenshot, a tester could send a process snapshot that can actually be resumed and inspected by a developer. Why did it ever only 'work on my machine'?

Obviously virtual machines exist, and so does wholesale-VM debugging. But on the process level, it's generally a non-starter, because sockets and files and drivers mess it up. External resources won't be tracked while suspended and will likely end up in an invalid state on resume. VMs have well-defined boundaries and well-defined hardware to emulate, whereas operating systems are a complete wild west.

It's worth considering the worth of a paper trail here too. If I suspend a program while a socket is open, and then resume it, what does this actually mean? If it was a one-time request, like an HTTP GET or PUT, I will probably want to retry that request, if at all still relevant. Maybe I prefer to drop it as unimportant and make a newer, different request. If it was an ongoing connection like a WebSocket, I will want to re-establish it. Which is to say, if you told a network layer the reason for opening a socket, maybe it could safely abort and resume sockets for you, subject to one of several policies, and network programming could actually become pleasant.

Files can receive a similar treatment, to deal with the situation where they may have changed, been deleted, moved, etc. Knowing why a file was opened or being written to is required to do this right, and depends on the specific task being accomplished. Here too macOS deserves a shout-out, for being clever enough to realize that if a user moves a file, any application editing that file should switch to the new location as well.

Systems-level programmers tend to orchestrate such things by hand when needed, but the data flow in many cases is quite unidirectional. If a process, or a part of a process, could resume and reconnect with its resources according to prior declared intent, it would make a lot of state machines disappear.

It's not a coincidence this post started with React. Even those aware of it still don't quite realize React is not actually a thing to make web apps. It is an incremental job scheduler, for recursively expanding a tree in an asynchronous and rewindable fashion. It just happens to be built for SGML-like trees, and contains a bunch of legacy fixes for browsers. The pattern can be applied to many areas that are not UI and not web. If it sounds daunting to consider approaching resources this way, consider that people thought exactly the same about async I/O until someone made that pleasant enough.

However, doing this properly will probably require going back further than you think. For example, when you re-establish a socket, should you repeat and confirm the DNS lookup that gave you the IP in the first place? Maybe the user moved locations between suspending and resuming, so you want to reconnect to the nearest data center. Maybe there is no longer a need for the socket because the user went offline.

All of this is contextual, defined by policies informed by the real world. This class of software behavior is properly called etiquette. Like its real world counterpart it is extremely messy because it involves anticipating needs. Usually we only get it approximately right through a series of ad-hoc hacks to patch the worst annoyances. But it is eminently felt when you get such edge cases to work in a generic and reproducible fashion.

Mainly it requires treating policies as first class citizens in your designs and code. This can also lead you to perceive types in code in a different way. A common view is that a type constrains any code that refers to it. That is, types ensure your code only applies valid operations on the represented values. When types represent policies though, the perspective changes because such a type's purpose is not to constrain the code using it. Rather, it provides specific guarantees about the rules of the universe in which that code will be run.

This to me is the key to developer happiness. As opposed to, say, making tools to automate the refactoring of terrible code and make it bearable, but only just.

The key to end-user happiness is to make tools that enable an equivalent level of affordance and flexibility compared to what the developer needed while developing it.

* * *

When you look at code from a data-centric view, a lot of things start to look like stale or inconsistent data problems. I don't like using the word "cache" for this because it focuses on the negative, the absence of fresh input. The real issue is data dependencies, which are connections that must be maintained in order to present a cohesive view and cohesive behavior, derived from a changing input model. Which is still the most practical way of using a computer.

Most caching strategies, including 99% of those in HTTP, are entirely wrong. They fall into the give-up-and-pray category, where they assume the problem is intractable and don't try something that could actually work in all cases. Which, stating the obvious, is what you should actually aim for.

Often the real problem is that the architect's view of the problem is a tangled mess of boxes and arrows that point all over the place, with loopbacks and reversals, which makes it near-impossible to anticipate and cover all the applicable scenarios.

If there is one major thread running through this, it's that many currently accepted sane defaults really shouldn't be. In a world of terabyte laptops and gigabyte GPUs they look suspiciously like premature optimization. Many common assumptions deserve to be re-examined, at least if we want to adapt tools like from the Offline Age to a networked day. We really don't need a glossier version of a Microsoft Office 95 wizard with a less useful file system.

We do need optimized code in our critical paths, but developer time is worth more than CPU time most everywhere else. Most of all, we need the ambition to build complete tools and the humility to grant our users access on an equal footing, instead of hoarding the goods.

The argument against these practices is usually that they lead to bloat and inefficiency. Which is definitely true. Yet even though our industry has not adopted them much at all, the software already comes out orders of magnitude bigger and slower than before. Would it really be worse?

28 May 2020 10:00pm GMT

08 Nov 2011

feedfosdem - Google Blog Search

papupapu39 (papupapu39)'s status on Tuesday, 08-Nov-11 00:28 ...

papupapu39 · http://identi.ca/url/56409795 #fosdem #freeknowledge #usamabinladen · about a day ago from web. Help · About · FAQ · TOS · Privacy · Source · Version · Contact. Identi.ca is a microblogging service brought to you by Status.net. ...

08 Nov 2011 12:28am GMT

05 Nov 2011

feedfosdem - Google Blog Search

Write and Submit your first Linux kernel Patch | HowLinux.Tk ...

FOSDEM (Free and Open Source Development European Meeting) is a European event centered around Free and Open Source software development. It is aimed at developers and all interested in the Free and Open Source news in the world. ...

05 Nov 2011 1:19am GMT

03 Nov 2011

feedfosdem - Google Blog Search

Silicon Valley Linux Users Group – Kernel Walkthrough | Digital Tux

FOSDEM (Free and Open Source Development European Meeting) is a European event centered around Free and Open Source software development. It is aimed at developers and all interested in the Free and Open Source news in the ...

03 Nov 2011 3:45pm GMT

26 Jul 2008

feedFOSDEM - Free and Open Source Software Developers' European Meeting

Update your RSS link

If you see this message in your RSS reader, please correct your RSS link to the following URL: http://fosdem.org/rss.xml.

26 Jul 2008 5:55am GMT

25 Jul 2008

feedFOSDEM - Free and Open Source Software Developers' European Meeting

Archive of FOSDEM 2008

These pages have been archived.
For information about the latest FOSDEM edition please check this url: http://fosdem.org

25 Jul 2008 4:43pm GMT

09 Mar 2008

feedFOSDEM - Free and Open Source Software Developers' European Meeting

Slides and videos online

Two weeks after FOSDEM and we are proud to publish most of the slides and videos from this year's edition.

All of the material from the Lightning Talks has been put online. We are still missing some slides and videos from the Main Tracks but we are working hard on getting those completed too.

We would like to thank our mirrors: HEAnet (IE) and Unixheads (US) for hosting our videos, and NamurLUG for quick recording and encoding.

The videos from the Janson room were live-streamed during the event and are also online on the Linux Magazin site.

We are having some synchronisation issues with Belnet (BE) at the moment. We're working to sort these out.

09 Mar 2008 3:12pm GMT