13 Aug 2018
Planet Lisp
Christophe Rhodes: a trace of riscv success
(Don't get too excited by the "success" in the title of this post.)
A brief SBCL RISCV update this time, as there have been no train journeys since the last blog post. Most of what has been achieved is trivial definitions and rearrangements. We've defined some more of the MOVE Virtual Operations that the compiler needs to be able to move quantities between different storage classes (from immediate
to descriptorreg
, or between tagged fixnums in descriptorreg
to untagged fixnums in signedreg
, for example).
We've defined a debugging printer function, which wouldn't be needed at this stage except that we're asking the compiler to print out substantial parts of its internal data structures. We've reorganized, and not in a desperately good way, the definitions of the registers and their storage bases and classes; this area of code is "smelly", in that various arguments aren't evaluate when maybe they should be, which means that some other structures need to be defined at compiletime so that variables can be defined at compiletime so that values (in the next form) can be evaluated at readtime; it's all a bit horrible, and as Paul said, not friendly to interactive development. And then a few more fairly trivial definitions later, we are at the point where we have a first trace file from compiling our very simple scratch.lisp
file. Hooray! Here's the assembly code of our foo
function:
L10:
VOP XEPALLOCATEFRAME {#<SB!ASSEM:LABEL 1>}
L1:
VOP EMITLABEL {#<SB!ASSEM:LABEL 2>}
L2:
VOP EMITLABEL {#<SB!ASSEM:LABEL 3>}
L3:
VOP NOTEENVIRONMENTSTART {#<SB!ASSEM:LABEL 4>}
L4:
L11:
L5:
VOP TYPECHECKERROR/C #:G0!4[A0] :DEBUGENVIRONMENT
{SB!KERNEL:OBJECTNOTSIMPLEARRAYFIXNUMERROR X}
L12:
byte 50
byte 40
.align 2
L6:
VOP EMITLABEL {#<SB!ASSEM:LABEL 7>}
L7:
VOP EMITLABEL {#<SB!ASSEM:LABEL 8>}
L8:
VOP NOTEENVIRONMENTSTART {#<SB!ASSEM:LABEL 9>}
L9:
L13:
L14:
L15:
I hope that everyone can agree that this is a flawless gem of a function.
The next step is probably to start actually defining and using RISCV instructions, so... stay tuned!
13 Aug 2018 4:57pm GMT
07 Aug 2018
Planet Lisp
Christophe Rhodes: first riscy steps
In our first post, I summarized the work of a few evenings of heat and a few hours on a train, being basically the boilerplate (parenthesisinfested boilerplate, but boilerplate nonetheless) necessary to get an SBCL crosscompiler built. At that point, it was possible to start running makehost2.sh
, attempting to use the crosscompiler to build the Lisp sources to go into the initial Lisp image for the new, target SBCL. Of course, given that all I had done was create boilerplate, running the crosscompiler was unlikely to end well: and in fact it barely started well, giving an immediate error of functions not being defined.
Well, functions not being defined is a fairly normal state of affairs (fans of static type systems look away now!), and easily fixed: we simply define them until they're not undefined any more, ideally with sensible rather than nonsensical contents. That additional constraint, though, gives us our first opportunity to indulge in some actual thought about our target platform.
The routines we have to define include things like functions to return Temporary Names (registers or stack locations) for such things as the lisp return address, the old frame pointer, and the number of arguments to a function: these are things that will be a part of the lisp calling convention, which may or may not bear much relation to the "native" C calling convention. On most SBCL platforms, it doesn't have much in common, with Lisp execution effectively being one large complicated subroutine when viewed from the prism of the C world; the Lisp execution uses a separate stack, which allows us to be sure about what on the stack is a Lisp object and what might not be (making garbage collectors precise on those platforms).
In the nascent RISCV port, in common with other 32register ports, these values of interest will be stored in registers, so we need to think about mapping these concepts to the platform registers, documentation of which can be found at the RISCV site. In particular, let's have a look at logical page 109 (physical page 121) of the userlevel RISCV ISA specification v2.2:
Register  ABI Name  Description  Saver 

x0  zero  Hardwired zero   
x1  ra  Return address  Caller 
x2  sp  Stack pointer  Callee 
x3  gp  Global pointer   
x4  tp  Thread pointer   
x5  t0  Temporary/Alternate link register  Caller 
x67  t12  Temporaries  Caller 
x8  s0/fp  Saved register/Frame pointer  Callee 
x9  s1  Saved register  Callee 
x1011  a01  Function arguments/return values  Caller 
x1217  a27  Function arguments  Caller 
x1827  s211  Saved registers  Callee 
x2831  t36  Temporaries  Caller 
The table (and the rest of the document) tells us that x0
is a wired zero: no matter what is written to it, reading from it will always return 0. That's a useful thing to know; we can now define a storage class for zero specifically, and define the zero
register at offset 0.
The platform ABI uses x1
for a link register (storing the return address for a subroutine). We may or may not want to use this for our own return address; I'm not yet sure if we will be able to. For now, though, we'll keep it reserved for the platform return address, defining it as register lr
in SBCL nomenclature, and use x5
for our own lisp return address register. Why x5
? The ISA definition for the jalr
instruction (Jump And Link Register, logical page 16) informs us that return address prediction hardware should manipulate the prediction stack when either x1
or x5
is used as the link register with jalr
, and not when any other register is used; that's what "Alternate link register" means in the above table. We'll see if this choice for our lra
register, for Spectre or for worse, gives us any kind of speed boost  or even if it is tenable at all.
We can define the Number Stack Pointer and Number Frame Pointer registers, even though we don't need them yet: in SBCL parlance, the "number" stack is the C stack, and our register table says that those should be x2
and x8
respectively. Next, for our own purposes we will need: Lisp stack, frame and oldframe pointer registers; some function argument registers; and some nondescriptor temporaries. We'll put those semiarbitrarily in the temporaries and function arguments region of the registers; the only bit of forethought here is to alternate our Lisp arguments and nondescriptor temporaries with one eye on the "C" RISCV extension for compressed instructions: if an instruction uses registers x8
x15
and the "C" extension is implemented on a particular RISCV board, the instruction might be encodable in two bytes instead of four, hopefully reducing instruction loading and decoding time and instruction cache pressure. (This might also be overthinking things at this stage). In the spirit of not overthinking, let's just put nargs on x31
for now.
With all these decisions implemented, we hit our first "real" error! Remembering that we are attempting to compile a very simple file, whose only actual Lisp code involves returning an element from a specialized array, it's great to hit this error because it seems to signify that we are almost ready. The crosscompiler is built knowing that some functions (usually as a result of code transformations) should never be compiled as calls, but should instead be /translated/ using one of our Virtual OPerations. datavectorref
, which the aref
in our scratch.lisp file translates to because of the type declarations, is one such function, so we need to define it: and also define it in such a way that it will be used no matter what the current compiler policy, hence the :policy :fastsafe
addition to the reffer macro. (That, and many other such :policy
declarations, are dotted around everywhere in the other backends; I may come to regret having taken them out in the initial boilerplate exercise.)
And then, finally, because there's always an excuse for a good cleanup: those float array VOPs, admittedly without :generator
bodies, look too similar: let's have a definefloatvectorfrobs
macro to tidy things up. (They may not remain that tidy when we come to actually implement them). At this point, we have successfully compiled one Virtual OPeration! I know this, because the next error from running the build with all this is about callnamed
, not datavectorset
. Onward!
07 Aug 2018 5:36pm GMT
Didier Verna: Quickref opensourced
8 months after the original announcement, we finally opensourced Quickref. The complete source code is available here.
I gave a lightning talk at ELS 2018, in which I claimed I would release the code in "something like two weeks". That was 4 months ago :). I apologize for the delay, especially to Zach who expressed interest in how we did the Docker thing at the time. I wanted to clean up the code before the first public release (I'm a maniac like that), which I did, but it took some time.
Compared to what I announced at ELS, Quickref also has a couple of new features, most importantly the ability to generate documentation for only what's already installed, rather than for the whole Quicklisp world. This can be very convenient if you just want some local documentation for the things you use daily. Finally, there is also a very rudimentary support for multithreading, which currently doesn't bring much. But the code has been prepared for going further in that direction; this will be the topic of another internship which will start next September.
Browse no less than 1588 reference manuals right now at https://quickref.commonlisp.net/, and recreate your own local version with this Docker 2liner:
docker run name quickref quickref/quickref docker cp quickref:/home/quickref/quickref .
07 Aug 2018 12:00am GMT
03 Aug 2018
Planet Lisp
Christophe Rhodes: beginning an sbcl port
I'm on a train! (Disclaimer: I'm not on a train any more.)
What better thing to do, during the hot, hot July days, thank something semimechanical involving very little brain power? With studentfacing work briefly calming down (we've just about weathered the storm of appeals and complaints following the publication of exam results), and the alltoobrief Summer holidays approaching, I caught myself thinking that it wasn't fair that Doug and his team should have all the fun of porting SBCL to a new architecture; I want a go, too! (The fact that I have no time notwithstanding.)
But, to what? 64bit powerpc is definitely on the move; I've missed that boat. SBCL already supports most current architectures, and "supports" a fair number of obsolete ones too. One thought that did catch my brain, though, was prompted by the recent publication by ARM of a set of claims purporting to demonstrate the dangers of the RISCV architecture, in comparison to ARM's own. Well, since SBCL already runs on 32bit and 64bit ARM platforms, how hard could porting to RISCV be?
I don't know the answer to that yet! This first post merely covers the straightforward  some might even say "tedious"  architectureneutral changes required to get SBCL to the point that it could start about considering compiling code for a new backend.
The SBCL build has roughly seven phases (depending a bit on how you count):
 build configuration;
 build the crosscompiler using the host Lisp, generating targetspecific header files;
 build the runtime and garbage collector using a C compiler and platform assembler, generating an executable;
 build the target compiler using the crosscompiler, generating targetspecific fasl files;
 build the initial ("cold") image, using the genesis program to simulate the effect of loading fasl files, generating a target Lisp memory image;
 run all the code that stage 5 delayed because it couldn't simulate the effects of loading fasls (e.g. many sideeffectful toplevel forms);
 build everything else, primarily PCL (a full implementation CLOS) and save the resulting image.
1. Define a keyword for a new backend architecture
Probably the most straightforward thing to do, and something that allows me to (pretty much) say that we're one seventh of the way there: defining a new keyword for a new backend architecture is almost as simple as adding a line or two to makeconfig.sh
. Almost. We run some devious dastardly consistency checks on our target *features*
, and those need to be updated too. This gets makeconfig.sh
running, and allows makehost1.sh
to run its consistency checks.
2. Construct minimal backendspecific files
Phase 2, building the crosscompiler, sounds like something straightforward: after all, if we don't have the constraint that the crosscompiler will do anything useful, not even run, then surely just producing minimal files where the build system expects to find them will do, and Lisp's usual latebinding nature will allow us to get away with the rest. No?
Well, that could have been the case, but SBCL itself does impose some constraints, and the minimal files are a lot less minimal than you might think. The translation of IR2 (second intermediate representation) to machine code involves calling a number of VOPs (virtual operations) by name: and those calls by name perform compiletime checking that the virtual operation template name already exists. So we have to define stub VOPs for all those virtual operations called by name, including those in optimizing vector initialisation, for all of our specialised vector types. These minimal definitions do nothing other than pacify the safety checks in the crosscompiler code.
Phase 2 also generates a number of header files used in the compilation of the Cbased runtime, propagating constant and object layout definitions to the garbage collector and other support routines. We won't worry about that for now; we just have to ignore an error about this at firstgenesis time for a while. And with this, makehost1.sh
runs to completion.
4. Build the target compiler using the crosscompiler
At this point, we have a compiler backend that compiles. It almost certainly doesn't run, and even when it does run, we will want to ask it to compile simple forms of our choosing. For now, we just add a new file to the start of the build order, and put in a very simple piece of code, to see how we get on. (Spoiler alert: not very well). We've added the :tracefile
flag so that the crosscompiler will print information about its compilation, which will be useful as and when we get as far as actually compiling things.
The next steps, moving beyond the start of step 4, will involve making decisions about how we are going to support sbcl on RISCV. As yet, we have not made any decisions about register function (some of those, such as the C stack and frame pointer, will be dictated by the platform ABI, but many won't, such as exactly how we will partition the register set), or about how the stack works (again, the C stack will be largely determined, but Lisp will have its own stack, kept separate for ease of implementating garbage collection).
(Observant readers will note that phase 3 is completely unstarted. It's not necessary yet, though it does need to be completed for phase 4 to run to completion, as some of the crosscompilation depends on information about the runtime memory layout. Some of the details in phase 3 depend on the decisions made in phase 4, though, so that's why I'm doing things in this order, and not, say, because I don't have a RISCV machine to run anything on  it was pleasantly straightforward to bring up Fedora under QEMU which should be fine as a place to get started.)
And this concludes the initial SBCL porting work that can be done with very little brain. I do not know where, if anywhere, this effort will go; I am notionally on holiday, and it is also hot: two factors which suggest that work on this, if any, will be accomplished slowly. I'd be very happy to collaborate, if anyone else is interested, or else plod along in my own slow way, writing things up as I go.
03 Aug 2018 1:55pm GMT
30 Jul 2018
Planet Lisp
FrançoisRené Rideau: Making ASDF more magic by making it less magic
This short essay describes some challenges I leave to the next maintainers of ASDF, the Common Lisp build system, related to the "magic" involved in bootstrapping ASDF. If you dare to read further, grab a chair and your favorite headachefighting brewage.
ASDF is a build system for Common Lisp. In the spirit of the original Lisp DEFSYSTEM, it compiles and loads software (mostly but not exclusively Common Lisp code) in the same image as it's running. Indeed, old time Lisp machines and Lisp systems did everything in the same world, in the same (memory) image, the same address space, the same (Unixstyle) process  whatever you name it. This notably distinguishes it from traditional Unix build, which happens in multiple processes each with its own address space. The upside of the Lisp way is extremely low overhead, which allows for greater speed on olden singleprocessor machines, but also richer communication between build phases (especially for error reporting and handling), interactive debugging, and more. The upside of the Unix way is greater parallelizability, which allows for greater speed on newer multiprocessor machines, but also fewer interactions between build phases, which makes determinism and distributed computation easier. The upside of the Lisp way is still unduly underappreciated by those ignorant of Lisp and other imagebased systems (such as Smalltalk). The Lisp way feels old because it is old; it could be updated to integrate the benefits of the Unix way, possibly using languagebased purity and effect control in addition to lowlevel protection; but that will probably happen with Racket rather than Common Lisp.
One notable way that ASDF is magic is in its support for building itself  i.e. compiling and/or loading a new version of itself, in the very same Lisp image that is driving the build, replacing itself in the process, while it is running. This "hot upgrade" isn't an idle exercise, but an essential feature without which ASDF 1 was doomed. For the explanation why, see my original post on ASDF, Software Irresponsibility, or the broader paper on ASDF 2, Evolving ASDF: More Cooperation, Less Coordination.
Now, despite the heroic efforts of ASDF 2 described in the paper above, selfbuild could not be made to reliably happen in the middle of the build: subtle incompatibilities between old and new version, previously loaded extensions being clobbered by redefinitions yet expected to work later, interactions with existing control stack frames or with inlining or caching of methods, etc., may not only cause the build to fail, but also badly corrupt the entire system. Thus, selfbuild, if it happens, must happen at the very beginning of the build. However, the way ASDF works, it is not predictable whether some part of the build will later depend on ASDF. Therefore, to ensure that selfbuild happens in the beginning if it happens at all, ASDF 3 makes sure it always happens, as the very first thing that ASDF does, no matter what. This also makes ASDF automatically upgrade itself if you just install a new source repository somewhere in your sourceregistry, e.g. under ~/commonlisp/asdf/ (recommended location for hackers) or /usr/share/commonlisp/source/clasdf/ (where Debian installs it). This happens as a call to upgradeasdf in defmethod operate :around in operate.lisp (including as now called by loadasd), so it is only triggered in sideeffectful operations of ASDF, not pure ones (but since findsystem can call loadasd, such sideeffects can happen just to look at a notpreviouslyseen, new or updated system definition file). Special care is taken in the recorddependency method in findsystem.lisp so this autoload doesn't cause circular dependencies.
Second issue since ASDF 3: ASDF was and is still traditionally delivered as a single file asdf.lisp that you can load in any Common Lisp implementation (literally any, from Genera to Clasp), and it just works. This is not the primary way that ASDF is seen by most endusers anymore: nowadays, every maintained implementation provides ASDF as a module, so users can (require "asdf") about anywhere to get a relatively recent ASDF 3.1 or later. But distributing a single file asdf.lisp is still useful to initially bootstrap ASDF on each of these implementations. Now, by release 2.26, ASDF had grown from its initial 500line code hack to a 4500line mess, with roughly working but incomplete efforts to address robustness, portability and upgradability, and with a deep design bug (see the ASDF 3 paper). To allow for further growth, robustification and nontrivial refactoring, ASDF 3 was split into two sets of files, one for the portability library, called UIOP (now 15 files, 7400 lines) and one for ASDF itself (now 25 files, 6000 lines as of 3.3.2.2), the latter set also specifically called asdf/defsystem in this context. The code is much more maintainable for having been organized in these much more modular smaller bites. However, to still be able to deliver as a single file, ASDF implemented a mechanism to concatenate all the files in the correct order into the desired artifact. It would be nice to convince each and every implementation vendor to provide UIOP and ASDF as separate modules, like SBCL and MKCL do, but that's a different challenge of its own.
There is another important reason to concatenate all source files into a single deliverable file: upgrading ASDF may cause some functions to be undefined or partially redefined between source files, while being used in the building ASDF's call stack, which may cause ASDF to fail to properly compile and load the next ASDF. Compiling and loading ASDF in a single file both makes these changes atomic and minimizes the use of functions being redefined while called. Note that in this regard, UIOP could conceivably be loaded the normal way, because it follows stricter backward compatibility restrictions than ASDF, and can afford them because it has a simpler, more stable, less extensible API that doesn't maintain as much dynamic state, and its functions are less likely to be adversely modified while in the call stack. Still, distributing UIOP and ASDF in separate files introduces opportunities for things to go wrong, and since we need a singlefile output for ASDF, it's much safer to also include UIOP in it, and simpler not to have to deal with two different ways to build ASDF, with or without a transcluded UIOP.
As a more recent issue, ASDF 3.3 tracks what operations happen while loading a .asd file (e.g. loading triggered by defsystemdependson), and uses that information to dynamically track dependencies between multiple phases of the build: there is a new phase each time ASDF compiles and loads extensions to ASDF into the current image as a prerequisite to process further build specifications. ASDF 3.3 is then capable of using this information to properly detect what to rebuild or not rebuild when doing a incremental compilation involving multiple build phases, whereas previous versions could fail in both ways. But, in light of the first issue, that means that trying to define ASDF or UIOP is special, since everything depends on them. And UIOP is even more special because ASDF depends on it. The "solution" I used in ASDF 3.3 was quite ugly  to prevent circular dependency between a defineop asdf and defineop uiop, I made asdf.asd magically read content from uiop.asd without loading uiop.asd, so as to transclude its sources in the concatenated file asdf.lisp. This is a gross hack that ought to be replaced by something better  probably adding more special cases to recorddependency for uiop as well as asdf along the way.
Finally, there is a way that ASDF could be modified, that would displace the magic of bootstrap such that no special case is needed for ASDF or UIOP  by making that magic available to all systems, potentially also solving issues with crosscompilation. (UIOP remains slightly special: it must be built using the standard CL primitives rather than the UIOP API.) But that would require yet another massive refactoring of ASDF. Moreover, it would either be incompatible with existing ASDF extensions or require nontrivial efforts to maintain a backwardcompatible path. The problem is the plan made by ASDF is executed by repeatedly calling the perform method, itself calling other methods on objects of various classes comprising the ASDF model, while this model is being redefined. The solution is that from the plan for one phase of execution, ASDF would instead extract a nonextensible, more functional representation of the plan that is impervious to upgrade. Each action would thus define a method on performforms instead of perform, that would (primarily) return a list of forms to evaluate at runtime. These forms can then be evaluated without the context of ASDF's object model, actually with a minimal context that even allows them to be run in a different Lisp image on a different Lisp implementation, allowing for crosscompilation, which itself opens the way for parallelized or distributed deterministic backends in the style of XCVB or Bazelisp. Such changes might justify bumping the ASDF version to 4.0.
As usual, writing this essay, which was prompted by a question from Eric Timmons, forced me to think about the alternatives and why I had rejected some of them, to look back at the code and experiment with it, which ultimately led to my finding and fixing various small bugs in the code. As for solving the deeper issues, they're up to you, next ASDF maintainers!
30 Jul 2018 10:47am GMT
11 Jul 2018
Planet Lisp
Quicklisp news: July 2018 Quicklisp dist update now available
Hi everyone! There's a new Quicklisp update for July, and regular updates should resume on a monthly schedule.
I'm using a new release system that involves Docker for easier build server setup and management. It took a while to get going but it should (eventually) make it easier for others to run things in an environment similar to mine. For example, it has all the required foreign libraries needed to compile and load everything in Quicklisp.
Here's the info for the new update:
New projects:
 april  April is a subset of the APL programming language that compiles to Common Lisp.  Apache2.0
 awsfoundation  Amazon AWS lowlevel utilities  BSD
 binaryio  Library for reading and writing binary data.  BSD
 clbip39  A Common Lisp implementation of BIP0039  MIT
 clbnf  A simple BNF parser.  MIT
 clgenerator  clgenerator, a generator implementation for common lisp  MIT
 clpatterns  Pattern library for algorithmic music composition and performance in Common Lisp.  GNU General Public License v3.0
 clprogressbar  Display progress bars directly in REPL.  MIT
 clad  The CLAD System.  BSD
 concretesyntaxtree  Library for parsing Common Lisp code into a concrete syntax tree.  FreeBSD
 definitions  General definitions reflection library.  Artistic
 eclector  A Common Lisp reader that can adapt to different implementations, and that can return Concrete Syntax Trees  BSD
 flute  A beautiful, easilly composable HTML5 generation library  MIT
 froute  An Http routing class that takes advantage of the MOP  MIT
 languagecodes  A small library mapping language codes to language names.  Artistic
 lichatldap  LDAP backend for the Lichat server profiles.  Artistic
 multilangdocumentation  A dropin replacement for CL:DOCUMENTATION providing multilanguage docstrings  Artistic
 multiposter  A small application to post to multiple services at once.  Artistic
 sandalphon.lambdalist  Lambda list parsing and usage  WTFPL
 sel  Programmatic modification and evaluation of software  GPL 3
 systemlocale  System locale and language discovery  Artistic
 taglib  Pure Lisp implementation to read (and write, perhaps, one day) tags  UNLICENSE
 terrable  Terragen TER file format reader  Artistic
 tooter  A client library for Mastodon instances.  Artistic
 tracedb  Writing, reading, storing, and searching of program traces  GPL 3
 umbra  A library of reusable GPU shader functions.  MIT
Updated projects: 3dmatrices, 3dvectors, agnosticlizard, alexa, awssign4, baseblobs, bodgeblobssupport, bodgechipmunk, bodgenanovg, bodgenuklear, bordeauxthreads, btsemaphore, caveman, cepl, cepl.drmgbm, cerberus, chirp, clalgebraicdatatype, clasync, clautorepo, clcognito, clconllu, cldarksky, clflow, clforms, clgamepad, clgeos, clgobjectintrospection, clgopher, clhamcrest, clinterpol, clliballegro, cllibsvmformat, clmechanize, clmpi, clmuth, clonlinelearning, clpslib, clpython, clrandomforest, clreadline, clredis, clrules, clsdl2, clstr, cltoml, clyesql, clack, claw, clip, closermop, closurehtml, clss, clx, codex, coleslaw, commonlispactors, configuration.options, croatoan, currycomposereadermacros, deltadebug, deploy, dexador, djula, dml, documentationutils, documentationutilsextensions, doublylinkedlist, dufy, dynamicmixins, elf, farescripts, femlisp, fft, flare, flexistreams, for, freebsdsysctl, fxml, gameboxframemanager, gameboxmath, glslspec, glsltoolkit, glyphs, goldenutils, graph, gsll, harmony, helambdap, hunchensocket, ironclad, jose, lichatprotocol, lichatserverlib, lichattcpserver, lispchat, listopia, maiden, mcclim, mediatypes, mito, nineveh, omercount, opticl, overlord, oxenfurt, parachute, parseq, parser.ini, pathnameutils, pgloader, physicalquantities, plump, postmodern, ppath, pythonicstringreader, qbase64, qlot, qtlibs, qtools, randomsample, rove, rtgmath, sdot2, serapeum, shadow, simpleflowdispatcher, slime, spinneret, staple, stringcase, stumpwm, sxql, thecostofnothing, trivia, trivialldap, trivialmmap, ubiquitous, uiop, unitformula, varjo, websocketdriver, whofields, xhtmlambda, xlsx, xmlemitter.
Removed projects: binge, blacktie, clctrnn, cldirectedgraph, clledger, clscan, readable, spartns.
The projects removed either didn't build (cldirectedgraph) or are no longer available for download that I could find (everything else).
To get this update, use (ql:updatedist "quicklisp").
Enjoy!
11 Jul 2018 3:34pm GMT
06 Jul 2018
Planet Lisp
Paul Khuong: The Confidence Sequence Method: A Computerage Test for Statistical SLOs
This post goes over some code that I pushed to github today. All the snippets below should be in the repo, which also includes C and Python code with the same structure.
I recently resumed thinking about balls and bins for hash tables. This time, I'm looking at large bins (on the order of one 2MB huge page). There are many hashing methods with solid worstcase guarantees that unfortunately query multiple uncorrelated locations; I feel like we could automatically adapt them to modern hierarchical storage (or address translation) to make them more efficient, for a small loss in density.
In theory, large enough bins can be allocated statically with a minimal waste of space. I wanted some actual nonasymptotic numbers, so I ran numerical experiments and got the following distribution of global utilisation (fill rate) when the first bin fills up.
It looks like, even with one thousand bins of thirty thousand values, we can expect almost 98% space utilisation until the first bin saturates. I want something more formal.
Could I establish something like a service level objective, "When distributing balls randomly between one thousand bins with individual capacity of thirty thousand balls, we can utilise at least 98% of the total space before a bin fills up, x% of the time?"
The natural way to compute the "x%" that makes the proposition true is to first fit a distribution on the observed data, then find out the probability mass for that distribution that lies above 98% fill rate. Fitting distributions takes a lot of judgment, and I'm not sure I trust myself that much.
Alternatively, we can observe independent identically distributed fill rates, check if they achieve 98% space utilisation, and bound the success rate for this Bernoulli process.
There are some nontrivial questions associated with this approach.
 How do we know when to stop generating more observations... without fooling ourselves with \(p\)hacking?
 How can we generate something like a confidence interval for the success rate?
Thankfully, I have been sitting on a software package to compute satisfaction rate for exactly this kind of SLOtype properties, properties of the form "this indicator satisfies $PREDICATE x% of the time," with arbitrarily bounded false positive rates.
The code takes care of adaptive stopping, generates a credible interval, and spits out a report like this : we see the threshold (0.98), the empirical success rate estimate (0.993 ≫ 0.98), a credible interval for the success rate, and the shape of the probability mass for success rates.
This post shows how to compute credible intervals for the Bernoulli's success rate, how to implement a dynamic stopping criterion, and how to combine the two while compensating for multiple hypothesis testing. It also gives two examples of converting more general questions to SLO form, and answers them with the same code.
Credible intervals for the Binomial
If we run the same experiment \(n\) times, and observe \(a\) successes (\(b = n  a\) failures), it's natural to ask for an estimate of the success rate \(p\) for the underlying Bernoulli process, assuming the observations are independent and identically distributed.
Intuitively, that estimate should be close to \(a / n\), the empirical success rate, but that's not enough. I also want something that reflects the uncertainty associated with small \(n\), much like in the following ridge line plot, where different phrases are assigned not only a different average probability, but also a different spread.
I'm looking for an interval of plausible success rates \(p\) that responds to both the empirical success rate \(a / n\) and the sample size \(n\); that interval should be centered around \(a / n\), be wide when \(n\) is small, and become gradually tighter as \(n\) increases.
The Bayesian approach is straightforward, if we're willing to shut up and calculate. Once we fix the underlying success rate \(p = \hat{p}\), the conditional probability of observing \(a\) successes and \(b\) failures is
\[P((a, b)  p = \hat{p}) \sim \hat{p}\sp{a} \cdot (1  \hat{p})\sp{b},\]
where the righthand side is a proportion^{1}, rather than a probability.
We can now apply Bayes's theorem to invert the condition and the event. The inversion will give us the conditional probability that \(p = \hat{p}\), given that we observed \(a\) successes and \(b\) successes. We only need to impose a prior distribution on the underlying rate \(p\). For simplicity, I'll go with the uniform \(U[0, 1]\), i.e., every success rate is equally plausible, at first. We find
\[P(p = \hat{p}  (a, b)) = \frac{P((a, b)  p = \hat{p}) P(p = \hat{p})}{P(a, b)}.\]
We already picked the uniform prior, \(P(p = \hat{p}) = 1\,\forall \hat{p}\in [0,1],\) and the denominator is a constant with respect to \(\hat{p}\). The expression simplifies to
\[P(p = \hat{p}  (a, b)) \sim \hat{p}\sp{a} \cdot (1  \hat{p})\sp{b},\]
or, if we normalise to obtain a probability,
\[P(p = \hat{p}  (a, b)) = \frac{\hat{p}\sp{a} \cdot (1  \hat{p})\sp{b}}{\int\sb{0}\sp{1} \hat{p}\sp{a} \cdot (1  \hat{p})\sp{b}\, d\hat{p}} = \textrm{Beta}(a+1, b+1).\]
A bit of calculation, and we find that our credibility estimate for the underlying success rate follows a Beta distribution. If one is really into statistics, they can observe that the uniform prior distribution is just the \(\textrm{Beta}(1, 1)\) distribution, and rederive that the Beta is the conjugate distribution for the Binomial distribution.
For me, it suffices to observe that the distribution \(\textrm{Beta}(a+1, b+1)\) is unimodal, does peak around \(a / (a + b)\), and becomes tighter as the number of observations grows. In the following image, I plotted three Beta distributions, all with empirical success rate 0.9; red corresponds to \(n = 10\) (\(a = 9\), \(b = 1\), \(\textrm{Beta}(10, 2)\)), black to \(n = 100\) (\(\textrm{Beta}(91, 11)\)), and blue to \(n = 1000\) (\(\textrm{Beta}(901, 101)\)).
We calculated, and we got something that matches my intuition. Before trying to understand what it means, let's take a detour to simply plot points from that unnormalised proportion function \(\hat{p}\sp{a} \cdot (1  \hat{p})\sp{b}\), on an arbitrary \(y\) axis.
Let \(\hat{p} = 0.4\), \(a = 901\), \(b = 101\). Naïvely entering the expression at the REPL yields nothing useful.
CLUSER> (* (expt 0.4d0 901) (expt ( 1 0.4d0) 101))
0.0d0
The issue here is that the unnormalised proportion is so small that it underflows double floats and becomes a round zero. We can guess that the normalisation factor \(\frac{1}{\mathrm{Beta}(\cdot,\cdot)}\) quickly grows very large, which will bring its own set of issues when we do care about the normalised probability.
How can we renormalise a set of points without underflow? The usual trick to handle extremely small or large magnitudes is to work in the log domain. Rather than computing \(\hat{p}\sp{a} \cdot (1  \hat{p})\sp{b}\), we shall compute
\[\log\left[\hat{p}\sp{a} \cdot (1  \hat{p})\sp{b}\right] = a \log\hat{p} + b \log (1  \hat{p}).\]
CLUSER> (+ (* 901 (log 0.4d0)) (* 101 (log ( 1 0.4d0))))
877.1713374189787d0
CLUSER> (exp *)
0.0d0
That's somewhat better: the logdomain value is not \(\infty\), but converting it back to a regular value still gives us 0.
The \(\log\) function is monotonic, so we can find the maximum proportion value for a set of points, and divide everything by that maximum value to get plottable points. There's one last thing that should change: when \(x\) is small, \(1  x\) will round most of \(x\) away. Instead of (log ( 1 x))
, we should use (log1p ( x))
to compute \(\log (1 + x) = \log (1  x)\). Common Lisp did not standardise log1p
, but SBCL does have it in internals, as a wrapper around libm
. We'll just abuse that for now.
CLUSER> (defun proportion (x) (+ (* 901 (log x)) (* 101 (sbkernel:%log1p ( x)))))
PROPORTION
CLUSER> (defparameter *points* (loop for i from 1 upto 19 collect (/ i 20d0)))
*POINTS*
CLUSER> (reduce #'max *points* :key #'proportion)
327.4909190001001d0
We have to normalise in the log domain, which is simply a subtraction: \(\log(x / y) = \log x  \log y\). In the case above, we will subtract \(327.49\ldots\), or add a massive \(327.49\ldots\) to each log proportion (i.e., multiply by \(10\sp{142}\)). The resulting values should have a reasonably nonzero range.
CLUSER> (mapcar (lambda (x) (cons x (exp ( (proportion x) *)))) *points*)
((0.05d0 . 0.0d0)
(0.1d0 . 0.0d0)
[...]
(0.35d0 . 3.443943164733533d288)
[...]
(0.8d0 . 2.0682681158181894d16)
(0.85d0 . 2.6252352579425913d5)
(0.9d0 . 1.0d0)
(0.95d0 . 5.65506756824607d10))
There's finally some signal in there. This is still just an unnormalised proportion function, not a probability density function, but that's already useful to show the general shape of the density function, something like the following, for \(\mathrm{Beta}(901, 101)\).
Finally, we have a probability density function for the Bayesian update of our belief about the success rate after \(n\) observations of a Bernoulli process, and we know how to compute its proportion function. Until now, I've carefully avoided the question of what all these computations even mean. No more (:
The Bayesian view assumes that the underlying success rate (the value we're trying to estimate) is unknown, but sampled from some distribution. In our case, we assumed a uniform distribution, i.e., that every success rate is a priori equally likely. We then observe \(n\) outcomes (successes or failures), and assign an updated probability to each success rate. It's like a manyworld interpretation in which we assume we live in one of a set of worlds, each with a success rate sampled from the uniform distribution; after observing 900 successes and 100 failures, we're more likely to be in a world where the success rate is 0.9 than in one where it's 0.2. With Bayes's theorem to formalise the update, we assign posterior probabilities to each potential success rate value.
We can compute an equaltailed credible interval from that \(\mathrm{Beta}(a+1,b+1)\) posterior distribution by excluding the leftmost values, \([0, l)\), such that the Beta CDF (cumulative distribution function) at \(l\) is \(\varepsilon / 2\), and doing the same with the right most values to cut away \(\varepsilon / 2\) of the probability density. The CDF for \(\mathrm{Beta}(a+1,b+1)\) at \(x\) is the incomplete beta function, \(I\sb{x}(a+1,b+1)\). That function is really hard to compute (this technical report detailing Algorithm 708 deploys five different evaluation strategies), so I'll address that later.
The more orthodox "frequentist" approach to confidence intervals treats the whole experiment, from data colleaction to analysis (to publication, independent of the observations 😉) as an Atlantic City algorithm: if we allow a false positive rate of \(\varepsilon\) (e.g., \(\varepsilon=5\%\)), the experiment must return a confidence interval that includes the actual success rate (population statistic or parameter, in general) with probability \(1  \varepsilon\), for any actual success rate (or underlying population statistic / parameter). When the procedure fails, with probability at most \(\varepsilon\), it is allowed to fail in an arbitrary manner.
The same Atlantic City logic applies to \(p\)values. An experiment (data collection and analysis) that accepts when the \(p\)value is at most \(0.05\) is an Atlantic City algorithm that returns a correct result (including "don't know") with probability at least \(0.95\), and is otherwise allowed to yield any result with probability at most \(0.05\). The \(p\)value associated with a conclusion, e.g., "success rate is more than 0.8" (the confidence level associated with an interval) means something like "I'm pretty sure that the success rate is more than 0.8, because the odds of observing our data if that were false are small (less than 0.05)." If we set that threshold (of 0.05, in the example) ahead of time, we get an Atlantic City algorithm to determine if "the success rate is more than 0.8" with failure probability 0.05. (In practice, reporting is censored in all sorts of ways, so...)
There are ways to recover a classical confidence interval, given \(n\) observations from a Bernoulli. However, they're pretty convoluted, and, as Jaynes argues in his note on confidence intervals, the classical approach gives values that are roughly the same^{2} as the Bayesian approach... so I'll just use the Bayesian credibility interval instead.
See this stackexchange post for a lot more details.
Dynamic stopping for Binomial testing
The way statistics are usually deployed is that someone collects a data set, as rich as is practical, and squeezes that static data set dry for significant results. That's exactly the setting for the credible interval computation I sketched in the previous section.
When studying the properties of computer programs or systems, we can usually generate additional data on demand, given more time. The problem is knowing when it's ok to stop wasting computer time, because we have enough data... and how to determine that without running into multiple hypothesis testing issues (ask anyone who's run A/B tests).
Here's an example of an intuitive but completely broken dynamic stopping criterion. Let's say we're trying to find out if the success rate is less than or greater than 90%, and are willing to be wrong 5% of the time. We could get \(k\) data points, run a statistical test on those data points, and stop if the data let us conclude with 95% confidence that the underlying success rate differs from 90%. Otherwise, collect \(2k\) fresh points, run the same test; collect \(4k, \ldots, 2\sp{i}k\) points. Eventually, we'll have enough data.
The issue is that each time we execute the statistical test that determines if we should stop, we run a 5% risk of being totally wrong. For an extreme example, if the success rate is exactly 90%, we will eventually stop, with probability 1. When we do stop, we'll inevitably conclude that the success rate differs from 90%, and we will be wrong. The worstcase (over all underlying success rates) false positive rate is 100%, not 5%!
In my experience, programmers tend to sidestep the question by wasting CPU time with a large, fixed, number of iterations... people are then less likely to run our statistical tests, since they're so slow, and everyone loses (the other popular option is to impose a reasonable CPU budget, with error thresholds so lax we end up with a smoke test).
Robbins, in Statistical Methods Related to the Law of the Iterated Logarithm, introduces a criterion that, given a threshold success rate \(p\) and a sequence of (infinitely many!) observations from the same Bernoulli with unknown success rate parameter, will be satisfied infinitely often when \(p\) differs from the Bernoulli's success rate. Crucially, Robbins also bounds the false positive rate, the probability that the criterion be satisfied even once in the infinite sequence of observations if the Bernoulli's unknown success rate is exactly equal to \(p\). That criterion is
\[{n \choose a} p\sp{a} (1p)\sp{na} \leq \frac{\varepsilon}{n+1},\]
where \(n\) is the number of observations, \(a\) the number of successes, \(p\) the threshold success rate, and \(\varepsilon\) the error (false positive) rate. As the number of observation grows, the criterion becomes more and more stringent to maintain a bounded false positive rate over the whole infinite sequence of observations.
There are similar "Confidence Sequence" results for other distributions (see, for example, this paper of Lai), but we only care about the Binomial here.
More recently, Ding, Gandy, and Hahn showed that Robbins's criterion also guarantees that, when it is satisfied, the empirical success rate (\(a/n\)) lies on the correct side of the threshold \(p\) (same side as the actual unknown success rate) with probability \(1\varepsilon\). This result leads them to propose the use of Robbins's criterion to stop Monte Carlo statistical tests, which they refer to as the Confidence Sequence Method (CSM).
(defun csmstopp (successes failures threshold eps)
"Pseudocode, this will not work on a real machine."
(let ((n (+ successes failures)))
(<= (* (choose n successes)
(expt threshold successes)
(expt ( 1 threshold) failures))
(/ eps (1+ n)))))
We may call this predicate at any time with more independent and identically distributed results, and stop as soon as it returns true.
The CSM is simple (it's all in Robbins's criterion), but still provides good guarantees. The downside is that it is conservative when we have a limit on the number of observations: the method "hedges" against the possibility of having a false positive in the infinite number of observations after the limit, observations we will never make. For computergenerated data sets, I think having a principled limit is pretty good; it's not ideal to ask for more data than strictly necessary, but not a blocker either.
In practice, there are still real obstacles to implementing the CSM on computers with finite precision (floating point) arithmetic, especially since I want to preserve the method's theoretical guarantees (i.e., make sure rounding is onesided to overestimate the lefthand side of the inequality).
If we implement the expression well, the effect of rounding on correctness should be less than marginal. However, I don't want to be stuck wondering if my bad results are due to known approximation errors in the method, rather than errors in the code. Moreover, if we do have a tight expression with little rounding errors, adjusting it to make the errors onesided should have almost no impact. That seems like a good tradeoff to me, especially if I'm going to use the CSM semiautomatically, in continuous integration scripts, for example.
One look at csmstopp
shows we'll have the same problem we had with the proportion function for the Beta distribution: we're multiplying very small and very large values. We'll apply the same fix: work in the log domain and exploit \(\log\)'s monotonicity.
\[{n \choose a} p\sp{a} (1p)\sp{na} \leq \frac{\varepsilon}{n+1}\]
becomes
\[\log {n \choose a} + a \log p + (na)\log (1p) \leq \log\varepsilon \log(n+1),\]
or, after some more expansions, and with \(b = n  a\),
\[\log n!  \log a!  \log b! + a \log p + b \log(1  p) + \log(n+1) \leq \log\varepsilon.\]
The new obstacle is computing the factorial \(x!\), or the logfactorial \(\log x!\). We shouldn't compute the factorial iteratively: otherwise, we could spend more time in the stopping criterion than in the data generation subroutine. Robbins has another useful result for us:
\[\sqrt{2\pi} n\sp{n + ½} \exp(n) \exp\left(\frac{1}{12n+1}\right) < n! < \sqrt{2\pi} n\sp{n + ½} \exp(n) \exp\left(\frac{1}{12n}\right),\]
or, in the log domain,
\[\log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n n + \frac{1}{12n+1} < \log n! < \log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n n +\frac{1}{12n}.\]
This double inequality gives us a way to overapproximate \(\log {n \choose a} = \log \frac{n!}{a! b!} = \log n!  \log a!  \log b!,\) where \(b = n  a\):
\[\log {n \choose a} < \log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n n +\frac{1}{12n}  \left(a + \frac{1}{2}\right)\log a +a  \frac{1}{12a+1}  \left(b + \frac{1}{2}\right)\log b +b  \frac{1}{12b+1},\]
where the rightmost expression in Robbins's double inequality replaces \(\log n!\), which must be overapproximated, and the leftmost \(\log a!\) and \(\log b!\), which must be underapproximated.
Robbins's approximation works well for us because, it is onesided, and guarantees that the (relative) error in \(n!\), \(\frac{\exp\left(\frac{1}{12n}\right)  \exp\left(\frac{1}{12n+1}\right)}{n!},\) is small, even for small values like \(n = 5\) (error \(< 0.0023\%\)), and decreases with \(n\): as we perform more trials, the approximation is increasingly accurate, thus less likely to spuriously prevent us from stopping.
Now that we have a conservative approximation of Robbins's criterion that only needs the four arithmetic operations and logarithms (and log1p
), we can implement it on a real computer. The only challenge left is regular floating point arithmetic stuff: if rounding must occur, we must make sure it is in a safe (conservative) direction for our predicate.
Hardware usually lets us manipulate the rounding mode to force floating point arithmetic operations to round up or down, instead of the usual round to even. However, that tends to be slow, so most language (implementations) don't support changing the rounding mode, or do so badly... which leaves us in a multidecade hardware/software coevolution Catch22.
I could think hard and derive tight bounds on the roundoff error, but I'd rather apply a bit of brute force. IEEE754 compliant implementations must round the four basic operations correctly. This means that \(z = x \oplus y\) is at most half a ULP away from \(x + y,\) and thus either \(z = x \oplus y \geq x + y,\) or the next floating point value after \(z,\) \(z^\prime \geq x + y\). We can find this "next value" portably in Common Lisp, with decodefloat
/scalefloat
, and some handwaving for denormals.
(defun next (x &optional (delta 1))
"Increment x by delta ULPs. Very conservative for
small (0/denormalised) values."
(declare (type doublefloat x)
(type unsignedbyte delta))
(let* ((exponent (nthvalue 1 (decodefloat x)))
(ulp (max (scalefloat doublefloatepsilon exponent)
leastpositivenormalizeddoublefloat)))
(+ x (* delta ulp))))
I prefer to manipulate IEEE754 bits directly. That's theoretically not portable, but the platforms I care about make sure we can treat floats as signmagnitude integers.
next
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 

CLUSER> (doublefloatbits pi)
4614256656552045848
CLUSER> (doublefloatbits ( pi))
4614256656552045849
The two's complement value for pi
is one less than ( (doublefloatbits pi))
because two's complement does not support signed zeros.
CLUSER> (eql 0 ( 0))
T
CLUSER> (eql 0d0 ( 0d0))
NIL
CLUSER> (doublefloatbits 0d0)
0
CLUSER> (doublefloatbits 0d0)
1
We can quickly check that the round trip from float to integer and back is an identity.
CLUSER> (eql pi (bitsdoublefloat (doublefloatbits pi)))
T
CLUSER> (eql ( pi) (bitsdoublefloat (doublefloatbits ( pi))))
T
CLUSER> (eql 0d0 (bitsdoublefloat (doublefloatbits 0d0)))
T
CLUSER> (eql 0d0 (bitsdoublefloat (doublefloatbits 0d0)))
T
We can also check that incrementing or decrementing the integer representation does increase or decrease the floating point value.
CLUSER> (< (bitsdoublefloat (1 (doublefloatbits pi))) pi)
T
CLUSER> (< (bitsdoublefloat (1 (doublefloatbits ( pi)))) ( pi))
T
CLUSER> (bitsdoublefloat (1 (doublefloatbits 0d0)))
0.0d0
CLUSER> (bitsdoublefloat (1+ (doublefloatbits 0d0)))
0.0d0
CLUSER> (bitsdoublefloat (1+ (doublefloatbits 0d0)))
4.9406564584124654d324
CLUSER> (bitsdoublefloat (1 (doublefloatbits 0d0)))
4.9406564584124654d324
The code doesn't handle special values like infinities or NaNs, but that's out of scope for the CSM criterion anyway. That's all we need to nudge the result of the four operations to guarantee an over or under approximation of the real value. We can also look at the documentation for our libm
(e.g., for GNU libm) to find error bounds on functions like log
; GNU claims their log
is never off by more than 3 ULP. We can round up to the fourth next floating point value to obtain a conservative upper bound on \(\log x\).
log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 

I could go ahead and use the building blocks above (ULP nudging for directed rounding) to directly implement Robbins's criterion,
\[\log {n \choose a} + a \log p + b\log (1p) + \log(n+1) \leq \log\varepsilon,\]
with Robbins's factorial approximation,
\[\log {n \choose a} < \log\sqrt{2\pi} + \left(n + \frac{1}{2}\right)\log n n +\frac{1}{12n}  \left(a + \frac{1}{2}\right)\log a +a  \frac{1}{12a+1}  \left(b + \frac{1}{2}\right)\log b +b  \frac{1}{12b+1}.\]
However, even in the log domain, there's a lot of cancellation: we're taking the difference of relatively large numbers to find a small result. It's possible to avoid that by reassociating some of the terms above, e.g., for \(a\):
\[\left(a + \frac{1}{2}\right) \log a + a  a \log p = \frac{\log a}{2} + a (\log a + 1  \log p).\]
Instead, I'll just brute force things (again) with Kahan summation. Shewchuk's presentation in Adaptive Precision FloatingPoint Arithmetic and Fast Robust Geometric Predicates highlights how the only step where we may lose precision to rounding is when we add the current compensation term to the new summand. We can implement Kahan summation with directed rounding in only that one place: all the other operations are exact!
"kahan summation"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 

We need one last thing to implement \(\log {n \choose a}\), and then Robbins's confidence sequence: a safely rounded floatingpoint value approximation of \(\log \sqrt{2 \pi}\). I precomputed one with computablereals:
CLUSER> (computablereals:r
(computablereals:logr
(computablereals:sqrtr computablereals:+2pir+)))
0.91893853320467274178...
CLUSER> (computablereals:ceilingr
(computablereals:*r *
(ash 1 53)))
8277062471433908
0.65067431749790398594...
CLUSER> (* 8277062471433908 (expt 2d0 53))
0.9189385332046727d0
CLUSER> (computablereals:r (rational *)
***)
+0.00000000000000007224...
We can safely replace \(\log\sqrt{2\pi}\) with 0.9189385332046727d0
, or, equivalently, (scalefloat 8277062471433908.0d0 53)
, for an upper bound. If we wanted a lower bound, we could decrement the integer significand by one.
logchoose
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 

We can quickly check against an exact implementation with computablereals
and a brute force factorial.
CLUSER> (defun crlogchoose (n s)
(computablereals:r
(computablereals:logr (alexandria:factorial n))
(computablereals:logr (alexandria:factorial s))
(computablereals:logr (alexandria:factorial ( n s)))))
CRLOGCHOOSE
CLUSER> (computablereals:r (rational (robbinslogchoose 10 5))
(crlogchoose 10 5))
+0.00050526703375914436...
CLUSER> (computablereals:r (rational (robbinslogchoose 1000 500))
(crlogchoose 1000 500))
+0.00000005551513197557...
CLUSER> (computablereals:r (rational (robbinslogchoose 1000 5))
(crlogchoose 1000 5))
+0.00025125559085509706...
That's not obviously broken: the error is pretty small, and always positive.
Given a function to overapproximate logchoose, the Confidence Sequence Method's stopping criterion is straightforward.
csm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 

The other, much harder, part is computing credible (Bayesian) intervals for the Beta distribution. I won't go over the code, but the basic strategy is to invert the CDF, a monotonic function, by bisection^{3}, and to assume we're looking for improbable (\(\mathrm{cdf} < 0.5\)) thresholds. This assumption lets us pick a simple hypergeometric series that is normally useless, but converges well for \(x\) that correspond to such small cumulative probabilities; when the series converges too slowly, it's always conservative to assume that \(x\) is too central (not extreme enough).
That's all we need to demo the code. Looking at the distribution of fill rates for the 1000 bins @ 30K ball/bin facet in
it looks like we almost always hit at least 97.5% global density, let's say with probability at least 98%. We can ask the CSM to tell us when we have enough data to confirm or disprove that hypothesis, with a 0.1% false positive rate.
Instead of generating more data on demand, I'll keep things simple and prepopulate a list with new independently observed fill rates.
CLUSER> (defparameter *observations* '(0.978518900
0.984687300
0.983160833
[...]))
CLUSER> (defun test (n)
(let ((count (countif (lambda (x) (>= x 0.975))
*observations*
:end n)))
(csm:csm n 0.98d0 count (log 0.001d0))))
CLUSER> (test 10)
NIL
2.1958681996231784d0
CLUSER> (test 100)
NIL
2.5948497850893184d0
CLUSER> (test 1000)
NIL
3.0115331544604658d0
CLUSER> (test 2000)
NIL
4.190687115879456d0
CLUSER> (test 4000)
T
17.238559826956475d0
We can also use the inverse Beta CDF to get a 99.9% credible interval. After 4000 trials, we found 3972 successes.
CLUSER> (countif (lambda (x) (>= x 0.975))
*observations*
:end 4000)
3972
These values give us the following lower and upper bounds on the 99.9% CI.
CLUSER> (csm:betaicdf 3972 ( 4000 3972) 0.001d0)
0.9882119750976562d0
1.515197753898523d5
CLUSER> (csm:betaicdf 3972 ( 4000 3972) 0.001d0 t)
0.9963832682169742d0
2.0372679238045424d13
And we can even reuse and extend the Beta proportion code from earlier to generate this embeddable SVG report.
There's one small problem with the sample usage above: if we compute the stopping criterion with a false positive rate of 0.1%, and do the same for each end of the credible interval, our total false positive (error) rate might actually be 0.3%! The next section will address that, and the equally important problem of estimating power.
Monte Carlo power estimation
It's not always practical to generate data forever. For example, we might want to bound the number of iterations we're willing to waste in an automated testing script. When there is a bound on the sample size, the CSM is still correct, just conservative.
We would then like to know the probability that the CSM will stop successfully when the underlying success rate differs from the threshold rate \(p\) (alpha
in the code). The problem here is that, for any bounded number of iterations, we can come up with an underlying success rate so close to \(p\) (but still different) that the CSM can't reliably distinguish between the two.
If we want to be able to guarantee any termination rate, we need two thresholds: the CSM will stop whenever it's likely that the underlying success rate differs from either of them. The hardest probability to distinguish from both thresholds is close to the midpoint between them.
With two thresholds and the credible interval, we're running three tests in parallel. I'll apply a Bonferroni correction, and use \(\varepsilon / 3\) for each of the two CSM tests, and \(\varepsilon / 6\) for each end of the CI.
That logic is encapsulated in csmdriver
. We only have to pass a success value generator function to the driver. In our case, the generator is itself a call to csmdriver
, with fixed thresholds (e.g., 96% and 98%), and a Bernoulli sampler (e.g., return T
with probability 97%). We can see if the driver returns successfully and correctly at each invocation of the generator function, with the parameters we would use in production, and recursively compute an estimate for that procedure's success rate with CSM. The following expression simulates a CSM procedure with thresholds at 96% and 98%, the (usually unknown) underlying success rate in the middle, at 97%, a false positive rate of at most 0.1%, and an iteration limit of ten thousand trials. We pass that simulation's result to csmdriver
, and ask whether the simulation's success rate differs from 99%, while allowing one in a million false positives.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 

We find that yes, we can expect the 96%/98%/0.1% false positive/10K iterations setup to succeed more than 99% of the time. The code above is available as csmpower
, with a tighter outer false positive rate of 1e9. If we only allow 1000 iterations, csmpower
quickly tells us that, with one CSM success in 100 attempts, we can expect the CSM success rate to be less than 99%.
CLUSER> (csm:csmpower 0.97d0 0.96d0 1000 :alphahi 0.98d0 :eps 1d3 :stream *standardoutput*)
1 0.000e+0 1.250e10 10.000e1 1.699e+0
10 0.000e+0 0.000e+0 8.660e1 1.896e+1
20 0.000e+0 0.000e+0 6.511e1 3.868e+1
30 0.000e+0 0.000e+0 5.099e1 5.851e+1
40 2.500e2 5.518e7 4.659e1 7.479e+1
50 2.000e2 4.425e7 3.952e1 9.460e+1
60 1.667e2 3.694e7 3.427e1 1.144e+2
70 1.429e2 3.170e7 3.024e1 1.343e+2
80 1.250e2 2.776e7 2.705e1 1.542e+2
90 1.111e2 2.469e7 2.446e1 1.741e+2
100 1.000e2 2.223e7 2.232e1 1.940e+2
100 iterations, 1 successes (false positive rate < 1.000000e9)
success rate p ~ 1.000000e2
confidence interval [2.223495e7, 0.223213 ]
p < 0.990000
max inner iteration count: 816
T
T
0.01d0
100
1
2.2234953205868331d7
0.22321314110840665d0
SLOify all the things with this Exact test
Until now, I've only used the Confidence Sequence Method (CSM) for Monte Carlo simulation of phenomena that are naturally seen as boolean success / failures processes. We can apply the same CSM to implement an exact test for null hypothesis testing, with a bit of resampling magic.
Looking back at the balls and bins grid, the average fill rate seems to be slightly worse for 100 bins @ 60K ball/bin, than for 1000 bins @ 128K ball/bin. How can we test that with the CSM?
First, we should get a fresh dataset for the two setups we wish to compare.
CLUSER> (defparameter *10060k* #(0.988110167
0.990352500
0.989940667
0.991670667
[...]))
CLUSER> (defparameter *1000128k* #(0.991456281
0.991559578
0.990970109
0.990425805
[...]))
CLUSER> (alexandria:mean *10060k*)
0.9897938
CLUSER> (alexandria:mean *1000128k*)
0.9909645
CLUSER> ( * **)
0.0011706948
The mean for 1000 bins @ 128K ball/bin is slightly higher than that for 100 bins @ 60k ball/bin. We will now simulate the null hypothesis (in our case, that the distributions for the two setups are identical), and determine how rarely we observe a difference of 0.00117
in means. I only use a null hypothesis where the distributions are identical for simplicity; we could use the same resampling procedure to simulate distributions that, e.g., have identical shapes, but one is shifted right of the other.
In order to simulate our null hypothesis, we want to be as close to the test we performed as possible, with the only difference being that we generate data by reshuffling from our observations.
CLUSER> (defparameter *resamplingdata* (concatenate 'simplevector *10060k* *1000128k*))
*RESAMPLINGDATA*
CLUSER> (length *10060k*)
10000
CLUSER> (length *1000128k*)
10000
The two observation vectors have the same size, 10000 values; in general, that's not always the case, and we must make sure to replicate the sample sizes in the simulation. We'll generate our simulated observations by shuffling the *resamplingdata*
vector, and splitting it in two subvectors of ten thousand elements.
CLUSER> (let* ((shuffled (alexandria:shuffle *resamplingdata*))
(60k (subseq shuffled 0 10000))
(128k (subseq shuffled 10000)))
( (alexandria:mean 128k) (alexandria:mean 60k)))
6.2584877e6
We'll convert that to a truth value by comparing the difference of simulated means with the difference we observed in our real data, \(0.00117\ldots\), and declare success when the simulated difference is at least as large as the actual one. This approach gives us a onesided test; a twosided test would compare the absolute values of the differences.
CLUSER> (csm:csmdriver
(lambda (_)
(declare (ignore _))
(let* ((shuffled (alexandria:shuffle *resamplingdata*))
(60k (subseq shuffled 0 10000))
(128k (subseq shuffled 10000)))
(>= ( (alexandria:mean 128k) (alexandria:mean 60k))
0.0011706948)))
0.005 1d9 :alphahi 0.01 :stream *standardoutput*)
1 0.000e+0 7.761e11 10.000e1 2.967e1
10 0.000e+0 0.000e+0 8.709e1 9.977e1
20 0.000e+0 0.000e+0 6.577e1 1.235e+0
30 0.000e+0 0.000e+0 5.163e1 1.360e+0
40 0.000e+0 0.000e+0 4.226e1 1.438e+0
50 0.000e+0 0.000e+0 3.569e1 1.489e+0
60 0.000e+0 0.000e+0 3.086e1 1.523e+0
70 0.000e+0 0.000e+0 2.718e1 1.546e+0
80 0.000e+0 0.000e+0 2.427e1 1.559e+0
90 0.000e+0 0.000e+0 2.192e1 1.566e+0
100 0.000e+0 0.000e+0 1.998e1 1.568e+0
200 0.000e+0 0.000e+0 1.060e1 1.430e+0
300 0.000e+0 0.000e+0 7.207e2 1.169e+0
400 0.000e+0 0.000e+0 5.460e2 8.572e1
500 0.000e+0 0.000e+0 4.395e2 5.174e1
600 0.000e+0 0.000e+0 3.677e2 1.600e1
700 0.000e+0 0.000e+0 3.161e2 2.096e1
800 0.000e+0 0.000e+0 2.772e2 5.882e1
900 0.000e+0 0.000e+0 2.468e2 9.736e1
1000 0.000e+0 0.000e+0 2.224e2 1.364e+0
2000 0.000e+0 0.000e+0 1.119e2 5.428e+0
NIL
T
0.0d0
2967
0
0.0d0
0.007557510165262294d0
We tried to replicate the difference 2967 times, and did not succeed even once. The CSM stopped us there, and we find a CI for the probability of observing our difference, under the null hypothesis, of [0, 0.007557]
(i.e., \(p < 0.01\)). Or, for a graphical summary, . We can also test for a lower \(p\)value by changing the thresholds and running the simulation more times (around thirty thousand iterations for \(p < 0.001\)).
This experiment lets us conclude that the difference in mean fill rate between 100 bins @ 60K ball/bin and 1000 @ 128K is probably not due to chance: it's unlikely that we observed an expected difference between data sampled from the same distribution. In other words, "I'm confident that the fill rate for 1000 bins @ 128K ball/bin is greater than for 100 bins @ 60K ball/bins, because it would be highly unlikely to observe a difference in means that extreme if they had the same distribution (\(p < 0.01\))".
In general, we can use this exact test when we have two sets of observations, \(X\sb{0}\) and \(Y\sb{0}\), and a statistic \(f\sb{0} = f(X\sb{0}, Y\sb{0})\), where \(f\) is a pure function (the extension to three or more sets of observations is straightforward).
The test lets us determine the likelihood of observing \(f(X, Y) \geq f\sb{0}\) (we could also test for \(f(X, Y) \leq f\sb{0}\)), if \(X\) and \(Y\) were taken from similar distributions, modulo simple transformations (e.g., \(X\)'s mean is shifted compared to \(Y\)'s, or the latter's variance is double the former's).
We answer that question by repeatedly sampling without replacement from \(X\sb{0} \cup Y\sb{0}\) to generate \(X\sb{i}\) and \(Y\sb{i}\), such that \(X\sb{i} = X\sb{0}\) and \(Y\sb{i} = Y\sb{0}\) (e.g., by shuffling a vector and splitting it in two). We can apply any simple transformation here (e.g., increment every value in \(Y\sb{i}\) by \(\Delta\) to shift its mean by \(\Delta\)). Finally, we check if \(f(X\sb{i}, Y\sb{i}) \geq f\sb{0} = f(X\sb{0}, Y\sb{0})\); if so, we return success for this iteration, otherwise failure.
The loop above is a Bernoulli process that generates independent, identically distributed (assuming the random sampling is correct) truth values, and its success rate is equal to the probability of observing a value for \(f\) "as extreme" as \(f\sb{0}\) under the null hypothesis. We use the CSM with false positive rate \(\varepsilon\) to know when to stop generating more values and compute a credible interval for the probability under the null hypothesis. If that probability is low (less than some predetermined threshold, like \(\alpha = 0.001\)), we infer that the null hypothesis does not hold, and declare that the difference in our sample data points at a real difference in distributions. If we do everything correctly (cough), we will have implemented an Atlantic City procedure that fails with probability \(\alpha + \varepsilon\).
Personally, I often just set the threshold and the false positive rate unreasonably low and handwave some Bayes.
That's all!
I pushed the code above, and much more, to github, in Common Lisp, C, and Python (probably Py3, although 2.7 might work). Hopefully anyone can run with the code and use it to test, not only SLOtype properties, but also answer more general questions, with an exact test. I'd love to have ideas or contributions on the usability front. I have some throwaway code in attic/
, which I used to generate the SVG in this post, but it's not great. I also feel like I can do something to make it easier to stick the logic in shell scripts and continuous testing pipelines.
When I passed around a first draft for this post, many readers that could have used the CSM got stuck on the process of moving from mathematical expressions to computer code; not just how to do it, but, more fundamentally, why we can't just transliterate Greek to C or CL. I hope this revised post is clearer. Also, I hope it's clear that the reason I care so much about not introducing false positive via rounding isn't that I believe they're likely to make a difference, but simply that I want peace of mind with respect to numerical issues; I really don't want to be debugging some issue in my tests and have to wonder if it's all just caused by numerical errors.
The reason I care so much about making sure users can understand what the CSM codes does (and why it does what it does) is that I strongly believe we should minimise dependencies whose inner working we're unable to (legally) explore. Every abstraction leaks, and leakage is particularly frequent in failure situations. We may not need to understand magic if everything works fine, but, everything breaks eventually, and that's when expertise is most useful. When shit's on fire, we must be able to break the abstraction and understand how the magic works, and how it fails.
This post only tests ideal SLOtype properties (and regular null hypothesis tests translated to SLO properties), properties of the form "I claim that this indicator satisfies $PREDICATE x% of the time, with false positive rate y%" where the indicator's values are independent and identically distributed.
The last assumption is rarely truly satisfied in practice. I've seen an interesting choice, where the service level objective is defined in terms of a sample of production requests, which can replayed, shuffled, etc. to ensure i.i.d.ness. If the nature of the traffic changes abruptly, the SLO may not be representative of behaviour in production; but, then again, how could the service provider have guessed the change was about to happen? I like this approach because it is amenable to predictive statistical analysis, and incentivises communication between service users and providers, rather than users assuming the service will gracefully handle radically new crap being thrown at it.
Even if we have a representative sample of production, it's not true that the service level indicators for individual requests are distributed identically. There's an easy fix for the CSM and our credible intervals: generate i.i.d. sets of requests by resampling (e.g., shuffle the requests sample) and count successes and failures for individual requests, but only test for CSM termination after each resampled set.
On a more general note, I see the Binomial and Exact tests as instances of a general pattern that avoids intuitive functional decompositions that create subproblems that are harder to solve than the original problem. For example, instead of trying to directly determine how frequently the SLI satisfies some threshold, it's natural to first fit a distribution on the SLI, and then compute percentiles on that distribution. Automatically fitting an arbitrary distribution is hard, especially with the weird outliers computer systems spit out. Reducing to a Bernoulli process before applying statistics is much simpler. Similarly, rather than coming up with analytical distributions in the Exact test, we bruteforce the problem by resampling from the empirical data. I have more examples from online control systems... I guess the moral is to be wary of decompositions where internal subcomponents generate intermediate values that are richer in information than the final output.
Thank you Jacob, Ruchir, Barkley, and Joonas for all the editing and restructuring comments.

Proportions are unscaled probabilities that don't have to sum or integrate to 1. Using proportions instead of probabilities tends to make calculations simpler, and we can always get a probability back by rescaling a proportion by the inverse of its integral.↩

Instead of a \(\mathrm{Beta}(a+1, b+1)\), they tend to bound with a \(\mathrm{Beta}(a, b)\). The difference is marginal for doubledigit \(n\).↩

I used the bisection method instead of more sophisticated ones with better convergence, like Newton's method or the derivativefree Secant method, because bisection already adds one bit of precision per iteration, only needs a predicate that returns "too high" or "too low," and is easily tweaked to be conservative when the predicate declines to return an answer.↩
06 Jul 2018 10:02pm GMT
02 Jul 2018
Planet Lisp
Eugene Zaikonnikov: A tiny Lisp bytecode interpreter in Z80 assembly
It all started with a raid on a long abandoned hosting service. Seen a mention of it in the news, leading to a vague recollection of using it for something. Email address associated with the account was long defunct, and the service itself changed ownership a few times in the past two decades. But incredibly, I could recall login credentials and they worked still.
Amazingly, in a pile of abandoned HTML templates, obsolete software archives and Under Construction GIFs there was a source file for a project I long considered lost. It's a minimal Lisp bytecode interpreter written in assembly for ZX Spectrum along the lines of MIT AIM514. Save for address locations and maybe a couple ROM calls for error reporting it's generic Z80 code.
It was a part of bigger project that should have included a primitive REPL, but no trace of that was found. Also, am quite sure there is a henious bug lurking in the mark&sweep GC. Should really find time to finally debug that!
02 Jul 2018 3:00pm GMT
25 Jun 2018
Planet Lisp
Paul Khuong: An Old Conjecture on Stream Transducers
I've been thinking about stream processing again, and came back to an old picktwoofthree conjecture of mine: for stream processing without dynamic allocation, "arbitrary outputs per input, multiple consumers, multiple producers: choose two."
The question is interesting because stream processing in constant space is a subset of L (or FL), and thus probably not Pcomplete, let alone Turing complete. Having easily characterisable subsets of stream processing that can be implemented in constant space would be a boon for the usability of stream DSLs.
I think I find this academic trope as suspicious as @DRMavIver does, so I have mixed feelings about the fact that this one still feels true seven years later.
Is it just me or do impossibility theorems which claim "these three obviously desirable properties cannot simultaneously be satisfied" always include at least one obviously undesirable or at least suspicious property?
 David R. MacIver (@DRMacIver) June 19, 2018
The main reason I believe in this conjecture is the following example, F(S(X), X)
, where S
is the function that takes a stream and ouputs every other value. Or, more formally, \(F\sb{i} = f(X\sb{2i}, X\sb{i})\).
Let's say X
is some stream of values that can't be easily recomputed (e.g., each output value is the result of a slow computation). How do we then compute F(S(X), X)
without either recomputing the stream X
, or buffering an unbounded amount of past values from that stream? I don't see a way to do so, not just in any stream processing DSL (domain specific language), but also in any general purpose language.
For me, the essence of the problem is that the two inputs to F
are out of sync with respect to the same source of values, X
: one consumes two values of X
per invocation of F
, and the other only one. This issue could also occur if we forced stream transducers (processing nodes) to output a fixed number of value at each invocation: let S
repeat each value of X
twice, i.e., interleave X
with X
(\(F\sb{i} = f(X\sb{\lfloor i / 2\rfloor}, X\sb{i})\)).
Forcing each invocation of a transducer to always produce exactly one value is one way to rule out this class of stream processing network. Two other common options are to forbid either forks (everything is singleuse or subtrees copied and recomputed for each reuse) or joins (only singleinput stream processing nodes).
I don't think this turtleandhare desynchronisation problem is a weakness in stream DSLs, I only see a reasonable task that can't be performed in constant space. Given the existence of such tasks, I'd like to see stream processing DSLs be explicit about the tradeoffs they make to balance performance guarantees, expressiveness, and usability, especially when it comes to the performance model.
25 Jun 2018 1:21am GMT
15 Jun 2018
Planet Lisp
Christophe Rhodes: sbcl method tracing
Since approximately forever, sbcl has advertised the possibility of tracing individual methods of a generic function by passing :methods t
as an argument to trace
. Until recently, tracing methods was only supported using the :encapsulate nil
style of tracing, modifying the compiled code for function objects directly.
For a variety of reasons, the alternative :encapsulate t
implementation of tracing, effectively wrapping the function with some code to run around it, is more robust. One problem with :encapsulate nil
tracing is that if the object being traced is a closure, the modification of the function's code will affect all of the closures, not just any single one  closures are distinct objects with distinct closedover environments, but they share the same execuable code, so modifying one of them modifies all of them. However, the implementation of method tracing I wrote in 2005  essentially, finding and tracing the method functions and the method fastfunctions (on which more later)  was fundamentally incompatible with encapsulation; the method functions are essentially never called by name by CLOS, but by more esoteric means.
What are those esoteric means, I hear you ask?! I'm glad I can hear you. The Metaobject Protocol defines a method calling convention, such that method calls receive as two arguments firstly: the entire argument list as the method body would expect to handle; and secondly: the list of sorted applicable next methods, such that the first element is the method which should be invoked if the method uses callnextmethod
. So a method function conforming to this protocol has to:
 destructure its first argument to bind the method parameters to the arguments given;
 if
callnextmethod
is used, reconstruct an argument list (in general, because the arguments to the next method need not be the same as the arguments to the existing method) before calling the next method's methodfunction with the reconstructed argument list and the rest of the next methods.
But! For a given set of actual arguments, for that call, the set of applicable methods is known; the precedence order is known; and, with a bit of bookkeeping in the implementation of defmethod
, whether any individual method actually calls callnextmethod
is known. So it is possible, at the point of calling a genericfunction with a set of arguments, to know not only the first applicable method, but in fact all the applicable methods, their ordering, and the combination of those methods that will actually get called (which is determined by whether methods invoke callnextmethod
and also by the generic function's method combination).
Therefore, a sophisticated (and by "sophisticated" here I mean "written by the wizards at Xerox PARC)") implementation of CLOS can compile an effective method for a given call, resolve all the nextmethod calls, perform some extra optimizations on slotvalue
and slot accessors, improve the calling convention (we no longer need the list of next methods, but only a single next effectivemethod, so we can spread the argument list once more), and cache the resulting function for future use. So the onetime cost for each set of applicable methods generates an optimized effective method, making use of fastmethodfunctions with the improved calling convention.
Here's the trick, then: this effective method is compiled into a chain of methodcall
and fastmethodcall
objects, which call their embedded functions. This, then, is ripe for encapsulation; to allow method tracing, all we need to do is arrange at computeeffectivemethod
time that those embedded functions are wrapped in code that performs the tracing, and that any attempt to untrace
the generic function (or to modify the tracing parameters) reinitializes the generic function instance, which clears all the effective method caches. And then Hey Bob, Your Uncle's Presto! and everything works.
(defgeneric foo (x)
(:method (x) 3))
(defmethod foo :around ((x fixnum))
(1+ (callnextmethod)))
(defmethod foo ((x integer))
(* 2 (callnextmethod)))
(defmethod foo ((x float))
(* 3 (callnextmethod)))
(defmethod foo :before ((x singlefloat))
'single)
(defmethod foo :after ((x doublefloat))
'double)
Here's a generic function foo
with moderately complex methods. How can we work out what is going on? Call the method tracer!
CLUSER> (foo 2.0d0)
0: (FOO 2.0d0)
1: ((SBPCL::COMBINEDMETHOD FOO) 2.0d0)
2: ((METHOD FOO (FLOAT)) 2.0d0)
3: ((METHOD FOO (T)) 2.0d0)
3: (METHOD FOO (T)) returned 3
2: (METHOD FOO (FLOAT)) returned 9
2: ((METHOD FOO :AFTER (DOUBLEFLOAT)) 2.0d0)
2: (METHOD FOO :AFTER (DOUBLEFLOAT)) returned DOUBLE
1: (SBPCL::COMBINEDMETHOD FOO) returned 9
0: FOO returned 9
9
This mostly works. It doesn't quite handle all cases, specifically when the CLOS user adds a method and implements callnextmethod
for themselves:
(addmethod #'foo
(makeinstance 'standardmethod
:qualifiers '()
:specializers (list (findclass 'fixnum))
:lambdalist '(x)
:function (lambda (args nms) (+ 2 (funcall (sbmop:methodfunction (first nms)) args (rest nms))))))
CLUSER> (foo 3)
0: (FOO 3)
1: ((METHOD FOO :AROUND (FIXNUM)) 3)
2: ((METHOD FOO (FIXNUM)) 3)
2: (METHOD FOO (FIXNUM)) returned 8
1: (METHOD FOO :AROUND (FIXNUM)) returned 9
0: FOO returned 9
9
In this trace, we have lost the method trace from the direct call to the methodfunction
, and calls that that function makes; this is the cost of performing the trace in the effective method, though a mitigating factor is that we have visibility of method combination (through the (sbpcl::combinedmethod foo)
line in the trace above). It would probably be possible to do the encapsulation in the method object itself, by modifying the function and the fastfunction, but this requires rather more bookkeeping and (at least theoretically) breaks the object identity: we do not have licence to modify the function stored in a method object. So, for now, sbcl has this imperfect solution for users to try (expected to be in sbcl1.4.9, probably released towards the end of June).
(I can't really believe it's taken me twelve years to do this. Other implementations have had this working for years. Sorry!)
15 Jun 2018 9:00pm GMT