28 Nov 2014
28 Nov 2014 10:04pm GMT
26 Nov 2014
Amazon.com is the leading online retailer in the United States, with over $75bn in global revenue. At Amazon, we are passionate about using technology to solve business problems that have big customer impact.
CORTEX is our next generation platform that handles real-time financial data flows and notifications. Our stateless event-driven compute engine for dynamic data transforms is built entirely in Clojure and is crucial to our ability to provide a highly agile response to financial events. We leverage AWS to operate on a massive scale and meet high-availability, low-latency SLAs.
Combining a startup atmosphere with the ambition to build and utilize cutting-edge reactive technology, the Cortex team at Amazon is looking for a passionate, results-oriented, innovative Sr. Software Engineer who wants to move fast, have fun and be deeply involved in solving business integration problems across various organizations within Amazon.
If this describes you - our team is a great fit:
- You obsess over software performance and challenge yourself and others to deliver highly scalable, low latency, reliable and fast computation platforms.
- You've got great ideas and you know how to solve problems, but you also follow through with a clean and maintainable implementation.
- You have a high bar for coding excellence and a passion for design and architecture.
Our technology stack: Clojure, JVM, AWS tools, Sable
- 3+ years of experience designing, building, deploying, operating, scaling, and evolving distributed systems and high-volume transaction application in a 24/7 environment
- 3+ years of industry experience in software developent in Java or C++
- Exceptional customer relationship skills including the ability to discover the true requirements underlying feature requests, recommend alternative technical and business approaches, and lead engineering efforts to meet aggressive timelines with optimal solutions
- Bachelor's Degree in Computer Science or related field or equivalent work experience
- Background or strong interest in Clojure
- Experience with cloud technologies from AWS
- Proficiency in a Unix/Linux environment
- Graduate degree (MS/PhD) a plus
- Experience mentoring and developing junior SDEs
If you are interested, contact Janney Jaxen, Technical Recruiter, Retail Systems firstname.lastname@example.org
26 Nov 2014 12:21am GMT
If you are a novice Lisp programmer and you are confused about the difference between systems and packages, I recommend you to read the Packages, systems, modules, libraries - WTF? before you continue.
The most common way that people use packages nowadays is by adding a
package.lisp file. This file is usually the first file to be loaded and define the packages that the other files use. This approach has worked for many projects. However, when packages become larger, it requires discipline to manage the dependencies between the files.
An alternative approach to the definition of packages is named one package per file. As the name suggests, it consists in starting every file with a defpackage. Because dependencies between packages are explicit and every file has a unique package, the dependencies between the files can be inferred.
This style of programming was introduced a few years ago by fastpath and quick-build. But recently, the de facto standard Common Lisp build system ASDF3 added support for it with the asdf-package-system extension. As a consequence, now it is easier than ever to use it.
So, stay with me for the next minutes and you will learn how to use this new approach in your projects. I hope you find it useful.
How to use it
First of all, we have to enable the asdf-package-system extension on our system. We will work on a system named
project as definition follows
(asdf:defsystem :project :name "My Project" :version "0.0.1" :class :package-inferred-system :defsystem-depends-on (:asdf-package-system) :depends-on (:project/addons))
This defines a correspondence between systems, packages and files. A system
project/foo/bar refers the file
foo/bar.lisp, which have to provide a package
project/foo/bar, and whose used and imported packages refer to systems of the same name.
For example, as the system
project depends on
project/addons, the file
addons.lisp must be loaded first. The content of
addons.lisp starts with:
(defpackage :project/addons (:use :common-lisp :project/core) (:import-from :cl-ppcre)) (in-package :project/addons) (cl-ppcre:scan-to-strings "h.*o" (hello))
Note that it uses the package
project/core and "import" the
cl-ppcre one. Therefore, ASDF will infer that it depends on both the systems cl-ppcre and
project/core, so they must be loaded first. But remember, the system
project/core refers to the file
(defpackage :project/core (:use :common-lisp) (:export #:hello)) (in-package :project/core) (defun hello () "Hello!")
And that is all. This file has no external dependencies. Then, if we try to load our system
the systems and files
addons.lisp will be loaded in the proper order.
Integration with other systems
What if we use a package but the system that provides such a package has not the same (downcased) name? For example, the system closer-mop provides a package named c2cl. In order to let ASDF know how to find the system for a package, we can call the function
register-system-packages with the name of the systems and the packages it provides as argument. In our case, we would include the following in our
(register-system-packages :closer-mop '(:c2cl))
A Last Trick
Most of the time we want to export a single package with all the symbols from the "subpackages". This can be done very easily if you use the UIOP library shipped with ASDF. For example, let us define a file named
all.lisp in our example like:
(uiop/package:define-package :project/all (:nickname :project) (:use-reexport :project/core :project/addons))
Now, the system
project/all depends on both
project/addons, and it will reexport the symbols into a
If you want to know more about ASDF3 and package-system:
ASDF 3, or Why Lisp is Now an Acceptable Scripting Language
(Extended version) by François-René Rideau.
26 Nov 2014 12:00am GMT
18 Nov 2014
I have uploaded a new version of my Alternatives library. In addition to the
ALTERNATIVES macro, there is an
ALTERNATIVES* macro which allows one to specify a name for the set of choices. Then, one can check the
DOCUMENTATION to see which alternative was last macroexpanded.
"Always pick the letter A."
"Choose any letter with equal probability"
(documentation 'random-letter-algorithm 'alt:alternatives)
Always pick the letter A."
18 Nov 2014 2:44am GMT
16 Nov 2014
It's apparently been just 508 days since I first joined github. In that time I've written a lot of Common Lisp code and apparently made around 4000-5000 commits. I now want to make a retrospective and go over all the projects I've started. I'll omit some of the smaller, uninteresting ones though.
The projects are very roughly in the order I remember creating them. I can't recall exactly what it was, so things might be all over the place, but it matters not. An approximate order like this is sufficient.
This was my first big CL project that I started as I was investigating tools for radiance. Radiance already began conceptually before this, but I didn't write significant enough code for it to count. lQuery tries to bring the very convenient jQuery syntax for manipulating the DOM to CL. I did this because I knew jQuery and I did not find the alternatives very appealing. Initially it was supposed to help with templating for the most part, but it turned out to be more useful for other tasks in the end.
The first version of lQuery was written in a hotel room in Japan during my one-week holiday there. Time well spent! Don't worry though, I got out often enough as well.
lQuery was also my first library to be published and distributed via Quicklisp, so I needed it to have easy to read documentation. Docstrings are great, but I wanted that information to be on the documentation page as well, so I looked for libraries that allowed me to bundle that somehow. Given that I couldn't find anything I liked, I quickly wrote up my own thing that used lQuery to generate the page. It was a matter of some hours and made me very pleased at the time.
Radiance is sort of the reason I really got into CL to begin with. The previous versions of the TyNET framework that my websites run on were written in PHP and I got really sick of the sources, so I never really went to fix bugs or make necessary improvements. Things worked, but they didn't work great.
As I picked up CL I had to look for a project to really get used to the language and rewriting my framework seemed like the first, obvious step. I wanted Radiance to become a mature, stable and usable system that other people could profit from as well. So, unlike in previous attempts I tried to take good care to do things right, even if my understanding of the language at that point was questionable at best.
One and a half years and almost a complete re-write (again) later, I still don't regret choosing this as my major project, as I'm now fairly confident that it will become something that people can use in the future. It's not quite there yet, but well on its way.
I dislike breakpoints and love good logging, so the next step that Radiance demanded was a good logging solution. I first tried my hands on log4cl, but didn't quite like it, mostly for a lack of being able to figure out how to make it work the way I wanted. So, rolling my own it was. I wanted something very flexible, so I though up a pipeline system for log message processing and distribution.
That was this library; a very small thing that allowed you to create (albeit in a cumbersome fashion) pipelines that could be used to process and distribute arbitrary messages.
From there on out I went to write the actual logger mechanisms, including threading support. Verbose was the result, and I still use and like it today.
For a while then I was occupied with the task of writing a bot for the Encyclopedia Dramatica wiki that should handle new registrations and bannings by adding templates to the user pages. In order to make this possible I checked out a few IRC libraries and wrote a crude thing that would sit on a channel and accept simple commands.
In order for it to actually do its thing though, I had to interact with the mediawiki API, so I wrote a tiny wrapper library around some of the calls that I needed. I never put this on Quicklisp because it was never fleshed-out enough to be there and it still isn't. Maybe some day I'll revise this to be a properly usable thing.
After I finished the bot I wanted to extend it to be able to interact with the forums of ED, which ran on XenForo. Unfortunately that forum offered absolutely zero APIs to access. There was a plugin, but I couldn't get the admins to install it as the forum was apparently so twisted that doing anything could make it crash and burn. Oh well.
So, I set out the classic way of parsing webpage content. Thanks to lQuery this was not that huge of a pain in the butt, but it still took a lot of fiddling to get things to run. This library too is not on QL as it is a big hack and far from complete as well.
At this point I'm really unsure about the order of the projects. Either way, the little bot project I made for ED was a mess and I wanted a proper bot framework to replace my previous bot, Kizai. As I wasn't impressed by the available IRC libraries either, I wrote Colleen from scratch.
Colleen is still being worked on every now and again today, but (with some bigger and smaller rewrites along the way) it has proven to be a very good framework that I am very glad I took the time to write.
In order to test out Radiance and because I was sick of pastebin as a paste service, I set out to write my own. This, too, has proven to be a good investment of my time as I still use plaster as my primary pasting service today. There's a few things I'd like to improve about it whenever I do get the time to, but for the most part it just works.
At some point I noticed that I'd like to have twitter interaction for some of my web-services, so I looked around for API interfaces for that. However there wasn't anything that really worked well. So, once more I went to write something that fit my needs.
This was my first really frustrating project to get going, mostly because figuring out how oAuth is supposed to work is a huge pain. Web-APIs are some of the worst things to interact with, as often enough there is absolutely no way to figure out what exactly went wrong, so you're left stumbling in the dark until you find something that works.
Even though I haven't really used Chirp much myself, it seems to have been of use to a couple of people at least, if Github stars are anything to go by.
Since oAuth is a repeating pattern on the web and it was sufficiently painful to figure out for Chirp, I segregated that part out into its own library. I'm not sure if anyone aside from myself has used South for anything though.
During one of my rewriting iterations of Colleen I noticed that a very common pattern was to save and load some kind of storage. Moving that pattern out into the framework and thus automating configuration and storage seemed like a good idea. However, since Colleen was also an end-user application, I needed to make sure that the configuration could be saved in a format that the user wanted, rather than simple sexprs.
And that's what Universal-Config is supposed to do: Generalise the access of configuration as well as the storage. It works really well on the code side; accessing parts and changing the config is very simple and convenient. It only works so-so on the configuration storage side of things though, as I needed to strike some gross compromises in the serialisation of the objects to ensure compatibility between formats.
Maybe some day I'll figure out a smarter solution to the problems UC has.
Deferred was an attempt at providing mechanisms for optional features of your code. Meaning that your could would work depending on what kind of libraries are loaded at the time. Therefore I could for example provide a local server based authentication with South without explicitly requiring Hunchentoot or some other webserver. Deferred is more a proof-of-concept than anything though, as I haven't actually utilised it in any of my projects.
However, the problem is an interesting one and whenever I do return to it, I want to try to tackle it from a different angle (extending ASDF to allow something like optional dependencies and conditional components).
The first version of lQuery used Closure-HTML, CXML, and css-selectors to do most of the work. However, CHTML and CXML suffered from big problems: CXML would not parse regular HTML (of course) and CHTML would not parse HTML5 as it required a strict DTD to conform to. Also, css-selectors' performance wasn't the greatest either.
So, in order to clean up all these issues I set out to write my own HT/X/ML parser that should both be fast and lenient towards butchered documents. Well, fast it is, and lenient it is as well. Plump is probably so far my best project in my opinion, as its code is straight-forward, extensible, and just does its job very well.
The next step was to build a CSS-selectors DOM search engine on top of Plump. This turned out to be quite simple, as I could re-use the tools from Plump to parse the selectors and searching the DOM efficiently was not that big of a deal either.
After these two were done, the last job was to re-write lQuery to work with the new systems Plump and CLSS provided. The re-write was a very good idea, as it made lQuery a lot more extensible and easier to read and test. It was quite funny to read such old code, after having worked with CL for about a year by then.
The templating engine I used in Radiance so far had been a combination of lQuery and "uibox", which provided some very crude tools to fill in fields of nodes on the DOM. I didn't like this approach very much as there was too much lQuery clutter in the code that should've been in the template.
Clip now provides a templating system that hasn't been done in CL before and I don't think has really been done ever. All the code that manipulates your template is in the template itself, but the template is a valid HTML5 document at all times. The trick is to take advantage what HTML already allows you to do: custom tags and attributes. Clip picks those up, parses them and then modifies the DOM according to their instructions. All you have to do in your CL code is to pass in the data the page needs.
lQuery-Doc left a lot to wish for, so another rewrite was in order. This time I took advantage of Clip's capabilities to provide a very straight-forward, no-bullshit tool to generate documentation.
The only drawback it has currently is that its default template doesn't have the greatest stylesheet in the world, but that hardly bothers me. Maybe I'll get to writing a fancy one some day.
I always wanted to write my own painting application, mostly because MyPaint and others were never completely to my liking. I even took attempts at this before in Java. At some point, out of curiosity, I looked into how I would go about grabbing tablet input. Investigating the jPen library brought me absolutely nothing but confusion, so I looked for other ways. Luckily enough it turned out that Qt already provided a built-in way to grab events from tablets and from previous experience with a minor project I knew that CommonQt allowed me to use Qt rather easily from CL out.
So, what started out as a quick test to see whether it would even be possible to make a painting application quickly turned into a big thing that had a lot of potential. You can read more about it here.
A lot of time had passed since I last worked on Radiance. I took time off as I noticed that the framework had turned into something uncanny and I needed to fix that. And the way to fix it was to write a lot of design drafts and work out all the issues that came to mind on paper.
My conclusion after all this was: Radiance needed a complete, from scratch, rewrite. Oh boy. The first part that needed to be done is a proper library to provide the encapsulation into modules. Modules are Radiance's primary abstraction that allow you to neatly separate parts, but also unify the access and interaction between them.
Modularize was the solution for this and it works pretty well. In fact, it works so well that I don't even think about it anymore nowadays, it just does its job as I expect it to. Aside from Modularize itself I wrote two extensions that tucker on support for triggers and the much-needed interfaces and implementations mechanism that is vital to Radiance. I won't explain what these do exactly right now, that'll be for when I write the comprehensive guide to Radiance.
After a long time of rewriting Radiance's core and contribs, it was time to rewrite another component from the old version of TyNET: my blog. This time I tried to focus on simplicity and getting it done well. Simple it is indeed, it's barely 200 lines of code. And as you can probably see as you read this, it works quite nicely.
Writing CSS is a pain in the butt, as it involves a lot of duplication and other annoyances. At some point I had the idea of writing a Lisp to CSS compiler. Taking inspiration from Sass this idea grew into LASS in a matter of.. a day or two, I think?
I now use LASS for all of my style sheet writing concerns as it just works very well and with some minor emacs fiddling I don't even have to worry about compiling it to CSS myself.
Sometimes Xach would talk on IRC about wanting to interact with Tumblr through CL. As Tumblr is a service I use too and the biggest hurdle (oAuth) was already handled by South I took the challenge of writing yet another web-API client.
Humbler turned out a lot nicer than Chirp did in terms of end-user experience, I would say. However, I cannot at all say the same about my own experience while writing it. Tumblr's API "documentation" is quite bad, to be blunt. A lot of the returned fields are not noted on the page, some things are plain wrong (probably out of date) and in general there's just not enough things actually being documented. The worst part about it all was the audacity that the staff had to proclaim in a blog post that they wanted to encourage experimentation!, as if having to figure out the API by yourself was actually a great thing.
Anyway, I haven't actually used Humbler for anything myself, but some people seem to be using it and that's good enough to me.
Returning back to Radiance problems, one of the recurring issues was validating user input. There didn't seem to be a library that did this in any way. And so the same old game of 'write it yourself' began. Ratify's development mostly included reading RFCs and trying to translate them into tests and finally bundling it all together in some easy to use macros.
On twitter I encountered a really nice screenshot of an error page on some Clojure project. I couldn't find the tweet again later, so I don't know what features it had exactly, but suffice to say it was better than anything I'd seen up to that point.
That lead me to wonder how I could actually get the stack trace myself if an error occurred. There was already a library that provided rudimentary support for that, trivial-backtrace. Taking a look at its source code filled me with everything else but esteem though, so I headed out to write something that would allow people to inspect the stack, restarts, and accompanied source code easily.
A quick question by eudoxia on twitter inspired me to write a very quick toolkit to extract and infuse CSS from/into HTML. The main use case for the former would be to turn HTML into HTML+CSS and the latter to reverse the process (for, say, emails). Using lQuery and LASS this turned out to be a super easy thing to do and I had it done in no time.
Hooray for great code re-use!
Aside from the blog, the only really actively used component of TyNET was the imageboard, Stevenchan. Stevenchan ran on my own software called Purplish. In order to be able to dump everything of the old-code base I was driven to re-write Purplish for Radiance.
However, Purplish now takes a much different approach. A lot of traditional imageboard features are missing and a couple of unconventional features were added. Plus, having it written in CL has the advantage of being much easier to maintain, so if anything ever does crop up I'll tend much more towards wanting to fix it than I did before with PHP.
I like language a lot. I also like to try and reduce things to their minimum. So, the idea came to me of a site that allowed people to review things with only a single keyword. The idea behind that was to, with sufficient data, see what kind of patterns emerge and find out what people think the essence of an experience is.
Certainly it wouldn't be useful for an actual 'review' of anything, but it's nevertheless an interesting experiment. I don't know if I'll ever get enough data to find patterns in this or anything that could lead to scientifically significant correlations, but it's a fun enough thing on its own.
Having completed pretty much everything that I wanted to work on and stalling on some major issues with Radiance I was on the lookout for things to do. Parasol was still on hold and nothing else really picked my interest. In an attempt to start out right and not dive head over heels into it again, I first considered ways in which to make the C++-ness of Qt more lispy.
Born out of this was Qtools, a collection of tools to aid development with CommonQt and make it all feel a bit more homely. Of course, some major issues still remain; you still need to pay attention to garbage and there's still C++ calls lingering about, but all in all the heavy hackery of Qtools does make it more pleasing to the eye.
Qtools forced me to go deeper into the guts of CLOS and MOP than I've ever gone before and I had to spend a lot of time in implementation code to figure out how to make the things I needed work. I wouldn't advise Qtools as a good use of MOP, but it could be considered an impressive work in exercising the flexibility Lisp offers.
So, that's it then for now. I'd like to amend here that during the most part of all these projects I should've been studying for university. I'm not sure if working on these projects was the right choice, but I have learned a huge bunch and I hope that my produce of my efforts has been of use to other people. If not, then it certainly was not the right choice to indulge myself this much in programming.
Before I go into another long rant about my questionable situation in university I'll cap this here. Until another time!
Post scriptum: If you have ideas for features or new projects for me to work on, please let me know! More ideas is always better.
16 Nov 2014 9:18pm GMT
I have now released the code that I mentioned in my previous post Code That Tells You Why which lets one keep multiple implementations around in code and switch between them manually without much trouble.
A link to the source code is here: nklein.com/software/alternatives/.
16 Nov 2014 5:04am GMT
13 Nov 2014
It makes a good point. However, it got me thinking that for cases like the binary-search example in the article, it might be nice to see all of the alternatives in the code and easily be able to switch between them.
One way to accomplish this in Lisp is to abuse the
#- reader macros:
(loop :for i :to n :summing (* i i))
(do ((i 0 (1+ i))
(sum 0 (+ sum (* i i))))
((> i n) sum))
"Some people find a do-loop to hard to read
(and 'too' too hard to spell, apparently)."
(/ (* n (1+ n) (1+ (+ n n)) 6))
This is less than ideal for a number of reasons, including: one needs to make sure to pick "feature" names that won't actually ever get turned on, the sense of
- seem backwards here, and switching to a different alternative requires editing two places.
Another Lisp alternative is to abuse the
(loop :for i :to n :summing (* i i)))
(do ((i 0 (1+ i))
(sum 0 (+ sum (* i i))))
((> i n) sum)))
"Some people find a do-loop to hard to read
(and 'too' too hard to spell, apparently).")
(/ (* n (1+ n) (1+ (+ n n)) 6)))))
This is better. No one can doubt which alternative is in use. It is only one edit to switch which alternative is used. It still feels pretty hackish to me though.
One can clean it up a bit with some macrology.
(flet ((symbol-is-***-p (sym)
(and (symbolp sym)
(string= (symbol-name sym) "***")))
(when (listp clause)
(destructuring-bind (tag &body body) clause
(when (and (symbolp tag)
(member (symbol-name tag)
'("***" "FINAL" "BLESSED")
((member-if #'symbol-is-***-p clauses)
(let ((clause (first (rest anaphora:it))))
((find-if #'final-clause-p clauses)
,@(rest (first anaphora:it)))))))
With this macro, one can now rewrite the
sum-i^2 function quite readably:
(loop :for i :to n :summing (* i i)))
(do ((i 0 (1+ i))
(sum 0 (+ sum (* i i))))
((> i n) sum)))
"Some people find a do-loop to hard to read
(and 'too' too hard to spell, apparently).")
(/ (* n (1+ n) (1+ (+ n n)) 6)))))
If I wanted to try the
my-first-attempt-was-something-like-this clause, I could stick a
*** before that clause or change its name to
blessed, or I could move that clause into the last spot.
There is still an onus on the developer to chose useful alternative names. In most production code, one wants to clean out all of the dead code. On the other hand, during development or for more interactive code bodies, one might prefer to be able to see the exact "How" that goes with the "Why" and easily be able to swap between them.
(Above macro coming in well-documented library form, hopefully this weekend.)
13 Nov 2014 10:22pm GMT
08 Nov 2014
I think this might be my last blog entry on the subject of building SBCL for a while.
One of the premises behind SBCL as a separate entity from CMUCL, its parent, was to make the result of its build be independent of the compiler used to build it. To a world where separate compilation is the norm, the very idea that building some software should persistently modify the state of the compiler probably seems bizarre, but the Lisp world evolved in that way and Lisp environments (at least those written in themselves) developed build recipes where the steps to construct a new Lisp system from an old one and the source code would depend critically on internal details of both the old and the new one: substantial amounts of introspection on the build host were used to bootstrap the target, so if the details revealed by introspection were no longer valid for the new system, there would need to be some patching in the middle of the build process. (How would you know whether that was necessary? Typically, because the build would fail with a more-or-less - usually more - cryptic error.)
Enter SBCL, whose strategy is essentially to use the source files first to build an SBCL!Compiler running in a host Common Lisp implementation, and then to use that SBCL!Compiler to compile the source files again to produce the target system. This requires some contortions in the source files: we must write enough of the system in portable Common Lisp so that an arbitrary host can execute SBCL!Compiler to compile SBCL-flavoured sources (including the standard headache-inducing
(defun car (list) (car list)) and similar, which works because SBCL!Compiler knows how to compile calls to
How much is "enough" of the system? Well, one answer might be when the build output actually works, at least to the point of running and executing some Lisp code. We got there about twelve years ago, when OpenMCL (as it was then called) compiled SBCL. And yet... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments).
I've been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail; if output files are different, it might be a harmless divergence, for example a difference in ordering of steps where neither depends on the other, or it might in fact indicate a leak from the host environment into the target. Before this latest attack at the problem, I last worked on it seriously in 2009, getting most of the way there but with some problems remaining, as measured by the number of output files (out of some 330 or so) whose contents differed depending on which host Common Lisp implementation SBCL!Compiler was running on.
Over the last month, then, I have been slowly solving these problems, one by one. This has involved refining what is probably my second most useless skill, working out what SBCL fasl files are doing by looking at their contents in a text editor, and from that intuiting the differences in the implementations that give rise to the differences in the output files. The final pieces of the puzzle fell into place earlier this week, and the triumphant commit announces that as of Wednesday all 335 target source files get compiled identically by SBCL!Compiler, whether that is running under Clozure Common Lisp (32- or 64-bit versions), CLISP, or a different version of SBCL itself.
Oh but wait. There is another component to the build: as well as SBCL!Compiler, we have SBCL!Loader, which is responsible for taking those 335 output files and constructing from them a Lisp image file which the platform executable can use to start a Lisp session. (SBCL!Loader is maybe better known as "genesis"; but it is to
load what SBCL!Compiler is to
compile-file). And it was slightly disheartening to find that despite having 335 identical output files, the resulting
cold-sbcl.core file differed between builds on different host compilers, even after I had remembered to discount the build fingerprint constructed to be different for every build.
Fortunately, the actual problem that needed fixing was relatively small: a call to
maphash, which (understandably) makes no guarantees about ordering, was used to affect the Lisp image data directly. I then spent a certain amount of time being thoroughly confused, having managed to construct for myself a Lisp image where the following forms executed with ... odd results:
(loop for x being the external-symbols of "CL" count 1) ; => 1032 (length (delete-duplicates (loop for x being the external-symbols of "CL" collect x))) ; => 978
(unless (member (package-name package) '("COMMON-LISP" "KEYWORD" :test #'string=)) ...)
was not the same as
(unless (member (package-name package) '("COMMON-LISP" "KEYWORD") :test #'string=) ...)
and all was well again, and as of this commit the
cold-sbcl.core output file is identical no matter the build host.
It might be interesting to survey the various implementation-specific behaviours that we have run into during the process of making this build completely repeatable. The following is a probably non-exhaustive list - it has been twelve years, after all - but maybe is some food for thought, or (if you have a particularly demonic turn of mind) an ingredients list for a maximally-irritating CL implementation...
- Perhaps most obviously, various constants are implementation-defined. The ones which caused the most trouble were undoubtably
most-negative-fixnum- particularly since they could end up being used in ways where their presence wasn't obvious. For example,
(deftype fd () `(integer 0 ,most-positive-fixnum))has, in the SBCL build process, a subtly different meaning from
(deftype fd () (and fixnum unsigned-byte))- in the second case, the
fdtype will have the intended meaning in the target system, using the target's
fixnumrange, while in the first case we have no way of intercepting or translating the host's value of
most-positive-fixnum. Special mentions go to
array-dimension-limit, which caused Bill Newman to be cross on the Internet, and to
internal-time-units-per-second; I ended up tracking down one difference in output machine code from a leak of the host's value of that constant into target code.
sxhashquite justifiably differ between implementations. The practical upshot of that is that these functions can't be used to implement a cache in SBCL!Compiler, because the access patterns and hence the patterns of cache hits and misses will be different depending on the host implementation.
- As I've already mentioned,
maphashdoes not iterate over hash-table contents in a specified order, and in fact that order need not be deterministic; similarly,
with-package-iteratorcan generate symbols in arbitrary orders, and set operations (
set-differenceand friends) will return the set as a list whose elements are in an arbitrary order. Incautious use of these functions tended to give rise to harmless but sometimes hard-to-diagnose differences in output; the solution was typically to sort the iteration output before operating on any of it, to introduce determinism...
- ... but it was possible to get that wrong in a harder-to-detect way, because
sortisnot specified to be stable. In some implementations, it actually is a stable sort in some conditions, but for cases where it's important to preserve an already-existing partial order,
stable-sortis the tool for the job.
- The language specification explicitly says that the initial contents of uninitialized arrays are undefined. In most implementations, at most times, executing
(make-array 8 :element-type (unsigned-byte 8))will give a zero-filled array, but there are circumstances in some implementations where the returned array will have arbitrary data.
- Not only are some constants implementation-defined, but so also are the effects of normal operation on some variables.
*gensym-counter*is affected by macroexpansion if the macro function calls
gensym, and implementations are permitted to macroexpand macros an arbitrary number of times. That means that our use of
gensymneeds to be immune to whatever the host implementation's macroexpansion and evaluation strategy is.
- The object returned by
byteto represent a bitfield with size and position is implementation-defined. Implementations (variously) return bitmasks, conses, structures, vectors; host return values of
bytemust not be used during the execution of SBCL!Compiler. More subtly, the various
boole-related constants (
boole-andand friends) also need special treatment; at one point, their host values were used when SBCL!Compiler compiled the
boolefunction itself, and it so happens that CLISP and SBCL both represent the constants as integers between 0 and 15... but with a different mapping between operation and integer.
- my last blog entry talked about constant coalescing, and about printing of
(quote foo). In fact printing in general has been a pain, and there are still significant differences in interpretation or at least in implementation of pretty-printing: to the extent that at one point we had to minimize printing at all in order for the build to complete under some implementations.
- there are a number of things which are implementation-defined but have caused a certain amount of difficulty. Floating point in general is a problem, not completely solved (SBCL will not build correctly if its host doesn't have distinct single- and double-float types that are at least approximately IEEE754-compliant). Some implementations lack denormalized numbers; some do not expose signed zeros to the user; and some implementations compute
(log 2d0 10d0)more accurately than others, including SBCL itself, do. The behaviour of the host implementation on legal but dubious code is also potentially tricky: SBCL's build treats full
warnings as worthy of stopping, but some hosts emit full warnings for constructs that are tricky to write in other ways: for example to write portable code to handle multiple kinds of string, one might write
(typecase string (simple-base-string ...) ((simple-array character (*)) ...)) (string ...))but some implementations emit full
warnings if a clause in a
typecaseis completely shadowed by other clauses, and if
characterare identical in that implementation the
typecaseabove will signal.
There were probably other, more minor differences between implementations, but the above list gives a flavour of the things that needed doing in order to get to this point, where we have some assurance that our code is behaving as intended. And all of this is a month ahead of my self-imposed deadline of SBCL's 15th birthday: SBCL was announced to the world on December 14th, 1999. (I'm hoping to be able to put on an sbcl15 workshop in conjunction with the European Lisp Symposium around April 20th/21st/22nd - if that sounds interesting, please pencil the dates in the diary and let me know...)
08 Nov 2014 10:38pm GMT
06 Nov 2014
- 1am - A minimal testing framework. - MIT
- cl-rlimit - Common lisp interface to unix rlimit -- ensure the performance of your program! - LLGPL
- corona - Isolated, reproducible virtual development environments. - MIT
- defpackage-plus - Extensible DEFPACKAGE with version support - BSD-2-Clause
- fast-http - A fast HTTP protocol parser in Common Lisp - MIT
- fiasco - A Common Lisp test framework that treasures your failures. A logical continuation of the Stefil test framework. - BSD 2-clause
- form-fiddle - A collection of utilities to destructure lambda forms. - Artistic
- global-vars - Define efficient global variables. - MIT
- http-body - HTTP POST data parser for Common Lisp - BSD 2-Clause
- hunchensocket - WebSockets for Hunchentoot - MIT
- lambda-fiddle - A collection of functions to process lambda-lists. - Artistic
- local-time-duration - local-time-duration: Simple duration functionality on top of local-time - MIT
- myway - Sinatra-compatible routing library. - LLGPL
- qtools - A collection of tools to aid in development with CommonQt. - Artistic
- quri - Yet another URI library for Common Lisp - BSD 3-Clause
- rock - Asset manager for Common Lisp. - MIT
- session-token - Simple session token generation library - BSD license: you can do anything you want with it (but no warranty).
- trivial-arguments - A simple library to retrieve the lambda-list of a function. - Artistic
- trivial-extract - Extract .tar/.tar.gz/.zip files. - MIT
- with-c-syntax - with-c-syntax is a fun package which introduces the C language syntax into Common Lisp. - WTFPL
- xsubseq - Efficient way to manage "subseq"s in Common Lisp - BSD 2-Clause
Updated projects: asteroids, buildapp, chirp, cl-algebraic-data-type, cl-ana, cl-arrows, cl-async, cl-autowrap, cl-bert, cl-case-control, cl-conspack, cl-dbi, cl-erlang-term, cl-mustache, cl-read-macro-tokens, cl-store, cl-virtualbox, clack, cleric, clfswm, clip, closer-mop, clsql-helper, clss, clx, coleslaw, colleen, crane, crypto-shortcuts, deferred, defmacro-enhance, dissect, drakma-async, esrap-liquid, event-glue, gbbopen, glyphs, graph, hdf5-cffi, helambdap, hh-web, humbler, ironclad, lass, lfarm, lisp-executable, lisp-gflags, lparallel, lquery, lredis, mcclim, method-combination-utilities, mgl-pax, modularize, modularize-hooks, modularize-interfaces, okra, optima, petit.string-utils, pgloader, plump, plump-sexp, plump-tex, postmodern, prove, ratify, rcl, readable, rutils, serapeum, shelly, slime, smug, softdrink, south, staple, stumpwm, trivial-benchmark, trivial-download, trivial-indent, trivial-mimes, trivial-signal, trivial-thumbnail, uiop, universal-config, vecto, verbose, weblocks, weblocks-stores, weblocks-tree-widget, yason.
To get this update, use (ql:update-dist "quicklisp").
06 Nov 2014 10:59pm GMT
SparX is a small engineering team focused on applying online machine learning and predictive modeling to eCommerce (impacting a 24 billion dollar business).
Our stack is 100% Clojure, service oriented, targeting 50 million users with 1ms SLAs. We apply engineering and data science to tough problems such as dynamic pricing, shipping estimations, personalized emails, and multi-variate testing.
We are always looking for talent in data-science, engineering and devops. Bonus points if you can bridge 2 of these together. We love people with strong fundamentals who can dive deep.
06 Nov 2014 5:14am GMT