16 Feb 2018

feedPlanet Lisp

McCLIM: McCLIM 0.9.7 "Imbolc" release

After 10 years we have decided that it is time to make a new release - the first one since 2008, which was McCLIM 0.9.6, St. George's Day. Imbolc is a Gaelic traditional festival marking the beginning of spring held between the winter solstice and the spring equinox.

Due to a long period of time, the number of changes is too big to list in full detail and we will thus note only major changes made during the last eleven iterations (though many important changes were done before that). For more information please check out previous iteration reports on McCLIM blog, git log and the issue tracker. We'd like to thank all present and past contributors for their time, support and testing.

We also have a bounty program financed with money from the fundraiser. We are grateful for financial contributions which allow us to attract new developers and reward old ones with bounties. Currently active bounties (worth $2650) are available here.

As Imbolc marks the beginning of spring we hope this release will be one of many in the upcoming future.

16 Feb 2018 12:00am GMT

07 Feb 2018

feedPlanet Lisp

Nicolas Hafner: Shader Pipelines and Effects - Gamedev

header
In the last entry I talked about Trial's way of handling shaders and how they are integrated into the object system to allow inheritance and composition. This time we'll take a look at how that system is extended to allow entire pipelines of effects and rendering systems.

Back in the olden days you could get away with a single pass to render everything. In fact, the fixed-function pipeline of GL 2.1 works that way, as do many simple rendering systems like primitive forward rendering, or most 2D games. However, more complex systems and effects require multiple passes. For instance, a deferred rendering system requires at least two passes, one that emits a "gbuffer", and another that combines the information from it to produce the final image. Another example would be a "god-rays" effect, which requires rendering the scene with all objects black, applying a radial blur to that, and then finally blending it with a rendering of the scene as normal.

These kinds of pipelines can become quite complex, especially when you start to consider that each pipeline might require multiple kinds of textures to use as buffers, inputs, and outputs. The management and allocation of these resources is a pain to do manually, so Trial includes a system to reduce the process to a graph definition. This allows you to write complicated pipelines without having to worry about any of the underlying buffers. The pipeline part is only one hand of the equation, but we'll get to the rest later.

To do the rendering, a stage will require a set of textures to output to, and a set of textures as inputs to the stage. A stage might offer additional parameters aside from that to tweak its behaviour, but those do not require allocation, so we can disregard them for now.

To give a bit of context I'll explain how these textures are attached to the shaders in OpenGL. I've glossed over that in the previous entry as it wasn't directly relevant. Let's consider a simple fragment shader that simply copies the colour from an input texture to an output texture:

uniform sampler2D input;
out vec4 color;

void main(){
  color = texelFetch(input, ivec2(int(gl_FragCoord.x), int(gl_FragCoord.y)));
}

Every GLSL file begins with a set of variable declarations. Each of these global variables has a storage qualifier that tells OpenGL where it comes from. uniforms are parameters that can be set from the CPU before the rendering is performed and are the same for all of the stages in the rendering. outs are variables that are sent to the next stage. The last important qualifier that isn't mentioned above is in, which takes variables from a previous stage. Every file must also contain a main function that acts as the entry point for the computation. In this case we perform a texture read at the current fragment position and store it into the output color - effectively copying it.

The way in which outputs and inputs differ should become apparent here. Inputs are declared as uniforms, which need to be set by the application before each rendering call. The outputs on the other hand are part of the framebuffer target, which is much more implicit. A frame buffer is a special kind of object that points to a set of textures to which the rendering is output. The default frame buffer will draw to your window so that things can actually be seen. This discrepancy between inputs and outputs is a bit of a pain, but it is simply a consequence of how OpenGL evolved. If I had to design a graphics API like this today I would definitely treat input and output buffers in a much more similar manner.

Whatever the case may be, we have the ability to fix this in Trial by giving the user a uniform view over both inputs and outputs to a shader stage. For instance, it would be nice if we could simply say something like the following:

(define-shader-pass renderer ()
  ((:output color)))

(define-shader-pass blur ()
  ((:input previous)
   (:output color)))

Then when we need to define an actual shader pipeline we could just do something like this:

(let ((pipeline (make-pipeline))
      (renderer (make-instance 'renderer))
      (blur (make-instance 'blur)))
  (connect (output color renderer) (input previous blur) pipeline)
  (pack pipeline))

What you see above, you can do in Trial, almost exactly. The syntax is a bit different, but the concept is much the same. The pack step analyses the graph formed by the pipeline and automatically allocates the minimal number of texture buffers needed, re-using them where possible. This is especially useful if you have a long chain of post-effects.

As the texture allocation system is implemented right now it's unfortunately pretty suboptimal. There's some severe limitations that I need to fix, but I have a good idea on how to do that, so I'll probably get around to doing that soon. Still, for most of my work so far the current approach has sufficed, and the general idea is clear.

With the pipeline and allocation system in place, one half of the equation is solved. The other half requires a bit more ingenuity. So far the passes we've looked at did one of two things: either simply render the scene to a buffer, or perform some kind of buffer to buffer transformation like blurring. However, most actually interesting shaders will want to do more complex things than that - they'll want to not only render objects, but also influence how they are rendered.

But, hold on, don't the objects themselves decide how they are rendered? And now the pass should suddenly decide what's going on? Well yes, and yes. Some passes will want to completely influence how every object is drawn, some will want to influence how objects are drawn, and yet some will not care how the objects are drawn at all. Allowing this is the second part of the shader pipeline protocol. As you know, modelling protocols with CLOS is my favourite thing to do, so strap in.

(define-shader-entity shader-pass () ())

(defgeneric register-object-for-pass (pass object))
(defgeneric shader-program-for-pass (pass object))
(defgeneric make-pass-shader-program (pass class))
(defgeneric coerce-pass-shader (pass class type spec))

The above class definition is not quite accurate, but for simplicity's sake I've shortened it to something that will be clear enough for our purposes here, and will link back to the previous entry appropriately. Each shader pass is a shader entity, and as such allows you to attach shader code to the class definition.

The first function, register-object-for-pass, is used to notify a shader pass about an object that would like to draw in the current scene. This is necessary for any shader pass that would like to influence the drawing of an object, or leave it up to the object entirely. In both of those cases, the pass will compute a new effective shader for the object and compile that into a shader program to use. This program can then be retrieved by the object with shader-program-for-pass. Thus, whenever an object is painted onto a stage, it can use this function to retrieve the shader program and set the necessary uniforms with it. Stages that would like to supplant their own rendering method can simply ignore these two functions.

The make-pass-shader-program function is responsible for combining the shader programs from the given class and pass, and to produce a useful shader program resource object as a result. This function, by default, simply does some book keeping and calls coerce-pass-shader to do the actual combination. coerce-pass-shader in turn merely retrieves the corresponding shader code from the pass and once again uses glsl-toolkit's merge-shader-sources to combine it with the existing shader.

That might be a bit much to put together in your head, so let's look at an example. We'll have an object that simply draws itself textured, and a shader pass that should apply a sort of fade out effect where things farther away become more transparent.

(define-shader-entity cube (vertex-entity textured-entity) ())

(define-shader-pass fade (render-pass) ())

(define-class-shader (fade :fragment-shader)
  "out vec4 color;
  
void main(){
  color.a *= (1 - gl_FragCoord.z);
}")

(defmethod setup-scene ((main main) scene)
  (enter (make-instance 'fade) scene)
  (enter (make-instance 'cube) scene))

Similar to how we did it in the previous entry, we define a new entity that is composed of vertices and has textured surfaces. I omitted some details like what vertex data and textures to use to make it simpler. Then we define a new shader pass, which is based on the existing render-pass that is a per-object-pass with a default color and depth output. Next we have to add the shader logic to our pass. Doing what we want is actually quite simple - we simply multiply the colour's alpha value with the depth value. The depth needs to be inverted, as by default 1 is far away and 0 is close by. Thus, with this shader, the farther away the fragment is, the more transparent it will now become.

Finally we define a method that hooks into the rest of Trial and establishes a scene to be drawn. We first enter a new instance of our fade pass into it, which will automatically get added to the underlying shader pipeline. We then add an instance of the cube to it, at which point several things will happen:

  1. The cube is added to the scene graph
  2. register-object-for-pass is called on the pass and cube instances.
  3. This will call determine-effective-shader-class on the cube. This function is used as a caching strategy to avoid duplicating shaders for classes that don't change anything from their superclass.
  4. If the determined class is not yet registered, make-pass-shader-program is called.
  5. make-pass-shader-program computes the effective shader definitions for each shader type.
  6. coerce-pass-shader is called for each type of shader definition that is used, and combines the object's shader source with that of the pass.
  7. The resulting shader program resource object is returned and stored in the pass.

The fragment shader produced by coerce-pass-shader will look something like this:

in vec2 texcoord;
out vec4 color;
uniform sampler2D texture_image;

void _GLSLTK_main_1(){
  color *= texture(texture_image, texcoord);
}

void _GLSLTK_main_2(){
  color.a *= (1 - gl_FragCoord.z);
}

void main(){
  _GLSLTK_main_1();
  _GLSLTK_main_2();
}

Again, this neatly combines the two effects into one source file that can be transparently used in the system. Once it comes to actually drawing the scene, something like the following course of events is going to take place:

  1. paint is called on the scene with the pipeline.
  2. paint-with is called on the fade pass with the scene.
  3. paint is called on the cube with the fade pass.
  4. shader-program-for-pass is called on the fade pass and cube. The previously cached program is retrieved and returned.
  5. The textured-entity part of the cube uses the shader program to set the texture uniform.
  6. The vertex-entity part of the cube uses the shader program to set the transformation matrices.
  7. The vertex-entity part of the cube calls the OpenGL vertex drawing function to finally run the rendering process.
  8. The color output from the fade pass is blitted onto the window for the user to marvel at.

And that is pretty much the entire process. Obviously some steps will be repeated as needed if you add more objects and passes. The pipeline system ensures that the passes are run in the correct order to satisfy dependencies. In case you're wondering about the paint and paint-with inversion, the latter function is useful so that the shader pass gets to have absolute control over what is rendered and how if it so desires. For instance, it could completely ignore the scene and draw something else entirely.

That about concludes the explanation of Trial's shader system as it stands today. As I've mentioned over these two entries there are a few minor places where improvements have to be made, but overall the design seems quite solid. At least I hope I won't suddenly realise a severe limitation somewhere down the line, forcing me to rethink and rewrite everything again.

Next I think I will take a bit of a break from talking about engine details, and instead talk about geometry clipmaps and the adventure I had trying to implement them. Until then~

07 Feb 2018 11:34pm GMT

Quicklisp news: Quicklisp implementation stats for 2017+

Here are some raw HTTP request stats for each CL implementation supported by Quicklisp from 2017-01-01 onward. Each request corresponds roughly to downloading a single project.

  • SBCL: 4,518,774 requests
  • Clozure CL: 488,057
  • CLISP: 34,767
  • LispWorks: 25,700
  • ABCL: 23,913
  • Allegro: 19,501
  • ECL: 19,027
  • CLASP: 3,335
  • CMUCL: 965
  • MKCL: 79
  • Scieneer: 0
I gathered this info to check Scieneer activity levels to plan future support. Turns out nobody with Scieneer has used Quicklisp since 2013. Bummer!

07 Feb 2018 2:50pm GMT

04 Feb 2018

feedPlanet Lisp

TurtleWare: cl-charms crash course

Writing a new CLIM backend

This work is meant as a showcase how to write a new McCLIM backend. To make it more interesting to me I'm writing it using cl-charms library which is a Common Lisp library for ncurses - console manipulation library for UNIX systems. During development I'm planning to make notes about necessary steps. If possible I'll also write a test suite for backends which will test the functionality from most basic parts (like creating windows) to more sophisticated ones (transformations and drawing). That should simplify verifying, if a new backend works fine and to what degree it is complete. We start with a crash course for cl-charms library.

Pointers

cl-charms crash course

Ensure you have ncurses development package installed on your system. Start the real terminal (Emacs doesn't start *inferior-lisp* in something ncurses can work with) and launch your implementation. I use CCL because usually software is developed with SBCL and I want to catch as many problems with cl-charms as possible. After that start swank server and connect from your Emacs session.

~ ccl
? (defvar *console-io* *terminal-io*) ; we will need it later
*CONSOLE-IO*
? (ql:quickload 'swank :silent t)
(SWANK)
? (swank:create-server :port 4005 :dont-close t)
;; Swank started at port: 4005.
4005
? (loop (sleep 1))

We loop over sleep because we don't want console prompt to read our first line for the console. If you don't do that you may have somewhat confusing behavior.

In Emacs: M-x slime-connect *Host:* localhost *Port:* 4005. Now we are working in *slime-repl ccl* buffer in Emacs and we have ncurses output in the terminal we have launched server from. Try some demos bundled with the library:

CL-USER> (ql:quickload '(cl-charms bordeaux-threads alexandria))
(CL-CHARMS BORDEAUX-THREADS ALEXANDRIA)
CL-USER> (ql:quickload '(cl-charms-paint cl-charms-timer) :silent t)
(CL-CHARMS-PAINT CL-CHARMS-TIMER)
CL-USER> (charms-timer:main) ; quit with Q, start/stop/reset with [SPACE]
CL-USER> (charms-paint:main) ; quit with Q, move with WSAD and paint with [SPACE]

Now we will go through various charms (and ncurses) capabilities. Our final goal is to have a window with four buttons and text input box. Navigation should be possible with [TAB] / [SHIFT]+[TAB] and by selecting gadgets with a mouse pointer. Behold, time for the first application.

First application

Lets dissect this simple program which prints "Hello world!" on the screen:

(defun hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (loop named hello-world
       with window = (charms:make-window 50 15 10 10)
       do (progn
            (charms:clear-window window)
            (charms:write-string-at-point window "Hello world!" 0 0)
            (charms:refresh-window window)

            ;; Process input
            (when (eql (charms:get-char window) #\q)
              (return-from hello-world))
            (sleep 0.1)))))

Program must be wrapped in charms:with-curses macro which ensures proper initialization and finalization of the program. In this operator context charms functions which configure the library are available. We use charms:disable-echoing to prevent unnecessary obfuscation of the window (we interpret characters ourself) and (charms:enable-raw-input) to turn off line buffering. charms:*standard-window* is a window covering whole terminal screen.

We create a Window for output (its size is 50x15 and offset is 10x10) and then in a loop we print "Hello world!" (at the top-left corner of it) until user press the character q.

Extending cl-charms API

All functions used until now come from higher-level interface. charms has also a low-level interface which maps to libncurses via CFFI. This interface is defined in the package named charms/ll. I highly recommend skimming through http://www.tldp.org/HOWTO/NCURSES-Programming-HOWTO which is a great overview of ncurses functionality.

We want borders around the window. CFFI interface is a bit ugly (i.e we would have to extract a window pointer to call wborder on it). We are going to abstract this with a function which plays nice with the lispy abstraction.

(defun draw-window-border (window
                           &optional;
                             (ls #\|) (rs #\|) (ts #\-) (bs #\-)
                             (tl #\+) (tr #\+) (bl #\+) (br #\+))
  (apply #'charms/ll:wborder (charms::window-pointer window)
         (mapcar #'char-code (list ls rs ts bs tl tr bl br))))

(defun draw-window-box (window &optional; (verch #\|) (horch #\-))
  (charms/ll:box (charms::window-pointer window) (char-code verch) (char-code horch)))

Now we can freely use draw-window-border. Put (draw-window-box window) after (charms:clear-window window) in hello-world program and see the result. It is ugly, but what did you expect from a window rendered in the terminal?

It is worth mentioning that border is drawn inside the window, so when we start writing string at point [0,0] - it overlaps with the border. If we want to paint content inside the border we should start at least at [1,1] and stop at [48,13].

Somewhat more appealing result may be achieved by having distinct window background instead of drawing a border with characters. To do that we need to dive into the low-level interface once more. We define colors API.

(defun start-color ()
  (when (eql (charms/ll:has-colors) charms/ll:FALSE)
    (error "Your terminal does not support color."))
  (let ((ret-code (charms/ll:start-color)))
    (if (= ret-code 0)
        T
        (error "start-color error ~s." ret-code))))

(eval-when (:load-toplevel :compile-toplevel :execute)
  (defconstant +black+   charms/ll:COLOR_BLACK)
  (defconstant +red+     charms/ll:COLOR_RED)
  (defconstant +green+   charms/ll:COLOR_GREEN)
  (defconstant +yellow+  charms/ll:COLOR_YELLOW)
  (defconstant +blue+    charms/ll:COLOR_BLUE)
  (defconstant +magenta+ charms/ll:COLOR_MAGENTA)
  (defconstant +cyan+    charms/ll:COLOR_CYAN)
  (defconstant +white+   charms/ll:COLOR_WHITE))

(defmacro define-color-pair ((name pair) foreground background)
  `(progn
     (start-color)
     (defparameter ,name (progn (charms/ll:init-pair ,pair ,foreground ,background)
                                (charms/ll:color-pair ,pair)))))

(define-color-pair (+white/blue+ 1) +white+ +blue+)
(define-color-pair (+black/red+ 2) +black+ +red+)

(defun draw-window-background (window color-pair)
  (charms/ll:wbkgd (charms::window-pointer window) color-pair))

(defmacro with-colors ((window color-pair) &body body)
  (let ((winptr (gensym)))
    (alexandria:once-only (color-pair)
      `(let ((,winptr (charms::window-pointer ,window)))
         (charms/ll:wattron ,winptr ,color-pair)
         ,@body
         (charms/ll:wattroff ,winptr ,color-pair)))))

start-color must be called when we configure the library. We map charm/ll constants to lisp constants and create define-color-pair macro. This abstraction could be improved so we are not forced to supply pair numbers and providing proper association between names and integers. We skip that step for brevity. Define two color pairs, function for filling a window background and macro with-colors for drawing with a specified palette. Finally lets use the new abstraction in pretty-hello-world function:

(defun pretty-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (loop named hello-world
       with window = (charms:make-window 50 15 10 10)
       do (progn
            (charms:clear-window window)
            (draw-window-background window +white/blue+)
            (with-colors (window +white/blue+)
              (charms:write-string-at-point window "Hello world!" 0 0))
            (with-colors (window +black/red+)
              (charms:write-string-at-point window "Hello world!" 0 1))
            (charms:refresh-window window)

            ;; Process input
            (when (eql (charms:get-char window :ignore-error t) #\q)
              (return-from hello-world))
            (sleep 0.1)))))

Result looks, as promised in the function name, very pretty ;-)

Asynchronous input

Printing Hello world! doesn't satisfy our needs, we want to interact with a brilliant software we've just made while its running. Even more, we want to do it without blocking computations going on in the system (which are truly amazing, believe me). First lets visualise these computations to know that they are really happening. Modify the program loop to draw Hello World! in different color on each iteration.

(defun amazing-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (loop named hello-world
       with window = (charms:make-window 50 15 10 10)
       for flip-flop = (not flip-flop)
       do (progn
            (charms:clear-window window)
            (draw-window-background window +white/blue+)
            (with-colors (window (if flip-flop
                                     +white/blue+
                                     +black/red+))
              (charms:write-string-at-point window "Hello world!" 0 0))
            (charms:refresh-window window)
            ;; Process input
            (when (eql (charms:get-char window :ignore-error t) #\q)
              (return-from hello-world))
            (sleep 1)))))

Something is not right. When we run amazing-hello-world to see it flipping - it doesn't. Our program is flawed. It waits for each character to verify that the user hasn't requested application exit. You press any key (i.e space) to proceed to the next iteration. Now we must think of how to obtain input from user without halting the application.

To do that we can enable non blocking mode for our window.

with window = (let ((win (charms:make-window 50 15 10 10)))
                (charms:enable-non-blocking-mode win)
                win)

This solution is not complete unfortunately. It works reasonably well, but we have to wait a second (because "computation" is performed every second, we call sleep after each get-char) before the character is handled. It gets even worse if we notice, that pressing five times character b and then q will delay processing by six seconds (characters are processed one after another in different iterations with one second sleep between them). We need something better.

I hear your internal scream: use threads! It is important to keep in mind, that if you can get without threads you probably should (same applies for cache and many other clever techniques which introduce even cleverer bugs). Keep also in mind that ncurses is not thread-safe. We are going to listen for events from all inputs like select does and generate "recompute" event each second. On implementation which support timers we could use them but we'll use... a thread to generate "ticks". Note that we use a thread as an asynchronous input rather than asynchronous charms access.

;;; asynchronous input hack (should be a mailbox!)
(defparameter *recompute-flag* nil "ugly and unsafe hack for communication")
(defvar *recompute-thread* nil)

(defun start-recompute-thread ()
  (when *recompute-thread*
    (bt:destroy-thread *recompute-thread*))
  (setf *recompute-thread*
        (bt:make-thread
         #'(lambda ()
             (loop
                (sleep 1)
                (setf *recompute-flag* t))))))

(defun stop-recompute-thread ()
  (when *recompute-thread*
    (bt:destroy-thread *recompute-thread*)
    (setf *recompute-thread* nil)))

In this snippet we create an interface to start a thread which sets a global flag. General solution should be a mailbox (or a thread-safe stream) where asynchronous thread writes and event loop reads from. We will settle with this hack though (it is a crash course not a book after all). Start recompute thread in the background before you start new application. Note, that this code is not thread-safe, we concurrently read and write to a global variable. We are also very drastic with bt:destroy-thread, something not recommended in any code which is not a demonstration like this one.

Time to refactor input and output functions: display-amazing-hello-world and get-amazing-hello-world-input.

(defun display-amazing-hello-world (window flip-flop)
  (charms:clear-window window)
  (draw-window-background window +white/blue+)
  (with-colors (window (if flip-flop
                           +white/blue+
                           +black/red+))
    (charms:write-string-at-point window "Hello world!" 0 0))
  (charms:refresh-window window))

(defun get-amazing-hello-world-input (window)
  (when *recompute-flag*
    (setf *recompute-flag* nil)
    (return-from get-amazing-hello-world-input :compute))
  (charms:get-char window :ignore-error t))

And finally improved application which takes asynchronous input without blocking.

(defun improved-amazing-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (let ((window (charms:make-window 50 15 10 10))
          (flip-flop nil))
      (charms:enable-non-blocking-mode window)
      (display-amazing-hello-world window flip-flop)
      (loop named hello-world
         do (case (get-amazing-hello-world-input window)
              ((#\q #\Q) (return-from hello-world))
              (:compute (setf flip-flop (not flip-flop))
                        (display-amazing-hello-world window flip-flop))
              ;; don't be a pig to a processor
              (otherwise (sleep 1/60)))))))

When you are done with demo you may call stop-recompute-thread to spare your image unnecessary flipping a global variable.

Gadgets and input handling

So we have created an amazing piece of software which does the computation and reacts (instantaneously!) to our input. Greed is an amazing phenomena - we want more... We want interactive application - buttons and input box (allowing us to influence the amazing computation at run time).

First we define abstract class gadget.

(defparameter *active-gadget* nil)

;;; gadget should be type of `window' or `panel' - we are simplistic
(defclass gadget ()
  ((position :initarg :position :accessor gadget-position)))

(defgeneric display-gadget (window gadget &key &allow-other-keys)
  (:method ((window charms:window) (gadget gadget) &key)
    (declare (ignore window gadget))))

(defgeneric handle-input (gadget input &key &allow-other-keys)
  (:method (gadget input &key)
    (declare (ignore gadget input))))

In our model each gadget has at least position, display function and method for handling input. Both methods are gadget-specific with defaults doing nothing. We define also a parameter *active-gadget* which holds gadget receiving input.

handle-input returns T only if this input causes, that gadget has to be redisplayed. Otherwise it should return NIL (this is a small optimization which we will use later in main application loop).

Lets define something what we will use in our application.

(define-color-pair (+black/white+ 3) +black+ +white+) ; color for text-input (inactive)
(define-color-pair (+black/cyan+ 4) +black+ +cyan+)   ; color for text-input (active)
(defparameter *computation-name* "Hello world!")

(defclass text-input-gadget (gadget)
  ((buffer :initarg :buffer :accessor gadget-buffer)
   (width :initarg :width :reader gadget-width)))

(defun make-text-input-gadget (width x y)
  (make-instance 'text-input-gadget
                 :width width
                 :position (cons x y)
                 :buffer (make-array width
                                     :element-type 'character
                                     :initial-element #\space
                                     :fill-pointer 0)))

(defmethod display-gadget ((window charms:window) (gadget text-input-gadget) &key)
  (with-colors (window (if (eql gadget *active-gadget*)
                           +black/cyan+
                           +black/white+))
    (let ((background (make-string (gadget-width gadget) :initial-element #\space)))
      (destructuring-bind (x . y) (gadget-position gadget)
        (charms:write-string-at-point window background x y)
        (charms:write-string-at-point window (gadget-buffer gadget) x y)))))

(defmethod handle-input ((gadget text-input-gadget) input &key)
  (let ((buffer (gadget-buffer gadget)))
    (case input
      ((#\Backspace #\Rubout)
       (unless (zerop (fill-pointer buffer))
         (vector-pop buffer)))
      ((#\Return #\Newline)
       (unless (zerop (fill-pointer buffer))
         (setf *computation-name* (copy-seq buffer)
               (fill-pointer buffer) 0)))
      (#\ESC
       (setf (fill-pointer buffer) 0))
      (otherwise
       (when (ignore-errors (graphic-char-p input))
         (vector-push input buffer))))))

First gadget we define is text-input-gadget. What we need as its internal state is a buffer which holds text which is typed in the box. We care also about the string maximal width.

Moreover define colors for it to use in display-gadget (we depend on global parameter *active-gadget* what is a very poor taste). In display function we create a "background" (that wouldn't be necessary if it were a panel, abstraction defined in a library accompanying ncurses) and then at the gadget position we draw background and buffer contents (text which was already typed in text-input-gadget).

Second function handle-input interprets characters it receives and acts accordingly. If it is backspace (or rubout as on my keyboard with this terminal settings), we remove one element. If it is return (or newline) we change the computation name and empty input. escape clears the box and finally, if it is a character which we can print, we add it to the text-input (vector-push won't extend vector length so extra characters are ignored).

(defparameter *gadgets*
  (list (make-text-input-gadget 26 2 13)))

(defun display-greedy-hello-world (window flip-flop)
  (charms:clear-window window)
  (draw-window-background window +white/blue+)
  (with-colors (window (if flip-flop
                           +white/blue+
                           +black/red+))
    (charms:write-string-at-point window *computation-name* 2 1))
  (dolist (g *gadgets*)
    (if (eql g *active-gadget*)
        (display-gadget window g)
        (charms:with-restored-cursor window
          (display-gadget window g))))
  (charms:refresh-window window))

We maintain a list of gadgets which are displayed one-by-one in the window (after signalling, that computation is being performed). Previously we had hard coded "Hello world!" name but now we depend on variable *computation-name* which may be modified from the input box.

Each gadget is displayed but only *active-gadget* is allowed to modify cursor position. Other rendering is wrapped in charms:with-restored-cursor macro which does the thing name suggests. That means, that cursor will be position whenever *active-gadget* puts it (or if it doesn't modify its position - cursor will be at the end of the computation string).

(defun get-greedy-hello-world-input (window)
  (when *recompute-flag*
    (setf *recompute-flag* nil)
    (return-from get-greedy-hello-world-input :compute))
  (charms:get-char window :ignore-error t))

(defun greedy-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (let ((window (charms:make-window 50 15 10 10))
          (flip-flop nil))
      (charms:enable-non-blocking-mode window)
      (display-greedy-hello-world window flip-flop)
      (catch :exit
        (loop
           do (let ((input (get-greedy-hello-world-input window)))
                (case input
                  (#\Dc1 ;; this is C-q
                   (throw :exit :c-q))
                  (#\Dc2 ;; this is C-r
                   (charms:clear-window charms:*standard-window*)
                   (charms:refresh-window charms:*standard-window*)
                   (display-greedy-hello-world window flip-flop))
                  (#\tab
                   (alexandria:if-let ((remaining (cdr (member *active-gadget* *gadgets*))))
                     (setf *active-gadget* (car remaining))
                     (setf *active-gadget* (car *gadgets*)))
                   (display-greedy-hello-world window flip-flop))
                  (:compute
                   (setf flip-flop (not flip-flop))
                   (display-greedy-hello-world window flip-flop))
                  (otherwise
                   (if (handle-input *active-gadget* input)
                       ;; redisplay only if handle-input returns non-NIL
                       (display-greedy-hello-world window flip-flop)
                       ;; don't be a pig to a processor
                       (sleep 1/60))))))))))

get-greedy-hello-world-input is fairly the same as get-amazing-hello-world-input for now. greedy-hello-world on the other hand handles input differently. For Quit we reserve C-q instead of q because we want to be able to type this character in the input box. We also add C-r sequence to refresh whole screen (just in case if we resize the terminal and some glitches remain). :compute is handledthe same way as it was previously. Finally if input is something else we feed it to the *active-gadget*. If handle-input returns something else than NIL we redisplay the application.

Note that if it is not redisplayed (i.e input is not handled because *active-gadget* was NIL or it didn't handle the input) we are a good citizen and instead of hogging the processor we wait 1/60 of a second. On the other hand if events come one by one we are legitimately busy so we skip this rest time and continue the event loop.

We don't have means to change which gadget is active yet. For that we have to add a new key which will be handled in the main event loop of the application #\tab. At least we wrap whole loop body in (catch :exit) to allow dynamic exit (instead of lexical one with block/return-from pair).

Start the application and make text-input-gadget active by pressing [tab]. Notice that its background changes. Now we have working input box where we can type new string and when we confirm it with [enter] string at the top will change.

We want more gadgets. For now we will settle with buttons.

(defclass button-gadget (gadget)
  ((label :initarg :label :reader gadget-label)
   (action :initarg :action :reader gadget-action)))

(defun make-button-gadget (text callback x y)
  (make-instance 'button-gadget :label text :action callback :position (cons x y)))

(defmethod display-gadget ((window charms:window) (gadget button-gadget) &key)
  (with-colors (window (if (eql gadget *active-gadget*)
                           +red/black+
                           +yellow/black+))
    (destructuring-bind (x . y) (gadget-position gadget)
      (charms:write-string-at-point window (gadget-label gadget) x y))))

(defmethod handle-input ((gadget button-gadget) input &key)
  (when (member input '(#\return #\newline))
    (funcall (gadget-action gadget))))

Each button is a gadget which (in addition to inherited position) has a label and the associated action. Active button label is drawn with yellow and red ink depending on its state. Handling input reacts only to pressing [enter]. When button action is activated gadget's action is funcall-ed (without arguments).

Modify *gadgets* parameter to have four buttons of ours and run the application (try pressing [tab] a few times to see that active widget is changing).

(defparameter *gadgets*
  (list (make-text-input-gadget 26 2 13)
        (make-button-gadget " Toggle " 'toggle-recompute-thread 30 11)
        (make-button-gadget "  Exit  " 'exit-application 40 11)
        (make-button-gadget " Accept " 'accept-input-box 30 13)
        (make-button-gadget " Cancel " 'cancel-input-box 40 13)))

(defun toggle-recompute-thread ()
  (if *recompute-thread*
      (stop-recompute-thread)
      (start-recompute-thread)))

(defun accept-input-box ()
  (handle-input (car *gadgets*) #\return))

(defun cancel-input-box ()
  (handle-input (car *gadgets*) #\esc))

(defun exit-application ()
  (throw :exit :exit-button))

We have created four buttons - Toggle starts/stops the "computation" thread which feeds us with input, Exit quits the application (that's why we have changed block to catch in greedy-hello-world main loop) and Accept / Cancel pass return and escape input to the text-input-gadget. Once again we prove that we lack a good taste because we aim at first element of *gadgets* parameter assuming, that first element is text input field.

The result looks a little like a user interface, doesn't it? Try activating various buttons and check out if text input works as desired. When you select Toggle and press [enter] computation label at the top will start / stop blinking in one second intervals.

Mouse integration

Our software has reached the phase where it is production ready. Investors crawl at our doorbell because they feel it is something what will boost their income. We are sophisticated and we have proposed a neat idea which will have a tremendous impact on the market. We have even created buttons and text-input gadgets which are the key to success. Something is still missing though. After a brainstorm meeting with the most brilliant minds of our decade we've came to a conclusion - what we miss is the mouse integration. That's crazy, I know. Mouse on a terminal? It is a heresy! Yet we believe that we have to take the risk if we want to outpace the competition. Roll up your sleeves - we are going to change the face of the modern UI design! ;-)

First we will start with abstracting things to make it easy to distribute events among gadgets. To do that we need to know where pointer event has happened. Lets assume that mouse event is composed of three parameters: button, x coordinate and y coordinate. Each gadget occupies some space on the screen (lets call it a region) which may be characterized by its bounding rectangle with [min-x, min-y] and [max-x, max-y] points.

(defgeneric bounding-rectangle (gadget)
  (:method ((gadget text-input-gadget))
    (destructuring-bind (x . y) (gadget-position gadget)
      (values x
              y
              (+ x -1 (gadget-width gadget))
              y)))
  (:method ((gadget button-gadget))
    (destructuring-bind (x . y) (gadget-position gadget)
      (values x
              y
              (+ x -1 (length (gadget-label gadget)))
              y))))

We could refactor button-gadget to have gadget-width method (that would simplify the implementation), but lets use what we have now. Having such representation of bounding rectangle we can now easily determine if cursor event occurred "over" the gadget or somewhere else.

(defun region-contains-position-p (gadget x y)
  (multiple-value-bind (x-min y-min x-max y-max)
      (bounding-rectangle gadget)
    (and (<= x-min x x-max)
         (<= y-min y y-max))))

Now distributing mouse event is as easy as iterating over all gadgets and verifying if event applies to the gadget. If it does we make such gadget active. Moreover if left mouse button is clicked we simulate [return] key to be handled by the previously defined handle-input method.

(defun distribute-mouse-event (bstate x y)
  (dolist (g *gadgets*)
    (when (region-contains-position-p g x y)
      (setf *active-gadget* g)
      (when (eql bstate charms/ll:button1_clicked)
        (handle-input g #\return))
      (return))))

Notice that we have used low-level charms interface (comparing bstate with charms/ll:button1_clicked). Other events are also defined in the charms/ll package so you may expand mouse handling interface.

Time to define display function for our enterprise application. ncurses cursor should to follow mouse movement, so displaying gadgets can't affect cursor position. To achieve that we wrap whole display function in charms:with-restored-cursor macro.

(defun display-enterprise-hello-world (window flip-flop)
  (charms:with-restored-cursor window
    (charms:clear-window window)
    (draw-window-background window +white/blue+)
    (if flip-flop
        (with-colors (window +white/blue+)
          (charms:write-string-at-point window *computation-name* 2 1))
        (with-colors (window +black/red+)
          (charms:write-string-at-point window *computation-name* 2 1)))
    (dolist (g *gadgets*) (display-gadget window g))
    (charms:refresh-window window)))

Usually terminal emulators doesn't report mouse movement (only clicks). To enable such reports print the following in the terminal output (note, that slime doesn't bind *terminal-io* to the right thing, so we use *console-io* which we have dfined at the beginning):

(format *console-io* "~c[?1003h" #\esc)

This escape sequence sets xterm mode for reporting any mouse event including mouse movement (see http://invisible-island.net/xterm/ctlseqs/ctlseqs.html). This escape sequence is usually honored by other terminal emulators (not only xterm). Without it we wouldn't be able to track mouse movement (only pointer press, release, scroll etc). This is important to us because we want to activate gadgets when mouse moves over them.

To start mouse and configure its events lets create initialization function. We configure terminal to report all mouse events including its position. After that we tell the terminal emulator to report mouse position events.

(defun start-mouse ()
  (charms/ll:mousemask
   (logior charms/ll:all_mouse_events charms/ll:report_mouse_position))
  (format *console-io* "~c[?1003h~%" #\esc))

We need to handle new event type :key-mouse (already handled types are C-q, C-r, TAB, :compute and "other" characterrs). Since event handling gets more complicated we factor it into a separate function enterprise-process-event. Note the handler-case - when mouse event queue is empty (for instance because the mouse event is masked), then function will return error. We also take into account that mouse events are reported starting at absolute terminal coordinates while our window starts at point [10,10]. Additionally we implement shift+tab sequence which on my system is reported as #\Latin_Small_Letter_S_With_Caron.

(defun get-enterprise-hello-world-input (window)
  (when *recompute-flag*
    (setf *recompute-flag* nil)
    (return-from get-enterprise-hello-world-input :compute))
  (let ((c (charms/ll:wgetch (charms::window-pointer window))))
    (when (not (eql c charms/ll:ERR))
      (alexandria:switch (c)
        (charms/ll:KEY_BACKSPACE #\Backspace)
        (charms/ll:KEY_MOUSE :KEY-MOUSE)
        (otherwise (charms::c-char-to-character c))))))

(defun enterprise-process-event (window flip-flop)
  (loop
     (let ((input (get-enterprise-hello-world-input window)))
       (case input
         (#\Dc1 ;; this is C-q
          (throw :exit :c-q))
         (#\Dc2 ;; this is C-r
          (display-enterprise-hello-world window flip-flop))
         (#\tab
          (alexandria:if-let ((remaining (cdr (member *active-gadget* *gadgets*))))
            (setf *active-gadget* (car remaining))
            (setf *active-gadget* (car *gadgets*)))
          (display-enterprise-hello-world window flip-flop))
         (#\Latin_Small_Letter_S_With_Caron ;; this is S-[tab]
          (if (eql *active-gadget* (car *gadgets*))
              (setf *active-gadget* (alexandria:lastcar *gadgets*))
              (do ((g *gadgets* (cdr g)))
                  ((eql *active-gadget* (cadr g))
                   (setf *active-gadget* (car g)))))
          (display-enterprise-hello-world window flip-flop))
         (:key-mouse
          (handler-case (multiple-value-bind (bstate x y z id)
                            (charms/ll:getmouse)
                          (declare (ignore z id))
                          ;; window starts at 10,10
                          (decf x 10)
                          (decf y 10)
                          (charms:move-cursor window x y)
                          (distribute-mouse-event bstate x y)
                          (display-enterprise-hello-world window flip-flop))
            (error () nil)))
         (:compute
          (setf flip-flop (not flip-flop))
          (display-enterprise-hello-world window flip-flop))
         (otherwise
          (if (handle-input *active-gadget* input)
              ;; redisplay only if handle-input returns non-NIL
              (display-enterprise-hello-world window flip-flop)
              ;; don't be a pig to a processor
              (sleep 1/60)))))))

To make terminal report mouse events in the intelligible way we need to call function charms:enable-extra-keys (thanks to that we don't deal with raw character sequences). We also call start-mouse.

(defun enterprise-hello-world ()
  (charms:with-curses ()
    (charms:disable-echoing)
    (charms:enable-raw-input)
    (start-color)
    (let ((window (charms:make-window 50 15 10 10))
          (flip-flop nil))
      ;; full enterprise ay?
      (charms:enable-non-blocking-mode window)
      (charms:enable-extra-keys window)
      (start-mouse)
      (display-enterprise-hello-world window flip-flop)
      (catch :exit
        (loop (funcall 'enterprise-process-event window flip-flop))))))

Our final result is splendid! We got rich :-). Source of the following asciinema video is located here.

asciicast

To finish our lisp session type in the slime REPL (quit).

Conclusions

In this crash course we have explored some parts of the cl-charms interface (windows, colors, mouse integration...) and defined an ad-hoc toolkit for the user interface. There is much more to learn and the library may be expanded (especially with regard to high level interface, but also to include supplement ncurses libraries like panel, menu and forms). Ability to run charms application in a new terminal started with run-program would be beneficial too.

Some programmers may have noticed that we have defined an ad-hoc, informally-specified, bug-ridden, slow implementation of 1/10th of CLIM. No wonder, it defines a very good abstraction for defining UI in a consistent manner and it is a standard (with full specification).

Instead of reinventing the wheel we want to plug into CLIM abstractions and use cl-charms as CLIM backend. This option will be explored in a near-term future - it will help to document a process of writing new backends. Of course such backend will be limited in many ways - that will be an opportunity to explore bugs which got unnoticed when the smallest distance is a pixel (in contrast to a reasonably big terminal character size). Stay tuned :-)

04 Feb 2018 12:00am GMT

03 Feb 2018

feedPlanet Lisp

Zach Beane: Planet Lisp is back on twitter

Years ago, Hans Hübner created the @planet_lisp twitter account and made some Lisp software to keep it updated with recent posts from Planet Lisp.

But it stopped working last summer, and he offered to let someone else run it, so I wrote new software and it should be working again.

This post is half announcement, half a test to make sure the new gateway works.

Enjoy!

03 Feb 2018 6:32pm GMT

02 Feb 2018

feedPlanet Lisp

Nicolas Hafner: Integrating Shaders and Objects - Gamedev

header
In this entry I'll describe the way in which shaders are integrated into the object system in Trial. It serves as a predecessor to the next entry, which will be about shader pipelines.

In case you're not familiar with modern graphics programming, a shader is a piece of code that runs on the GPU during a particular stage in the rendering pipeline. The customisable part of the pipeline looks as follows:

Vertex (Tesselation Control) (Tesselation Evaluation) (Geometry) Fragment

In order to run anything at all, you need to provide shaders for the vertex and the fragment stages. The vertex stage emits vertices that form triangles. The tesselation stages can then subdivide these triangles further to add detail. The geometry stage can add entirely new triangles, for instance to produce grass. The triangles are then rasterised, and for each fragment (pixel) that is drawn the fragment shader is evaluated to produce the resulting colour.

Whenever you perform any drawing command, this pipeline is run. Which particular shaders are run depends on the shader program that is bound at the time of drawing. The shader program ties together shaders for the different stages to form the complete pipeline.

Different objects in the scene you're drawing will need different shaders to produce individual effects and change drawing behaviour. However, often times a large part of the shaders will be similar or the same. As such, it seems sensible to introduce some kind of system to allow sharing bits and pieces of shader logic between objects. I'd like to do some in-depth research into how other engines approach this problem and publish a paper on it at some point. For now, here's a brief summary of the strategies I've seen so far:

The drawback with the include strategy is that the included code won't have any idea about where it's included. It limits you to only being able to include function definitions that are fully encapsulated. The drawback with the mega-shader should be obvious: you're creating a huge mess of a file. Simply copy pasting should also be obviously a bad idea since that doesn't really share anything at all.

The other issue that all of these approaches have in common is that they do not integrate with the rest of the engine's entity system at all. In almost all cases there will be some kind of inheritance system to allow reusing and combining behaviour between common components. However, all of these combination approaches won't include shaders; they still have to be handled separately.

Trial solves this in two steps. The first step is the combination of shader parts, which is solved by glsl-toolkit. This library allows you to parse GLSL code into an AST, run static analysis on it, and use that to combine different code parts logically. Namely, it will recognise shared global variables and main functions, and will automatically rewrite the code to fit together properly. Naturally, there are limits to how well it can do this, but overall it has worked pretty well so far. There's other really interesting things that could be done with glsl-toolkit, like early error checking, inlining, cleanup, and so forth.

The second step is to use the MOP to tie shader code to classes. That way multiple inheritance can be used to combine shader parts as needed, alongside the other bits of logic that are naturally combined through methods and slots. When an instance of a class is finally created, all the shader parts can then be merged together to form a complete shader program.

In Trial this is implemented through the shader-entity class. Aside from a bunch of convenience method definitions and some MOP boilerplate, this is actually not that much code. Let's take a look at how the shaders are computed, which is the most substantial part of the file.

(defmethod compute-effective-shaders ((class shader-entity-class))
  (let ((effective-shaders ())
        (inhibited (inhibited-shaders class))
        (superclasses (remove 'shader-entity-class
                              (c2mop:compute-class-precedence-list class)
                              :test-not (lambda (type class) (typep class type)))))
    ;; Check whether inhibits are effective
    (loop for (name type) in inhibited
          for super = (find name superclasses :key #'class-name)
          do (cond ((not super)
                    (warn "No superclass ~s in hierarchy of ~s. Cannot inhibit its shader ~s." name (class-of super) (class-name class))
                    (setf (inhibited-shaders class) (remove (list name type) inhibited :test #'equal)))
                   ((not (getf (direct-shaders super) type))
                    (warn "No shader of type ~s is defined on ~s. Cannot inhibit it for ~s." type name (class-name class))
                    (setf (inhibited-shaders class) (remove (list name type) inhibited :test #'equal)))))
    ;; Compute effective inhibited list
    (loop for super in superclasses
          do (setf inhibited (append inhibited (inhibited-shaders super))))
    ;; Make all direct shaders effective
    (loop for (type shader) on (direct-shaders class) by #'cddr
          do (setf (getf effective-shaders type)
                   (list shader)))
    ;; Go through all superclasses in order
    (loop for super in superclasses
          do (loop for (type shader) on (direct-shaders super) by #'cddr
                   unless (find (list (class-name super) type) inhibited :test #'equal)
                   do (pushnew shader (getf effective-shaders type))))
    ;; Compute effective single shader sources
    (loop for (type shaders) on effective-shaders by #'cddr
          do (setf (getf effective-shaders type)
                   (glsl-toolkit:merge-shader-sources
                    (loop for (priority shader) in (stable-sort shaders #'> :key #'first)
                          collect (etypecase shader
                                    (string shader)
                                    (list (destructuring-bind (pool path) shader
                                            (pool-path pool path))))))))
    (setf (effective-shaders class) effective-shaders)))

It might look a bit intimidating at first, but it's not too complicated. First we compute a list of superclasses that are also shader-entity-classes. Then we go through the list of inhibition specs, which allow you to prevent certain shaders from being inherited. This can be useful if you want to slightly alter the way the shader is implemented without having to also rewrite the other parts of functionality that come with inheritance. When going through the inhibitions, we check whether they are even applicable at all, to give the user a bit of error checking. We then combine the inhibition specs with those of all the superclasses. This is done because we always compute a fresh effective shader from all direct shaders on the superclasses, so we need to preserve inhibition from superclasses as well. Next we push all the shaders that were defined directly on the class to the effective shaders. Then we go through all the superclasses in order and, unless their shader was inhibited, push it onto the effective shaders. Finally, for each shader stage we combine all the shader parts using glsl-toolkit's merge-shader-sources. The merging is done in the order of priority of the shaders. Adding a priority number to shaders allows you to bypass the natural order by inheritance, and instead make sure your code gets included at a particular point. This can be important because merge semantics change depending on the order.

Let's look at an example of how this can be useful. I will omit code that isn't directly related to the shaders. If you want to have a closer look at how it all works, check the helpers.lisp file.

(define-shader-entity vertex-entity () ())

(define-class-shader (vertex-entity :vertex-shader)
  "layout (location = 0) in vec3 position;

uniform mat4 model_matrix;
uniform mat4 view_matrix;
uniform mat4 projection_matrix;

void main(){
  gl_Position = projection_matrix * view_matrix * model_matrix * vec4(position, 1.0f);
}")

The vertex-entity is a base class for almost all entities that have a vertex array that should be drawn to the screen. It provides a single vertex shader that does the basic vertex translation from model space to clip space. In case you're not familiar with these terms, model space means that vertex positions are relative to the model's origin. Clip space means vertex positions are relative to the OpenGL viewport (window). If we ever have an object that draws a model in some way, this class is now a good candidate for inclusion as a superclass.

(define-shader-entity colored-entity () ())

(define-class-shader (colored-entity :fragment-shader)
  "uniform vec4 objectcolor;
out vec4 color;

void main(){
  color *= objectcolor;
}")

This class is a bit smaller and simply gives you the ability to colour the resulting fragments uniformly. Note however that we don't simply set the output colour to our object's colour, but instead multiply it. This means that this class, too, can be combined with others.

(define-shader-entity textured-entity () ())

(define-class-shader (textured-entity :vertex-shader)
  "layout (location = 1) in vec2 in_texcoord;
out vec2 texcoord;

void main(){
  texcoord = in_texcoord;
}")

(define-class-shader (textured-entity :fragment-shader)
  "in vec2 texcoord;
out vec4 out_color;
uniform sampler2D texture_image;

void main(){
  out_color *= texture(texture_image, texcoord);
}")

Finally, the textured-entity class provides you with the ability to stick textures onto your vertices. To do so it needs another attribute that describes the texture coordinates for each vertex and, using the vertex shader, passes these on down the line to the fragment shader. There it does a lookup into the texture and, as with the previous entity, multiplies the output colour with that of the texture's.

Now for the interesting bit: if we want to create an object that is drawn using a vertex array, is uniformly textured, and should be tinted with a particular colour, we can simply do this:

(define-shader-entity funky-object (vertex-entity colored-entity textured-entity) ())

When we finalise the inheritance on this class, all the shaders it inherits are combined to produce single shaders as we would expect:

TRIAL> (effective-shaders 'funky-object)
(:VERTEX-SHADER "#version 330 core
layout(location = 1) in vec2 in_texcoord;
out vec2 texcoord;

void _GLSLTK_main_1(){
  texcoord = in_texcoord;
}
layout(location = 0) in vec3 position;
uniform mat4 model_matrix;
uniform mat4 view_matrix;
uniform mat4 projection_matrix;

void _GLSLTK_main_2(){
  gl_Position = (projection_matrix * (view_matrix * (model_matrix * vec4(position, 1.0))));
}

void main(){
  _GLSLTK_main_1();
  _GLSLTK_main_2();
}"
:FRAGMENT-SHADER "#version 330 core
out vec4 color;

void _GLSLTK_main_1(){
  color = vec4(1.0, 1.0, 1.0, 1.0);
}
in vec2 texcoord;
uniform sampler2D texture_image;

void _GLSLTK_main_2(){
  color *= texture(texture_image, texcoord);
}
uniform vec4 objectcolor;

void _GLSLTK_main_3(){
  color *= objectcolor;
}

void main(){
  _GLSLTK_main_1();
  _GLSLTK_main_2();
  _GLSLTK_main_3();
}")

As you can see, the individual main functions got renamed and coupled, and specifications for in and out variables that denoted the same thing got merged together. The code perhaps isn't structured as nicely as one would like, but there's always room for improvements.

This inheritance and combination of shader parts is pretty powerful. While it will probably not allow you to produce very efficient shaders, this should not be a big concern for an overwhelmingly large class of projects, especially during development. Instead, this "Lego blocks" approach to shaders and graphics functionality allows you to be very productive in the early phases, just to hammer things out.

With that covered, you should be ready for the next part, where I will go into how rendering is actually performed, and how more complicated effects and functions are composed. Hopefully I'll have that out in a couple of days.

02 Feb 2018 10:22am GMT

Quicklisp news: Download stats for January, 2018

Here are the raw Quicklisp download stats for January 2018.

20452  alexandria
17822 closer-mop
16236 cl-ppcre
15928 split-sequence
15762 babel
15131 iterate
14929 anaphora
14859 trivial-features
14580 let-plus
14136 trivial-gray-streams
13668 bordeaux-threads
12524 cffi
12215 trivial-garbage
11815 flexi-streams
11383 puri
11201 more-conditions
11063 nibbles
10266 usocket
9545 utilities.print-items
9499 trivial-backtrace
9312 esrap
9170 cl+ssl
8940 chunga
8812 cl-base64
8446 chipz
8417 cl-fad
8107 drakma
7212 cl-yacc
6937 named-readtables
6929 asdf-flv
6647 fiveam
6429 local-time
6179 ironclad
6103 parse-number
5850 closure-common
5844 cxml
5743 log4cl
5229 architecture.hooks
5037 cl-json
5023 parser.common-rules
4773 plexippus-xpath
4530 optima
4323 lift
4028 lparallel
4002 cl-clon
3882 cl-dot
3594 cxml-stp
3469 cl-interpol
3421 xml.location
3399 cl-unicode
3363 utilities.print-tree
3350 cl-store
3311 fare-utils
3305 fare-quasiquote
3135 slime
3109 inferior-shell
3098 fare-mop
2810 metabang-bind
2557 trivial-utf-8
2556 trivial-types
2544 cl-utilities
2288 uuid
2244 cl-annot
2234 cl-syntax
2232 quri
2128 trivial-indent
2118 documentation-utils
2035 cl-slice
2027 plump
2023 array-utils
2013 gettext
1998 static-vectors
1983 symbol-munger
1966 collectors
1933 arnesi
1932 md5
1920 access
1915 djula
1898 cl-locale
1893 cl-parser-combinators
1833 fast-io
1822 simple-date-time
1804 hunchentoot
1567 ieee-floats
1537 yason
1364 prove
1312 asdf-system-connections
1307 metatilities-base
1304 cl-containers
1192 osicat
1138 monkeylib-binary-data
1115 rfc2388
1041 trivial-shell
993 diff
989 postmodern
961 cl-custom-hash-table
929 parse-float
912 salza2
882 cl-sqlite
872 trivial-mimes

02 Feb 2018 12:21am GMT

01 Feb 2018

feedPlanet Lisp

Nicolas Hafner: Assets, Resources, and Loaders - Gamedev

header
Welcome to a new article series! I've decided to try to record my thoughts on various matters about game development. This will include observations, discoveries, explanations of existing systems, and proposals for improvements in related projects. Most of this will be directly related to Shirakumo's game engine Trial and the Treehouse gamedev streams. This first entry is a proposal for a new resource management system in Trial.

Any game engine needs to deal with the management of assets and resources. To eliminate any possibility for confusion, I'll define what I mean by those two terms here:

Right now Trial mushes both of these concepts into one. For a while this seemed like a sane thing to do - after all, both assets and resources encapsulate some kind of loading logic, represent some kind of thing that you would like to load/allocate into foreign memory, and let stick around until you deallocate it again.

However, as the engine grows more complex and the demands for capabilities rise, this is no longer appropriate. For instance, not every texture you need will be loaded from an image. Frame buffers, complex shader pipelines, and other systems require textures without an external image attached. This has resulted in a hybrid approach, where a texture could take a "spec" as an input which would construct a texture without loading it.

Unfortunately, textures have a lot of properties: width, height, internal format, wrapping, min-filter, max-filter, attachment, depth, mipmap levels, structure, and probably others I'm forgetting about. The spec thus grew into a very weird format that not only didn't suit all needs, but was generally a pain to deal with. Textures aren't the only area where this combination of concerns hurts either. A similar problem arises for meshes and vertex buffers and arrays. There's probably others still that I haven't come across yet.

So, now that we have established that this separation of concerns is indeed a Good Idea™, we need to move on to the actual proposal. I will formulate this using my favourite modelling language, CLOS. Here goes:

(defclass resource ()
  ())

(defgeneric allocate (resource))
(defgeneric deallocate (resource))
(defgeneric allocated-p (resource))

So far so good. The resource is a relatively opaque type, which merely gives you operations to allocate, deallocate, or test for allocation. It is so opaque on purpose, to allow it to be as generic as possible. Double allocation or double deallocation should be automatically prevented by the protocol.

(defclass foreign-resource (resource)
  ((data-pointer :initform NIL
                 :initarg :data-pointer
                 :reader data-pointer)))

Most actual resources will represent a foreign data block of some kind, for which this foreign-resource subtype is useful. For GL resources for instance, the data-pointer would return the integer naming the specific resource.

(defclass asset (resource)
  ((pool :initarg :pool :reader pool)
   (name :initarg :name :reader name)
   (input :initarg :input :accessor input)))

(defgeneric load (asset))
(defgeneric reload (asset))
(defgeneric offload (asset))
(defgeneric loaded-p (asset))

Onward to the asset, where things get a bit more complicated. Trial employs something called "asset pools" that group assets under a common name. Each pool is associated with a base pathname, from which every asset within it is derived. This allows writing libraries for Trial that ship their own assets, without having to require the user to take care to properly deploy the assets into their own game when it's time to ship. Each asset also has a unique name within the pool, and an input of some kind. The operations we can perform on an asset should be fairly self-explanatory.

Note that asset is a subtype of resource. The idea here is that, while resources are useful on their own, assets are not. Every asset is going to translate to loading into some kind of resource. Thus it makes perfect sense to retain the capabilities of the underlying resource with the associated asset type. To illustrate this a bit better, let's create an image asset.

(defclass texture (foreign-resource)
  (wrapping min-filter mag-filter anisotropy ...))

(defmethod allocate ((texture texture))
  #| Use glCreateTextures and so forth to allocate. |#)

(defclass image (texture asset)
  ())

(defmethod load ((image image))
  (allocate image)
  #| Load image data and use glTexSubImage* to transfer. |#)

In order to back an image we need textures. Textures come with all the wonderful aforementioned properties. We can encode the creation of the texture and the setting of these properties in the allocation of the texture resource. Then, we simply create a subclass of both texture and asset, inheriting all the texture properties, and gaining the ability to transfer the data into the texture on loading.

This may seem like we're back at the beginning with the mushing of the two concepts, since assets are also resources now. However, remember that the primary ache with that approach was that resources were assets. This is no longer the case. You can now create resources completely independently, and interact with their various properties and requirements in a much more sensible manner. This is in addition to the major advantage that now anything dealing with resources does not have to care whether it is passed an asset or not - regardless, it will work as expected.

By having the assets be direct instances of resources as well, we also eliminate another layer of indirection we would have to follow otherwise. If the resource had been separate, we would need a slot on the asset to track it, requiring two dereferences to reach the data-pointer contents. This might not seem like a big deal, but in tight render loops this sort of thing can add up. Ideally you'd be able to inline the data-pointer, but doing so is quite tricky. I hope to talk about my ideas for that kind of optimisation in a future entry.

I could go into a lot more detail here about the specific background details of how this asset/resource system should be implemented, particularly in the way of invariants, but I think the crux of it should be clear. Once again the fact that multiple inheritance is so easily supported by CLOS makes a very convenient, sensible, and efficient design possible.

I'll update this entry with a link to the relevant code once I have implemented it in Trial.

I have some other ideas for entries in this series ready to go, so hopefully it won't be long before another article rolls out.

01 Feb 2018 12:23am GMT

31 Jan 2018

feedPlanet Lisp

Quicklisp news: January 2018 Quicklisp dist update now available

New projects:

Updated projects: 3bgl-shader, 3d-matrices, 3d-vectors, array-utils, asd-generator, asdf-viz, aws-sign4, beast, ccldoc, cells, cepl, cepl.sdl2, cepl.sdl2-ttf, chancery, chirp, chunga, cl-ana, cl-ansi-term, cl-cognito, cl-conllu, cl-csv, cl-cut, cl-dbi, cl-digraph, cl-diskspace, cl-dot, cl-editdistance, cl-flac, cl-fond, cl-gamepad, cl-gobject-introspection, cl-gpio, cl-hamcrest, cl-ledger, cl-lzma, cl-mixed, cl-monitors, cl-mpg123, cl-oclapi, cl-online-learning, cl-opengl, cl-out123, cl-pcg, cl-random-forest, cl-readline, cl-riff, cl-sandbox, cl-smtp, cl-spidev, cl-str, cl-string-match, cl-strings, cl-svg, cl-unicode, cl-wav, clip, closer-mop, clss, clx-cursor, codex, commonqt, croatoan, crypto-shortcuts, dartsclhashtree, deeds, deferred, defpackage-plus, deploy, dissect, djula, documentation-utils, eazy-project, esrap, fare-scripts, fast-http, flare, flow, for, form-fiddle, gamebox-math, generic-comparability, glsl-spec, glsl-toolkit, graph, halftone, harmony, hu.dwim.perec, humbler, hunchensocket, inquisitor, iterate, jose, js, kenzo, lack, lambda-fiddle, lass, legit, lichat-protocol, lisp-chat, lisp-unit2, local-time, lquery, maiden, mcclim, mgl-pax, mk-string-metrics, modularize, modularize-hooks, modularize-interfaces, named-readtables, nineveh, north, overlord, papyrus, parachute, parser.common-rules, pathname-utils, piping, plump, portable-threads, postmodern, qlot, qmynd, qtools, qtools-ui, random-state, ratify, redirect-stream, remote-js, rutils, scalpl, serapeum, simple-inferiors, simple-rgb, simple-tasks, skippy-renderer, snooze, softdrink, spinneret, staple, stumpwm, the-cost-of-nothing, trees, trivial-arguments, trivial-benchmark, trivial-file-size, trivial-indent, trivial-main-thread, trivial-mimes, trivial-thumbnail, trivial-timeout, trivial-update, trivial-ws, ubiquitous, unix-opts, varjo, verbose, websocket-driver, with-c-syntax, wuwei, xml.location, yaclml.

Removed projects: blackthorn-engine, gamebox-grids, linedit, software-evolution, squirl, tbnl.

The latest removed projects are due to author request or author neglect - if a project you need has been removed, maybe a new maintenance arrangement can be worked out, get in touch!

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

31 Jan 2018 10:09pm GMT

Vsevolod Dyomkin: Minimal Perfect Hash-Tables in Common Lisp

Recently, my twitter pal @ifesdjeen wrote a line that resonated with me: "Looks like it's easier for people to read 40 blog posts than a single whitepaper." And although he used it in a negative context, I recognized it as a very precise (and, actually, positive) description of what a research engineer does: read a whitepaper (or a dozen, for what it's worth) and transform it into working code and - as a possible byproduct - into a blog post that other engineers will understand and be able to reproduce. I'm in the business of reproducing papers for about 7 years now, and, I believe, everyone with the same experience will confirm that it's not a skill every engineer should be easily capable of. Not all papers even can be reproduced (because the experiment was just not set up correctly), and of those, which, in principle, can be, only some are presented in the form that I can grasp well enough to be able to program them. And, after all, working code is an ultimate judge of your understanding.

But I digressed... :) This post is meant to explain (in simple "engineering" terms) the concept of minimal perfect hash-tables and how I've recently implemented one of their varieties to improve the memory efficiency of my language identification library WILD. Uncovering, in the process, a more abstract concept behind it.

The Justification of Perfect Hash-Tables

Minimal perfect hash-tables are persistent data structures that solve 2 basic deficiencies of normal hash-tables: they guarantee both constant (not only amortized) O(1) time for collision-free key-based access, while no reserved space is required (which for hash-tables may be in the range of 20%-50% of its size). It comes at a cost of two restrictions: the table should be filled with a keyset known ahead-of-time, and the process of building one takes longer than for a usual hash-table (although, for many methods, it can still be bounded by amortized O(1) time).

From my point of view, the main advantage of perfect HTs is the possibility to substantially reduce memory usage, which is important in such use cases as storing big dictionaries (relevant to many NLP tasks). Moreover, the space requirements can be lowered even more if the whole keyset is stored in the table, i.e. there can be no misses. Under such constraints, unlike with normal hash-tables, which still require storing the keys alongside the values due to the need to compare them in cases of hash collisions, in a perfect HT they can be just omitted. Unfortunately, such situation is rarely the case, but if some level of false positives is permitted, with the help of additional simple programmatic tricks, the memory usage for keys can be also reduced by orders of magnitude.

Hash-tables on their own are the trickiest among the basic data structures. But, in terms of complexity, they pale in comparison to minimal perfect hash-tables, which belong to the advanced algorithms family. One reason for that is that perfect hash-tables require more "moving parts", but the main one is, probably, that there's no common well-known algorithm to distribute keys in a perfect manner. And reading many perfect hash-table papers will scare away most programmers, at least it did me. However, after some research and trial-and-error, I've managed to find the one that presents a simple and scalable approach. So, I'm going to explain it in this post.

Hash-tables in SBCL

If you take a look at some particular hash-table implementation, your mental model of what a hash-table is may be quite seriously shaken. A straightforward open addressing table will assume a vector of key-value pairs as an underlying concrete structure, filled as the table is modified. On a 64-bit system, it will require: (8 × 1.5 + 8 × 16) × entries-count + total size of all keys + total size of all values + constant size of metadata. The 8 × 1.5 bytes in the formula is storage needed to hold a pointer to a hash-table entry including an average 33% extra overhead, the additional 8 × 16 bytes per entry will be spent on a cons cell (if we use this efficient although antiquated way to represent pairs in Lisp). It should be noted that depending on the size of keys and values the first term may be either a significant (up to half of the total size) or a negligible part of the table's memory profile.

However, this is not how SBCL implements hash-tables. See for yourself. Let's create a random hash-table with approximately 1000 keys:


> (defparameter *ht*
(pairs->ht (loop :repeat 1000
:collect (pair (fmt "~{~C~}"
(loop :repeat (random 10)
:collect (code-char (+ 65 (random 50)))))
(random 1000)))
:test 'equal))

And inspect its contents:


> (inspect *ht*)

The object is a STRUCTURE-OBJECT of type HASH-TABLE.
0. TEST: EQUAL
1. TEST-FUN: #
2. HASH-FUN: #
3. REHASH-SIZE: 1.5
4. REHASH-THRESHOLD: 1.0
5. REHASH-TRIGGER: 1024
6. NUMBER-ENTRIES: 844
7. TABLE: #(#{EQUAL
"plCjU" 985
"VVYZqKm[" 861
"N\\" 870
"fqfBdZP\\" 364
"cHNjIM]Y" 804
"p_oHUWc^" 630
"CqGRaMH" 408
"NecO" 636
"QDBq" 380
"M" 838
...
}
0 "plCjU" 985 "VVYZqKm[" 861 "N\\" 870 "fqfBdZP\\" 364 ...)
8. NEXT-WEAK-HASH-TABLE: NIL
9. %WEAKNESS: 0
10. NEXT-FREE-KV: 845
11. CACHE: 1688
12. INDEX-VECTOR: #(0 617 332 393 35 0 194 512 0 420 ...)
13. NEXT-VECTOR: #(0 72 0 0 253 499 211 0 349 488 ...)
14. HASH-VECTOR: #(9223372036854775808 1830284672361826498 3086478891113066655
24243962159539 2602570438820086662 2431530612434713043
4568274338569089845 3085527599069570347 2374133389538826432
3322613783177911862 ...)
15. LOCK: #
16. SYNCHRONIZED-P: NIL

As the fog of initial surprise clears, we can see that instead of a single vector it uses 4! The keys and values are placed in the one called TABLE (that starts with a reference to the hash-table object itself). Note that storing the entries as pairs is redundant both in terms of memory and access (additional dereferencing). That's why an obvious optimization is to put them directly in the backing array: so the keys and values here are interleaved. One more indirection that may be removed, at least in Lisp, is "unboxing" the numbers, i.e. storing them immediately in the array avoiding the extra pointer. This may be an additional specialized optimization for number-keyed hash-tables (to which we'll return later), but it is hardly possible with the interleaved scheme.

Perhaps, the biggest surprise is that the entries of the TABLE array are stored sequentially, i.e. there are no gaps. If they were to be added randomly, we'd expect the uniform distribution of 845×2 unique keys and corresponding values over the 2048-element array. Instead, this randomness is transferred to the 1024-element integer INDEX-VECTOR: another level of indirection. Why? For the same reason yet another 1024-element array is used - a HASH-VECTOR, which stores the values of hashes for all the table keys: efficient resizing. Although, it may seem that the biggest overhead of a hash-table is incurred due to reserved space for new keys, in fact, as we have now learned, resizing is a much heavier factor. Especially this relates to the HASH-VECTOR: if the hashes of all keys would not have been stored, each time the table got resized they would have had to be recalculated anew: an intolerable slowdown for larger tables. Having a separate INDEX-VECTOR is not so critical, but it provides two additional nice capabilities. Resizing the table is performed by adjusting the vectors' lengths (an optimized operation) without the need to redistribute the entries. Besides, unlike in theory, we can iterate this table in a deterministic order: the one of keys addition into the table. Which comes quite handy, although SBCL developers don't want to advertise this as a public API as it may restrict potential future modifications to the data structure. There's also a NEXT-VECTOR that is used to optimize insertion.

Overall, we can see that a production-optimized hash-table turns out to be much heavier than a textbook variant. And, indeed, these optimizations are relevant, as SBCL's hash-table are quite efficient. In my experiments several years ago, they turned out to be 2-3 times faster than, for instance, the CCL ones. Yet, it's clear that the speed optimizations, as usual, come at a cost of storage deoptimization. To restore storage sanity, we could fall back to the basic hash-table variant, and it's a possible solution for some cases, although a mediocre one, in general. Neither will it be the fastest, nor fully optimized.

Building an MPHT

Most of space in the SBCL's hash-table implementation is spent on metadata and keys, not on values. Yet, if we introduce a restriction that the table cannot be altered - no new values added after initial construction and no resizing possible - most of those optimizations and connected storage become redundant. An ideally space-efficient data structure is an array, but it doesn't allow key-based access. In theory, minimal perfect hash-tables are arrays with a minimal amount of metadata overhead. Yet, the precise amount is dependent on the algorithm, and there's still new research improving them going on. Overviewing all the existing approaches (most of which I don't fully understand) is beyond the scope of this post.

Besides, MPHTs require additional calculations at access time compared to normal hash-tables. So, if a hash-table is well-optimized it will usually be faster than and MPH.

And still, MPHs require a non-negligible amount of additional space for metadata: for the algorithm that will be discussed it's around 2 bytes per key. The best algorithms in the literature claim to reduce this amount more than 4 times to less than 4 bits per key. However, for now, I have picked the simpler approach, since this difference is not critical considering that every value in the table, on a 64-bit machine, occupies at least 16 bytes (a pointer plus the data), so an overhead of 2 bytes versus 0.5 bytes (that will probably be rounded to 1 byte) is already negligible.

Now, let's think of how we can distribute the keys in a hash-table so that there are no collisions and the backing array has the same number of elements as the number of hash-table keys. Theoretically, as the set of keys is known before the creation of the table, we can find a hash function that produces such distribution. Unfortunately, due to the Birthday paradox, it may be a long search. The algorithms for MPHTs suggest ways of structuring it. A good algorithm should have at most O(n) complexity as, otherwise, it will be infeasible for large keysets, which are the main use case for perfect hash-tables.

The algorithm I will briefly describe now was suggested by Botelho and Ziviani. It is a 2-step process:

Here you can see one of the possible labelings (each edge corresponds to a key and its unique index, each vertex - to the value for each of the possible Jenkins hashes):

Now, the per-bucket hash-function can be reconstructed from an array of numbers (in the range 0-255) associated with each possible hash. The divisor of the hash is twice the number of keys in the bucket, though it can be any number above the number of keys: the greater the divisor, the more space is used for storing the numbers (the divisor is the length of an array) and the less time it takes to find an acyclic graph.

The algorithm works quite fast: on my laptop, it takes 8 seconds for the table of 725 359 character trigrams from a mid-size language identification model and 13 seconds for 1 236 452 words from the same model.

To sum up, this is how to find the index of an element (bytes argument) in our perfect hash-table:


(defun csindex (bytes cstab)
(with ((mod (/ (1- (length @cstab.meta)) 2)) ; divisor of the top-level hash
(hash (* (mod (jenkins-hash bytes) mod) 2)) ; determine the bucket
(off (? @cstab.meta hash))
(seed (? @cstab.meta (1+ hash))) ; each bucket has its own Jenkins seed
(mod2 (- (? @cstab.meta (+ 2 hash)) off))
(b c (jenkins-hash2 bytes seed (* mod2 2)))
(goff (* 2 off)))
;; bucket offset + in-bucket hash
(+ off (mod (+ (? gs (+ goff b))
(? gs (+ goff c)))
mod2))))

Note, that, in the code for this article, I refer to perfect hash tables as cstabs for the reasons described in the end.

Efficiency In Practice

So, let's now examine the memory efficiency of this method. Thankfully, just recently the SBCL developers started working on a critical for every algorithmic developer missing piece in the SBCL API: a function to determine the size in memory occupied by an arbitrary object. As we know from the famous koan, "LISP programmers know the value of everything and the cost of nothing". Indeed, from a certain point of view, this applies to SBCL. Although we, now, have a rough tool at our disposal that patches this hole... ;)

Using this unofficial function, we can roughly calculate the space occupied by the character trigrams hash-table mentioned above:


> (let ((ht (? wild:*lang-detector* '3gs)))
(+ (object-size ht)
(object-size @ht.table)
(object-size @ht.index-vector)
(object-size @ht.hash-vector)
(reduce '+ (map 'list
(lambda (obj)
;; the keys of the table are strings
;; and values -- alists of a symbol and a float
(reduce '+ (map 'list ^(if (listp %)
(sum 'object-size %)
(object-size %))
@ht.table)))))))
102856432

100 MB!


> (let ((ct (build-const-table (? wild:*lang-detector* '3gs))))
(+ (object-size ct)
(object-size @ct.gs)
(object-size @ct.meta)
(reduce '+ (map 'list ^(sum 'object-size %)
@ct.data))))
53372880

I.e., we have reduced the memory usage almost twice. Because all the metadata now occupies just 1479808 bytes or 1,5 MB.

One critical decision that allows for such drastic memory-use improvement is omitting the keys from the table. It should be noted that adding them back is trivial: (defstruct (keyed-pht (:include const-table)) (keys nil :type simple-vector) (test 'eql)), for which getting the value will work like this:


(defun csget (key cstab)
(let ((idx (csindex (bytes key) cstab)))
(when (call @cstab.test key (svref @cstab.keys idx))
(values (svref @cstab.data idx)
t)))

However, this will, basically, return us to the same ballpark as a textbook hash-table implementation in terms of memory usage while loosing in terms of speed. Yet, if we allow for some controlled level of false positives, there are a few algorithmic tricks that can be used to once again make keys almost negligible.

The first one is really simple and straightforward: replace the vector of keys with the vector of their hashes. In particular, if we take a single byte of the hash, such array will be 10-100 times smaller than the generic keys array and produce an FP-rate of 1/255.

Another trick will be to use a Bloom filter. For instance, a filter with 0.1 FP-rate for all the trigrams from our language identification model will require just around 0.4 MB of storage compared to 0.7 MB needed to store the 1-byte hashes and 30 MB needed to store just the keys. The disadvantage of a Bloom filter, however, is slower processing time: for the mentioned 10% FP-rate we'll need to perform 4 hash calculations, and if we'd like to reach the same 0.3% rate as the 1-byte hash array we'd have to increase the memory usage to 1MB and perform 8 hash calculations.

Finally, it should be noted that the main motivator for this work was reducing the memory footprint of my language identification library, for, to be widely adopted, such kind of project should be very frugal in its memory usage: 10-30 MB is its maximum polite footprint. By switching to a perfect hash-table, we haven't reached that goal (at least, for this particular model), but there's also plenty of room for optimizing values memory usage that I will return to later.

Const-tables

The other motivator for this work, in general, was my interest in the topic of efficient "static" data structures. In this context, I feel that the notion of a perfect hash table doesn't fully capture the essential features of the data structures class we're dealing with. First of all, the main distinguishing factor is static vs dynamic usage. A hash-table is thought of as a dynamic entity while our data structure is primarily a static one. Next, hashing is, for sure, involved in constructing all these structures, but it's only part of a bigger picture. The way of structuring the metadata, in particular, the index arrays may differ greatly not only between perfect and usual hash-table, but also within the ordinary hash-table class - within the different implementations.

So, I decided to come up with a different name for this data structure - a const-table (or cstab). It defines a broad class of persistent data structures that allow constant-time key-based access and, ideally, efficiently store the keys. The implementation, described here, is released as the library with the same name. It is still in its infancy, so major changes are possible - and suggestions on its improvement are also welcome.

31 Jan 2018 4:49pm GMT