10 Aug 2017

feedPlanet Lisp

Quicklisp news: July 2017 Quicklisp download stats

Here are the raw download stats for the top 100 projects in Quicklisp for July:

11470  alexandria
8732 babel
8521 closer-mop
7779 split-sequence
7534 trivial-features
7197 cffi
7170 iterate
7095 cl-ppcre
7061 bordeaux-threads
6863 trivial-gray-streams
6526 anaphora
6062 flexi-streams
5589 cl+ssl
5588 trivial-garbage
5327 trivial-backtrace
5122 let-plus
5071 nibbles
4811 cl-fad
4648 usocket
4353 puri
4124 cl-base64
4119 drakma
4107 local-time
4006 named-readtables
3949 chunga
3661 chipz
3201 ironclad
3164 esrap
3058 cl-unicode
3043 cl-interpol
3010 cl-yacc
2807 more-conditions
2789 md5
2526 utilities.print-items
2523 fiveam
2511 asdf-flv
2472 log4cl
2250 slime
2198 parse-number
2178 trivial-types
2154 trivial-indent
2152 cl-annot
2122 trivial-utf-8
2113 cl-syntax
1969 array-utils
1914 cl-json
1913 gettext
1894 symbol-munger
1882 plump
1875 arnesi
1826 collectors
1825 cl-slice
1805 access
1794 djula
1767 cl-locale
1766 cl-parser-combinators
1742 cl-utilities
1732 metabang-bind
1695 lift
1668 cl-containers
1666 asdf-system-connections
1664 optima
1662 metatilities-base
1633 quri
1631 hunchentoot
1599 simple-date-time
1567 lparallel
1566 fast-io
1562 uuid
1531 cl-clon
1461 bt-semaphore
1438 trivial-mimes
1437 closure-common
1421 cxml
1409 static-vectors
1406 mcclim
1327 clack
1322 cl-vectors
1281 ieee-floats
1220 salza2
1197 fast-http
1165 clx
1160 fare-utils
1116 fare-quasiquote
1114 lack
1105 architecture.hooks
1087 prove
1087 cl-colors
1057 uffi
1040 cl-ansi-text
997 inferior-shell
997 fare-mop
991 postmodern
979 rfc2388
978 proc-parse
961 quicklisp-slime-helper
942 pythonic-string-reader
940 xsubseq
940 plexippus-xpath
934 cl-jpeg

10 Aug 2017 7:40pm GMT

29 Jul 2017

feedPlanet Lisp

Timofei Shatrov: Your personal DIY image search

Hi everyone, it's been a while! I bet you forgot this blog even existed. I happen to be a big supporter of quality over quantity, so while my work on parsing Japanese counters earlier this year was pretty interesting, I already wrote way too many articles about Ichiran/ichi.moe so I decided to keep it to myself. Recently I've been working on a little side-project and now that it finally works, I think it deserves a full-fledged blog post.

For a bit of a nostalgia trip, let's go back to the early 00s. Remember when TinEye first appeared? It was amazing. For the first time you could easily find where that one image you once saved from some random phpBB forum is really from. It didn't matter if your image was resized, or slightly edited from the original, it still worked. That shit was magic, my friends. Of course these days nobody is impressed by this stuff. Google Image Search indexes pretty much anything that exists on the Internet and even uses neural networks to identify content of an image.

Back to the present day. I discovered I have an image hoarding problem. Over the years of using the Intertubes, I have accumulated a massive number of images on my hard drive. When I see an image I like my first thought is "do I have this one saved already?" because how could I possibly remember? At this point I need my own personal Google Image Search. And (spoiler alert) now I have one.

First of all, I needed an actual image matching technology. These days the cloud is all the rage, so I definitely wanted to have this thing running in the cloud (as opposed to my local PC) so that I could search my images from anywhere in the world. After a cursory search, my eyes fell on a thing called Pavlov Match which runs from a Docker container, so should be pretty easy to install. I installed docker and docker-compose on my VPS, and then git-cloned Match and ran make dev according to instructions. This will actually run an Elasticsearch instance on the same VPS, and apparently the damn thing eats memory for breakfast, at least with the default settings. I'm using a cheap 2GB RAM Linode, so the memory is actually a very finite resource here, as I will find out later. The default settings will also completely expose your match installation AND elasticsearch to the world. But don't worry, I figured this out so that you don't have to. Let's edit docker-compose.yml from match repository as follows:

version: '2'
services:
  match:
    image: pavlov/match:latest
    ports:
    - 127.0.0.1:8888:8888
    command: ["/wait-for-it.sh", "-t", "60", "elasticsearch:9200", "--", "gunicorn", "-b", "0.0.0.0:8888", "-w", "4", "--preload", "server:app"]
    links:
    - elasticsearch
  elasticsearch:
    image: elasticsearch
    environment:
      - "ES_JAVA_OPTS=-Xms256m -Xmx256m"
      - bootstrap.mlockall=true
    expose:
    - "9200"

This will make match server only available on local network within the VPS on port 8888, and elasticsearch only available to these two docker containers. It will also restrict elasticsearch RAM consumption to 512mb and --preload flag reduces the amount of memory gunicorn workers consume.

To make match server available from outside I recommend proxying it through nginx or some other proper web server. You can also add authentication/IP whitelist in nginx because the match server has no authentication features whatsoever, so anyone will be able to search/add/delete the data on it.

That was the backend part. No programming required here! But this is a Lisp blog, so the next step is writing a Lisp client that can communicate with this server. The first step is reading the match API documentation. You might notice it's a bit… idiosyncratic. I guess REST is out of fashion these days. Anyway, I started implementing a client using the trusty drakma, but I quickly hit a limitation: match expects all parameters to be sent encoded as form data, but drakma can only encode POST parameters as form data and not, say, DELETE parameters. Not to be foiled by a badly designed API, I tried dexador, and while dex:delete does not encode parameters as form data, dex:request is flexible enough to do so. Each response (a JSON string) is parsed using jsown.

(defun parse-request (&rest args)
  (when *auth*
    (setf args `(,@args :basic-auth ,*auth*)))
  (multiple-value-bind (content return-code)
      (handler-bind ((dex:http-request-failed #'dex:ignore-and-continue))
        (apply 'dex:request args))
    (cond
      ((<= 400 return-code 499)
       (jsown:new-js
        ("status" "fail")
        ("error" content)
        ("code" return-code)))
      (t (let ((obj (jsown:parse content)))
           (jsown:extend-js obj ("code" return-code)))))))

(defun add-local (file &key path (metadata "{}"))
  "Add local image to Match server"
  (parse-request
   (api-url "/add")
   :method :post
   :content `(("image" . ,(pathname file))
              ("filepath" . ,(or path file))
              ("metadata" . ,metadata))))

With this basic client in place, I can add and delete individual images, but it would be incredibly cumbersome to manage thousands of images with it. I had to write some code that would scan specified directories for images, track any changes and then add/update/delete information from Match server as needed. I already wrote something like this before, so this was pretty easy. Of course SBCL's "sb-posix:stat doesn't work on Unicode filenames" bug has reared its head again, but I already knew the workaround. This time I completely relied on UIOP for recursively walking directories (uiop:subdirectories and uiop:directory-files are your friends). Each image file is represented as CLOS object and saved into a hash-table which is serialized to a file using CL-STORE. The object has a status attribute which can be :new, :update, :delete, :ok and so on. Based on status, an action needs to be performed, such as uploading an image to Match server (for :new and :update).

Now, I could just send a bunch of requests one after another, but that would be a waste. Remember, we have 4 gunicorn workers running on our server! This clearly calls for a thread pool. I thought PCALL would be perfect for this, but nope. It uses sb-thread:interrupt-thread which is incredibly unsafe and the result is that you basically can't safely make http requests from thread workers. Debugging this took way too much time. In the end, I implemented a thread pool based on lparallel promises which is kind of an overkill for such a simple use case, but at least it worked.

  (setf *cache* (update-cache))
  (let ((lparallel:*kernel* (lparallel:make-kernel threads)))
    (unwind-protect
         (loop for value in (alexandria:hash-table-values *cache*)
            collect (worker value) into futures
            finally (map nil 'lparallel:force futures))
      (lparallel:end-kernel)))
  (save-cache *cache*))

Note that you must be very careful when doing things that affect global state inside the threads. For example :delete action removes a key from the hash table *cache*. This is not guaranteed to be an atomic operation, so it's necessary to grab a global lock when doing it.

(defvar *cache-lock* (bordeaux-threads:make-lock "match-cache-lock"))
...
   (bordeaux-threads:with-lock-held (*cache-lock*)
      (remhash key *cache*))

Printing messages to REPL from inside threads also requires a separate lock and (force-output), otherwise it will look like a complete mess!

(defun format-msg (str &rest args)
  (bordeaux-threads:with-lock-held (*msg-lock*)
    (terpri)
    (apply 'format t str args)
    (force-output)))

Now that the required functionality is implemented, it's time to test upload a bunch of stuff… and get back a bunch of errors. It took some sleuthing to discover that gunicorn workers of my Match server are routinely getting killed by "OOM killer". Basically, the server runs out of memory and the system in desperation kills a process that it doesn't like. Remember, I only have 2Gb of memory there!

I figured out that it's images with very large dimensions that are the most problematic in terms of memory usage. If I were to resize these images to some reasonable size, the matching should still work pretty well. In order to execute this plan, I thought I'd use some Lisp to ImageMagick interface. There's in fact a pure Lisp solution called OptiCL but would it really handle any image? Remind me to test that later! Anyway, back to ImageMagick. Neither lisp-magick nor lisp-magick-wand would work with the most recent ImageMagick version (seems its API has changed a bit). However the last one I tried cl-graphicsmagick, which uses a fork of ImageMagick called GraphicsMagick, has unexpectedly worked (at least on my Windows laptop. Note that you need to install Microsoft Visual C Redistributable 2008 otherwise the library wouldn't load with CFFI) so I went with that.

Using very useful temporary files functionality of UIOP (uiop:with-temporary-file), I resize each oversized image to reasonable dimensions and save into a temporary file, which is then uploaded to Match server. I also send the file's original and resized dimensions as metadata. Thankfully this completely eradicated the memory issue. There's a minor problem where GraphicsMagick cannot do Unicode pathnames on Windows, so I copy the original image into a temporary file with ASCII-only name in that case.

(defun resize-image (input-path output-path
                     &key (max-width *max-dimension*) (max-height *max-dimension*)
                       (filter :%QuadraticFilter) (blur 1))
  (gm::with-magick-wand (wand)
    (handler-case (gm::%MagickReadImage wand input-path)
      ;; graphicsmagick cannot read Unicode filenames on Windows so attempt to load a copy
      (gm::magick-error ()
        (uiop:with-temporary-file (:pathname tmp :prefix "gm" :type (pathname-type input-path))
          (uiop:copy-file input-path tmp)
          (setf wand (gm::%NewMagickWand))
          (gm::%MagickReadImage wand (namestring tmp)))))
    (let ((w (gm::%MagickGetImageWidth wand))
          (h (gm::%MagickGetImageHeight wand))
          (res nil))
      (multiple-value-bind (fw fh) (gm::fit-width-height w h max-width max-height)
        (unless (and (= w fw) (= h fh))
          (gm::%MagickResizeImage wand fw fh filter blur)
          (gm::%MagickWriteImage wand output-path)
          (setf res output-path))
        (values res w h fw fh)))))

Later I tested this code on an Ubuntu machine with GraphicsMagick installed from Apt repository and SBCL crashed into ldb debugger mode straight away… Welp. The helpful folks of #lisp told me the problem is with signal handlers established by GraphicsMagick library, somehow they confuse SBCL. Based on that advice, eventually I succeeded making this work. Uninstall apt Graphicsmagick and grab the sources. Find the file called magick.c and replace the line

InitializeMagickSignalHandlers(); /* Signal handlers */

with

// InitializeMagickSignalHandlers(); /* Signal handlers */

(commenting it out). Then do configure --enable-shared (see readme for possible options), make and sudo make install. This will make it work when called from SBCL on Linux.

Anyways, the full code of MATCH-CLIENT can be found at my Github. It's not installable from quicklisp for obvious reasons, in fact it's a complete pain to install as you might've already guessed, but if you wanna try it, you're welcome. The main two commands are update and match. The first is called to upload all images in your *root-dirs* to the server and then to update them if anything changes. match is used to match any image on the Internet (passed as URL string) or a local pathname (passed as pathname object) compared to the server. It returns a list of jsown objects (basically alists) that contain score (up to 100 for exact match), path (with "local tag" which can be different per device) and metadata containing original and resized dimensions.

((:OBJ ("score" . 96.00956)
  ("filepath" . "[HOME] d:/foo/bar/baz.jpg")
  ("metadata" :OBJ ("rw" . 1218) ("rh" . 2048) ("w" . 3413) ("h" . 5736))))

Anyway, this was a fun (although often frustrating) thing to build and ended up being quite useful! Thanks for reading and see you next time.

29 Jul 2017 6:47pm GMT

McCLIM: Progress report #9

Dear Community,

McCLIM code is getting better on a weekly basis depending on developer time. We are happy to see the project moving forward.

Some highlights for this iteration:

Moreover many bug fixes have been proposed and merged into the codebase.

All McCLIM bounties (both active and already solved) may be found here. Default bounty expiration date is 6 months after posting it (a bounty may be reissued after that time period).

To answer recurring requests for native Windows and OSX support, we have posted bountes for finishing the Windows backend and fixing the OSX backend. Moreover, to improve portability a bounty for closer-mop support has been posted.

Bounties solved this iteration:

Active bounties ($1700):

Our current financial status is $800 for bounties and $267 recurring monthly contributions from the supporters (thanks!).

Suggestions as to which other issues should have a bounty on them are appreciated and welcome. Please note that Bountysource has a functionality "Suggest an Issue" which may be found on the bounties page. If you would like to work on an issue that is not covered by the existing bounties, feel free to suggest a new bounty.

If you have any questions, doubts or suggestions - please contact me either by email (daniel@turtleware.eu) or on IRC (my nick is jackdaniel).

Sincerely yours,
Daniel Kochmański

29 Jul 2017 1:00am GMT