23 Oct 2014

feedPlanet Lisp

Lispjobs: Senior Software Engineer: D-Wave, Burnaby, British Columbia, Canada

D-Wave is looking for exceptionally motivated people who love to see the impact of their work on a daily basis, who will do whatever it takes to ensure success of the company, and who want to be a part of something special.

D-Wave is working to radically change what it is possible to do with computers. Our mission is to integrate new discoveries in physics and computer science into new breakthrough approaches to computation. We are committed to commercializing quantum computers. The company's flagship product, the D-Wave Two, is built around a novel type of superconducting quantum processor. D-Wave Two systems are currently in use by on-line customers and by customers in the field such as NASA & Google

Position:

D-Wave is seeking an experienced Software Developer to join the Processor Development group. The successful candidate will work closely with physicists to develop and optimize measurement routines used to calibrate D-Wave's quantum processor. You will be self-driven, but comfortable working closely with others. You will share responsibility for designing, implementing, testing and maintaining the suite of software necessary to support the testing and operation of D-Wave's quantum computing hardware. The software is implemented in Common Lisp (SBCL) and is an integral part of the quantum computing system. It is used for a variety of purposes including calibration, operation, testing and benchmarking.

Responsibilities:

  • Work closely with physicists and other software engineers to develop any and all aspects of quantum processor calibration, operation infrastructure, performance optimization and profiling
  • Analyze and optimize existing software and newly developed routines for performance and reliability. Develop and support software related to architecture, usage of libraries and functions, the best ways of solving a problem or implementing a new feature, and layer efficiency and performance optimization
  • Software development, support, and troubleshooting systems hardware including fridge control and processor electronics
  • Full life-cycle support of software products from development, test and validation, production deployment, through to decommissioning
  • Configuring and upgrading quantum processor control servers and software development servers

Qualifications:

  • Masters Degree in Computer Science with 4+ years relevant experience, or Bachelor's degree and 8+ years experience.
  • Experience developing and optimizing software in compiled languages; ability to consider both algorithm choice and how your code is compiled when tuning performance.
  • At least 4 years of professional software development experience including software design, code and test, and maintenance
  • Familiarity with Common Lisp is a definite asset
  • Comfortable working alongside one or two other scientists or software engineers, such as in a pair programming

    We thank all applicants for their interest, however, only those who are selected for interviews will be contacted. It is D-Wave Systems Inc policy to provide equal employment opportunity (EEO) to all persons regardless of race, color, religion, sex, national origin, age, sexual orientation, genetic information, physical or mental disability, protected veteran status, or any other characteristic protected by federal, state/provincial, local law.

Contact:

Lindsay Andrea <landrea@dwavesys.com>

Talent Acquisition Specialist, Human Resources

D-Wave Systems Inc.

http://www.dwavesys.com

604.630.1428 Ext. 119


23 Oct 2014 1:05am GMT

20 Oct 2014

feedPlanet Lisp

Paul Khuong: Performance Tuning ~ Writing an Essay

Skip to the meaty bits.

My work at AppNexus mostly involves performance optimisation, at any level from microarchitecture-driven improvements to data layout and assembly code to improving the responsiveness of our distributed system under load. Technically, this is similar to what I was doing as a lone developer on research-grade programs. However, the scale of our (constantly changing) code base and collaboration with a dozen other coders mean that I approach the task differently: e.g., rather than single-mindedly improving throughput now, I aim to pick an evolution path that improves throughput today without imposing too much of a burden on future development or fossilising ourselves in a design dead-end. So, although numbers still don't lie (hah), my current approach also calls for something like judgment and taste, as well as a fair bit of empathy for others. Rare are the obviously correct choices, and, in that regard, determining what changes to make and which to discard as over-the-top ricing feels like I'm drafting a literary essay.

This view is probably tainted by the fact that, between English and French classes, I spent something like half of my time in High School critiquing essays, writing essays, or preparing to write one. Initially, there was a striking difference between the two languages: English teachers had us begin with the five paragraph format where one presents multiple arguments for the same thesis, while French teachers imposed a thesis/antithesis/synthesis triad (and never really let it go until CÉGEP, but that's another topic). When I write that performance optimisation feels like drafting essays, I'm referring to the latter "Hegelian" process, where one exposes arguments and counterarguments alike in order to finally make a stronger case.

I'll stretch the analogy further. Reading between the lines gives us access to more arguments, but it's also easy to get the context wrong and come up with hilariously far-fetched interpretations. When I try to understand a system's performance, the most robust metrics treat the system as a black box: it's hard to get throughput under production data wrong. However, I need finer grained information (e.g., performance counters, instruction-level profiling, or application-specific metrics) to guide my work, and, the more useful that information can be - like domain specific metrics that highlight what we could do differently rather than how to do the same thing more efficiently - the easier it is to measure incorrectly. That's not a cause for despair, but rather a fruitful line of skepticism that helps me find more opportunities.

Just two weeks ago, questioning our application-specific metrics lead to an easy 10% improvement in throughput for our biggest consumer of CPU cycles. The consumer is an application that determines whether internet advertising campaigns are eligible to bid on an ad slot, and if so, which creative (ad) to show and at what bid price. For the longest time, the most time-consuming part of that process was the first step, testing for campaign eligibility. Consequently, we tracked the execution of that step precisely and worked hard to minimise the time spent on ineligible campaigns, without paying much attention to the rest of the pipeline. However, we were clearly hitting diminishing returns in that area, so I asked myself how an adversary could use our statistics to mislead us. The easiest way I could think of was to have campaigns that are eligible to bid, but without any creative compatible with the ad slot (e.g., because it's the wrong size or because the website forbids Flash ads): although the campaigns are technically eligible, they are unable to bid on the ad slot. We added code to track these cases and found that almost half of our "eligible" campaigns simply had no creative in the right size. Filtering these campaigns early proved to be a low-hanging fruit with an ideal code complexity:performance improvement ratio.

Trust no one, not even performance counters

I recently learned that we also had to second-guess instruction level profiles. Contemporary x86oids are out of order, superscalar, and speculative machines, so profiles are always messy: "blame" is scattered around the real culprit, and some instructions (pipeline hazards like conditional jumps and uncached memory accesses, mostly) seem to account for more than their actual share. What I never realised is that, in effect, some instructions systematically mislead and push their cycles to others.

Some of our internal spinlocks use mfence. I expected that to be suboptimal, since it's common knowledge that locked instruction are more efficient barriers: serialising instructions like mfence have to affect streaming stores and other weakly ordered memory accesses, and that's a lot more work than just preventing store/load reordering. However, our profiles showed that we spent very little time on locking so I never gave it much thought... until eliminating a set of locks had a much better impact on performance than I would have expected from the profile. Faced with this puzzle, I had to take a closer look at the way mfence and locked instructions affect hardware-assisted instruction profiles on our production Xeon E5s.

I came up with a simple synthetic microbenchmark to simulate locking on my E5-4617: the loop body is an adjustable set of memory accesses (reads and writes of out-of-TLB or uncached locations) or computations (divisions) bracketed by pairs of normal stores, mfence, or lock inc/dec to cached memory (I would replace the fences with an increment/decrement pair and it looks like all read-modify-write instructions are implemented similarly on Intel). Comparing runtimes for normal stores with the other instructions helps us gauge their overhead. I can then execute each version under perf and estimate the overhead from the instruction-level profile. If mfence is indeed extra misleading, there should be a greater discrepancy between the empirical impact of the mfence pair and my estimate from the profile.

You can find the super crufty code here, along with a rdtscp version of cycle.h.

With locked instructions and random reads that miss the L3 cache, the (cycle) profile for the microbenchmark loop is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ perf annotate -s cache_misses
[...]
    0.06 :        4006b0:       and    %rdx,%r10
    0.00 :        4006b3:       add    $0x1,%r9
    ;; random (out of last level cache) read
    0.00 :        4006b7:       mov    (%rsi,%r10,8),%rbp
   30.37 :        4006bb:       mov    %rcx,%r10
    ;; foo is cached, to simulate our internal lock
    0.12 :        4006be:       mov    %r9,0x200fbb(%rip)        # 601680 <foo>
    0.00 :        4006c5:       shl    $0x17,%r10
    [... Skipping arithmetic with < 1% weight in the profile]
    ;; locked increment of an in-cache "lock" byte
    1.00 :        4006e7:       lock incb 0x200d92(%rip)        # 601480 <private+0x200>
   21.57 :        4006ee:       add    $0x1,%rax
    [...]
    ;; random out of cache read
    0.00 :        400704:       xor    (%rsi,%r10,8),%rbp
   21.99 :        400708:       xor    %r9,%r8
    [...]
    ;; locked in-cache decrement
    0.00 :        400729:       lock decb 0x200d50(%rip)        # 601480 <private+0x200>
   18.61 :        400730:       add    $0x1,%rax
    [...]
    0.92 :        400755:       jne    4006b0 <cache_misses+0x30>

Looking at that profile, I'd estimate that the two random reads account for ~50% of runtime, and the pair of lock inc/dec for ~40%.

The picture is completely different for mfence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ perf annotate -s cache_misses
[...]
    0.00 :        4006b0:       and    %rdx,%r10
    0.00 :        4006b3:       add    $0x1,%r9
    ;; random read
    0.00 :        4006b7:       mov    (%rsi,%r10,8),%rbp
   42.04 :        4006bb:       mov    %rcx,%r10
    ;; store to cached memory (lock word)
    0.00 :        4006be:       mov    %r9,0x200fbb(%rip)        # 601680 <foo>
    [...]
    0.20 :        4006e7:       mfence 
    5.26 :        4006ea:       add    $0x1,%rax
    [...]
    ;; random read
    0.19 :        400700:       xor    (%rsi,%r10,8),%rbp
   43.13 :        400704:       xor    %r9,%r8
    [...]
    0.00 :        400725:       mfence 
    4.96 :        400728:       add    $0x1,%rax
    0.92 :        40072c:       add    $0x1,%rax
    [...]
    0.36 :        40074d:       jne    4006b0 <cache_misses+0x30>

It looks like the loads from uncached memory represent ~85% of the runtime, while the mfence pair might account for at most ~15%, if I include all the noise from surrounding instructions.

If I trusted the profile, I would worry about eliminating locked instructions, but not so much for mfence. However, runtimes (in cycles), which is what I'm ultimately interested in, tell a different story. The same loop of LLC load misses takes 2.81e9 cycles for 32M iterations without any atomic or fence, versus 3.66e9 for lock inc/dec and 19.60e9 cycles for mfence. So, while the profile for the mfence loop would let me believe that only ~15% of the time is spent on synchronisation, the mfence pair really represents 86% \(((19.6 - 2.81) / 19.6)\) of the runtime for that loop! Inversely, the profile for the locked pair would make me guess that we spend about 40% of the time there, but, according to the timings, the real figure is around 23%.

The other tests all point to the same conclusion: the overhead of mfence is strongly underestimated by instruction level profiling, and that of locked instructions exaggerated, especially when adjacent instructions write to memory.

  setup     cycles   (est. overhead)  ~actual overhead

div [ALU] (100 Mi iterations)
 atomic: 20153782848   (20%)          ~ 3.8%
 mfence: 28202315112   (25%)          ~31.3%
vanilla: 19385020088

Reads:

TLB misses (64Mi iterations)
 atomic:  3776164048   (80%)          ~39.3%
 mfence: 12108883816   (50%)          ~81.1%
vanilla:  2293219400 

LLC misses (32Mi iterations)
 atomic:  3661686632   (40%)          ~23.3%
 mfence: 19596840824   (15%)          ~85.7%
vanilla:  2807258536

Writes:

TLB (64Mi iterations)
 atomic:  3864497496   (80%)          ~10.4%
 mfence: 13860666388   (50%)          ~75.0%
vanilla:  3461354848

LLC (32Mi iterations)
 atomic:  4023626584   (60%)          ~16.9%
 mfence: 21425039912   (20%)          ~84.4%
 vanilla: 3345564432

I can guess why we observe this effect; it's not like Intel is intentionally messing with us. mfence is a full pipeline flush: it slows code down because it waits for all in-flight instructions to complete their execution. Thus, while it's flushing that slows us down, the profiling machinery will assign these cycles to any of the instructions that are being flushed. Locked instructions instead affect stores that are still queued. By forcing such stores to retire, locked instructions become responsible for the extra cycles and end up "paying" for writes that would have taken up time anyway.

Losing faith in hardware profiling being remotely representative of reality makes me a sad panda; I now have to double check perf profiles when hunting for misleading metrics. At least I can tell myself that knowing about this phenomenon helps us make better informed - if less definite - decisions and ferret out more easy wins.

P.S., if you find this stuff interesting, feel free to send an email (pkhuong at $WORK.com). My team is hiring both experienced developers and recent graduates (:

20 Oct 2014 12:05am GMT

19 Oct 2014

feedPlanet Lisp

Gábor Melis: Transcripts

I've just committed a major feature to MGL-PAX: the ability to include code examples in docstrings. Printed output and return values are marked up with ".." and "=>", respectively.

 (values (princ :hello) (list 1 2))
 .. HELLO
 => :HELLO
 => (1 2)

The extras are:

  • parsing back and updating a transcript
  • auto-checking of up-to-dateness at documentation generation time
  • readable return values can be commented, hand-indented without breaking consistency checks and updates will not destroy those changes
  • Emacs integration: transcribing the last expression and updating a transcript in a region.
  • TRANSCRIBE works without the rest of MGL-PAX so it can be used to format bug reports or as a poor man's expect script.

The documentation provides a tutorialish treatment. I hope you'll find it useful.

19 Oct 2014 10:00pm GMT

14 Oct 2014

feedPlanet Lisp

Christophe Rhodes: still working on reproducible builds

It's been nearly fifteen years, and SBCL still can't be reliably built by other Lisp compilers.

Of course, other peoples' definition of "reliably" might differ. We did achieve successful building under unrelated Lisp compilers twelve years ago; there were a couple of nasty bugs along the way, found both before and after that triumphant announcement, but at least with a set of compilers whose interpretation of the standard was sufficiently similar to SBCL's own, and with certain non-mandated but expected features (such as the type (array (unsigned-byte 8) (*)) being distinct from simple-vector, and single-float being distinct from double-float), SBCL achieved its aim of being buildable on a system without an SBCL binary installed (indeed, using CLISP or XCL as a build host, SBCL could in theory be bootstrapped starting with only gcc).

For true "reliability", though, we should not be depending on any particular implementation-defined features other than ones we actually require - or if we are, then the presence or absence of them should not cause a visible difference in the resulting SBCL. The most common kind of leak from the host lisp to the SBCL binary was the host's value of most-positive-fixnum influencing the target, causing problems from documentation errors all the way up to type errors in the assembler. Those leaks were mostly plugged a while ago, though they do recur every so often; there are other problems, and over the last week I spent some time tracking down three of them.

The first: if you've ever done (apropos "PRINT") or something similar at the SBCL prompt, you might wonder at the existence of functions named something like SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX (QUOTE (102))) (OP1 (QUOTE (58))) (OP2 (QUOTE (32))) (IMM NIL TYPE (QUOTE IMM-BYTE))) (QUOTE (NAME TAB REG , REG/MEM ...)))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|.

What is going on there? Well, these functions are a part of the disassembler machinery; they are responsible for taking a certain amount of the machine code stream and generating a printed representation of the corresponding assembly: in this case, for the PINSRB instruction. Ah, but (in most instruction sets) related instructions share a fair amount of structure, and decoding and printing a PINSRD instruction is basically the same as for PINSRB, with just one #x20 changed to a #x22 - in both cases we want the name of the instruction, then a tab, then the destination register, a comma, the source, another comma, and the offset in the destination register. So SBCL arranges to reuse the PINSRB instruction printer for PINSRD; it maintains a cache of printer functions, looked up by printer specification, and reuses them when appropriate. So far, so normal; the ugly name above is the generated name for such a function, constructed by interning a printed, string representation of some useful information.

Hm, but wait. See those (QUOTE (58)) fragments inside the name? They result from printing the list (quote (58)). Is there a consensus on how to print that list? Note that *print-pretty* is bound to nil for this printing; prior experience has shown that there are strong divergences between implementations, as well as long-standing individual bugs, in pretty-printer support. So, what happens if I do (write-to-string '(quote foo) :pretty nil)?

So, if SBCL was compiled using CLISP, the name of the same function in the final image would be SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX '(102)) (OP1 '(58)) (OP2 '(32)) (IMM NIL TYPE 'IMM-BYTE)) '(NAME TAB REG , REG/MEM ...))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|. Which is shorter, and maybe marginally easier to read, but importantly for my purposes is not bitwise-identical.

Thus, here we have a difference between host Common Lisp compilers which leaks over into the final image, and it must be eliminated. Fortunately, this was fairly straightforward to eliminate; those names are never in fact used to find the function object, so generating a unique name for functions based on a counter makes the generated object file bitwise identical, no matter how the implementation prints two-element lists beginning with quote.

The second host leak is also related to quote, and to our old friend backquote - though not related in any way to the new implementation. Consider this apparently innocuous fragment, which is a simplified version of some code to implement the :type option to defstruct:

(macrolet ((def (name type n)
             `(progn
                (declaim (inline ,name (setf ,name)))
                (defun ,name (thing)
                  (declare (type simple-vector thing))
                  (the ,type (elt thing ,n)))
                (defun (setf ,name) (value thing)
                  (declare (type simple-vector thing))
                  (declare (type ,type value))
                  (setf (elt thing ,n) value)))))
  (def foo fixnum 0)
  (def bar string 1))

What's the problem here? Well, the functions are declaimed to be inline, so SBCL records their source code. Their source code is generated by a macroexpander, and so is made up of conses that are generated programmatically (as opposed to freshly consed by the reader). That source code is then stored as a literal object in an object file, which means in practice that instructions for reconstructing a similar object are dumped, to be executed when the object file is processed by load.

Backquote is a reader macro that expands into code that, when evaluated, generates list structure with appropriate evaluation and splicing of unquoted fragments. What does this mean in practice? Well, one reasonable implementation of reading `(type ,type value) might be:

(cons 'type (cons type '(value)))

and indeed you might (no guarantees) see something like that if you do

(macroexpand '`(type ,type value))

in the implementation of your choice. Similarly, reading `(setf (elt thing ,n) value) will eventually generate code like

(cons 'setf (cons (cons 'elt (list 'thing n)) '(value)))

Now, what is "similar"? In this context, it has a technical definition: it relates two objects in possibly-unrelated Lisp images, such that they can be considered to be equivalent despite the fact that they can't be compared:

similar adj. (of two objects) defined to be equivalent under the similarity relationship.

similarity n. a two-place conceptual equivalence predicate, which is independent of the Lisp image so that two objects in different Lisp images can be understood to be equivalent under this predicate. See Section 3.2.4 (Literal Objects in Compiled Files).

Following that link, we discover that similarity for conses is defined in the obvious way:

Two conses, S and C, are similar if the car of S is similar to the car of C, and the cdr of S is similar to the cdr of C.

and also that implementations have some obligations:

Objects containing circular references can be externalizable objects. The file compiler is required to preserve eqlness of substructures within a file.

and some freedom:

With the exception of symbols and packages, any two literal objects in code being processed by the file compiler may be coalesced if and only if they are similar [...]

Put this all together, and what do we have? That def macro above generates code with similar literal objects: there are two instances of '(value) in it. A host compiler may, or may not, choose to coalesce those two literal '(value)s into a single literal object; if it does, the inline expansion of foo (and bar) will have a circular reference, which must be preserved, showing up as a difference in the object files produced during the SBCL build. The fix? It's ugly, but portable: since we can't stop an aggressive compiler from coalescing constants which are similar but not identical, we must make sure that any similar substructure is in fact identical:

(macrolet ((def (name type n)
             (let ((value '(value)))
               `(progn
                  (declaim (inline ,name (setf ,name)))
                  (defun ,name (thing)
                    (declare (type simple-vector thing))
                    (the ,type (elt thing ,n)))
                  (defun (setf ,name) (value thing)
                    (declare (type simple-vector thing))
                    (declare (type ,type . ,value))
                    (setf (elt thing ,n) . ,value)))))
  (def foo fixnum 0)
  (def bar string 1))

Having dealt with a problem with quote, and a problem with backquote, what might the Universe serve up for my third problem? Naturally, it would be a problem with a code walker. This code walker is somewhat naïve, assuming as it does that its body is made up of forms or tags; it is the assemble macro, which is used implicitly in the definition of VOPs (reusable assembly units); for example, like

(assemble ()
  (move ptr object)
  (zeroize count)
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
 LOOP
  (loadw ptr ptr cons-cdr-slot list-pointer-lowtag)
  (inst add count (fixnumize 1))
  (inst cmp ptr nil-value)
  (inst jmp :e DONE)
  (%test-lowtag ptr LOOP nil list-pointer-lowtag)
  (error-call vop 'object-not-list-error ptr)
 DONE))

which generates code to compute the length of a list. The expander for assemble scans its body for any atoms, and generates binding forms for those atoms to labels:

(let ((new-labels (append labels
                          (set-difference visible-labels inherited-labels))))
  ...
  `(let (,@(mapcar (lambda (name) `(,name (gen-label))) new-labels))
     ...))

The problem with this, from a reproducibility point of view, is that set-difference (and the other set-related functions: union, intersection, set-exclusive-or and their n-destructive variants) do not return the sets with a specified order - which is fine when the objects are truly treated as sets, but in this case the LOOP and DONE label objects ended up in different stack locations depending on the order of their binding. Consequently the machine code for the function emitting code for computing a list's length - though not the machine code emitted by that function - would vary depending on the host's implementation of set-difference. The fix here was to sort the result of the set operations, knowing that all the labels would be symbols and that they could be treated as string designators.

And after all this is? We're still not quite there: there are three to four files (out of 330 or so) which are not bitwise-identical for differing host compilers. I hope to be able to rectify this situation in time for SBCL's 15th birthday...

14 Oct 2014 6:51am GMT

09 Oct 2014

feedPlanet Lisp

Timofei Shatrov: Words made out of words

Since my last post I've done a lot of work on my Japanese sentence-segmenting algorithm, so it's time for an update.

First of all, I added conjugations. Here's how JMdict does conjugations. That's for a single verb. There's a note saying "this table has been automatically generated"; indeed, in JMdict conjugations are generated on a basis of a rather large .csv file and are not stored in the database. Obviously for my purposes it is more efficient to have these in my database, so I ported a (rather simple) algorithm to Common Lisp and wrote a (really complex) procedure to load them. It takes quite a while to INSERT those one by one, which made me wish postmodern had some sort of bulk inserting mechanism. Some time later I discovered that some of these conjugations are themselves verbs that can be (and often are) conjugated. So I added "second level" conjugations that point both to first level conjugation and to the original verb. Hopefully "third level" conjugations are rarely used.

Meanwhile I've been trying to improve the segmentation algorithm. The first major change was calculating n best segmentations instead of just one. That would allow me to have a better picture of what the algorithm prefers. I came up with the structure that I call top-array, which is basically an array of n scores sorted from the biggest to smallest and when a new score is added, we go from the end and push everything smaller than the new score to the right. I thought it was pretty elegant and probably the fastest way to do this for small n (obviously some sort of tree would work better for large n).

(defstruct (top-array-item (:conc-name tai-)) score payload)

(defclass top-array ()
  ((array :reader top-array)
   (count :reader item-count :initform 0)
   ))

(defmethod initialize-instance :after ((obj top-array) &key (limit 5))
  (setf (slot-value obj 'array) (make-array limit :initial-element nil)))

(defgeneric register-item (collection score payload)
  (:method ((obj top-array) score payload)
    (with-slots (array count) obj
      (let ((item (make-top-array-item :score score :payload payload))
            (len (length array)))
        (loop for idx from (min count len) downto 0
           for prev-item = (when (> idx 0) (aref array (1- idx)))
           for done = (or (not prev-item) (>= (tai-score prev-item) score))
           when (< idx len) do (setf (aref array idx) (if done item prev-item))
           until done)
        (incf count)))))

(defgeneric get-array (collection)
  (:method ((obj top-array))
    (with-slots (array count) obj
      (if (>= count (length array)) array (subseq array 0 count)))))
    

An instance of top-array is created for every segment (found word in a sentence), as well as one for the entire sentence, from which the best path (a sequence of words) is taken in the end. Then the basic algorithm is similar to the one described in my previous post, but gains an extra inner loop.

(loop for (seg1 . rest) on segments
      for score1 = (get-segment-score seg1)
      do (register-item (segment-top seg1) score1 (list seg1))
         (register-item top score1 (list seg1))
      (loop for seg2 in rest
            for score2 = (get-segment-score seg2)
            when (>= (segment-start seg2) (segment-end seg1)) do
            (loop for tai across (get-array (segment-top seg1))
                 for path = (cons seg2 (tai-payload tai))
                 for score = (+ score2 (tai-score tai))
                 do (register-item (segment-top seg2) score path)
                    (register-item top score path))))

Then (get-array top) would return n best paths.

After this I started thinking on how to make my algorithm more context-sensitive. The way in which every segment is scored is completely independent of the other segments, which might cause best scored path to be a sequence of words that make no sense when put next to each other! The above algorithm is easy to modify to add some sort of bonus to two subsequent segments, so my first attempt was to encourage words that like to be next to each other in natural language with some extra score (I called that "synergy"). So, for example, there are "no-adjectives", which are basically nouns, but when followed by particle "no" they become adjectives. I added a synergy that adds 15 points if such word is followed by particle "no". In the end this way to do things has proven itself limited. Words can have wildly different scores and when things go wrong, extra 15 points might not be enough to make them right. On the other hand, if I increase this bonus too much, this might erroneously break up words that just so happen to have "no" in them.

Later I came up with the concept of compound words, which are "words" that don't exist in the database, but rather consist of several words that do exist in the database. Right now, it's mostly a primary word + one or several suffixes, but potentially there could be prefixes too. For the purposes of segmentation a compound word acts like one single word. One example of a common suffix would be "たい" (-tai) , which follows a verb ("to X") conjugated in a certain way and the resultant meaning is "to want to X". Most of these suffixes themselves have many conjugations. To check if a word can be understood as a compound word, I need to check if it ends with one of many suffixes, and then check if the part before the suffix has correct part of speech or conjugation. All possible suffixes and their meanings are put into a hashtable and then we can check if a word ends with some of them by checking all its endings versus the hashtable.

(defun get-suffixes (word)
  (init-suffixes)
  (loop for start from (1- (length word)) downto 1
       for substr = (subseq word start)
       for val = (gethash substr *suffix-cache*)
       when val
       collect (cons substr val)))

The concept of suffixes has fared much better as now I am able to calculate scores of compound words in a more versatile way.

I would still sometimes encounter phrases that are split badly by my algorithm, but a human would segment easily. For example if the words "AB" and "ABC" both exist in database, but "AB" happens to score higher (e.g. because it's a really common word, while ABC is not so much), then "ABC" would never be segmented as one word "ABC", it would be "AB"+"C", even if "C" is a completely worthless word, or even not a word at all (a gap). An example of a "worthless" word is a hiragana spelling of one-syllable word that would normally be spelled with a kanji. I didn't care about those much, because they had really low scores and thus only appeared when something went awry. However getting rid of these low-scoring words would allow me to place a large penalty on gaps and thus "ABC" will be able to score higher than "AB"+gap. In the path-finding algorithm above the same score is put into top and segment-top top-arrays. But if we want to penalize gaps, the score put into top should also include a penalty for the gap to the right of the last segment, if it exists. Penalties for gaps to the left of the leftmost segment and in-between segments should be added to both.

Anyway, I'm pretty happy with how this thing is progressing, and I'm going to switch my efforts to building a web-interface. Here's how it currently works in REPL:

(click here for full-res image)

image

Kinda messy, isn't it? The challenge would be to display all this information in reasonable manner. I already have some ideas, but it would still probably take some effort to decipher. But then again, translating the sentence was never the goal, just romanizing it, which ichiran does pretty well right now.

09 Oct 2014 3:07pm GMT

08 Oct 2014

feedPlanet Lisp

drmeister: Debugging with the Clasp debugger

Clasp provides two ways of debugging code. In interactive sessions Clasp invokes a built in Common Lisp debugger when errors or other exceptional situations arise. The Clasp compiler also generates DWARF debugging information that can be used by the GDB debugger (and hopefully soon the LLDB debugger) to display Clasp Common Lisp source information interleaved with C++ source information.

To see this start up clasp and type what follows the > prompt:

Top level.
> (defun c () (break "In c"))

C
> (defun b () (c))

B
> (defun a () (b))

A
> (a)

Condition of type: SIMPLE-CONDITION
In c

Available restarts:
(use :r1 to invoke restart 1)

1. (CONTINUE) Return from BREAK.
2. (RESTART-TOPLEVEL) Go back to Top-Level REPL.

Broken at frame[14] CORE::REP.
 File: #<CORE:SOURCE-FILE-INFO #P"/Users/meister/Development/clasp/src/lisp/kernel/lsp/top.lsp"> (Position #573)
>> 

The double prompt >> indicates that we are now in the Clasp Common Lisp debugger. This debugger is inherited from ECL (because Clasp uses the excellent Common Lisp source code from ECL). To get a list of commands that are available in the debugger type:


>> :h

Top level commands:
:cf                Compile file.
:exit or ^D      Exit Lisp.
:ld                Load file.
:step              Single step form.
:tr(ace)      Trace function.
:untr(ace)    Untrace function.
:pwd       Print the current value of *default-pathname-defaults*.
:cd        Change the current value of *default-pathname-defaults*.

Help commands:
:apropos   Apropos.
:doc(ument)   Document.
:h(elp) or ?        Help.  Type ":help help" for more information.

Break commands:
:q(uit)               Return to some previous break level.
:pop               Pop to previous break level.
:c(ontinue)   Continue execution.
:b(acktrace)  Print backtrace.
:f(unction)   Show current function.
:p(revious)   Go to previous function.
:d(own)         Alias to :previous.
:n(ext)               Go to next function.
:u(p)           Alias to :next.
:g(o)         Go to next function.
:fs             Search forward for function.
:bs             Search backward for function.
:disassemble       Disassemble current function.
:l(ambda-)e(expression)     Show lisp code for current function.
:v(ariables)  Show local variables, functions, blocks, and tags.
:hide              Hide function.
:unhide            Unhide function.
:hp                Hide package.
:unhp              Unhide package.
:unhide-all     Unhide all variables and packages.
:bds            Show binding stack.
:frs            Show frame stack.
:m(essage)      Show error message.
:hs                Help stack.
:i(nspect)      Inspect value of local variable.

Restart commands:
:r1             Return from BREAK. (CONTINUE).
:r2             Go back to Top-Level REPL. (RESTART-TOPLEVEL).
>> 


Clasp/ECL use Common Lisp keywords to activate debugger functionality.

To generate a backtrace type:


>> :b

--------STACK TRACE--------
   frame#  0toplevel         epilogueForm     0/0   REPL
   frame#  1/c              top.lsp   419/2   CORE::TOP-LEVEL
   frame#  2/c              top.lsp   615/21  CORE::TPL
   frame#  3/c              top.lsp   605/32  CORE::REP
   frame#  4/b         evaluator.cc  2353/0   CORE:TOP-LEVEL-EVAL-WITH-ENV
   frame#  5/b         evaluator.cc  2351/0   CORE:COMPILE-FORM-AND-EVAL-WITH-ENV
   frame#  6/c            -no-file-     1/0   nil
   frame#  7/c            -no-file-     1/0   A
   frame#  8/c            -no-file-     1/0   B
   frame#  9/c            -no-file-     1/0   C
   frame# 10/c       conditions.lsp   457/8   COMMON-LISP:BREAK
   frame# 11/c              top.lsp  1507/9   COMMON-LISP:INVOKE-DEBUGGER
   frame# 12/c              top.lsp  1489/5   CORE::DEFAULT-DEBUGGER
   frame# 13/c              top.lsp   618/7   CORE::TPL
-->frame# 14/c              top.lsp   605/32  CORE::REP
   frame# 15/b         evaluator.cc  2353/0   CORE:TOP-LEVEL-EVAL-WITH-ENV
   frame# 16/b         evaluator.cc  2351/0   CORE:COMPILE-FORM-AND-EVAL-WITH-ENV
   frame# 17/c            -no-file-     0/0   nil
   frame# 18/c              top.lsp  1088/3   CORE::TPL-BACKTRACE
   frame# 19/b            stacks.cc   712/0   CORE:IHS-BACKTRACE

NIL
>> 

The --> indicates the current frame that the debugger has stopped on. Since the error handling code and the debugger functions are all written in Common Lisp, those functions also appear on the backtrace. The functions we entered are in frames 7, 8, and 9.

At this point we could go to a specific frame using :g and view the environment of that frame using :v or we can print variables by just typing their names.

For now we will just leave the debugger and return to the top level REPL by invoking a restart.


>> :r2

> 

Now we are back in the top level REPL and can continue working.

Next I'll show you how to use the DWARF generated debugging information embedded in compiled Common Lisp code to debug Clasp using GDB or LLDB.


08 Oct 2014 10:55pm GMT

06 Oct 2014

feedPlanet Lisp

Quicklisp news: October 2014 Quicklisp dist update now available

New projects:

Updated projects: asteroids, avatar-api, babel, basic-binary-ipc, caveman, cffi, checkl, cl-ana, cl-async, cl-autowrap, cl-base58, cl-charms, cl-cli, cl-cli-parser, cl-conspack, cl-dbi, cl-dot, cl-gdata, cl-gss, cl-locatives, cl-mediawiki, cl-opengl, cl-project, clack, clip, closer-mop, clss, coleslaw, colleen, com.informatimago, cqlcl, datafly, dbus, djula, docbrowser, drakma, dynamic-mixins, fast-io, floating-point, gbbopen, gendl, graph, hdf5-cffi, lisp-executable, lisp-interface-library, lisp-unit2, mel-base, metabang-bind, mgl-pax, micmac, modularize-hooks, modularize-interfaces, nibbles, osicat, pg, plump, postmodern, quickproject, ratify, restas, rucksack, rutils, s-xml, scriptl, serapeum, shelly, smug, spinneret, staple, stumpwm, trivial-download, trivial-mimes, trivial-signal, universal-config, utils-kt, yason.

Removed projects: cl-test-more, phemlock.

cl-test-more hasn't really been removed. It's been renamed to prove.

I removed phemlock by request; it represents an old, dead branch of development, hosted on CVS. You can still get hemlock through Quicklisp by loading one of hemlock.tty, hemlock.qt, or hemlock.clx, all provided by the hemlock project on gitorious.

To get this update, use (ql:update-dist "quicklisp").


06 Oct 2014 9:04pm GMT

Christophe Rhodes: interesting pretty-printer bug

One of SBCL's Google Summer of Code students, Krzysztof Drewniak (no relation) just got to merge in his development efforts, giving SBCL a far more complete set of Unicode operations.

Given that this was the merge of three months' out-of-tree work, it's not entirely surprising that there were some hiccups, and indeed we spent some time diagnosing and fixing a 1000-fold slowdown in char-downcase. Touch wood, all seems mostly well, except that Jan Moringen reported that, when building without the :sb-unicode feature (and hence having a Lisp with 8-bit characters) one of the printer consistency tests was resulting in an error.

Tracking this down was fun; it in fact had nothing in particular to do with the commit that first showed the symptom, but had been lying latent for a while and had simply never shown up in automated testing. I've expressed my admiration for the Common Lisp standard before, and I'll do it again: both as a user of the language and as an implementor, I think the Common Lisp standard is a well-executed document. But that doesn't stop it from having problems, and this is a neat one:

When a line break is inserted by any type of conditional newline, any blanks that immediately precede the conditional newline are omitted from the output and indentation is introduced at the beginning of the next line.

(from pprint-newline)

For the graphic standard characters, the character itself is always used for printing in #\ notation---even if the character also has a name[5].

(from CLHS 22.1.3.2)

Space is defined to be graphic.

(from CLHS glossary entry for 'graphic')

What do these three requirements together imply? Imagine printing the list (#\a #\b #\c #\Space #\d #\e #\f) with a right-margin of 17:

(write-to-string '(#\a #\b #\c #\Space #\d #\e #\f) :pretty t :right-margin 17)
; => "(#\\a #\\b #\\c #\\
; #\\d #\\e #\\f)"

The #\Space character is defined to be graphic; therefore, it must print as #\ rather than #\Space; if it happens to be printed just before a conditional newline (such as, for example, generated by using pprint-fill to print a list), the pretty-printer will helpfully remove the space character that has just been printed before inserting the newline. This means that a #\Space character, printed at or near the right margin, will be read back as a #\Newline character.

It's interesting to see what other implementations do. CLISP 2.49 in its default mode always prints #\Space; in -ansi mode it prints #\ but preserves the space even before a conditional newline. CCL 1.10 similarly preserves the space; there's an explicit check in output-line-and-setup-for-next for an "escaped" space (and a comment that acknowledges that this is a heuristic that can be wrong in the other direction). I'm not sure what the best fix for this is; it's fairly clear that the requirements on the printer aren't totally consistent. For SBCL, I have merged a one-line change that makes the printer print using character names even for graphic characters, if the *print-readably* printer control variable is true; it may not be ideal that print/read round-tripping was broken in the normal case, but in the case where it's explicitly been asked for it is clearly wrong.

06 Oct 2014 8:48pm GMT

03 Oct 2014

feedPlanet Lisp

ABCL Dev: Short tip :: Booting quicklisp via SLIME

For the truly lazy, the following forms will quickly load Quicklisp:

(require :abcl-contrib) (asdf:load-system :quicklisp-abcl)


03 Oct 2014 7:44pm GMT

30 Sep 2014

feedPlanet Lisp

drmeister: Things you can do with Clasp #1

Now that people are starting to build Clasp I thought I would say a few things about what you can do with it.

Clasp is an implementation of Common Lisp (80-90% complete) and I plan to make it an ANSI compliant Common Lisp as soon as possible. I also plan to add a faster compiler and expose some powerful C++ libraries to extend its capabilities.

But here are a few things you can do right away to get a sense of what Clasp does and to test some of its capabilities. Code that you type will (look-like-this). Don't worry about how what you see isn't colored or highlighted as shown here, I'm exploring different ways of rendering program output in this blog.

(defun add (x y) (+ x y))

ADD

(add 3 5)

8

(disassemble 'add)

Disassembling function: #<COMMON-LISP:COMPILED-FUNCTION CORE:ADD>

"#<LLVM-SYS::FUNCTION 
define internal void @ADD({ {}*, i32 }* %result-ptr, { {}* }* %closed-af-ptr, i32 %num-varargs, {}* %farg0, {}* %farg1, {}* %farg2, i8* %va-list) {
\"(TRY-0).entry\":
  %lambda-args-42- = alloca { {}* }
  call void @newAFsp({ {}* }* %lambda-args-42-)
  %temp = alloca { {}* }
  call void @newTsp({ {}* }* %temp)
  %0 = alloca { {}* }
  call void @newTsp({ {}* }* %0)
  call void @makeValueFrame({ {}* }* %lambda-args-42-, i32 2, i32 2000058)
  call void @setParentOfActivationFrame({ {}* }* %lambda-args-42-, { {}* }* %closed-af-ptr)
  %correct-num-args = icmp eq i32 %num-varargs, 2
  br i1 %correct-num-args, label %\"(TRY-0).continue3\", label %\"(TRY-0).error\"

\"(TRY-0).error\":                                  ; preds = %\"(TRY-0).entry\"
  %enough-args = icmp slt i32 %num-varargs, 2
  br i1 %enough-args, label %\"(TRY-0).error1\", label %\"(TRY-0).continue\"
...

That gobbledy-gook is pure LLVM-IR code generated by the Clasp Common Lisp compiler and using the fantastic LLVM library. It is exactly the same intermediate language that the Clang compiler converts C++/C/Objective-C into before it lowers it to machine code. Clasp shares the same LLVM backend library with the Clang compiler and once Clasp generates LLVM-IR, that is as efficient as that generated by Clang, then Clasp will generate code that runs as fast as C++ and C!

For some more information on Common Lisp you can check out the following links or use Google. I haven't tested the tutorial against Clasp but almost everything should work.

For a tutorial:

http://en.wikibooks.org/wiki/Common_Lisp/First_steps/Beginner_tutorial

For beginners:

Land of Lisp: Learn to Program in Lisp, One Game at a Time!

Land of Lisp: Learn to Program in Lisp, One Game at a Time!

Buy from Amazon

I'll post more later.


30 Sep 2014 1:53pm GMT