Revenge of Lisp (Part 1⁄2)
This may surprise you if you know me, but I’ve been learning Common Lisp for a few weeks now.
It all started when I was reading, funnily enough, a blog post about another, much more hyped, language called Julia. The post was titled Julia and the reincarnation of Lisp, and in it the author lamented that despite his love for Common Lisp, it’d never become popular, so he was now trying Julia instead as it may become mainstream soon.
Anyway, what called my attention the most was not Julia (I’ve seen this promise of a new language that is as fast as C and as easy as Python a few too many times to believe it), but all the Lisp prasing, including by taking a jab at poor Java, of all things:
I wondered how they could be so certain of that (and why they thought they needed to create something else anyway if Lisp was so great). As someone who’s written a lot of Java code, I would love to use a language that takes much less time write things in… so, I followed the linked article, which led me to discovering a few papers written in the late 1990’s and early 2000’s comparing Java with Common Lisp and other popular languages of the time. That made me think that, 20 years later, it would be a good idea to revisit the topic and compare Lisp with modern Java (those papers used Java 1 and 1.2!) and other newer languages like Rust to see whether the results of the old papers (which generally lauded Lisp while slamming Java) still held today.
I ended up slightly obsessed with doing that for the last few months and wrote a few articles about what I found out, including Revising Prechelt’s paper… and How to write really slow Rust (a result of my initially awkward attempt at porting the Lisp code to Rust), followed by Part 2, where I analysed in more detail why the Rust implementation was so bad, and finally, How to write fast Rust code, a summary of what I learned optimising the Rust code (pushed by a crowd of Rust people who didn’t like seeing Rust running slow on my benchmarks).
All the while, one thing had stuck in the back of my mind: the Common Lisp code was the shortest by a good margin, and its performance was just excellent, even with the clean, completely non-optimised code which was written by Peter Norvig in a couple of hours!
For this reason, I just couldn’t keep myself from learning more about the language and trying to find out just how far we can actually take it in terms of performance.
Because I got a bit over-excited with all the fascinating things I came across, I decided to break up this post into two.
In this first part, I quickly introduce Common Lisp and the things that I found interesting or weird about it, then describe the phone encoding problem used in the the benchmarks, as well as the Common Lisp solution written by Norvig.
In Part 2, I will describe the benchmark improvements I made for this blog post and then show the effect of each optimisation that was either suggested to me by readers of my previous blog posts or that I came up with myself based on both my experience with optimising Rust and Java, and by reading books and article about Lisp.
Can Common Lisp beat, or at least approach, Java and Rust in terms of performance? Is it still an attractive language to use even in 2021, nearly 40 years after it was first described, and 27 years after it was finally standardized? Read on to find out!
Common Lisp - a brief history and introduction for non-Lispers
Common Lisp, henceforth CL, is actually an ANSI standard. It does not have only a single compiler implementation, far from it, there are lots of implementations, both free and commercial.
In this post, I am going to use the most prominent and performant (I believe) free implementation today, SBCL.
Lisp is a big family of languages with a long tradition. The first Lisp was described already in 1958 by John McCarthy. Today, there are arguably three main languages in the Lisp family tree: Common Lisp, the result of an unification effort in 1994 which resulted in the Common Lisp’s ANSI standard… Scheme, which today has spawned many dialects such as Racket and GNU Guile (most notably used in GNU Guix), and Clojure, which was created by Rich Hickey based on his frustrations with both Java and the existing Lisp implementations at the time.
Even though Clojure might be a more obvious choice for someone, like me, who is used to working with the JVM, I actually wanted to try using something else with a lighter runtime and great performance.
Choosing an editor
My first challenge before getting started with CL was to choose an IDE to use! Unfortunately, there’s no good JetBrains IDE for CL (I couldn’t find even a basic plugin for IntelliJ) so my first choice was not feasible. After asking on the Lisp Reddit about that, I got an unanimous answer: just use Emacs or one of its more complete distributions like Spacemacs.
I decided to try simple Emacs because I like to learn things from the very basics and it seemed that other distributions are not much more than amalgamations of several Emacs packages.
If you like VS Code, VIM, Sublime or Atom instead of Emacs, they also have support for Common Lisp! Check all of the available editors at the Lisp Cookbook - Editor Support page.
Learning Emacs is something that deserves another blog post one day, but suffice it to say that I spent at least a few weeks in the rabbit hole that the world of Emacs turned out to be… the whole application, while sometimes compared to a complete Operating System because of the huge variety of packages it supports to do virtually anything, is basically a Lisp environment with a user interface on top. It uses a dialect of Lisp referred to as elisp - or Emacs Lisp - which is fairly close to CL. This has the advantage that by learning Emacs, you also have to learn some Lisp!
With basic Emacs knowledge at hand and a nice configuration for Lisp coding, I fired up SLIME (one of the packages that turn Emacs into a CL IDE) and was finally ready to play with CL code the way it’s meant to be done.
Common Lisp code is initially pretty hard to read, in my opinion, for several reasons. One of which is that the names of the functions are not what you would expect if you know more modern languages. Another is that macros can introduce variables (actually, custom syntax) seemingly out of nowhere, something we’ll get back to later.
For example, here is a short snippet from Norvig’s solution to the phone-encoding problem:
(when (and (not found-word) (not (numberp (first words)))) (print-translations num digits (+ start 1) (cons (nth-digit digits start) words))))))
cons didn’t really look obvious to me.
when might be easier to guess, but it’s unclear what
happens in case the condition is
false (or as we’ll see,
nil in CL).
After some digging,
numberp is named as such because it’s a predicate and CL convention is to end the names
of predicates with
p (other examples:
numberp). But not all predicates follow the
convention: to check if something is an atom you use
atom, to check for
nil (more about that soon),
you use the function
null, confusingly for a Java developer.
cons is a strangely named function, but should be familiar to anyone who has ever read anything about Lisp:
it builds, or _cons_tructs, lists, the basic data structure on which all Lisps are built, including Lisp source
code itself, giving Lisp its fabled homoiconicity property.
The other basic list functions are
cadrwhich return the
seconditems of a list, respectively. In fact the latter set of functions are more intuitively named aliases for the former.
Here’s a basic example of how
cons works from a short SLIME REPL section:
As in most Lisps, CL allows us to quote a list with the
' symbol, so above,
'(2 3) is a quoted list, meaning
the list is not evaluated. Normally, lists (which form s-expressions in Lisp source code) would be evaluated with the first item being either a
function or a macro. So,
(+ 1 2) is evaluated immediately by calling the function
+, and results in the value
'(+ 1 2) is simply a list containing the symbol
+ followed by numbers
The curious usage of
nil in CL
Moving on, something I found odd was the role of
nil in CL. Instead of being just a null pointer as with most languages,
nil plays the role of
false and of the empty list. It may sound like a very bad choice, but the whole language is
actually built around lists, remember, so that has some interesting benefits!
You rarely get in trouble by passing or returning
nil to/from functions as most other functions
expect you will do just that!
(length nil) evaluates to
(first nil) is the same as
(first '()) and evaluates to
(alpha-char-p #\$), where
#\$ is the char
$ is not alphabetic.
Now, it should be obvious what
when evaluates to when its condition is false (I mean,
nil, of course.
format and CL truth
What you may expect to be called
true is just
t in CL, by the way. But while
always used to represent
false, anything that’s not
nil is actually
true when it comes to conditionals…
T (yes, CL is mostly case-insensitive, in case you haven’t noticed) is just a convenient
symbol to use for that.
For example, when using
cond, it’s handy to have a condition that always
matches anything, and
t is perfect for that:
(let ((n (random 10))) (cond ((= n 1) 'one) ((= n 2) 'two) ((= n 3) 'three) (t 'other-number)))
T is also used as a special config value.
format can be used to format a String (similar to
printf), but if the first argument is
T, it prints the string to
Knowing that, we can look at the common CL idiom to print something, say
"Hello " + name (where
name is an existing variable):
(format t "Hello ~a~%" name)
Yeah, I know… it’s starting to get really weird!
Let’s look at the docs (as displayed in Emacs) to understand how
There’s a lot to unpack above, but the important things to notice are the function signature (called
Declared type), the example directives and the values expected for
~a is a placeholder for an argument and
~% for new-line, somewhat analogous to
printf in C-like languages.
Seeing that type declaration for the first time surprised me because I’d assumed that CL is a intrinsically dynamic language, but as you can see above, it seems to have an intricate type system capable of representing union types! As we’ll see later, we can declare types for everything in our own code too, and the compiler can check them at compile time, where possible, or at runtime when necessary. It can also emit much more efficient native code when you do that!
Notice also that, interestingly, functions can return many values, not just one, hence the
VALUES in the return type.
Another little REPL section may clarify a few things:
Multiple return values are used a lot in CL. For example, when we try to retrieve a value from a
hash-table by its key,
we get two values: the actual value associated with the key, and a
T if the table contained
To learn more about multiple values, have a look at Constructs for handling multiple values.
That brings us to data structures in CL.
Besides the ubiquitous list, CL also has many other built-in data structures, including n-dimensional, resizable
Instead of having different functions for reading
and writing into data structures, CL has a nice trick instead: it works by combining the function that reads with
setf for writing.
setf can be used to mutate almost anything, though for many common operations there are other forms that may be more convenient to use
push to add an item to a list).
Check this out:
It’s worth mentioning here the CLOS (Common Lisp Object System), which is (according to the linked Lisp Cookbook), arguably one of the most powerful object systems available in any language. I highly recommend learning about it as it might completely change your view of OOP. Also, CLOS concepts show up in the CL documentation, so it helps knowing a little bit about it just to understand the docs better.
In CL, equality falls on a continuum.
There’s very strong equality (the identity of the “things” is the same) with
eq, and lax equality (the “things” look the same) with
In between, there are
equal. There are also a few other type-specific predicates like
= (only works
tree-equal and probably more!
This article by Eli Bendersky explains well the main equality functions.
For the purposes of this article, the important thing is that a data structure like
hash-table, which needs
to know about the concept of equality to do its job, lets the user specify which predicate to use for equality.
The programmer must be careful to select the most appropriate function for each particular use case (from a performance’s point-of-view, the stricter the equality function, the better).
make-hash-table accepts a
:TEST keyword (we’ll see in the next section what keywords mean in CL) which determines how keys are compared. While the default,
well for numbers and characters, for strings one must use
equal, or even
equalp for case-insensitive comparison.
Hence, to create a string-keyed, case-insensitive hash-table, for example, one can do this:
(make-hash-table :test 'equalp)
Macros and functions
Finally, as I mentioned earlier, CL, like most Lisps, has very powerful macros. As explained by Paul Graham in On Lisp:
But because macros are not marked specially in CL, they make code difficult to read sometimes.
As an example, some macros introduce new bindings, and it’s impossible to know that if you don’t know the syntax of a macro (macros are so powerful they can become full-on DSLs, as we’ll see when we discuss
loop). Below, what’s the
(defun load-dictionary (file size) ... (with-open-file (in file) ... )
Well, it’s NOT an argument, it’s a new binding. This is basically saying, open the file at path
file and bind
in to the file stream.
This is kind of analogous to the
let special operator (besides functions and macros, CL also has a few special forms from which all other constructs are built) whose purpose is to introduce one or more local bindings:
(let ((a 1) (b 2)) (+ a b))
Functions are usually defined using a macro,
(defun f (x) (* x 2))
defun is introducing a function called
(x) is the arguments list. What comes after is the body of the function, which in this case multiplies the argument by
Another way to create a function is with
That’s usually done when an anynoymous function is required, as when using the
Note that to pass a reference to a function
my-fun, the notation
#'my-funis used. A reference to a variable
my-varwould be just
'my-var, which means CL has separate namespaces for variables and functions. This is what differentiates a Lisp-1, like Scheme (i.e. 1 namespace for everything), from a Lisp-n, like CL. Which kind of Lisp is “best” is the topic of a somewhat contentious debate in Lisp communities.
We can create our own, weirder macro to define functions too! Below, we define a macro that just calls
defun, but with the argument list reversed, because why not!?
` backquote character creates templates for Lisp expressions, whereas
, comma is used to
make a slot within that template. In other words,
, unquotes (so can only be used within a
Functions can have variadic and optional parameters, as well as keyword parameters, which are like named parameters. Parameters can even have default values (I am genuinely surprised CL has had this for literally decades!).
With this, we have everything we need to understand Peter Norvig’s code for the Prechelt’s phone encoding algorithm.
If you want to learn more Lisp, check these excellent (and free!) resources:
Practical Common Lisp, by Peter Seibel.
On Lisp, by Paul Graham.
More books on lisp-lang.org!
The phone encoding problem
The problem that was discussed in the papers that motivated this series of blog posts is a phone encoding problem where long phone numbers are encoded into words so that people can remember them more easily.
It uses a very simple scheme where characters (
A-Z allowed) are translated into digits. We can seee the
mapping in the following Lisp function, from Norvig’s solution:
(defun char->digit (ch) "Convert a character to a digit according to the phone number rules." (ecase (char-downcase ch) ((#\e) 0) ((#\j #\n #\q) 1) ((#\r #\w #\x) 2) ((#\d #\s #\y) 3) ((#\f #\t) 4) ((#\a #\m) 5) ((#\c #\i #\v) 6) ((#\b #\k #\u) 7) ((#\l #\o #\p) 8) ((#\g #\h #\z) 9)))
For example, the word
Hello can be used to represent
world can represent
90888-28283 can be encoded as
Before/after each word, a digit can be used without being encoded, but only in case there’s no word that could be used at that position.
If you would like to try to implement this algorithm yourself, feel free to follow the more comprehensive instructions from the Lisp study.
My GitHub repository renatoathaydes/prechelt-phone-number-encoding contains solutions in CL, Java, Rust and a few other languages. It also has a benchmark runner you can use to try your own implementation. Check the README page for details, and the many branches for alternative implementations for each language.
Understanding Norvig’s implementation
The initial implementation of the phone encoding algorithm by Peter Norvig is written in a very high level, straightforward manner.
The valid words (the dictionary) are loaded into a global variable named
*dict* as follows:
It seems that using names surrouned by
*s is a CL convention for global variables. Constants, defined with
+s instead, so might be named, for example,
(defvar *dict* nil "A hash table mapping a phone number (integer) to a list of words from the input dictionary that produce that number.") (defun load-dictionary (file size) "Create a hashtable from the file of words (one per line). Takes a hint for the initial hashtable size. Each key is the phone number for a word; each value is a list of words with that phone number." (let ((table (make-hash-table :test #'eql :size size))) (with-open-file (in file) (loop for word = (read-line in nil) while word do (push word (gethash (word->number word) table)))) table))
Hopefully, the previous section was instructive enough for the above code to look understandable.
The inner loop of the algorithm (inside the
print-translations function), where it goes through each character
in the current words being analysed, looks like this:
(loop for i from start below (length digits) do (setf n (+ (* 10 n) (nth-digit digits i))) (loop for word in (gethash n *dict*) do (setf found-word t) (print-translations num digits (+ 1 i) (cons word words))))
The loop macro, used above, is an amazing Common Lisp DSL for iterating over collections in more ways than you probably ever thought possible! I recommend having a read of Loop for Black Belts to understand how crazy (or beautiful?) it is. There are even people writing research papers and giving conference talks about it.
word->number function, which encodes a word into a number, duplicates the same algorithm shown above to
compute a number
n for the current word.
n starts with value 1 because, as Norvig mentions in the comments:
nth-digit function is defined as follows:
(defun nth-digit (digits i) "The i-th element of a character string of digits, as an integer 0 to 9." (- (char-code (char digits i)) #.(char-code #\0)))
#. is a reader macro which evaluates its expression at read-time
(i.e. before the expression is even compiled!), so that
#.(char-code #\0) expression gets replaced with
48 in the code that actually executes - but using
48 wouldn’t be nearly as readable.
And that’s all!
Pretty simple solution, I would say, but perhaps wasteful?! The key
n is computed
by using arithmetic. As each phone number has an upper limit of 50 characters (that’s part of the problem description),
that requires big-numbers, which is what killed the performance of the Rust program in my previous blog posts.
Also, it uses
cons to append a new word and recurse instead of mutating the list in-place, an optimisation
that sped up the Rust code by over 15%.
Could that make the Lisp code faster? We’ll find out later, but let’s see first how this implementation fared against my own Java implementation and a Rust port (admitedly implemented without much concern for performance) on my initial blog post:
That’s right, it was the fastest implementation for all input sizes except the largest one, where it performed just slightly worse than my best Java implementation. I write mostly Java professionally, so it’s reasonable to assume the Java code is fairly representative of a proper Java implementation, but the Rust one was a beginner’s implementation.
After lots of Rust programmers insisted that I make changes to the Rust program, as I mentioned before, I spent weeks, and 3 blog posts, getting Rust to run faster! A little effort was also spent on making similar changes to the Java program as I changed Rust… but CL was left completely unchanged, and unsurprisingly CL fell behind.
In the next post, I will finally fix that by throwing every weapon CL (SBCL in particular) makes available to programmers to try and make the CL program run as fast as possible and, hopefully, as fast or faster than highly optimised Java and, one can dream, Rust!
Continue to Part 2!