CL-INTERPOL - String interpolation for Common Lisp

"The crux of the biscuit is the apostrophe." (Frank Zappa)

Abstract
CL-INTERPOL is a library for Common Lisp which modifies the reader so that you can have interpolation within strings similar to Perl or Unix Shell scripts. It also provides various ways to insert arbitrary characters into literal strings even if your editor/IDE doesn't support them. Here's an example:
* (ql:quickload :cl-interpol)
* (named-readtables:in-readtable :interpol-syntax)
* (let ((a 42))
    #?"foo: \xC4\N{Latin capital letter U with diaeresis}\nbar: ${a}")
"foo: ÄÜ
bar: 42"
If you're looking for an alternative syntax for characters, see CL-UNICODE.
CL-INTERPOL comes with a BSD-style license so you can basically do with it whatever you want.
Download current version or visit the project on Github.

Download and installation
Support
Syntax
The CL-INTERPOL dictionary
Known issues
1. {n,m} modifiers in extended mode
Acknowledgements

Download and installation

CL-INTERPOL together with this documentation can be downloaded from Github. The current version is 0.2.7.

CL-INTERPOL comes with a system definition for ASDF so you can install the library with

(asdf:load-system :cl-interpol)

if you've unpacked it in a place where ASDF can find it. It depends on CL-UNICODE and NAMED-READTABLES. Installation via asdf-install or Quicklisp should also be possible.

Note: Before you can actually use the new reader syntax you have to enable it with ENABLE-INTERPOL-SYNTAX or via named-readtables:

(named-readtables:in-readtable :interpol-syntax)

You can run a test suite which tests most aspects of the library with

(asdf:test-system :cl-interpol)

The test suite depends on FLEXI-STREAMS.

The development version of cl-interpol can be found on github. Please use the github issue tracking system to submit bug reports. Patches are welcome, please use GitHub pull requests.

CL-INTERPOL installs ? (question mark) as a "sub-character" of the dispatching macro character # (sharpsign), i.e. it relies on the fact that sharpsign is a dispatching macro character in the current readtable when ENABLE-INTERPOL-SYNTAX is invoked.

The question mark may optionally be followed by an R and an X (case doesn't matter) - see the section about regular expression syntax below. If both of them are present, the R must precede the X.

The next character is the opening outer delimiter which may be one of " (double quote), ' (apostrophe), | (vertical bar), # (sharpsign), / (slash), ( (left parenthesis), < (less than), [ (left square bracket), or { (left curly bracket). (But see *OUTER-DELIMITERS*.)

The following characters comprise the string which is read until the closing outer delimiter is seen. The closing outer delimiter is the same character as the opening outer delimiter - unless the opening delimiter was one of the last four described below in which case the closing outer delimiter is the corresponding closing (right) bracketing character. So these are all valid CL-INTERPOL string equivalent to "abc":

* #?"abc"
"abc"
* #?r"abc"
"abc"
* #?x"abc"
"abc"
* #?rx"abc"
"abc"
* #?'abc'
"abc"
* #?|abc|
"abc"
* #?#abc#
"abc"
* #?/abc/
"abc"
* #?(abc)
"abc"
* #?[abc]
"abc"
* #?{abc}
"abc"
* #?<abc>
"abc"

A character which would otherwise be a closing outer delimiter can be escaped by a backslash immediately preceding it (unless this backslash is itself escaped by another backslash). Also, the bracketing delimiters can nest, i.e. a right bracketing character which might otherwise be closing outer delimiter will be read as part of the string if it is matched by a preceding left bracketing character within the string.

* #?"abc"
"abc"
* #?"abc\""
"abc\""
* #?"abc\\"
"abc\\"
* #?[abc]
"abc"
* #?[a[b]c]
"a[b]c"
* #?[a[[b]]c]
"a[[b]]c"
* #?[a[[][]]b]
"a[[][]]b"

The characters between the outer delimiters are read one by one and inserted into the resulting string as is unless one of the special characters \ (backslash), $ (dollar sign), or @ (at-sign) is encountered. The behaviour with respect to these special characters is modeled after Perl because CL-INTERPOL is intended to be usable with CL-PPCRE.

Backslashes

Here's a short summary of what might occur after a backslash, originally copied from man perlop. Details below - you can click on the entries in this table to go to the corresponding paragraph.

  \t          tab             (HT, TAB)
  \n          newline         (NL)
  \r          return          (CR)
  \f          form feed       (FF)
  \b          backspace       (BS)
  \a          alarm (bell)    (BEL)
  \e          escape          (ESC)
  \033        octal char      (ESC)
  \x1b        hex char        (ESC)
  \x{263a}    wide hex char   (SMILEY)
  \c[         control char    (ESC)
  \N{name}    named char

  \l          lowercase next char
  \u          uppercase next char
  \L          lowercase till \E
  \U          uppercase till \E
  \E          end case modification
  \Q          quote non-word characters till \E

  \␤          ignore the newline and following whitespaces

If a backslash is followed by n, r, f, b, a, or e (all lowercase) then the corresponding character #\Newline, #\Return, #\Page, #\Backspace, (CODE-CHAR 7), or (CODE-CHAR 27) is inserted into the string.

* #?"New\nline"
"New
line"

If a backslash is followed by one of the digits 0 to 9, then this digit and the following characters are read and parsed as octal digits and will be interpreted as the character code of the character to insert instead of this sequence. The sequence ends with the first character which is not an octal digit but at most three digits will be read. Only the rightmost eight bits of the resulting number will be used for the character code.

* #?"\40\040"
"  "  ;; two spaces
* (map 'list #'char-code #?"\0\377\777")
(0 255 255)  ;; note that \377 and \777 yield the same result
* #?"Only\0403 digits!"
"Only 3 digits!"
* (map 'list #'identity #?"\9")
(#\9)

If a backslash is followed by an x (lowercase) the following characters are read and parsed as hexadecimal digits and will be interpreted as the character code of the character to insert instead of this sequence. The sequence of hexadecimal digits ends with the first character which is not one of the characters 0 to 9, a to f, or A to F but at most two digits will be read. If the character immediately following the x is a { (left curly bracket), then all the following characters up to a } (right curly bracket) must be hexadecimal digits and comprise a number which'll be taken as the character code (and which obviously should denote a character known by your Lisp implementation). Note that in both case it is legal that zero digits will be read which'll be interpreted as the character code 0.

* (char #?"\x20" 0)
#\Space
* (char-code (char #?"\x" 0))
0
* (char-code (char #?"\x{}" 0))
0
* (unicode-name (char #?"\x{2323}" 0))
"SMILE"
* #?"Only\x202 digits!"
"Only 2 digits!"

If a backslash is followed by a c (lowercase) then the ASCII control code of the following character is inserted into the string. Note that this only defined for A to Z, [, \, ], ^, and _ although CL-INTERPOL will also accept other characters. In fact, the transformation is implemented as

(code-char (logxor #x40 (char-code (char-upcase <char>))))

where <char> is the character following \c.

* (char-name (char #?"\cH" 0))
;; see 13.1.7 of the ANSI standard, though
"Backspace"
* (char= (char #?"\cj" 0) #\Newline)
T

If a backslash is followed by an N (uppercase) the following character must be a { (left curly bracket). The characters following the bracket are read until a } (right curly bracket) is seen and comprise the Unicode name of the character to be inserted into the string. This name is interpreted as a Unicode character name by CL-UNICODE and returns the character CHARACTER-NAMED. This obviously also means that you can fine-tune this behaviour using CL-UNICODE's global special variables.

* (unicode-name (char #?"\N{Greek capital letter Sigma}" 0))
"GREEK CAPITAL LETTER SIGMA"
* (unicode-name (char #?"\N{GREEK CAPITAL LETTER SIGMA}" 0))
"GREEK CAPITAL LETTER SIGMA"
* (setq *try-abbreviations-p* t)
T
* (unicode-name (char #?"\N{Greek:Sigma}" 0))
"GREEK CAPITAL LETTER SIGMA"
* (unicode-name (char #?"\N{Greek:sigma}" 0))
"GREEK SMALL LETTER SIGMA"
* (setq *scripts-to-try* "Greek")
"Greek"
* (unicode-name (char #?"\N{Sigma}" 0))
"GREEK CAPITAL LETTER SIGMA"
* (unicode-name (char #?"\N{sigma}" 0))
"GREEK SMALL LETTER SIGMA"

Of course, \N won't magically make your Lisp implementation Unicode-aware. You can only use the names of characters that are actually supported by your Lisp.

If a backslash is followed by an l or a u (both lowercase) the following character (if any) is downcased or uppercased respectively.

* #?"\lFOO"
"fOO"
* #?"\ufoo"
"Foo"
* #?"\l"
""

If a backslash is followed by an L or a U (both uppercase) the following characters up to \E (uppercase) or another \L or \U are upcased or downcased respectively. While \E simply ends the scope of \L or \U, another \L or \U will introduce a new round of upcasing or downcasing.

* #?"\Ufoo\Ebar"
"FOObar"
* #?"\LFOO\EBAR"
"fooBAR"
* #?"\LFOO\Ubar"
"fooBAR"
* #?"\LFOO"
"foo"

These examples may seem trivial but \U and friends might be very helpful if you interpolate strings.

If a backslash is followed by a Q (uppercase) the following characters up to \E (uppercase) are quoted, i.e. every character except for 0 to 9, a to z, A to Z, and _ (underscore) is preceded by a backslash. Corresponding pairs of \Q and \E can be nested.

* #?"-\Q-\E-"
"-\\--"
* #?"\Q-\Q-\E-\E"
"\\-\\\\\\-\\-"
* #?"-\Q-"
"-\\-"

As you might have noticed, \E is used to end the scope of \Q as well as that of \L and \U. As a consequence, pairs of \Q and \E can be nested between \L or \U and \E and vice-versa but each occurence of \L or \U which is preceded by another \L or \U will immediately end the scope of all enclosed \Q modifiers. Hmm, need an example?

* #?"\LAa-\QAa-\EAa-\E"
"aa-aa\\-aa-"
* #?"\QAa-\LAa-\EAa-\E"
"Aa\\-aa\\-Aa\\-"
* #?"\U\QAa-\LAa-\EAa-\E"
"AA\\-aa-Aa-" ;; note that only the first hyphen is quoted now

Quoting characters with \Q is especially helpful if you want to interpolate a string verbatim into a regular expression.

If a backslash is placed at the end of a line, it works as the tilde newline directive to Common Lisp's FORMAT function. That is, the newline immediately following the backslash and any non-newline whitespace characters after the newline are ignored. This escape sequence allows to break long string literals into several lines of code, so as to maintain convenient line width and indentation of code.

* #?"@@ -1,11 +1,12 @@\n Th\n-e\n+at\n  quick b\n\
     @@ -22,18 +22,17 @@\n jump\n-s\n+ed\n  over \n\
     -the\n+a\n  laz\n"
"@@ -1,11 +1,12 @@
 Th
-e
+at
  quick b
@@ -22,18 +22,17 @@
 jump
-s
+ed
  over
-the
+a
  laz
"

All other characters following a backslash are left as is and inserted into the string. This is also true for the backslash itself, for $, @, and - as mentioned above - for the outer closing delimiter.

* #?"\"\\f\o\o\""
"\"\\foo\""

Interpolation

If a $ (dollar sign) or @ (at-sign) is seen and followed by one of { (left curly bracket), [ (left square bracket), < (less than), or ( (left parenthesis) (but see *INNER-DELIMITERS*), the characters following the bracket are read up to the corresponding closing (right) bracketing character. They are read as Lisp forms and treated as an implicit progn the result of which will be inserted into the string at execution time. (Technically this is done by temporarily making the syntax of the closing right bracketing character in the current readtable be the same as the syntax of ) (right parenthesis) in the standard readtable and then reading the forms with READ-DELIMITED-LIST.)

The result of the forms following a $ (dollar sign) is inserted into the string as with PRINC at execution time. The result of the forms following an @ (at-sign) must be a list. The elements of this list are inserted into the string one by one as with PRINC interspersed (or "joined" if you prefer) with the contents of the variable *LIST-DELIMITER* (also inserted as with PRINC).

Every other $ or @ is inserted into the string as is.

* (let* ((a "foo")
         (b #\Space)
         (c "bar")
         (d (list a b c))
         (x 40))
    (values #?"$ @"
            #?"$(a)"
            #?"$<a>$[b]"
            #?"\U${a}\E \u${a}"
            (let ((*list-delimiter* #\*))
              #?"@{d}")
            (let ((*list-delimiter* ""))
              #?"@{d}")
            #?"The result is ${(let ((y 2)) (+ x y))}"
            #?"${#?'${a} ${c}'} ${x}"))  ;; note the embedded CL-INTERPOL string
"$ @"
"foo"
"foo "
"FOO Foo"
"foo* *bar"
"foo bar"
"The result is 42"
"foo bar 40"

Interpolations are realized by creating code which is evaluated at execution time. For example, the expansion of #?"\Q-\l${(let ((x 40)) (+ x 2))}" might look like this:

(with-output-to-string (#:G1098)
  (write-string (cl-ppcre:quote-meta-chars
                 (with-output-to-string (#:G1099)
                   (write-string "-" #:G1099)
                   (let ((#:G1100
                           (format nil "~A"
                                   (progn
                                     (let ((x 40))
                                       (+ x 2))))))
                     (when (plusp (length #:G1100))
                       (setf (char #:G1100 0)
                               (char-downcase (char #:G1100 0))))
                     (write-string #:G1100 #:G1099))))
                #:G1098))

However, if a string read by CL-INTERPOL does not contain interpolations, it is guaranteed to be expanded into a constant Lisp string.

Support for CL-PPCRE/Perl regular expressions

Beyond what has been explained above CL-INTERPOL can support Perl regular expression syntax. This feature is mainly intended for use with CL-PPCRE (version 0.7.0 or higher). The regular expression mode is switched on if the opening outer delimiter is a / (slash) - but see *REGEX-DELIMITERS*. It is also on if there's an r (lowercase or uppercase) in front of the opening outer delimiter. If there's also an x (lowercase or uppercase) in front of the opening outer delimiter (but behind the r if it's there), the string will be read in extended mode (see man perlre for a detailed explanation). In these modes the following things are different from what's described above:

\p, \P, \w, \W, \s, \S, \d, and \D are never converted to their unescaped (backslash-less) counterparts because they have or can have a special meaning in regular expressions.
```
* #?#\W\o\w#
"Wow"
* #?/\W\o\w/
"\\Wo\\w"
* #?r#\W\o\w#
"\\Wo\\w"
```
\k, \b, \B, \a, \z, and \Z are only converted to their unescaped (backslash-less) counterparts if they are within a character class (i.e. enclosed in square brackets) because they have a special meaning in regular expressions outside of character classes.
```
* #?/\A[\A-\Z]\Z/
"\\A[A-Z]\\Z"
* #?/\A[]\A-\Z]\Z/
"\\A[]A-Z]\\Z"
* #?/\A[^]\A-\Z]\Z/
"\\A[^]A-Z]\\Z"
```
Octal representations of character codes are left as is and not expanded if they're not within character classes and could possible denote a back-reference to a register group. (Actually, this also holds for sequences starting with \8 or \9 in compliance with Perl.)
```
* (map 'list #'identity #?/\0\40[\40]/)
(#\Null #\\ #\4 #\0 #\[ #\Space #\])
```
Characters which are represented by octal or hexadecimal codes, by names, or escaped by a preceding backslash are 'protected' by a backslash if they have a special meaning within regular expressions.
```
* #?"\x2B\\\.[\.]"
"+\\.[.]"
* #?/\x2B\\\.[\.]/
"\\+\\\\\\.[.]"  ;; note that the second dot is not 'protected' because it's in a character class
```
Embedded comments (like (?#...)) are removed from the string - with the exception that they are replaced with (?:) (a non-capturing, empty group which will be otimized away by CL-PPCRE) if the next character is a hexadecimal digit.
```
* #?/A(?#n embedded) comment/
"A comment"
* #?/\1(?#)2/
"\\1(?:)2"  ;; instead of "\\12" which has a different meaning to the regex engine
```
Interpolation only works with curly brackets (and only if they haven't been removed from *INNER-DELIMITERS*).
```
* (let ((a 42))
    (values #?"$(a)" #?"${a}"
            #?/$(a)/ #?/${a}/))
"42"
"42"
"$(a)"
"42"
```
In extended mode whitespace characters (one of #\Space, #\Tab, #\Linefeed, #\Return, and #\Page) are removed from the string unless they are escaped by a backslash or within a character class.
```
* #?/ \ [ ]/
"  [ ]"  ;; two spaces in front of square bracket
* #?x/ \ [ ]/
" [ ]"  ;; one space in front of square bracket
```
In extended mode end-of-line comments (starting with # (sharpsign) and ending with the newline character) are removed from the string - with the exception that they are replaced with (?:) (a non-capturing, empty group which will be otimized away by CL-PPCRE) if the next character is a hexadecimal digit.
```
* #?x/[a-z]#blabla
\$/
"[a-z]$"
* #?x/\1#
2/
"\\1(?:)2"  ;; instead of "\\12" which has a different meaning to the regex engine
```

If all this seems complicated, just keep in mind that this mode is meant so that you can feed strings to CL-PPCRE exactly as if you had written them for Perl (without counting Lisp backslashes versus Perl backslashes). However, you should not use both CL-INTERPOL's as well as CL-PPCRE's extended mode at once because this might lead to errors. (CL-PPCRE's will, e.g., throw away whitespace which had been escaped in CL-INTERPOL.)

* (let ((scanner (cl-ppcre:create-scanner " a\\ a " :extended-mode t)))
    (cl-ppcre:scan scanner "a a"))
0
3
#()
#()
* (let ((scanner (cl-ppcre:create-scanner #?x/ a\ a /)))
    (cl-ppcre:scan scanner "a a"))
0
3
#()
#()
* (let ((scanner (cl-ppcre:create-scanner #?x/ a\ a / :extended-mode t)))
    ;; wrong, because extended mode is applied twice
    (cl-ppcre:scan scanner "a a"))
NIL

The CL-INTERPOL dictionary

CL-INTERPOL exports the following symbols:

[Macro]
enable-interpol-syntax &key modify-*readtable*=> |

This is used to enable the reader syntax described above. This macro expands into an EVAL-WHEN so that if you use it as a top-level form in a file to be loaded and/or compiled it'll do what you expect.

If the parameter modify-*readtable* is NIL (the default) this will push the current readtable on a stack so that matching calls of ENABLE-INTERPOL-SYNTAX and DISABLE-INTERPOL-SYNTAX can nest. Otherwise the current value of *readtable* will be modified.

Note: by default the reader syntax is not enabled after loading CL-INTERPOL.

[Macro]
disable-interpol-syntax => |

This is used to disable the reader syntax described above. This macro expands into an EVAL-WHEN so that if you use it as a top-level form in a file to be loaded and/or compiled it'll do what you expect. Technically this'll pop a readtable from the stack described above so that matching calls of ENABLE-INTERPOL-SYNTAX and DISABLE-INTERPOL-SYNTAX can nest. If the stack is empty (i.e. when DISABLE-INTERPOL-SYNTAX is called without a preceding call to ENABLE-INTERPOL-SYNTAX), the standard readtable is re-established.

[Special variable]
*list-delimiter*

The contents of this variable are inserted between the elements of a list interpolated with @ at execution time. They are inserted as with PRINC. The default value is " " (one space).

[Special variable]
*outer-delimiters*

This is a list of acceptable outer delimiters. The elements of this list are either characters or dotted pairs the car and cdr of which are characters. A character denotes a delimiter like ' (apostrophe) which is the opening as well as the closing delimiter. A dotted pair like (#\{ . #\}) denotes a pair of matching bracketing delimiters. The name of this list is exported so that you can customize CL-INTERPOL's behaviour by removing elements from this list, you are advised not to add any - specifically you should not add alphanumeric characters or the backslash. Note that this variable has effect at read time so you probably need to wrap an EVAL-WHEN around forms that change its value. The default value is
'((#$ . #$)
  (#\{ . #\})
  (#\< . #\>)
  (#\[ . #\])
  #\/ #\| #\" #\' #\#))

[Special variable]
*inner-delimiters*

This is a list of acceptable delimiters for interpolation. The elements of this list are either characters or dotted pairs the car and cdr of which are characters. A character denotes a delimiter like ' (apostrophe) which is the opening as well as the closing delimiter. A dotted pair like (#\{ . #\}) denotes a pair of matching bracketing delimiters. The name of this list is exported so that you can customize CL-INTERPOL's behaviour by removing elements from this list, you are advised not to add any - specifically you should not add alphanumeric characters or the backslash. Note that this variable has effect at read time so you probably need to wrap an EVAL-WHEN around forms that change its value. The default value is
'((#$ . #$)
  (#\{ . #\})
  (#\< . #\>)
  (#\[ . #\]))

[Special variable]
*interpolate-format-directives*

This is a boolean value which determines if the ~ character signals the start of an inline format directive. When T sequences with this form:
~paramsX(form)
Will be passed to cl:format, with FORM as the one and only argument and params and X are the format directive (with the same syntax as in cl:format). Examples:
* (let ((x 42)) #?"An integer: ~D(x) ~X(x) ~8,'0B(x)")
"An integer: 42 2A 00101010"

[Special variable]
*regex-delimiters*

This is a list of opening outer delimiters which automatically switch CL-INTERPOL's regular expression mode on. The elements of this list are characters. An element of this list must also be an element of *OUTER-DELIMITERS* to have any effect. Note that this variable has effect at read time so you probably need to wrap an EVAL-WHEN around forms that change its value. The default value is the one-element list '(#\/).

Known issues

`{n,m}` modifiers in extended mode

CL-INTERPOL treats 'potential' {n,m} modifiers differently from CL-PPCRE or Perl in extended mode if they contain whitespace. CL-INTERPOL will simply remove the whitespace and thus make them valid modifiers for CL-PPCRE while Perl will remove the whitespace but not recognize the character sequence as a modifier. CL-PPCRE behaves like Perl - you decide if this behaviour is sane...:)

* (let ((scanner (cl-ppcre:create-scanner "^a{3, 3}$" :extended-mode t)))
    (cl-ppcre:scan scanner "aaa"))
NIL
* (let ((scanner (cl-ppcre:create-scanner "^a{3, 3}$" :extended-mode t)))
    (cl-ppcre:scan scanner "a{3,3}"))
0
6
#()
#()
* (cl-ppcre:scan #?x/^a{3, 3}$/ "aaa")
0
3
#()
#()
* (cl-ppcre:scan #?x/^a{3, 3}$/ "a{3, 3}")
NIL

Acknowledgements

Thanks to Peter Seibel who had the idea to do this to make CL-PPCRE more convenient. Buy his book!!!

$Header: /usr/local/cvsrep/cl-interpol/doc/index.html,v 1.39 2008/07/25 12:52:00 edi Exp $

BACK TO THE HOMEPAGE