40 Years of DSL Disasters:
From Makefile to Dockerfile
Greg Ward <greg@gerg.ca>
@gergdotca
PyCon Canada 2017
Montreal, QC • Nov 18, 2017
DSL = Domain-Specific Language
it's a large category
- more complex than a config file
- less complex than a full-blown programming language
Some well-known examples
- HTML
- CSS
- TeX and LaTeX
- config files for Apache, nginx, Exim, ...
Some other well-known examples
(spoiler alert)
- Makefile
- autoconf
- RPM .spec
- distutils setup.py
- Dockerfile
Other kinds of language: 1
on the one hand: config files
- key/value pairs
- add "[section]" headers: done! problem solved
- or XML/JSON/YAML
Other kinds of language: 2
on the other hand: programming languages
none of them are perfect, but the fatally flawed ones get weeded out
Hybrid languages
this seems to be hard to get right
- a bit of this: declarative statements
- a bit of that: imperative statements
there's more to the world DSLs than this ... but that's what this
talk is limited to
Disclaimer
I'm a build/packaging geek, so that biases my experience!
II. Unfortunate DSL Flaws
Exhibit A: Make
the grand-daddy of them all
- implemented by a summer intern in 1977
- legend: "by the time I realized tabs were a problem, I already had 12 users and didn't want to break their Makefiles"
Make: example
CC = /usr/bin/gcc
CFLAGS = $(OPT) -DDEBUG=1
OPT = -O2
foo: main.c util.c
$(CC) $(CFLAGS) -o $@ $^
- variable declarations (CC = ...)
- dependency declarations (foo depends on main.c, util.c)
- imperative actions: to build foo, run this command
Make is not a disaster
- pro: clean, sensible grammar
- pro: clear distinction between declaration and action
- con: unfortunate choice of tab as syntax to denote action
- con: easy to confuse shell variables and make variables
The real problem with make?
people who use a tool without understanding it
- imagine someone pounding a screw into the wall with a hammer
- the screw is not the problem
- the hammer is not the problem
- the wall is not the problem
make is fine for C projects with a handful of source files and some man pages; that does not mean it's great for every problem
Exhibit B: autoconf
- problem: my code builds fine on my machine, but not on yours
- stdlib differences between Unix variants
- different ways of building shared libraries
- environment differences (I have libfoo installed, you don't)
autoconf's big idea
- examine the environment
- make decisions
- shape the build
How to do this? (ca. 1991)
solution: Bourne shell!
- it's available everywhere!
- it behaves exactly the same on every Unix!
- it's completely portable!
Oh wait... it's not portable
solution: M4 macros!
- autoconf provides macros
- which expand to portable shell code to examine the environment, make decisions, and write files that shape the build
- you just have to invoke the macros that you need
- what could be simpler?
Autoconf example
AC_CHECK_HEADER(foobar.h,
[AC_DEFINE(HAVE_FOOBAR_H, 1,
[Define to 1 if you have <foobar.h>.])],
[AC_MSG_ERROR([sorry, can't do anything for you])])
- AC_CHECK_HEADER expands to shell code see if it can compile a C program that includes foobar.h
- if so: execute the expansion of AC_DEFINE
- if not: execute the expansion of AC_MSG_ERROR
A hard-won lesson
- declaration and action are all jumbled up
- no syntactic distinction at all
- the quoting rules make vanilla shell programming look trivial
- abstractions always leak, but at least make an effort
Exhibit C: RPM .spec files
goal: from source code, build a binary package: one little fragment of a complete working OS
- input: Python-3.6.2.tar.gz
- input: python3.spec
- output: python3-3.6.2-1.x86_64.rpm (and more...)
so: we need to assemble metadata, variables, and fragments of shell code
RPM: example
Name: foobar
Version: 1.5.3
Source: foobar-1.5.3.tar.gz
%prep
[...]
%build
./configure
make
%install
make install
- metadata is clear and obvious
- imperative sections are obvious and clearly delimited
Problem 1: variables
Name: foobar
Version: 1.5.3
Source: foobar-%{version}.tar.gz
- this is called a macro
- most other languages would call it a variable, but whatever
- similar to, but still distinct from, shell variable expansion
Problem 2: subroutines
- every RPM in the world has to unpack a source tarball or .zip file
- most are the same, but some are weird
%prep
%setup -q
- that is also called a macro, although it's very different from
%{version}
- it's syntactically indistinguishable from the section delimiters like
%prep
Problem 3: setting variables
we've already seen one way to set a variable/macro... but why limit ourselves?
%define upstream_name FooBar
Name: foobar
Version: 1.5.3
Source: %{upstream_name}-%{version}.tar.gz
%prep
%setup -q -n %{upstream_name}-src
- also syntactically indistinguishable from
%prep
- so when you see a token matching
/%[a-z]+/
, you
have no way of knowing what it is!
Exhibit D: Python distutils
- problem: building Python extensions requires specialized
knowledge about Python
- opportunity: if you're building Python extensions, you have a
better language than shell or M4 at your fingertips
- Perl's developers noticed the same thing a few years earlier,
hence ExtUtils and MakeMaker
distutils setup.py example
usage: python setup.py cmd [options...]
from distutils.core import setup # or setuptools, nowadays
setup(name='foobar', version='1.5.3', packages=['foobar'])
- very simple example: all metadata, purely declarative
- but how do you know it's declarative?
- this is source code in an imperative language!
setup.py counterexample
import random
from setuptools import setup
name = ('Foo' + 'Bar').lower()
version = open('version').read().strip()
packages = ['foobar']
if random.random() > 0.9:
packages.append('foobar.ui')
setup(name=name, version=version, packages=packages)
- yes, this is a contrived example
- just demonstrating why you cannot parse setup.py
- but people do some pretty hairy things in real-life setup scripts
- because they don't really have an alternative
Exhibit E: Dockerfile
despite the wild visions of XML or Ruby fans, custom DSLs are still a thing
- like RPM .spec files, Dockerfiles specify how to build a deployable binary package
- unlike RPMs, Docker images:
- contain a complete working OS, minus kernel and service manager
- start with other binaries, not source code
- also unlike RPM, Docker is still cool, so sharpen your
pitchforks and get your torches ready
Dockerfile example
FROM debian/8
RUN apt-get update && \
apt-get upgrade && \
apt-get install apache2 python mod_wsgi
COPY conf/myapp.conf /etc/myapp.conf
COPY myapp /usr/bin/myapp
- simple, straightforward, traditional Unix line-based format
- right down to our old friend, backslash-escaped newlines
- no problem, right?
Dockerfile problem example
be careful about making wildly popular software; even Microsoft
will have to support it
FROM microsoft/nanoserver
COPY testfile.txt c:\\
RUN dir c:\
Dockerfile problem example
be careful about making wildly popular software; even Microsoft
will have to support it
FROM microsoft/nanoserver
COPY testfile.txt c:\\
RUN dir c:\
...is actually equivalent to...
FROM microsoft/nanoserver
COPY testfile.txt c:\RUN dir c:\
The fix?
magic comments and dynamic syntax, of course!
# escape=`
FROM microsoft/nanoserver
COPY testfile.txt c:\
RUN dir c:\
Fun prank
slip this into your friend's Dockerfile:
# escape=a
(no, I have not tested this)
Red herrings
pretty sure it's wrong to blame:
- lack of foresight
- insufficient planning
- not ambitious enough
lots more software fails because it is insanely ambitious than
for being too modest in its aims!
My guess at the common error
Insufficiently rich syntax to accomodate future growth
- Java was 10 when it gained decent for loops: not a calamity
- Python was 15 when it gained the "with" statement: not a disaster
- why can't autoconf have decent strings?
- why can't RPM have subroutines
The secret sauce
what do Java and Python have that autoconf and RPM don't?
- MATH!
- formalism
- a grammar
(those are three ways of saying the same thing)
Disclaimer
- I bet that RPM has a grammar now
- I bet it did not originally: probably a retrofit
(have not confirmed either suspicion)
Grammar
here's a bit of a grammar you know well: Python's "if" statement
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
[...long and complicated...]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
- unimportant: you will see this in the docs (language reference)
- absolutely essential: this is part of the source code
Please don't reinvent the wheel
- parsing is a solved problem
- we've known how to do this stuff since the 1960s
- theory and technology have advanced since then, but it's the
same paradigm
- outsource your parser to an expert: lex/yacc, ANTLR,
pyparsing, PLY, whatever
Lessons
- language design is a learnable skill
- designing a language like make or autoconf is much easier than
designing Python or Go
- "it's so easy, even I can do it!"
- but first: learn the trade
←
→
/
#