Learning Haskell the Hard Way

When I was reading the collection of learning resources on Haskell and tried to find a good start, I quickly realized that none of the books or tutorials are suitable for me: the easier a tutorial claims to be, the harder to really understand Haskell by reading it. What I need is a terse documentation that introduces the syntax and semantics of Haskell systematically and clearly, but unfortunately none was found.

I know I have to try the hard way: reading the Haskell language specification directly and absorb it myself. To make the process less dull and record my progress, I will write down my learning notes here incrementally.

Overview

A Haskell program is organized with four levels: modules, declarations, expressions & lexical structures, but the specification is organized in the reverse order.

Haskell has ad hoc polymorphism (overloading) and parametric polymorphism (Hindley-Milner type structure).

Haskell has six namespaces:

for value
- variable
- constructor
for type entity
- type variable
- type constructor
- type class
module

The same name can be reused without conflicts as long as they are in different namespaces.

Lexical Structure

A Haskell program is composed of lexemes (tokens) and whitespaces.

program → { lexeme | whitespace }

Whitespace includes:

The complete set of whitespace in both ASCII & Unicode
Two kinds of comments
- inline comment starts with --
- nested comment wrapped by {- -}

A lexeme is one of the following:

identifier
- qvarid: (qualified) variable identifier
- qconid: (qualified) constructor identifier
- reservedid: reserved identifier
operator
- qvarsym: (qualified) variable (symbolic) operator
- qconsym: (qualified) constructor (symbolic) operator
- reservedop: reserved operator
literal: integer, float, character or string literal
special: one of special symbols ()[]{}`,;

A variable and a constructor is distinguished by the first character and put into different namespaces:

identifier
- variable: lower case (including underscore)
- constructor: upper case
operator
- variable: non-colon
- constructor: colon :

A variable or constructor can contain symbol ', so the common mathematical term “x prime” can be represented as x'.

By using layout rule (indentation), symbols {}; can be omitted in sereral grammer productions.

Expressions

Parentheses

(exp) is a parenthesized expression, and is equivalent to exp.

Function & operator application

Function is prefixed and curried, so f x y means (f x) y.

fexp    → [fexp] aexp

All operators are infixed except prefix negation -.

infixexp→ lexp qop infixexp
        | - infixexp    (prefix negation)
        | lexp
qop     → qvarop | qconop

An operator can be converted to prefix notation by parentheses. e.g.

(+) x y

Reversely, a function identifier (either variable or constructor) can be converted to an infix operator by backquotes.

x `op` y

List & Tuple

List is constructed with :.

1:2:3:[]   or   (:) 1 ((:) 2 ((:) 3 []))

Arithmetic sequence is another way to construct a list.

[1,3..6] == [1,3,5]
  [1..6] == [1,2,3,4,5,6]

Tuple is constructed with (,...,).

(1, 2, "a")   or   (,,) 1 2 "a"

Field label

A field label is used to give a name to a field in a datatype. It can be used to construct, extract and update the field.

A constructor with labeled fields:

aexp    → qcon { fbind1 , … , fbindn }  (n ≥ 0)
fbind   → qvar = exp

Pattern matching

A pattern itself is not an expression, but it is an important part of sereral expressions, including: lambda abstractions, function definitions, let expressions, list comprehensions, do expressions and case expressions.

Pattern matching is used to deconstruct values according to their type specification. It proceeds from left to right, and outside to inside. Attempting to match a pattern can have one of three results:

Fail
Succeed: returning a binding for each variable in the pattern
Diverge: i.e. return ⊥

The syntax for patterns covers a subset of expressions.

A pattern can match against infix expressions, but limited to infix constructors (the operator must be qconop).

pat     → lpat qconop pat (infix constructor)

A pattern can match against constructor functions (with or without field labels).

pat     → ...
        | lpat

lpat    → apat
        | gcon apat1 … apatk    (arity gcon  =  k, k ≥ 1) 

apat    → ...
        | gcon  (arity gcon  =  0)
        | qcon { fpat1 , … , fpatk }    (labeled pattern, k ≥ 0)

fpat    → qvar = pat

A pattern can match against a variable, a literal, a parenthesized expression, a tuple or a list.

lpat    → var ...
        | - (integer | float)   (negative literal) 

apat    → ...
        | literal
        | ( pat )               (parenthesized pattern)
        | ( pat1 , … , patk )   (tuple pattern, k ≥ 2)
        | [ pat1 , … , patk ]   (list pattern, k ≥ 1)

The variables defined within the pattern can be binded, but how to name and bind the whole pattern? This is exactly what an as pattern does (var before @ is the name for the whole pattern).

apat    → var [ @ apat] (as pattern)

Wildcard is used when you need a variable placeholder but do not want to bind the value to a name.

apat    → ...
        | _     (wildcard)

Sometimes you need a pattern that can never fail (only succeed or diverge), it is called a irrefutable pattern.

apat    → ...
        | ~ apat        (irrefutable pattern)

Besides ~apat, these patterns are also irrefutable: a variable, a wildcard, N apat where N is a constructor defined by newtype and apat is irrefutable, var@apat where apat is irrefutable. All other patterns are refutable.

Case expression

Case expression is very important because all other pattern matching expressions ultimately translate into case expressions.

A case expression has one or more alternatives.

lexp    → case exp of { alts }
alts    → alt1 ; … ; altn       (n ≥ 1)

An alternative is either a pattern or empty. The pattern either coresponds to an expression (body) directly, or has one or more guarded patterns (note an optional gdpat appears at the right side of itself). A guarded pattern starts with | and is composed of one or more guards.

alt     → pat -> exp [where decls]
        | pat  gdpat [where decls]
        | (empty alternative) 
gdpat   → guards -> exp [ gdpat ]
guards  → | guard1, …, guardn   (n ≥ 1)
decls   → { decl1 ; … ; decln } (n ≥ 0)

Note that each alternative can optionally have a where declaration. It is used to bind extra variables to be used in the local scope.

There are two types of guards: pattern guard & boolean guard, and local declarations can also be introduced together with guards.

guard   → pat <- infixexp     (pattern guard)
        | let decls           (local declaration)
        | infixexp            (boolean guard)

This is how a case expression works:

The expression after case is matched against each alternative till a match is found.
Then each guarded pattern in the matched alterantive is tested till one passes. A guarded pattern passes if and only if all of its guards pass.
If successful, the conresponding expression is returned, otherwise, the next guarded pattern or alternative is tried sequentially.
If no match can be found, the result is ⊥.

Lambda expression

Lambda is used to construct a function without a name. Besides a variable, the input can also be any pattern.

lexp    → \ apat1 … apatn -> exp        (n ≥ 1)

An example:

(\ (x, y) -> x+y) (3, 7)

Also note that lambda -> associates to the right, e.g.

Integer ->  Integer -> Integer
    is equivalent to
Integer -> (Integer -> Integer)

Let expression

A let expression introduces a nested, lexically-scoped, mutually-recursive list of declarations. exp after keyword in is the value of a let expression.

lexp    → let decls in exp
decls   → { decl1 ; … ; decln } (n ≥ 0)

Some examples:

let {} in 42
let {(x,y) = (1,2)} in x+y

List comprehension

A list comprehension constructs a list of elements represented by exp qualified by one or more qualifiers.

aexp    → [ exp | qual1 , … , qualn ]   (n ≥ 1)

There are three kinds of qualifiers.

A generator is composed of a pattern (pat with type t) and a list (e with type [t]). The pattern is matched against and binded to each element of the list one by one, so the binded variables can be used to generate each element of the result list. The generators are evaluated in a nested, depth-first order, and a failed match is just skipped over.

qual    → pat <- e      (generator)

A qualifier can also be a local declaration to bind auxiliary variables, or a boolean guard to exclude some elements from the list.

qual    → ...
        | let decls     (local declaration)
        | exp           (boolean guard)

Do expression

Wait till monad is fully understood.

Type signature

Haskell has type inference, but an expression can be optionally specified with a type signature.

exp     → infixexp :: [context =>] type

Declarations

There are top declarations (topdecl) that are only allowed at the top level of a module, and nested declarations (decl) that may be used either at the top level or in nested scopes.

topdecl → ...
        | ...
        | decl

Top declarations

A top declaration starts with one of the keywords: data, type, newtype, class, instance, default and foreign.

data, type and newtype is for declaring new types. class is for declaring typeclasses and instance is for declaring the membership between types and typeclasses.

They are explained later one by one.

Nested declarations

In the last section, decl appears in the let expression and where clause without any explanations. Actually, they are nested declarations.

There are five kinds of nested declarations: type signature, fixity, function binding, pattern binding and empty declartions.

A type signature simply specifies types for variables.

decl    → gendecl
        | ...
gendecl → vars :: [context =>] type
vars    → var1 , …, varn        (n ≥ 1)

A fixity declaration gives the fixity and binding precedence of one or more operators.

gendecl → ...
        | fixity [integer] ops
fixity  → infixl | infixr | infix
ops     → op1 , … , opn         (n ≥ 1)
op      → varop | conop

The right hand side (rhs) of function and pattern bindings are the same.

decl    → ...
        | (funlhs | pat) rhs

The syntax of rhs is almost the same as case expression, except -> is replaced by =.

rhs     → = exp [where decls]
        | gdrhs [where decls]
gdrhs   → guards = exp [gdrhs]

A function can be binded in multiple function binding declarations, as long as they are contiguous and the number of patterns is the same. In each of the declaration, the left hand side supports three alternative syntaxes.

funlhs  → var apat { apat }
        | pat varop pat
        | ( funlhs ) apat { apat }

And an example:

plus x y z = x+y+z
x ‘plus‘ y = \ z -> x+y+z
(x ‘plus‘ y) z = x+y+z

Type

Just as an expression may contain variables and denotes a value, a type expression may contain type variables and denotes a type value (but it is evaluated at compile time, unlike an expression evaluated at run time).

Type expressions are designed to have similar representations as their corresponding expressions.

Function type can be represented in infix or prefix notation. Like function application, type application (btype) is also prefixed and curried. atype is the type expression excluding infix function type and type application.

type    → btype [-> type]       (function type)
btype   → [btype] atype         (type application)
atype   → gtycon
        | ...
gtycon  → ...
        | (->)  (function constructor)

A type variable is an identifier beginning with a lowercase letter. A parenthesized type (t) is identical to the type t. A type constructor (type template) as an identifier begins with an uppercase letter. Besides function type, the syntaxes for tuple and list are also special syntaxes.

atype   → gtycon
        | tyvar
        | ( type1 , … , typek ) (tuple type, k ≥ 2)
        | [ type ]              (list type)
        | ( type )              (parenthesised constructor)
gtycon  → qtycon
        | ()            (unit type)
        | []            (list constructor)
        | (->)          (function constructor)
        | (,{,})        (tupling constructors)

Kind

What is *, * -> *…? An ordinary type has kind *. A type constructor (template) that has one type argument has kind * -> *, e.g. a list. So on and so forth.

Context

The context (context => type) is used to indicate that type tyvar belongs to class qtycls.

A context is composed of zero or more class assertions.

context → class
        | ( class1 , … , classn )               (n ≥ 0)

And a class assertion is either either a type variable, or the application of type variable to one or more types.

class   → qtycls tyvar
        | qtycls ( tyvar atype1 … atypen )      (n ≥ 1)
qtycls  → [ modid . ] tycls
tycls   → conid
tyvar   → varid

Algebraic data type

algebraic data type is named so because it is composed of other types by product and sum (algebraic operations). It introduces a new type constructor (simpletype) with zero or more constituent data constructors (constrs).

topdecl → ...
        | data [context =>] simpletype [= constrs] [deriving]

The type constructor begins with upper case letter and may have zero to more type variables as parameters. The type constructor then can be used in curried type application in a type expression.

simpletype      → tycon tyvar1 … tyvark         (k ≥ 0)

On the right side of =, sum (alternative) types are separated by |.

constrs → constr1 | … | constrn         (n ≥ 1)

For each constr, there are three alternative syntaxes.

A data constructor followed by zero or more atype.
An infix data constructor operator between two btype.
A data constructor followed by field declarations.

constr  → con [!] atype1 … [!] atypek   (arity con  =  k, k ≥ 0)
        | (btype | ! atype) conop (btype | ! atype) (infix conop)
        | con { fielddecl1 , … , fielddecln }       (n ≥ 0) 
con     → conid | ( consym )    (constructor)

! is strictness flag to indicate that the corresponding constructor argument is eagerly evaluated.

Type synonym

A type synonym is equivalent to its definition are completely interchangeable.

topdecl → type simpletype = type

Newtype

A newtype declaration introduces a distinct type whose representation is the same as an existing type.

topdecl     → newtype [context =>] simpletype = newconstr [deriving]
newconstr   → con atype
            | con { var :: type }

Unlike type synonyms, newtype may be used to define recursive types.
New instances can be defined for a type defined by newtype but may not be defined for a type synonym.
A type created by newtype has an extra level of indirection compared to an algebraic datatype, so the pattern matching rule is different.

Ad hoc polymorphism

Ad hoc polymorphism (overloading) is supported by typeclasses.

A class declaration introduces a new class and the operations (methods) on it.

Modules

To be continued…

Overview#

Lexical Structure#

Expressions#

Parentheses#

Function & operator application#

List & Tuple#

Field label#

Pattern matching#

Case expression#

Lambda expression#

Let expression#

List comprehension#

Do expression#

Type signature#

Declarations#

Top declarations#

Nested declarations#

Type#

Kind#

Context#

Algebraic data type#

Type synonym#

Newtype#

Ad hoc polymorphism#

Modules#