When I was reading the collection of learning resources on Haskell and tried to find a good start, I quickly realized that none of the books or tutorials are suitable for me: the easier a tutorial claims to be, the harder to really understand Haskell by reading it. What I need is a terse documentation that introduces the syntax and semantics of Haskell systematically and clearly, but unfortunately none was found.
I know I have to try the hard way: reading the Haskell language specification directly and absorb it myself. To make the process less dull and record my progress, I will write down my learning notes here incrementally.
Overview
A Haskell program is organized with four levels: modules, declarations, expressions & lexical structures, but the specification is organized in the reverse order.
Haskell has ad hoc polymorphism (overloading) and parametric polymorphism (Hindley-Milner type structure).
Haskell has six namespaces:
- for value
- variable
- constructor
- for type entity
- type variable
- type constructor
- type class
- module
The same name can be reused without conflicts as long as they are in different namespaces.
Lexical Structure
A Haskell program is composed of lexemes (tokens) and whitespaces.
program → { lexeme | whitespace }
Whitespace includes:
- The complete set of whitespace in both ASCII & Unicode
- Two kinds of comments
- inline comment starts with
--
- nested comment wrapped by
{- -}
- inline comment starts with
A lexeme is one of the following:
- identifier
- qvarid: (qualified) variable identifier
- qconid: (qualified) constructor identifier
- reservedid: reserved identifier
- operator
- qvarsym: (qualified) variable (symbolic) operator
- qconsym: (qualified) constructor (symbolic) operator
- reservedop: reserved operator
- literal: integer, float, character or string literal
- special: one of special symbols
()[]{}`,;
A variable and a constructor is distinguished by the first character and put into different namespaces:
- identifier
- variable: lower case (including underscore)
- constructor: upper case
- operator
- variable: non-colon
- constructor: colon
:
A variable or constructor can contain symbol '
, so the common mathematical
term “x prime” can be represented as x'
.
By using layout rule (indentation), symbols {};
can be omitted in sereral
grammer productions.
Expressions
Parentheses
(exp)
is a parenthesized expression, and is equivalent to exp
.
Function & operator application
Function is prefixed and curried, so f x y
means (f x) y
.
fexp → [fexp] aexp
All operators are infixed except prefix negation -
.
infixexp→ lexp qop infixexp
| - infixexp (prefix negation)
| lexp
qop → qvarop | qconop
An operator can be converted to prefix notation by parentheses. e.g.
(+) x y
Reversely, a function identifier (either variable or constructor) can be converted to an infix operator by backquotes.
x `op` y
List & Tuple
List is constructed with :
.
1:2:3:[] or (:) 1 ((:) 2 ((:) 3 []))
Arithmetic sequence is another way to construct a list.
[1,3..6] == [1,3,5]
[1..6] == [1,2,3,4,5,6]
Tuple is constructed with (,...,)
.
(1, 2, "a") or (,,) 1 2 "a"
Field label
A field label is used to give a name to a field in a datatype. It can be used to construct, extract and update the field.
A constructor with labeled fields:
aexp → qcon { fbind1 , … , fbindn } (n ≥ 0)
fbind → qvar = exp
Pattern matching
A pattern itself is not an expression, but it is an important part of sereral expressions, including: lambda abstractions, function definitions, let expressions, list comprehensions, do expressions and case expressions.
Pattern matching is used to deconstruct values according to their type specification. It proceeds from left to right, and outside to inside. Attempting to match a pattern can have one of three results:
- Fail
- Succeed: returning a binding for each variable in the pattern
- Diverge: i.e. return ⊥
The syntax for patterns covers a subset of expressions.
A pattern can match against infix expressions, but limited to infix constructors
(the operator must be qconop
).
pat → lpat qconop pat (infix constructor)
A pattern can match against constructor functions (with or without field labels).
pat → ...
| lpat
lpat → apat
| gcon apat1 … apatk (arity gcon = k, k ≥ 1)
apat → ...
| gcon (arity gcon = 0)
| qcon { fpat1 , … , fpatk } (labeled pattern, k ≥ 0)
fpat → qvar = pat
A pattern can match against a variable, a literal, a parenthesized expression, a tuple or a list.
lpat → var ...
| - (integer | float) (negative literal)
apat → ...
| literal
| ( pat ) (parenthesized pattern)
| ( pat1 , … , patk ) (tuple pattern, k ≥ 2)
| [ pat1 , … , patk ] (list pattern, k ≥ 1)
The variables defined within the pattern can be binded, but how to name and bind
the whole pattern? This is exactly what an as pattern does (var
before @
is the name for the whole pattern).
apat → var [ @ apat] (as pattern)
Wildcard is used when you need a variable placeholder but do not want to bind the value to a name.
apat → ...
| _ (wildcard)
Sometimes you need a pattern that can never fail (only succeed or diverge), it is called a irrefutable pattern.
apat → ...
| ~ apat (irrefutable pattern)
Besides ~apat
, these patterns are also irrefutable: a variable, a wildcard,
N apat
where N is a constructor defined by newtype and apat is irrefutable,
var@apat
where apat is irrefutable. All other patterns are refutable.
Case expression
Case expression is very important because all other pattern matching expressions ultimately translate into case expressions.
A case expression has one or more alternatives.
lexp → case exp of { alts }
alts → alt1 ; … ; altn (n ≥ 1)
An alternative is either a pattern or empty. The pattern either coresponds to
an expression (body) directly, or has one or more guarded patterns (note an
optional gdpat appears at the right side of itself). A guarded pattern starts
with |
and is composed of one or more guards.
alt → pat -> exp [where decls]
| pat gdpat [where decls]
| (empty alternative)
gdpat → guards -> exp [ gdpat ]
guards → | guard1, …, guardn (n ≥ 1)
decls → { decl1 ; … ; decln } (n ≥ 0)
Note that each alternative can optionally have a where
declaration. It is used
to bind extra variables to be used in the local scope.
There are two types of guards: pattern guard & boolean guard, and local declarations can also be introduced together with guards.
guard → pat <- infixexp (pattern guard)
| let decls (local declaration)
| infixexp (boolean guard)
This is how a case expression works:
- The expression after
case
is matched against each alternative till a match is found. - Then each guarded pattern in the matched alterantive is tested till one passes. A guarded pattern passes if and only if all of its guards pass.
- If successful, the conresponding expression is returned, otherwise, the next guarded pattern or alternative is tried sequentially.
- If no match can be found, the result is ⊥.
Lambda expression
Lambda is used to construct a function without a name. Besides a variable, the input can also be any pattern.
lexp → \ apat1 … apatn -> exp (n ≥ 1)
An example:
(\ (x, y) -> x+y) (3, 7)
Also note that lambda ->
associates to the right, e.g.
Integer -> Integer -> Integer
is equivalent to
Integer -> (Integer -> Integer)
Let expression
A let expression introduces a nested, lexically-scoped, mutually-recursive list
of declarations. exp
after keyword in
is the value of a let expression.
lexp → let decls in exp
decls → { decl1 ; … ; decln } (n ≥ 0)
Some examples:
let {} in 42
let {(x,y) = (1,2)} in x+y
List comprehension
A list comprehension constructs a list of elements represented by exp
qualified
by one or more qualifiers.
aexp → [ exp | qual1 , … , qualn ] (n ≥ 1)
There are three kinds of qualifiers.
A generator is composed of a pattern (pat
with type t
) and a list (e
with
type [t]
). The pattern is matched against and binded to each element of the
list one by one, so the binded variables can be used to generate each element of
the result list. The generators are evaluated in a nested, depth-first order,
and a failed match is just skipped over.
qual → pat <- e (generator)
A qualifier can also be a local declaration to bind auxiliary variables, or a boolean guard to exclude some elements from the list.
qual → ...
| let decls (local declaration)
| exp (boolean guard)
Do expression
Wait till monad is fully understood.
Type signature
Haskell has type inference, but an expression can be optionally specified with a type signature.
exp → infixexp :: [context =>] type
Declarations
There are top declarations (topdecl
) that are only allowed at the top level of
a module, and nested declarations (decl
) that may be used either at the top
level or in nested scopes.
topdecl → ...
| ...
| decl
Top declarations
A top declaration starts with one of the keywords: data
, type
, newtype
,
class
, instance
, default
and foreign
.
data
, type
and newtype
is for declaring new types. class
is for
declaring typeclasses and instance
is for declaring the membership between
types and typeclasses.
They are explained later one by one.
Nested declarations
In the last section, decl
appears in the let expression and where clause
without any explanations. Actually, they are nested declarations.
There are five kinds of nested declarations: type signature, fixity, function binding, pattern binding and empty declartions.
A type signature simply specifies types for variables.
decl → gendecl
| ...
gendecl → vars :: [context =>] type
vars → var1 , …, varn (n ≥ 1)
A fixity declaration gives the fixity and binding precedence of one or more operators.
gendecl → ...
| fixity [integer] ops
fixity → infixl | infixr | infix
ops → op1 , … , opn (n ≥ 1)
op → varop | conop
The right hand side (rhs
) of function and pattern bindings are the same.
decl → ...
| (funlhs | pat) rhs
The syntax of rhs
is almost the same as case expression, except ->
is
replaced by =
.
rhs → = exp [where decls]
| gdrhs [where decls]
gdrhs → guards = exp [gdrhs]
A function can be binded in multiple function binding declarations, as long as they are contiguous and the number of patterns is the same. In each of the declaration, the left hand side supports three alternative syntaxes.
funlhs → var apat { apat }
| pat varop pat
| ( funlhs ) apat { apat }
And an example:
plus x y z = x+y+z
x ‘plus‘ y = \ z -> x+y+z
(x ‘plus‘ y) z = x+y+z
Type
Just as an expression may contain variables and denotes a value, a type expression may contain type variables and denotes a type value (but it is evaluated at compile time, unlike an expression evaluated at run time).
Type expressions are designed to have similar representations as their corresponding expressions.
Function type can be represented in infix or prefix notation.
Like function application, type application (btype
) is also prefixed and
curried. atype
is the type expression excluding infix function type and
type application.
type → btype [-> type] (function type)
btype → [btype] atype (type application)
atype → gtycon
| ...
gtycon → ...
| (->) (function constructor)
A type variable is an identifier beginning with a lowercase letter.
A parenthesized type (t)
is identical to the type t
.
A type constructor (type template) as an identifier begins with an uppercase
letter.
Besides function type, the syntaxes for tuple and list are also special syntaxes.
atype → gtycon
| tyvar
| ( type1 , … , typek ) (tuple type, k ≥ 2)
| [ type ] (list type)
| ( type ) (parenthesised constructor)
gtycon → qtycon
| () (unit type)
| [] (list constructor)
| (->) (function constructor)
| (,{,}) (tupling constructors)
Kind
What is *
, * -> *
…? An ordinary type has kind *
. A type constructor
(template) that has one type argument has kind * -> *
, e.g. a list. So on and
so forth.
Context
The context (context => type
) is used to indicate that type tyvar
belongs to
class qtycls
.
A context is composed of zero or more class assertions.
context → class
| ( class1 , … , classn ) (n ≥ 0)
And a class assertion is either either a type variable, or the application of type variable to one or more types.
class → qtycls tyvar
| qtycls ( tyvar atype1 … atypen ) (n ≥ 1)
qtycls → [ modid . ] tycls
tycls → conid
tyvar → varid
Algebraic data type
algebraic data type is named so because it is composed of other types by
product and sum (algebraic operations). It introduces a new type constructor
(simpletype
) with zero or more constituent data constructors (constrs
).
topdecl → ...
| data [context =>] simpletype [= constrs] [deriving]
The type constructor begins with upper case letter and may have zero to more type variables as parameters. The type constructor then can be used in curried type application in a type expression.
simpletype → tycon tyvar1 … tyvark (k ≥ 0)
On the right side of =
, sum (alternative) types are separated by |
.
constrs → constr1 | … | constrn (n ≥ 1)
For each constr
, there are three alternative syntaxes.
- A data constructor followed by zero or more
atype
. - An infix data constructor operator between two
btype
. - A data constructor followed by field declarations.
constr → con [!] atype1 … [!] atypek (arity con = k, k ≥ 0)
| (btype | ! atype) conop (btype | ! atype) (infix conop)
| con { fielddecl1 , … , fielddecln } (n ≥ 0)
con → conid | ( consym ) (constructor)
!
is strictness flag to indicate that the corresponding constructor argument
is eagerly evaluated.
Type synonym
A type synonym is equivalent to its definition are completely interchangeable.
topdecl → type simpletype = type
Newtype
A newtype declaration introduces a distinct type whose representation is the same as an existing type.
topdecl → newtype [context =>] simpletype = newconstr [deriving]
newconstr → con atype
| con { var :: type }
- Unlike type synonyms,
newtype
may be used to define recursive types. - New instances can be defined for a type defined by
newtype
but may not be defined for a type synonym. - A type created by
newtype
has an extra level of indirection compared to an algebraic datatype, so the pattern matching rule is different.
Ad hoc polymorphism
Ad hoc polymorphism (overloading) is supported by typeclasses.
A class declaration introduces a new class and the operations (methods) on it.
Modules
To be continued…