A Concise Introduction to Gofer/Haskell

Copyright ©1997 by David Matuszek


Caveat: I am a novice at this language, so be especially wary of errors in this summary.

Haskell is a pure functional language; no state information is kept. In other words, the state of a computation can be fully described by the state of the stack.

Gofer shares the same core language as Haskell. Each language has some features the other lacks, but this paper discusses only the common core. For convenience we use the name "Haskell" throughout this paper, rather than repeatedly saying "Haskell and Gofer."

Haskell is a particularly enjoyable language in which to program, as the language does so much to make it easy for the programmer. A lot of structural syntax, such as parentheses and semicolons, simply aren't needed. Powerful operators allow for concise, yet still readable, programs. Although the language is strongly typed, the programmer hardly ever needs to specify types--Haskell figures out types for itself. Functions are first-class objects, and infinite data structures are supported. Haskell is more human-oriented and less computer-oriented than most programming languages and, as a result, is less efficient; it therefore has little practical application.

Haskell has an underlying uniform representation. But for convenience, it has been extended with syntactic sugar to provide familiar-looking representations for arithmetic, strings, lists, and the like.

Comments

There are two kinds of comment in Haskell:

-- Anything after a double hyphen, up to the end of the line.

{- Comments may be enclosed in braces with hyphens. {- These comments can be nested -} . -}

Expressions

There are four simple types:

Primitive type

Representative values

Int

-5, 0, 5

Float

3.14159

Char

'a', '\n', '\\', '\'', '\97', '\o77', '\xFF', '\DEL'

Bool

True, False

A familiar set of functions and operators is provided. In the following table, the notation Char -> Int means a function that takes a character argument and produces an integer result; the notation Int -> Int -> Int means a function that takes two integer arguments and produces an integer result. (The last type in the chain is always the result.) The reason for this notation for types will be explained later.

Functions on integers:

Function

Type

Comment

+  -  *  /  ^

Int -> Int -> Int

Infix add, subtract, multiply, divide, exponentiate

even  odd

Int -> Bool

Prefix tests for parity

div  quot

Int -> Int -> Int

div rounds down (same as /);
quot rounds toward zero

gcd  lcm

Int -> Int -> Int

greatest common divisor, least common multiple

primIntToFloat

Int -> Float

convert to floating point

signum

Int -> Int

-1, 0, or 1, according to whether the argument is negative, zero, or positive

Functions on floating point numbers:

Function

Type

Comment

+  -  *  /  ^

Float -> Float -> Float

Infix add, subtract, multiply, divide, exponentiate

sin  cos  tan  log  exp  sqrt  log  log10

Float -> Float

Usual math functions

pi

Float

3.1415926535

truncate

Float -> Int

Usual math functions

Functions on characters:

Function

Type

Comment

ord

Char -> Int

Convert to integer ASCII value

chr

Int -> Char

Convert from ASCII value

isPrint  isSpace

Int -> Bool

Test for printable or nonprintable character

isAscii  isControl

Int -> Bool

Test for ASCII (?), control character

isUpper  isLower

Int -> Bool

Test for capital or lowercase letter

isAlpha  isDigit

Int -> Bool

Test for letter, test for decimal digit

isAlphanum

Int -> Bool

Test for letter or digit

Functions on Booleans:

Function

Type

Comment

&&  ||

Bool -> Bool -> Bool

Infix and, infix or

not

Bool -> Bool

Prefix not

Polymorphic functions:

Function

Type

Comment

<  <=  ==  /=  >=  >

a -> a -> Bool

Arguments must have same type

show

a -> String

Convert almost anything to a string

Basic operation

Haskell is an interactive language: you type an expression at the Haskell prompt, and Haskell prints the result of evaluating that expression. Thus, if you type 2+2 at the Haskell prompt, Haskell will respond with 4. The usual precedence rules apply, and parentheses may be used. For example, 2+3*4+5 means the same as 2+(3*4)+5. Integers and floating point may not be mixed in the same expression without explicit conversions.

Function definitions must be loaded from a file; they cannot be defined from the Haskell prompt. (This is a consequence of the fact that no state information is kept.) The user interacts with the system by making function calls. There is no "main program"--any function may be called.

In addition to evaluating expressions, the Haskell interpreter recognizes certain commands, all of which begin with a colon:

Command

Meaning

:?

Display a list of the available commands

:l filename

Load in definitions from the specified file

:t expression

Give the type of the expression as well as its value

:q

Quit

To call a function, write the name of the function followed by its arguments. No additional punctuation is used. For example,

f x y
mod 103 10

Case is significant. Variable names and function names begin with a lowercase letter and may contain letters, digits, underscores, and apostrophes. Type names begin with a capital letter.

Binary infix operators (such as + and >) may be converted to prefix form by enclosing them in parentheses. Thus,

(+) 2 2

is the same as

2 + 2

Binary prefix operators (such as mod) can be written in infix form by enclosing them in backquotes. Thus,

mod 100 3

is the same as

100 `mod` 3

Lists

Haskell also contains lists, tuples, and functions.

Lists are written using brackets and commas, e.g. [1,2,4,8]. All the elements of a list must be of the same type; the type of a list is denoted [T], where T is the type of the elements. The list [a], where a is a variable, denotes a list whose elements are all of type a.

The empty list is denoted by []. Lists can be appended (concatenated) with the ++ operator.

A string is a list of characters, that is, it has type [Char]. For convenience, a string may be written with double quotes, e.g. "Haskell" is the same thing as ['H','a','s','k','e','l','l']. Because strings are lists, they may be concatenated with the ++ operator.

Functions on lists (and therefore strings):

Function

Type

Comment

:

a -> [a] -> [a]

(infix) Add an element to the front of the list

++

[a] -> [a] -> [a]

(infix) Append two lists

!!

[a] -> Int -> a

(infix) Return element Int of the list, counting from zero

head

[a] -> a

Return the first element of the list

tail

[a] -> [a]

Return the list with the first element removed

last

[a] -> a

Return the last element in the list

init

[a] -> [a]

Return the list with the last element removed

reverse

[a] -> [a]

Return the list with the elements in reverse order

take

[a] -> Int -> [a]

Return the first Int elements of the list

drop

[a] -> Int -> [a]

Return the list with the first Int elements removed

nub

[a] -> [a]

Return the list with all duplicate elements removed

elem  notElem

[a] -> a -> Bool

Test for membership in the list

length

[a] -> Int

The number of elements in the list

concat

[[a]] -> [a]

Given a list of lists, concatenate all the lists into one list

There are various shortcuts to writing lists. Haskell uses a technique called lazy evaluation: no value is ever computed until it is needed. Lazy evaluation allows Haskell to support infinite lists (and other infinite data structures). Arithmetic over infinite lists is supported, but some operations must be avoided, for example, it is a bad idea to ask for the last element of an infinite list.

Here are some list notations, with brief explanations and examples:

Notation

Explanation

Example

[a..b]

The list of all values from a to b, inclusive

[1..5] == [1,2,3,4,5]
[2.. -2] == []

[a..]

The list of all values equal to or larger than a

[1..] == all positive integers

[a, b..c]

The list of values starting from a and stepping by (b-a), up to and possibly including c

[1,3..10] == [1,3,5,7,9]
[2,1.. -2] == [2,1,0,-1,-2]

[a, b..]

The list of values starting from a and stepping by (b-a)

[1, 3..] == all odd positive integers

[expression_involving_x | x <- list]

The set of all values of the expression involving x, where x is drawn from the list (this is called a list comprehension)

[x*x | x <- [1..]] == the list of squares of positive integers

[expression_involving_x_and_y | x <- list, y <- list]

The set of all values of the expression involving x and y, where x is drawn from one list and y from the other (y varies faster)

[[x,y] | x <- ['a'..'b'], y <- ['x'..'z']] ==
["ax", "ay", "az", "bx", "by", "bz"]

[expression_involving_x | x <- list, condition_on_x]

The set of all values of the expression involving x, where x is drawn from the list and meets the condition

[x*x | x <- [1..10], even x] ==
[4, 16, 36, 64, 100]

[expression_involving_x_and_y | x <- list, condition_on_x, y <- list, condition_on_y]

The set of all values of the expression involving x and y, where x is drawn from one list and meets one condition, while y is drawn from the other list and meets the other condition

[x+y | x <- [1..5], even x, y <- [1..5], odd y] == [3, 5, 7, 5, 7, 9]

As an example of the use of lazy evaluation, the infinite list [x*x | x <- [1..]] is the list of squares of positive integers.

Tuples

Tuples are written using parentheses and commas, e.g. ("John",24,True). The elements of a tuple may be of different types. Two tuples have the same type if they have the same number of elements and those elements have the same types in the same order.

There are very few operations defined on tuples. If a tuple has exactly two elements, the functions fst and snd return the first and second elements, respectively. In general, you access the elements of a tuple by pattern matching, as explained below.

Functions

In Haskell, a function is a "first-class object," able to be used the same way other types are used (e.g. passed as arguments to functions).

Functions are defined using the = operator. For example,

averageOf2 x y = (x + y) / 2

This is actually syntactic sugar for the following:

averageOf2 = \x y -> (x + y) / 2

The expression on the right of the equals sign indicates an (anonymous) function of two arguments, x and y, with the body of the function following the arrow; this function is "assigned to" the variable on the left of the equals sign. (The backslash in this expression is pronounced "lambda.") Anonymous functions are used frequently in Haskell, usually surrounded by parentheses.

Haskell supports function "slices." For example, given the above definition, the expression (averageOf2 25) denotes the function that returns the average of 25 and its (single) argument. As a further example, you could define a Boolean function negative as

negative = (< 0)

(The parentheses are necessary.) Other examples of function slices are (2 *), (2 /), (/ 2), and (1 -). However, (- 1) does not work, because '-' is considered to be the unary minus.

Due to slicing, all Haskell functions could be thought of as having a single argument; for example, a function with type a -> b -> c -> d may be considered to be a function that takes a value of type a as an argument and returns as result a new function of type b -> c -> d, which in turn takes a value of type b as an argument and returns a new function of type c -> d, which takes a value of type c as an argument and returns a result of type d. Thus, the implicit parenthesization is:  a -> b -> c -> d == a -> (b -> (c -> d))

Here are some functions that take functions as arguments:

Function

Type

Comment and example

map

(a->b) -> [a] -> [b]

Applies the function to all elements of the list:
map chr [104, 101, 108, 108, 111] == hello

filter

(a->Bool) -> [a] -> [a]

Returns the list of elements for which the function returns True:
filter odd [1..5] == [1, 3, 5]

iterate

(a -> a) -> a -> [a]

returns the list [x, f x, f f x, f f f x, ...]:
take 5 (iterate (2*) 1) == [1, 2, 4, 8, 16]

foldl

(a -> b -> a) -> a -> [b] -> a

foldl f i x starts with the value i and accumulates the value obtained by repeatedly applying f to the current value and the next element in the list:
foldl (-) 100 [1..3] computes the value (((100-1)-2)-3)

foldl1

(a -> a -> a) -> [a] -> a

Same as foldl f (head x) (tail x)

flip

(a -> b -> c) -> b -> a -> c

Return a function whose first two arguments are reversed:
flip elem "aeiou" 'y' is the same as elem 'y' "aeiou"

(.)

(a -> b) -> (c -> a) -> c -> b

Compose two functions into a single function:
(f . g) x is the same as f (g x)

span

(a -> Bool) -> [a] -> ([a],[a])

Break the list into a tuple of two lists: all those at the front of the list that satisfy the test, and the rest of the list:
span odd [3,1,4,1,6] == ([3, 1],[4, 1, 6])

break

(a -> Bool) -> [a] -> ([a],[a])

Break the list into a tuple of two lists: all those at the front of the list that fail the test, and the rest of the list:
break even [3,1,4,1,6] == ([3, 1],[4, 1, 6])

Pattern matching

Parameter transmission is by pattern matching; thus, a function definition may consist of two or more equations with differing formal parameters. For example, the factorial function may be defined as:

fact 0 = 1
fact n = n * fact (n - 1)

Equations will be tried in order; the pattern 0 in the first equation for fact will match only a call with a zero argument, while the pattern n in the second equation will match anything.

Here are the allowable patterns:

Type of pattern

Examples

Comment

A variable

x

Will match anything

A constant

5  'a'  []

Will match only that value

Wildcard

_

"Don't care"--will match anything, but the matched value can't be used

Tuples

(x, y)

Matches a pair--the component values can be referred to as x and y

Fixed-length lists

[x]  [x,_,y]  []

With the bracket notation, you can only match lists of the given length

Variable-length lists

(h:t)

Matches a nonempty list whose head is h and whose tail is t

As-patterns

p@(x,y,z)  p@(h:t)

When the pattern after the @ matches something, the variable p matches the whole thing; e.g. if p@(h:t) matches [1,2,3], then h matches 1, t matches [2,3], and p matches [1,2,3]

(n+k) patterns

(m+1)

Matches any value equal to or greater than k (k must be an integer); m is k less than the value matched

Pattern matching allows you to write your own functions to extract the fields of tuples. For example, if you have tuples of the form ("Mary", 'f', 38), you might write extraction functions as follows:

name (x, _, _) = x
gender (_, x, _) = x
age (_, _, x) = x

Styles of writing functions

A function consisting of multiple equations may have guards on each equation, as for example:

fact n
    | n == 0    = 1
    | otherwise = n * fact (n - 1)

This is the same as

fact n = case n of
            0 -> 1
            n -> n * fact (n - 1)

Another way to write the factorial function is:

fact n = if n==0 then 1
                 else n * fact (n - 1)

You can introduce new variables with an expression of the form

let declarations in expression

for example:

fact n
    | n==0      = 1
    | otherwise = let m = n - 1 in n * fact m

Another way to introduce local variables is with

function_definition where declarations

as for example in:

fact n
    | n==0      = 1
    | otherwise = n * fact m
    where m = n - 1

Whereas let is an expression and may be used wherever an expression may be use, where is part of the syntax of a function declaration, and can only be used in this way. The scope of variables declared by where is the entire equation.

Haskell is a strongly-typed language, but you seldom need to tell Haskell what type your variables and functions are, because Haskell can almost always figure it out. It is a good idea to specify the types of functions, however, so that Haskell can inform you if a function doesn't have the type you intended. The operator :: means "has type," and can be used as follows:

fact :: Int -> Int

Layouts

Indentation is sometimes important in Haskell.

An equation starts on a new line. Successive lines must be indented; when Haskell finds a line beginning in the same (or earlier) column as the start of the last equation, it takes this as the start of the next equation. Suggested indentation: start every equation in column 1, and indent all remaining lines of the equation. See the various definitions of fact above for examples.

The first declaration following let or where defines the starting column for that group of declarations; all additional declarations must begin in the same column.

latinize word = (snd pair) ++ (fst pair) ++ "ay"
    where vowel = flip elem "aeiou"
          consonant x = isLower x && not (vowel x)
		pair = span consonant word

The first printing character following the word of in a case statement defines the starting column for that group of cases; all additional cases must begin in the same column. One of the definitions of fact given above illustrates this layout.

Not discussed

Haskell is a rich language, and a number of important topics have not been discussed in this paper. Among these are irrefutable patterns, I/O, user-defined data types and type synonyms, polymorphism, type classes, overloading, contexts, and dictionaries.

A final example

The following definition of the quicksort algorithm shows just how nice Haskell can be (at least sometimes).

quicksort [] = []
 
quicksort (pivot:xs) =
     quicksort [x | x <- xs, x < pivot] ++
     [pivot] ++
     quicksort [x | x <- xs, x >= pivot]