Chapter 4: Collections
A taxonomy of collections
We have already encountered some collections in the previous chapter – arrays, tuples, dicts and sets. This chapter examines what we can do with collections and how to best use them for maximum effectiveness. A collection may be indexable, associative or neither. This refers primarily to the way we access individual elements within the collection.
Think of an indexable collection as a shopping list – the only way to identify individual elements is by pointing out their position. If you want to refer to, say, 1 pint of milk, you refer to it as the fifth entry on my shopping list. Elements of an indexable collection are accessed using the square bracket notation collection[index]
, e.g. shopping_list[5]
. Unusually for most programming languages and in sharp contrast to other languages like Python, indices begin with 1, rather than 0. An indexable collection is also only equal to a collection with the same elements if they are in the same order – thus [1, 3, 5, 7] == [3, 7, 1, 5]
yields, as one would expect, false
.
Associative collections, on the other hand, resemble a page from a phone book instead (if any of you actually still remember what one of those things is!). You wouldn't say that your phone number is on page 217, left column, fifth from the bottom. Rather, you would have a key (your name), by reference to which someone can find your phone number (the value). Associative arrays do follow the key-value pair form. They are, therefore, not accessed in the collection[index]
form but rather in the collection[key]
form. An associative collection is not indexable, therefore the order of entries does not matter: two associative collections will be equal as long as they contain the same key-value pairs, order regardless.
Collections that are neither associative nor indexable are a somewhat complex case. Sets are the only frequently used collection that is neither associative nor indexable. They are also special because of the uniqueness constraint, that is, a set may contain each value once and only once. A set does not support the commonly used methods of access, but it does support many of the collection manipulation functions, such as push!()
and pop!()
. The latter, in case you were wondering, returns items in a random order, since the absence of an index means sets are not ordered. Sets are not indexable, consequently two sets that contain the same elements will be considered equal, order regardless.
In addition, collections may be mutable or non-mutable. Put quite simply, a mutable collection is one where you can change particular values after creation, while in an immutable collection, you cannot do so. The typical immutable collections are, of course, tuples – once you have assigned a value to a tuple, you cannot change that value (although you can assign some other tuple or indeed an entirely different value to the variable that holds the tuple). Sets again represent a special case – they are what could be best described as pseudo-immutable – there is no way to access values in a way that could change them, since you normally access an element of a set by its value (which is sufficient in a set thanks to the uniqueness constraint).
Mutable | Immutable | |
---|---|---|
Indexable | Arrays | Tuples |
Associative | Dicts | |
Non-indexable and non-associative | Sets |
Indexable collections
Access
Elements of an indexable collection can be accessed using the square bracket notation, by their ordinal:
julia> prime_array = [2, 3, 5, 7, 11]
5-element Array{Int64,1}:
2
3
5
7
11
julia> prime_array[3]
5
In Julia, a range of numbers is written as start:end
or start:steps:end
. You can use a range to access a range of elements:
julia> prime_array[2:3]
2-element Array{Int64,1}:
3
5
A range always returns a collection, even if it has the length 1. This is exemplified by the difference between prime_array[3]
, the call we made above, and prime_array[3:3]
, which returns
julia> prime_array[3:3]
1-element Array{Int64,1}:
5
julia> prime_array[3:3] == 5
false
You can access the last element of an indexable collection using [end]
:
julia> prime_array[end]
11
Incidentally, end
behaves like a number – so prime_array[end-1]
returns the penultimate element of the collection.
Setting
If the indexable collection you are using is also mutable (e.g. an array), any of these methods will act as a pseudo-variable and will allow you to assign a value to it. Thus prime_array[end] = 12
would change the last element of prime_array
to 12. You also aren't restricted to a single element: calling prime_array[2:4] = 0
would result in
julia> prime_array
5-element Array{Int64,1}:
2
0
0
0
11
And, of course, you can use an array or another indexable collection to replace values:
julia> prime_array[2:4] = [3,5,7]
3-element Array{Int64,1}:
3
5
7
julia> prime_array
5-element Array{Int64,1}:
2
3
5
7
11
Unpacking
Indexable collections can unpack: that is, they can be assigned in a single line to as many distinct variables as they have elements. This is a very useful convenience feature, and is much used in functional programming:
julia> actors = ["Ian McKellen", "Martin Freeman", "Elijah Wood"]
3-element Array{String,1}:
"Ian McKellen"
"Martin Freeman"
"Elijah Wood"
julia> gandalf, bilbo, frodo = actors
3-element Array{String,1}:
"Ian McKellen"
"Martin Freeman"
"Elijah Wood"
julia> gandalf
"Ian McKellen"
Unpacking can also be used to swap the contents of variables:
julia> firstname = "Irving"
julia> lastname = "Washington"
julia> firstname, lastname = lastname, firstname
("Washington","Irving")
julia> lastname
"Irving"
Common functions
push!
, pop!
and append!
push!
appends the value to the end of the collection. pop!
takes the last element of the list, returns it and removes it from the collection.
julia> array = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
julia> push!(array, 5)
5-element Array{Int64,1}:
1
2
3
4
5
julia> pop!(array)
5
julia> array
4-element Array{Int64,1}:
1
2
3
4
append!
, somewhat unusually, puts the elements of a collection to the end of another collection:
julia> array2 = [5,6,7]
3-element Array{Int64,1}:
5
6
7
julia> append!(array, array2)
7-element Array{Int64,1}:
1
2
3
4
5
6
7
shift!
and unshift!
shift!
and unshift!
are the front equivalent of pop!
and push!
.
Similarly to pop!
, shift!
retrieves the first element of the collection and removes it from the collection (which shifts it):
julia> shift!(array)
1
Similarly to push!
, unshift!
puts an element to the front of the collection:
julia> unshift!(array, 8)
7-element Array{Int64,1}:
8
2
3
4
5
6
7
find
functions
There's a set of functions starting with find — such as find()
, findfirst()
, and findnext()
— that you can use to get the index or indices of values within an indexable collection that match a specific value, or pass a test. These functions share three properties.
- Their result is the index or indices of the value sought or tested for, with the n_th element's index being n, not n-1 as you might be used to from other languages.
- You can use these functions in two principal forms: you can test for a value or you can test for a function, in which case the results will be the values for which the function returns
true
. - The
find
functions' way of telling you they haven't found anything is returning zero, since there is no element of index zero.
Let's try to find things within the following Array
: primes_and_one = [1,2,3,5,7,11,13,17,19,23]
findfirst()
findfirst()
finds the first occurrence of a value and returns its index (or zero):
julia> findfirst(primes_and_one, 5)
4
As noted above, we can feed the find
functions a function as well – it will return values for which the function would return a true
value. We have not really discussed functions, but the general idea should be familiar to you. A function of the form x -> x == 13
results in true
if the value of x
is 13 and false
otherwise. Let's try to see which prime number is the first to equal 13 (don't expect big surprises):
julia> findfirst(x -> x == 13, primes_and_one)
7
You might have noticed that unlike in the case where you were searching for a particular value, where you're searching by a function, the function comes first. This is a little idiosyncrasy, but has to do with the way Julia determines what to do based on what it is provided with. A lot more of this will be explored in Chapter [X].
find()
find()
returns an array of results. Thus, for instance, let's use the isinteger
function to see which of our primes are integers (yet again, the result should not come as a shock):
julia> find(isinteger, primes_and_one)
10-element Array{Int64,1}:
1
2
3
4
5
6
7
8
9
10
findnext()
findnext()
returns results from a given index onwards. Thus, if you want to know the index of the first odd number after 3 in the list of primes, you would proceed as follows (using the function isodd
, which, as you could guess, returns true
for odd integers and false
otherwise):
julia> findnext(isodd, primes_and_one, 4)
4
Wait, why 4
? As you might remember, Julia is 1-indexed, not 0-indexed. Therefore, an index 'begins before' the number. The number after the first index is the first number in the sequence and so on. As such, the number after the third item in a collection is the item next to (= following) the index 4
, not 3
.
As you might have noticed, when you use a function as an argument, you do not use the parentheses you would normally use to call a function. That is because function()
means call this function
while function
is merely a reference to the function object itself.
Get elements, not indices
So far, we've only been getting indices. How do we get the actual elements? The answer is, of course, by using our magical []
(square brackets) syntax. We'll also use this as a good opportunity to introduce a very useful function, isprime()
, which returns true
for primes and false otherwise:
julia> import Primes
julia> find(isprime, primes_and_one)
9-element Array{Int64,1}:
2
3
4
5
6
7
8
9
10
julia> primes_and_one[find(isprime,primes_and_one)]
9-element Array{Int64,1}:
2
3
5
7
11
13
17
19
23
Filtering
The filter()
function works quite similar to find
, except in this case returns only the elements that satisfy the condition (it is, effectively, a shorthand for the previous listing).
julia> filter(isodd, primes_and_one)
9-element Array{Int64,1}:
1
3
5
7
11
13
17
19
23
filter()
can be used in-place, by using the !
after the name of the function. Thus, using filter!()
, alters the actual array rather than returning a filtered copy. Note, however, that functions ending with !
modify the object, so, obviously, the type they act on must be mutable – you would, therefore, not be able to filter!()
a tuple, even though you would be able to filter()
it.
Sorting
The sort()
function sorts an array lexicographically, generally in an ascending order:
julia> sort([-3, 2, 1, 7])
4-element Array{Int64,1}:
-3
1
2
7
You can specify the sort criterion using by
– in this case, we will be using the absolute value function abs
(remember not to use the parentheses symbolising function call, just the name of the function):
julia> sort([-3,2,1,7], by=abs)
4-element Array{Int64,1}:
1
2
-3
7
You can change the order of sorting using rev
:
julia> sort([-3,2,1,7], by=abs, rev=true)
4-element Array{Int64,1}:
7
-3
2
1
And, for the great joy of algorithm nerds like me, you can choose the sort algorithm using alg
. Julia currently supports three sorting algorithms (InsertionSort
, QuickSort
and MergeSort
).
julia> sort([-3,2,1,7], by=abs, rev=true, alg=MergeSort)
4-element Array{Int64,1}:
7
-3
2
1
For mutable indexable collections, such as arrays, you can use sort!()
, which sorts 'in place'. Of course, you can also sort non-numeric elements, or indeed anything for which the isless()
function is defined, which sorting uses internally.
julia> sort(["Bayes", "Laplace", "Poisson", "Gauss"])
4-element Array{String,1}:
"Bayes"
"Gauss"
"Laplace"
"Poisson"
Counting
count()
tells you the number of instances in the collection that satisfy the criterion:
julia> count(isodd, primes_and_one)
9
all()
and any()
all()
and any()
implement two of the mathematical concepts known as quantifiers, with all()
representing the universal quantifier \forall
, while any()
implements the existential quantifier. These functions test whether all or any, respectively, of a collection satisfies a certain criterion, and return a single truth value.
Existence of a particular value
To find out whether an array has a particular value among its elements, you can use in()
:
julia> in(2, primes)
true
Somewhat strangely, in the in()
syntax, the needle comes before the haystack, i.e. in(value, array)
, where value
denotes the value you are looking for.
Particular types
Arrays
Arrays (the ones we used in our examples so far in this section) are mutable indexable collections. The type Array{T,N}
indicates an N
-dimensional array which elements' types are subtypes of T
. For instance, Array{Number, 2}
is an 2-dimensional array. Its elements' types descend from Number
(e.g.Int
, Float64
).
Access in multidimensional arrays
How do we access elements in a multidimensional array, a special form of indexable collection? Simple – in a multidimensional array, indexes go down each row, then from left to right. Therefore, this array
julia> md_array = ["A" "B"; "C" "D"]
2x2 Array{String,2}:
"A" "B"
"C" "D"
would be indexed as follows:
md_array[1] = "A"
md_array[2] = "C"
md_array[3] = "B"
md_array[4] = "D"
This is a little counterintuitive and different from the usual row/column notation, where you would use array[row][column]
. To retrieve a cell by row and column, use array[row, column]
:
julia> md_array[1,2]
"B"
This generalizes for higher dimension arrays.
Tuples
A tuple is an ordered sequence of elements, like an array. A tuple is represented by parentheses and commas, rather than the square brackets used by arrays. The important difference between arrays and tuples is that tuples are immutable: you can't change the elements of a tuple, or the tuple itself, after creating it.
Tuples are generally used for small fixed-length collections — they're ubiquitous across Julia, for example as argument lists. Where a function returns multiple values, which, as we see, is pretty often the case, the result is a tuple.
A corollary of immutability is that none of the !
functions work on tuples - but at the same time, using functions such as push()
is perfectly acceptable, since it returns a copy of the tuple with the added element, which does not alter the original tuple.
Associative collections: dicts
An associative collection is a kind of non-indexed collection that stores (usually) pairs of values. The indexable collections you have encountered correspond to real-life examples such as a shopping list or a sequential list of train stations. Associative collections, on the other hand, have a key and a value (for this reason, they are sometimes referred to as key-value pairs). Julia, like many other programming languages, implements associative collections in an object called a dict
(short for dictionary
), which corresponds to 'maps', 'hash tables' or 'dictionaries' in other programming languages.
A dict, as we have seen, is usually created using the dict literal
dict = Dict("a" => 1, "b" => 2, "c" => 3)
The key of a key-value pair is unique, meaning that while several keys might point at the same value (and a key might point at a collection as a value), you cannot have duplicate keys (in database terminology, you might have heard this referred to as a one-to-many relationship).
Creating dicts
Other than the Dict()
literal, there are three more ways to create a dict.
First, you can create a dict using the comprehension syntax for dicts. An example is
[i => sqrt(i) for i = 1:2:15]
This creates a dict with the square root of every odd number from 1 to 15. In this case, i
can be any iterable – while ranges are the most frequently used, there is no reason why
Dict(i => sqrt(i) for i = [2, 5, 6, 8, 12, 64])
would not be equally valid.
Secondly, you can also create an empty dictionary. Dict()
will construct an empty dictionary that permits any elements, while Dict{type1, type2}()
will create an empty dictionary that permits any elements with keys of type1
and values of type2
.
Finally, earlier versions of Julia used to support what is sometimes referred to as zip creation of a dict, namely entering two equal-length tuples, one for keys and one for values. This is now regarded as deprecated – it still works, but you should not use it. Instead, the correct syntax is Dict(zip(ks, vs))
:
ks = ("a", "b", "c")
vs = ("1", "2", "3")
julia> Dict(zip(ks,vs))
Dict{String,String} with 3 entries:
"c" => "3"
"b" => "2"
"a" => "1"
Access
Just like items in an indexable arrays are keyed by their index, items in a dict are identified by their key and retrieved using the square bracket syntax:
julia> statisticians = Dict("Gosset" => "1876-1937", "Pearson" => "1857-1936", "Galton" => "1822-1911")
Dict{String,String} with 3 entries:
"Galton" => "1822-1911"
"Pearson" => "1857-1936"
"Gosset" => "1876-1937"
julia> statisticians["Gosset"]
"1876-1937"
One drawback of the bracket syntax is that if there is no entry for the key provided, Julia will raise an error:
julia> statisticians["Kendall"]
ERROR: key not found: "Kendall"
in getindex at dict.jl:644
An alternative form of accessing a dictionary is using the get()
function, which accepts a default value:
julia> get(statisticians, "Pearson", "I'm sorry, I don't know when this person lived.")
"1857-1936"
julia> get(statisticians, "Kendall", "I'm sorry, I don't know when this person lived.")
"I'm sorry, I don't know when this person lived."
An advantage of this is that you can create a default value, which the function will return if it cannot find the key requested. Unlike in some other programming languages, a default is not optional for Julia:
julia> get(statisticians, "Kendall")
ERROR: `get` has no method matching get(::Dict{String,String}, ::String)
Get or create (get!()
)
Because dicts are mutable, get!()
can try to access a value by its key and create it if not found, then return the new value. Its syntax is identical to get()
:
julia> get!(statisticians, "Kendall", "I'm sorry, I don't know when this person lived.")
"I'm sorry, I don't know when this person lived."
julia> statisticians
Dict{String,String} with 4 entries:
"Galton" => "1822-1911"
"Pearson" => "1857-1936"
"Kendall" => "I'm sorry, I don't know when this person lived."
"Gosset" => "1876-1937"
pop!()
pop!()
returns the key-value array matching the key. If the key does not exist, it returns an optional default value or throws an error:
julia> pop!(statisticians, "Gosset")
"1876-1937"
julia> statisticians
Dict{String,String} with 3 entries:
"Galton" => "1822-1911"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
Change values
To change a value, access it via the bracket syntax, then assign it the new value:
julia> statisticians["Kendall"] = "1907-1983"
"1907-1983"
julia> statisticians
Dict{String,String} with 4 entries:
"Galton" => "1822-1911"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
"Gosset" => "1876-1937"
Checking for existence of a key or a key-value pair
To check for the existence of a key without retrieving it, you can use haskey()
:
julia> haskey(statisticians, "Galton")
true
julia> haskey(statisticians, "Bayes")
false
You can also check for the existence of a key-value pair in a dict using the in()
function you might be familiar with from arrays. Note that the notation of a key-value pair in this case will be in the form of a tuple (key, value)
rather than using the associative array symbol =>
:
julia> in(("Bayes", "1702-1761"), statisticians)
false
Retrieving keys or values
To retrieve all keys of a dict, use keys()
. This will retrieve an object of type KeyIterator
, which does just what the name suggests - it iterates through the keys of an array. This will be useful later on when we want to iterate through the dictionary by keys:
julia> keys(statisticians)
KeyIterator for a Dict{String,String} with 3 entries. Keys:
"Galton"
"Pearson"
"Kendall"
You can retrieve values, predictably, by using the values()
function:
julia> values(statisticians)
ValueIterator for a Dict{String,String} with 3 entries. Values:
"1822-1911"
"1857-1936"
"1907-1983"
Sorting
You may have noticed that dicts are unordered. Even a dict generated by reference to a range, such as the one seen above, will not be in any particular order:
julia> [i => sqrt(i) for i = 1:2:15]
Dict{Int64,Float64} with 8 entries:
7 => 2.6457513110645907
9 => 3.0
13 => 3.605551275463989
3 => 1.7320508075688772
11 => 3.3166247903554
5 => 2.23606797749979
15 => 3.872983346207417
1 => 1.0
This is because dicts are not indexable, therefore there is no ordering that would make inherent sense. However, sometimes, we like dictionaries sorted. Disappointingly, sorting dicts is not as easy as sorting arrays: sort(statisticians)
tells us that 'sort' has no method matching sort(::Dict{String,String})
. Therefore, you have to write your own sort function that first converts statisticians
from a dict into an array of 2-element tuples. This is because the sort()
function has no defined methods for dicts, but it can sort arrays, including tuples, where it sorts by the first element in the tuple. Then, it iterates through the result and represents it as a dict again:
result = Dict{String, String}
for (k,v) in sort(collect(statisticians))
result[k] => v
println(result)
end
This yields the expected result:
Galton => 1822-1911
Kendall => 1907-1983
Pearson => 1857-1936
If you want the output to be in-place or yield an actual dict, you will have to augment your code a little:
result = Dict()
for (k::String,v::String) in sort(collect(statisticians))
setindex!(result, v, k)
end
julia> result
Dict{Any,Any} with 3 entries:
"Galton" => "1822-1911"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
Merging
The function merge()
merges two or more dicts.
julia> mathematicians = Dict("Gauss" => "1777-1855", "Leibniz" => "1646-1716", "Abel" => "1802-1829")
Dict{String,String} with 3 entries:
"Abel" => "1802-1829"
"Leibniz" => "1646-1716"
"Gauss" => "1777-1855"
julia> merge(mathematicians, statisticians)
Dict{String,String} with 6 entries:
"Abel" => "1802-1829"
"Galton" => "1822-1911"
"Leibniz" => "1646-1716"
"Gauss" => "1777-1855"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
Its bang counterpart, merge!()
, merges them in place, overwriting the first dict mentioned while leaving the second intact.
julia> merge!(mathematicians, statisticians)
Dict{String,String} with 6 entries:
"Abel" => "1802-1829"
"Galton" => "1822-1911"
"Leibniz" => "1646-1716"
"Gauss" => "1777-1855"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
julia> mathematicians
Dict{String,String} with 6 entries:
"Abel" => "1802-1829"
"Galton" => "1822-1911"
"Leibniz" => "1646-1716"
"Gauss" => "1777-1855"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
julia> statisticians
Dict{String,String} with 3 entries:
"Galton" => "1822-1911"
"Pearson" => "1857-1936"
"Kendall" => "1907-1983"
Non-indexable non-associative collections: sets
You might be familiar with the idea of sets from maths/set theory. A set is a non-indexable, non-associative and non-mutable collection that also has unique elements. No element may occur twice, so an element's value identifies it conclusively.
Creating sets
To create a set, use the Set()
constructor function. You can create a set that accepts any data type
julia> primes = Set()
Set{Any}({})
– or you can specify what sort of data types it would accept:
julia> primes = Set{Int64}()
Set{Int64}({})
You can create and fill sets in one go by listing elements surrounded by curly braces {}
, and if you surround the elements with square brackets []
instead of curly braces {}
Julia will guess the type:
julia> mersenne_primes_set = Set([3, 7, 31, 127])
Set([7,31,3,127])
Set operations
Sets have some unique functions that accommodate certain problems well-known from set theory: the functions union()
, intersect()
and setdiff()
each, respectively, implement the union, intersection and difference of sets. Let's see how we can use this to find some similarities between the cast of two blockbusters, The Lord of the Rings and The Matrix.
First, let's create two sets with some actors from each movie:
lotr_actors = Set(["Elijah Wood", "Ian McKellen", "Viggo Mortensen", "Hugo Weaving"])
matrix_actors = Set(["Keanu Reeves", "Lawrence Fishburne", "Hugo Weaving"]
To find shared actors, we can use intersect()
:
julia> intersect(lotr_actors, matrix_actors)
Set(String["Hugo Weaving"])
To find actors who only starred in Lord of the Rings but not in The Matrix, we can use setdiff()
, which shows all elements that are in the first Set
but not the second:
julia> setdiff(lotr_actors, matrix_actors)
Set(String["Elijah Wood","Ian McKellen","Viggo Mortensen"])
Finally, we can see actors who played in either of the movies, by using union()
:
julia> union(lotr_actors, matrix_actors)
Set(String["Elijah Wood","Ian McKellen","Hugo Weaving","Keanu Reeves","Lawrence Fishburne","Viggo Mortensen"])
Collections and types
Until now, we have generally created collections using literals, and with precious little regard to the types of information that go in them. While types will be discussed in quite a bit of detail later on, what we do know about types is that they are individual categories of data.
Julia operates what is called type inference: unless you tell it explicitly what type something is, it tries to figure it out best as it can. We see this in operation when we create a new collection. When a collection is created and Julia is not told that this is going to be a collection containing elements of only a particular kind or particular kinds of values, it makes an educated guess. The REPL tells us this much:
julia> mersenne_primes = [3, 7, 31, 127, 8191, 131071]
6-element Array{Int64,1}:
3
7
31
127
8191
131071
Upon creating the array, the REPL reports to us that it's an array consisting of six elements, all of type Int64
– a type of signed 64-bit integer (don't worry if that means quite little to you just yet, we will be discussing various integer types in Chapter [X]). It also, helpfully, reports to us that we've got a 1-dimensional array.
Type inference and dissimilar types
What, however, if I want to play it a little wild and mix it up? Consider the following array:
julia> not_really_mersenne_primes = [3, 7, "potato", 127, π, "wallaby"]
6-element Array{Any,1}:
3
7
"potato"
127
π = 3.1415926535897...
"wallaby"
As you have guessed, Julia is at a loss as to what to do with this, since we've got a mix of integers, strings and a constant thrown in for good measure. Therefore, it tells us that it has inferred the type of the collection to be Any
– a type that applies to all objects.
Type inference and empty collections
The other marginal case is that of the empty set. Julia has a dedicated type, None
– a subtype of Any
– that applies to the empty set:
julia> empty_set = []
0-element Array{None,1}