Chapter 9: I/O
Interacting with real world data is where programming gets interesting. Mastering I/O opens up a world of information for your applications and is quite fun, too! With that, let's delve into working with various file types.
Text files
Opening and closing files
In general, whenever you open a file, you will eventually need to close it. It's quite crucial to be circumspect with these housekeeping duties, lest you end up with corrupted data!
Julia's way of dealing with files resembles that of Python and a number of other languages. First, using the open(path)
function, you open a file. The function returns an object that represents the file within Julia, known sometimes as a file handle. Commonly, the variable file handles are assigned to is f
, but this does not have to be the case.
f = open("textfile.txt")
When you're done with your connection, use the close()
function to close down the file handle:
close(f)
Reading files
Creating a file handle does not actually read the file - it merely checks where it is and figures out how to deal with it. As such, creating your file handle is usually pretty quick and memory-inexpensive, even if the file itself is very large.
However, we want to get something out of the text we imported. For the rest of this chapter, I will be using Virgil's Aeneid to demonstrate text functions, which you can obtain for yourself here courtesy of Project Gutenberg.
Once you have called the open
function and assigned the file handle to f
, you can start reading the file. Julia helpfully offers multiple ways to accomplish this, and it's useful to remember the ability to read a large file line-by-line once we enter the realm of handling large data sets.
You might notice that using the read functions 'uses up' the file. This is true - once you have 'read' all lines, the file will be empty. This is handy for keeping track of how much has been already read and ensure that where a read process has been interrupted, you will know where to pick up the thread.
readall
The readall
function allows you to read the entire file, which it contains in a massive string, with line breaks represented as newlines (\n
):
julia> readall(f)
julia> typeof(readall(f))
String
readlines
Unlike readall
, readlines
creates an array of strings, each representing a line:
julia> readlines(f)
14656-element Array{Union(UTF8String,String),1}:
"\ufeff Arms, and the man I sing, who, forc'd by fate,\r\n"
" And haughty Juno's unrelenting hate,\r\n"
" Expell'd and exil'd, left the Trojan shore.\r\n"
" Long labors, both by sea and land, he bore,\r\n"
" And in the doubtful war, before he won\r\n"
" The Latian realm, and built the destin'd town;\r\n"
" His banish'd gods restor'd to rites divine,\r\n"
" And settled sure succession in his line,\r\n"
" From whence the race of Alban fathers come,\r\n"
" And the long glories of majestic Rome.\r\n"
You can technically iterate through it line by line, which is a useful function as it allows you to perform functions on smaller chunks of data.
i = 1
for line in readlines(f)
println("$i \t $line")
i += 1
end
1 Arms, and the man I sing, who, forc'd by fate,
2 And haughty Juno's unrelenting hate,
3 Expell'd and exil'd, left the Trojan shore.
However, Julia has something far better to accomplish that, as we'll see in the next subsection.
eachline
eachline
creates an iterator that you can use to go through and apply linewise functions. Let's see how many characters are in each line:
for line in eachline(f)
println("$(length(line)) \t $line")
end
46 Now, in clos'd field, each other from afar
46 They view; and, rushing on, begin the war.
58 They launch their spears; then hand to hand they meet;
51 The trembling soil resounds beneath their feet:
56 Their bucklers clash; thick blows descend from high,
51 And flakes of fire from their hard helmets fly.
As we can see, the numbers are quite a bit off. This is ok - it's due to the indentation and the rather archaic spelling. What matters is that we can execute a function on each line of the function.
enumerate
enumerate
is like a bonus function – it takes an iterable and creates an enumerator that provides the index along with the value.
julia> arr = ["a", "b", "c", "d"]
julia> for (index, value) in enumerate(arr)
println(index, " : ", value)
end
1 : a
2 : b
3 : c
4 : d