11 Input/Output
11.1 Fast I/O
To preface this initial section, if you are working in a locked version of R, due to company policy, you may skip to the built-in section below.
Two packages provide much of the general I/O funtionality necessary:
readr
data.table
In addition, PKPDmisc
offers wrappers that provide tailored input/output:
- read_nonmem()
- write_nonmem()
- read_phx() # for phx nlme datasets with a units column
11.2 Built-in I/O
There are two primary reasons to introduce the built-in I/O functions as well:
data.table
orreadr
will not always be available on your/collegues computers- the built in functions are designed to be incredibly robust, therefore will often be able to handle edge cases that
readr
anddata.table::fread
fail on.
The principal way of reading/writing data in R read.table
and write.table
. Additionally, some “custom” functions with commonly used defaults are also available.
read.csv
is simplyread.table
with the default separator set to,
and a couple other slight modifications
Another useful function is source
. source
allows us to read and execute R files automatically.
The read.table
function has some important arguments we can go through for later reference (note all the arguments are obviously the same for read.csv
)
file
- name of fileheader
- indicates if file has header linesep
- how columns are separatedcolClasses
- character vector with class of each columnnrows
- number of rowscomment.char
- character string to indicate the comment characterstringsAsFactors
- how to handle strings (factor or simply strings)skip
- how many lines to skip before starting to read Lets also look at a couple relevant defaults forread.table
andread.csv
?read.table
- unless specificied path is relative to current working directory
- header = FALSE
- stringsAsFactors = TRUE
- R will skip lines that begin with a # (comment.char = “#”)
?read.csv
- header = TRUE
- sep = “,”
- quote = “"”
- dec = “.”
- fill = TRUE
- comment.char = “”
11.3 Some tidbits
Assigning colClasses
will significantly speed up the read.table process - this can be valuable for highly rich data.
A quick and dirty way of allowing R to detect and assign them for you is via:
initial <- read.table("data.txt", nrows = 50)
classes <- sapply(initial, class)
all_data <- read.table("data.txt", colclasses = classes)
11.4 Output
Many of the output settings are similar to input settings. Again, your best bet is to spend a little bit of time checking out the documentation.
A couple values that are relevant and frequently useful:
quote = FALSE
- will keep column names from being output with quotes around themna
- allows you to declare the string to use for missing values in datarow.names = FALSE
- don’t output additional (any usually unnecessary for us) row names columnappend = TRUE
- rather than overwriting a file w/ that name - it appends the results to the end of the file
cookbook-R on writing text and output to files
11.4.1 Your task
- you are given a dataframe with a header and units column
- create list with the column names and units that you can reference as necessary during analysis
- are there other ways you could store units to access them later? attributes? What are the risks?
- BONUS: write a new function write_phx that has both the header column as well as a units column below to make importing to phoenix and setting units easier. Write it such that you would not lose the type of each column when reading back into R