Make two dataframes from built-in dataset, “Body Temperature Series of Two Beavers”
beaver_1 <- beaver1
beaver_2 <- beaver2
here
packageWhether they’re collaborating with colleagues, uploading to data repositories after submitting a paper, or uploading scripts to Github for anyone to use, people who write code constantly share their code. A great time-saving strategy for streamlining code-sharing is to use here
, a package intended to facilitate file referencing by getting rid of absolute filepaths (which makes your code fragile and difficult to integrate into other people’s systems) and instead using relative filepaths (which makes your code easier to transfer to other people’s systems). See Jenny Bryan’s overview of the package for more information and convincing arguments for adopting here
.
here
begin?For this workshop, you’ve created a new R project in, for example, a folder where you keep all your R code. The folder that contains this R project is where here
begins (aka your top-level directory).
library(here) # first let's load the library.
## here() starts at C:/Users/sbrei/Documents/R_Projects/Collabs/BGSS_Retreat_2021
here::here() # this is where your folder begins.
## [1] "C:/Users/sbrei/Documents/R_Projects/Collabs/BGSS_Retreat_2021"
If you want to change your root folder, you should close your R session, move the current R project folder to the location where you want it to start, and then open a new R session, type here::here()
again, and then the output should show you your new root location.
A function is a command that executes one to several tasks all at once. You’ve probably used functions before. In R, for example, we frequently use functions that have been made by other people: read.csv()
and print()
are functions, to name a few.
Writing functions is a fundamental component of programming and therefore scares many people. However! Functions can be easier than you’d think. For example, think of what your perfect bar plot looks like. How much time do you spend making your bar plots look like that? Wouldn’t it be great if the very first time you plotted a barplot it looked exactly the way you wanted it to? With functions you can do that, and all the knowledge you’d need is how to make the barplot look the way you want. This workshop gives a quick overview of how to use functions to make your life easier. Demonstrations will be done in R, but are applicable to most languages.
One of the more powerful aspects of programming languages relative GUI statistical softwares is the ability to create functions. Functions are extremely powerful because they allow for the automation of repetitive tasks and reduce bloated code from the more traditional “copy-paste” format. This becomes extremely powerful when combined with the apply family or loops. However, even for simple tasks, functions can make your life easier.
In a 2016 interview with Naturally Speaking, Hadley Wickham explains,
“A really good rule of thumb is, as soon as you’ve copied and pasted something more than twice, it’s time to start thinking, ‘how can I get rid of that duplication?’”
So if you think it wise to take the advice of RStudio’s widely regarded Chief Scientist (which I certainly do!), then you should consider making your own functions when you find yourself copying and pasting something 3 or more times.
There will always be times to break this rule, but here’s Hadley again on why it’s important to learn how to write your own functions:
“Any time you’ve got duplication in your code, there’s just a chance for… bugs to come in because… you change one, and you forget to change another one, you get inconsistencies. It’s really frustrating.”
When you set up a function, the first two things you want to think about are: 1) what you will name your function, and 2) the variables the function will need in order to produce an output.
Then, integrate those things into the format of function, which is fairly simple:
functionName <- function(arguments){ arguments + operation }
In the function below, the name is add_numbers
and the input variables are called number_1
, number_2
, and number_3
. Big surprise: it adds three numbers together!
After adding your function name and variables as in the format below (you can have lots of input variables, more than the three we’re using), add a set of curly brackets. Inside the curly brackets ({}), write the chunk of code that you want your function to execute (the body). The body should reference your input variables. It should also be indented, which R will automatically do once you press “Enter” after typing the curly brackets.
add_numbers <- function(number_1, number_2, number_3) {
number_1 + number_2 + number_3
}
add_numbers(1,10,100)
## [1] 111
Take a look at how we’ve applied the same structuring logic to set up this function as well. This function is performing the write.csv()
function on an object referenced as “dataframe”. It will export the object to a file referenced as “filename”.
I’ve explicitly told R that the write.csv function should use its “x” argument on what I’ve called “dataframe”, and “file” on “filename”.
Beaver_to_csv1.1 <- function(dataframe, filename) {
write.csv(x = dataframe,
file = filename)
}
In this first example, we want to export beaver csv with only filename in path. Notice we’ve added the file type (extension) to the filename: .csv
# Create object that we'll use as our filename
filename1.1 <- "Beaver1.csv"
# Use function to export dataframe object to csv
Beaver_to_csv1.1(beaver_1,
filename1.1)
We want to make a function that can take any data frame and save it with any fileformat to an export folder. First, need to create a folder that is made for all the exports of the beaver file. To begin, we first need to find where your project folder begins (its root). Then manually create a new folder in that directory called “exports”.
## Make a folder especially for beaver file exports
here::here()
## Create new input variables for new function
# dir.create("./exports") ## make directory if it doesn't already exist
path1.2 <- here::here("./exports/ ") ## make that directory
Next, we make the function that exports dataframe object to csv in csvs folder. To accommodate the file path and file type objects, we’ll add a two more input variables to our new function’s set of input variables.
Beaver_to_csv1.2 <- function(dataframe, path, filename, filetype) {
write.csv(dataframe,
gsub(" ", "",
paste0(path,
filename,
filetype)))
}
Here’s what the function is doing:
paste0
. The paste0()
function is taking three input variables and pasting them together: path, filename, and filetype.gsub()
function. This function is finding any spaces and substituting them with nothing (essentially removing any spaces it finds, particularly the space we had to put in the path).write.csv()
function, which is taking our dataframe as the object to be exported, and will name and export this object where the conglomerate path + filename + filetype tells it to.Export dataframe to csv in csvs folder:
Beaver_to_csv1.2(beaver_1,
here::here("./exports/ "),
filename = "Beaver1",
filetype = ".csv")
Just for fun, add .txt as a file type option:
Beaver_to_csv1.2(beaver_1,
here::here("./exports/ "),
filename = "Beaver1",
filetype = ".txt")
We could skip the step of specifying exactly what we’re using for each argument just input it all on the fly if we know the order of the arguments R is expecting, as shown here:
Beaver_to_csv1.2(beaver_1,
here::here("./exports/ "),
"Beaver1",
".txt")
R will treat the variables you supply to the function in the order that the arguments appear in the function. If you have strong familiarity with the function, or if the function is fairly simple, you can simply supply the variables as done above (i.e., you don’t need the “x =” portion). However, if it provides clarity or if your function is complex, you can specify each argument. This also allows you to supply arguments in different orders.
Beaver_to_csv1.2(filetype = ".csv",
dataframe = beaver_1,
filename = "Beaver1",
path = here::here("./exports/ "))
Be careful here. If you supply the arguments in the wrong order and don’t indicate which variables match the arguments, the function might not work. Here’s an example with the basic function lm
.
With this function, R is expecting the formula first and then the data second. You’ll find that reversing those arguments does not work.
lm(beaver_1, day ~ temp) # This doesn't work
lm(day ~ temp, beaver_1) # This works
##
## Call:
## lm(formula = day ~ temp, data = beaver_1)
##
## Coefficients:
## (Intercept) temp
## 352.2220 -0.1633
However, if we specify that beaver_1 is the data for this linear model, it will work:
lm(data = beaver_1, day ~ temp) # This works even though it's not in the order the function was written to expect.
##
## Call:
## lm(formula = day ~ temp, data = beaver_1)
##
## Coefficients:
## (Intercept) temp
## 352.2220 -0.1633
Sometimes functions act upon arguments that are encased in quotations marks, sometimes not. There is a difference. Be aware of this and consider this if your function is failing.
lm(data = "beaver_1", day ~ temp) # This doesn't work
Beavers <- list(beaver_1, beaver_2)
Export_beaver_list <- function(df_list, path, filetype) {
for (i in 1:length(df_list)){
Beaver_to_csv1.2(df_list[i],
path,
paste0("Beaver", i),
filetype)
}
}
Export_beaver_list(Beavers,
here::here("./exports/ "),
".csv")
plot(beaver_1$temp ~ beaver_1$time,
xlim = c(0, 2400),
ylim = c(36, 38),
xlab = "Time of Day",
ylab = "Beaver Temperature (C)",
col = "dark green",
pch = 2,
cex = .5)
One way to do so is to name relative (vs. absolute) axis limits. This plot will begin its x axis, which is plotting time, 0.5 units before the datapoint with the lowest time value. It will extend the x axis 0.5 units after the datapoint with the highest time unit. The same is true for the y axis’ limits for temperature.
plot(temp ~ time,
data = beaver_1,
xlim = c(min(time),
max(time)),
ylim = c(min(temp) - 0.5,
max(temp) + 0.5),
xlab = "Time of Day",
ylab = "Beaver Temperature (C)",
col = "dark green",
pch = 2, # point shape
cex = .5) # point size
Beaver_to_plot <- function(dataframe, path, filename, filetype) {
png(file = gsub(" ", "",
paste0(path, filename, filetype)))
plot(temp ~ time,
data = dataframe,
xlim = c(min(time),
max(time)),
ylim = c(min(temp) - 0.5,
max(temp) + 0.5),
xlab = "Time of Day",
ylab = "Beaver Temperature (C)",
col = "dark green",
pch = 2, # point shape
cex = .5) # point size
dev.off() # tells R to stop plotting
}
Beaver_to_plot(beaver_1,
here::here("./exports/ "),
"Beaver1.1",
".png")
## png
## 2
Beaver_to_plot_many <- function(df_list, path, filetype) {
for (i in 1:length(df_list)){
Beaver_to_plot(df_list[[i]],
path,
paste0("Beaver", i, "_plot2"),
filetype)
}
}
Beaver_to_plot_many(Beavers,
here::here("./exports/ "),
".png")
You don’t always need to specify all of the input variables for a function. Sometimes you can make use of a tool called an ellipsis. An ellipsis is, you guessed it, three dots in a row: “…”.
The most straightforward usage of an ellipsis is for when you want to make your function flexible enough to handle an unspecified number of input variables. For instance, think about the paste()
function.
The paste()
function can print any number of input variables:
paste("This", "phrase", "has", "five", "variables")
## [1] "This phrase has five variables"
paste("How", "about", "three?")
## [1] "How about three?"
paste("I", "could", "go", "on", "and", "on", "and", "on", "and", "on", "and", "on...")
## [1] "I could go on and on and on and on and on..."
## Create function
ellipsis_example <- function(x_var, y_var, ...){
plot(y_var ~ x_var,
...)
}
## Run function
ellipsis_example(x_var = beaver_1$time,
y_var = beaver_1$temp,
pch = beaver_1$activ,
col = "magenta",
main = "One Beaver's Daily Temperature",
sub = "circles = beaver engaged in high-intensity activity",
xlab = "Time of Day",
ylab = "Temperature (C)")
For those interested, here and here are some additional exercises to practice writing functions. Even if you don’t know which commands you would use, you can practice setting up the format of a function and say something like, “then I’d add these two things together” or “and then I would print this value”.