Code will hardly ever work exactly as you want the first try. Especially early on, coding is an exercise in incremental improvements. Debugging, or identifying and removing “buggy” code that doesn’t work as intended, is the skill that lets us identify what is wrong so we can make those improvements.
Today’s worksheet is presented as a series of puzzles. Each puzzle will be a function that has something wrong with it. I will provide an input, and the desired output. Your task is to use the debugging tools we learned to figure out what is wrong with the function, and correct it. I will walk through an example first so you get the idea.
We will be using two main tools for debugging, debugonce()
and browser()
. Each accomplishes the same thing is slightly different ways. These functions let you pause the execution of code mid-way inside a function, and see what is going inside our mini-R universes. This is very helpful, as opposed to just running code in your global environment, you can’t normally run code inside a function line-by-line to see what is happening to the data at each step. debugonce()
and browser()
let you do that. This is also very helpful while building new functions.
debugonce()
accepts a function name, and the next time you run a function, it will drop you into the mini-universe of that function for you to look around. You can tell it worked because your console will change slightly.
The figure above shows what the browser window will look like. While in the browser, you can execute R code like normal, but there are a few differences.
>
prompt in the R console, you will see Browse[#]>
indicating you are in the browser. It still works mostly like the normal console, with a few extra commands. a) You can press Enter
or enter the letter n
to go to the next line of code. b) You can enter c
to continue to the end of the function c) You can enter q
to quit and leave the browsern
is highlighted.We can also use the browser()
function to call the browser at a specific spot within a function. Simply add the browser()
function anywhere inside a function you are writing and define the function again by executing it. Now, whenever you run that function, the browser will open wherever you added browser()
. You will have to remove it from your function once you finish debugging.
Here is an example function that needs some debugging. This one is relatively short, and you may be able to figure out the problem without debugging. This will not always be the case, as functions will routinely extend for a dozen or several dozen lines with multiple other function inside of them creating nested mini-universes. The debugging process will always be the same though: figure out what function the problem is in, then go inside and follow the process step-by-step.
This function is meant to accept a numeric vector, and then output the mean, median, and mode. Instead, it results in the error shown below.
example_vector = c(1, 2, 6, 8, 4, 2, 8, 2, 7, 10, 33)
example_function = function(num_vec) {
# get the mean
vec_mean = mean(num_vec)
# get the median
vec_median = median(num_vec)
# get the mode
vec_mode = mode(num_vec)
# create named vector for output
output = c("mean" = vec_mean, "median" = vec_median, "mode" = vec_mode)
# make sure all results are numeric
if(!all(is.numeric(output))){stop("Not all values are numeric!")}
# return results
return(output)
}
example_function(example_vector)
Error in example_function(example_vector): Not all values are numeric!
How would we go about fixing this? We only have one function, so we know where things must be going wrong. We’ll use debugonce()
to get a peak inside. Copy the above function code into your console and execute it to add the function to your environment. Run example_function()
on example_vector
to make sure you are getting the same output as we did here.
Once you have done that, run debugonce(example_function)
, then run example_function(example_vector)
again. You will be dropped into the browser, looking around inside example_function()
. Step through the code execution one line at a time by pressing the Enter key. Watch the environment pane each step of the way and see if you can catch where the error will happen. Once you get in the spot in the function the error occurs, it will boot you out of the browser back to the global environment.
As you step through the function you should notice the at line vec_mode = mode(num_vec)
produces an output of "numeric"
, which would be causing our error in the next line, if(!all(is.numeric(output))){stop("Not all values are numeric!")}
. That code asking, if all output is not (because !
) numeric, then run stop()
.
We could use browser()
to check that section more quickly using the following:
example_vector = c(1, 2, 6, 8, 4, 2, 8, 2, 7, 10, 33)
example_function = function(num_vec) {
# get the mean
vec_mean = mean(num_vec)
# get the median
vec_median = median(num_vec)
# -------------------------------------------------------------Browser will stop execution here.
browser()
# get the mode
vec_mode = mode(num_vec)
# create named vector for output
output = c("mean" = vec_mean, "median" = vec_median, "mode" = vec_mode)
# make sure all results are numeric
if(!all(is.numeric(output))){stop("Not all values are numeric!")}
# return results
return(output)
}
example_function(example_vector)
If you re-define our example_function()
using the code above then try to use it, it will always stop at the browser()
function to let us look around. Try it out yourself!
The following function will intake a vector of character names, and output a dataframe with 6 columns. The function will have each character flip a coin. If they get a heads, they can flip again, up to a max of three. If a character flips heads three times, the lucky
column should be set to TRUE
. Run the following several times. Every so often, a character will appear where they did not flip heads all three times, but get a TRUE
in the lucky column. Fix this error.
char_vec = c("Spike Spiegal", "Doreamon", "Sherlock Holmes", "Tiana", "Crush", "Thor", "Rhys", "Buffy", "Sasha Braus", "Catra", "Pikachu", "My Melody", "Claire Fraser", "Shinchan", "Kakashi")
puzzle_1 = function(characters) {
# sort chars by alphabetical order
sorted_char = sort(char_vec)
# get the first letter of each name
char_letters = substr(x = sorted_char, start = 1, stop = 1)
# create a dataframe of character and their initial
char_df = data.frame("char_name" = sorted_char, "char_initial" = char_letters)
# randomly flip a count for each char
char_df$toss_1 = sample(x = c("heads", "tails"),
size = nrow(char_df),
replace = TRUE)
# for each that got heads, flip again, those with tails are out
char_df$toss_2 = ifelse(char_df$toss_1 == "heads",
sample(x = c("heads", "tails"),
size = nrow(char_df),
replace = TRUE),
NA)
# do it again
char_df$toss_3 = ifelse(char_df$toss_1 == "heads",
sample(x = c("heads", "tails"),
size = nrow(char_df),
replace = TRUE),
NA)
# add TRUE / FALSE for those with 3 heads
## set to TRUE if the 3rd toss is heads
## (as the other two had to be heads to toss a third time)
char_df$lucky = ifelse(char_df$toss_3 == "heads", TRUE, FALSE)
## fill NAs with FALSE
char_df[is.na(char_df$lucky), "lucky"] = FALSE
# return results
return(char_df)
}
puzzle_1(char_vec)
Fix the line:
# do it again
char_df$toss_3 = ifelse(char_df$toss_1 == 'heads',
sample(x = c('heads', 'tails'), size = nrow(char_df), replace = TRUE),
NA)
So that it looks at toss_2 rather than toss_1.
The following function will input our survey dataframe, and is meant to output the number of times people responded TRUE
to a question. The correct output is 33, however it is currently outputting 110. Debug this function to fix the issue.
# load data
survey = read.csv("https://raw.githubusercontent.com/Intro-to-Data-Science-Template/intro_to_data_science_reader/main/content/class_worksheets/4_r_rstudio/data/survey_data.csv")
puzzle_2 = function(survey_dataframe) {
# pivot the survey data from wide to long
survey_long = tidyr::pivot_longer(survey_dataframe, cols = -fav_char, values_transform = as.character)
# get all the questions people answered TRUE
all_true = survey_long[survey_long$value == TRUE, ]
# count the number of rows (the number of questions with answers of TRUE)
num_true = nrow(all_true)
# return that number
return(num_true)
}
puzzle_2(survey)
When we subset using:
all_true = survey_long[survey_long$value == TRUE, ]
It also includes all NAs. You can either account for the NAs like this:
all_true = survey_long[survey_long$value == TRUE & !is.na(survey_long$value), ]
Or subset the dataframe again like:
all_true = survey_long[survey_long$value == TRUE, ]
all_true = all_true[!is.na(all_true$value), ]