See how the purrr packageโs possibly() function helps you flag errors and keep going when applying a function over multiple objects in R.
Itโs frustrating to see your code choke part of the way through while trying to apply a function in R. You may know that something in one of those objects caused a problem, but how do you track down the offender?
The purrr packageโs possibly() function is one easy way.
In this example, Iโll demo code that imports multiple CSV files. Mostย filesโ value columns import as characters, but one of these comes in as numbers. Running a function that expects characters as input will cause an error.
For setup, the code below loads several libraries I need and then uses base Rโs list.files() function to return a sorted vector with names of all the files in my data directory.ย
library(purrr)
library(readr)
library(rio)
library(dplyr)
my_data_files <- list.files("data_files", full.names = TRUE) %>%
sort()
I can then import the first file and look at its structure.ย
x <- rio::import("data_files/file1.csv")
str(x)
'data.frame': 3 obs. of 3 variables:
$ Category : chr "A" "B" "C"
$ Value : chr "$4,256.48 " "$438.22" "$945.12"
$ MonthStarting: chr "12/1/20" "12/1/20" "12/1/20"
Both the Value and Month columns are importing as character strings. What I ultimately want is Value as numbers and MonthStarting as dates.ย
I sometimes deal with issues like this by writing a small function, such as the one below, to make changes in a file after import. It uses dplyrโs transmute() to create a new Month column from MonthStarting as Date objects, and a new Total column from Value as numbers. I also make sure to keep the Category column (transmute() drops all columns not explicity mentioned).
library(dplyr)
library(lubridate)
process_file <- function(myfile) {
rio::import(myfile) %>%
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Total = readr::parse_number(Value)
)
}
I like to use readrโs parse_number() function for converting values that come in as character strings because it deals with commas, dollar signs, or percent signs in numbers. However, parse_number() requires character strings as input. If a value is already a number, parse_number() will throw an error.
My new function works fine when I test it on the first two files in my data directory using purrrโs map_df() function.
my_results <- map_df(my_data_files[1:2], process_file)
But if I try running my function on all the files, including the one where Value imports as numbers, it will choke.
all_results <- map_df(my_data_files, process_file)
Error: Problem with `mutate()` input `Total`.
x is.character(x) is not TRUE
โน Input `Total` is `readr::parse_number(Value)`.
Run `rlang::last_error()` to see where the error occurred.
That error tells me Total is not a character column in one of the files, but Iโm not sure which one. Ideally, Iโd like to run through all the files, marking the one(s) with problems as errors but still processing all of them instead of stopping at the error.
possibly() lets me do this by creating a brand new function from my original function:
safer_process_file <- possibly(process_file, otherwise = "Error in file")
The first argument for possibly() is my original function, process_file. The second argument, otherwise, tells possibly() what to return if thereโs an error.
To apply my new safer_process_file() function to all my files, Iโll use the map() function and not purrrโs map_df() function. Thatโs because safer_process_file() needs to return a list, not a data frame. And thatโs because if thereโs an error, those error results wonโt be a data frame; theyโll be the character string that I told otherwise to generate.ย
all_results <- map(my_data_files, safer_process_file)
str(all_results, max.level = 1)
List of 5
$ :'data.frame': 3 obs. of 3 variables:
$ :'data.frame': 3 obs. of 3 variables:
$ :'data.frame': 3 obs. of 3 variables:
$ : chr "Error in file"
$ :'data.frame': 3 obs. of 3 variables:
You can see here that the fourth item, from my fourth file, is the one with the error. Thatโs easy to see with only five items, but wouldnโt be quite so easy if I had a thousand files to import and three had errors.
If I name the list with my original file names, itโs easier to identify the problem file:
names(all_results) <- my_data_files
str(all_results, max.level = 1)
List of 5
$ data_files/file1.csv:'data.frame': 3 obs. of 3 variables:
$ data_files/file2.csv:'data.frame': 3 obs. of 3 variables:
$ data_files/file3.csv:'data.frame': 3 obs. of 3 variables:
$ data_files/file4.csv: chr "Error in file"
$ data_files/file5.csv:'data.frame': 3 obs. of 3 variables:
I can even save the results of str()ย to a text file for further examination.ย
str(all_results, max.level = 1) %>%
capture.output(file = "results.txt")
Now that I know file4.csv is the problem, I can import just that one and confirm what the issue is.ย
x4 <- rio::import(my_data_files[4])
str(x4)
'data.frame': 3 obs. of 3 variables:
$ Category : chr "A" "B" "C"
$ Value : num 3738 723 5494
$ MonthStarting: chr "9/1/20" "9/1/20" "9/1/20"
Ah, Value is indeed coming in as numeric. Iโll revise my process_file() function to account for the possibility that Value isnโt a character string with an ifelse() check:
process_file2 <- function(myfile) {
rio::import(myfile) %>%
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Total = ifelse(is.character(Value), readr::parse_number(Value), Value)
)
}
Now if I use purrrโs map_df() with my new process_file2() function, it should work and give me a single data frame.ย
all_results2 <- map_df(my_data_files, process_file2)
str(all_results2)
'data.frame': 15 obs. of 3 variables:
$ Category: chr "A" "B" "C" "A" ...
$ Month : Date, format: "2020-12-01" "2020-12-01" "2020-12-01" ...
$ Total : num 4256 4256 4256 3156 3156 ...
Thatโs just the data and format I wanted, thanks to wrapping my original function in possibly() to create a new, error-handling function.
For more R tips, head to theย โDo More With Rโ page on InfoWorldย or check out theย โDo More With Rโ YouTube playlist.


