by Sharon Machlis

Contributing Writer

Easy error handling in R with purrr’s possibly

how-to

Dec 17, 20206 mins

See how the purrr package’s possibly() function helps you flag errors and keep going when applying a function over multiple objects in R.

Do More With R [video teaser/video series] - R Programming Guide - Tips & Tricks

It’s frustrating to see your code choke part of the way through while trying to apply a function in R. You may know that something in one of those objects caused a problem, but how do you track down the offender?

The purrr package’s possibly() function is one easy way.

In this example, I’ll demo code that imports multiple CSV files. Most files’ value columns import as characters, but one of these comes in as numbers. Running a function that expects characters as input will cause an error.

For setup, the code below loads several libraries I need and then uses base R’s list.files() function to return a sorted vector with names of all the files in my data directory.

library(purrr)
library(readr)
library(rio)
library(dplyr)
my_data_files <- list.files("data_files", full.names = TRUE) %>%
  sort()

I can then import the first file and look at its structure.

x <- rio::import("data_files/file1.csv")
str(x)
'data.frame':	3 obs. of  3 variables:
 $ Category     : chr  "A" "B" "C"
 $ Value        : chr  "$4,256.48 " "$438.22" "$945.12"
 $ MonthStarting: chr  "12/1/20" "12/1/20" "12/1/20"

Both the Value and Month columns are importing as character strings. What I ultimately want is Value as numbers and MonthStarting as dates.

I sometimes deal with issues like this by writing a small function, such as the one below, to make changes in a file after import. It uses dplyr’s transmute() to create a new Month column from MonthStarting as Date objects, and a new Total column from Value as numbers. I also make sure to keep the Category column (transmute() drops all columns not explicity mentioned).

library(dplyr)
library(lubridate)
process_file <- function(myfile) {
  rio::import(myfile) %>%
    dplyr::transmute(
      Category = as.character(Category),
      Month = lubridate::mdy(MonthStarting),
      Total = readr::parse_number(Value)
    )
}

I like to use readr’s parse_number() function for converting values that come in as character strings because it deals with commas, dollar signs, or percent signs in numbers. However, parse_number() requires character strings as input. If a value is already a number, parse_number() will throw an error.

My new function works fine when I test it on the first two files in my data directory using purrr’s map_df() function.

my_results <- map_df(my_data_files[1:2], process_file)

But if I try running my function on all the files, including the one where Value imports as numbers, it will choke.

all_results <- map_df(my_data_files, process_file)
 Error: Problem with `mutate()` input `Total`.
x is.character(x) is not TRUE
ℹ Input `Total` is `readr::parse_number(Value)`.
Run `rlang::last_error()` to see where the error occurred.

That error tells me Total is not a character column in one of the files, but I’m not sure which one. Ideally, I’d like to run through all the files, marking the one(s) with problems as errors but still processing all of them instead of stopping at the error.

possibly() lets me do this by creating a brand new function from my original function:

safer_process_file <- possibly(process_file, otherwise = "Error in file")

The first argument for possibly() is my original function, process_file. The second argument, otherwise, tells possibly() what to return if there’s an error.

To apply my new safer_process_file() function to all my files, I’ll use the map() function and not purrr’s map_df() function. That’s because safer_process_file() needs to return a list, not a data frame. And that’s because if there’s an error, those error results won’t be a data frame; they’ll be the character string that I told otherwise to generate.

all_results <- map(my_data_files, safer_process_file)
str(all_results, max.level = 1) 
List of 5
 $ :'data.frame':	3 obs. of  3 variables:
 $ :'data.frame':	3 obs. of  3 variables:
 $ :'data.frame':	3 obs. of  3 variables:
 $ : chr "Error in file"
 $ :'data.frame':	3 obs. of  3 variables:

You can see here that the fourth item, from my fourth file, is the one with the error. That’s easy to see with only five items, but wouldn’t be quite so easy if I had a thousand files to import and three had errors.

If I name the list with my original file names, it’s easier to identify the problem file:

names(all_results) <- my_data_files
str(all_results, max.level = 1) 
List of 5
 $ data_files/file1.csv:'data.frame':	3 obs. of  3 variables:
 $ data_files/file2.csv:'data.frame':	3 obs. of  3 variables:
 $ data_files/file3.csv:'data.frame':	3 obs. of  3 variables:
 $ data_files/file4.csv: chr "Error in file"
 $ data_files/file5.csv:'data.frame':	3 obs. of  3 variables:

I can even save the results of str() to a text file for further examination.

str(all_results, max.level = 1) %>%
  capture.output(file = "results.txt")

Now that I know file4.csv is the problem, I can import just that one and confirm what the issue is.

x4 <- rio::import(my_data_files[4])
str(x4)
'data.frame':	3 obs. of  3 variables:
 $ Category     : chr  "A" "B" "C"
 $ Value        : num  3738 723 5494
 $ MonthStarting: chr  "9/1/20" "9/1/20" "9/1/20"

Ah, Value is indeed coming in as numeric. I’ll revise my process_file() function to account for the possibility that Value isn’t a character string with an ifelse() check:

process_file2 <- function(myfile) {
  rio::import(myfile) %>%
    dplyr::transmute(
      Category = as.character(Category),
      Month = lubridate::mdy(MonthStarting),
      Total = ifelse(is.character(Value), readr::parse_number(Value), Value)
    )
}

Now if I use purrr’s map_df() with my new process_file2() function, it should work and give me a single data frame.

all_results2 <- map_df(my_data_files, process_file2)
str(all_results2)
'data.frame':	15 obs. of  3 variables:
 $ Category: chr  "A" "B" "C" "A" ...
 $ Month   : Date, format: "2020-12-01" "2020-12-01" "2020-12-01" ...
 $ Total   : num  4256 4256 4256 3156 3156 ...

That’s just the data and format I wanted, thanks to wrapping my original function in possibly() to create a new, error-handling function.

For more R tips, head to the “Do More With R” page on InfoWorld or check out the “Do More With R” YouTube playlist.

by Sharon Machlis

Contributing Writer

Follow Sharon Machlis on LinkedIn

Sharon Machlis was a longtime writer and editor at Computerworld and later the Director of Editorial Data & Analytics at parent company Foundry. She is also the author of Practical R for Mass Communications and Journalism.

Sharon's Do more with R video tutorials won a Jesse H. Neal award for Best Instructional Content.

Recently retired, Sharon is still passionate about R and generative AI, and also blogs about the retirement life. You can find her on Bluesky at @smachlis.bsky.social, Mastodon at @smach@masto.machlis.com, and LinkedIn.

Show me more

Topics

About

Policies

Our Network

More

Easy error handling in R with purrr’s possibly

See how the purrr package’s possibly() function helps you flag errors and keep going when applying a function over multiple objects in R.

More from this author

How to create your own RAG applications in R

GenAI tools for R: New tools to make R programming easier

3 of the best LLM integration tools for R

FAQ: Getting started with Bluesky

Create searchable Bluesky bookmarks with R

Shiny for Python adds chat component for generative AI chatbots

Maker of RStudio launches new R and Python IDE

5 easy ways to run an LLM locally

Show me more

Databricks adds Data Science Agent to automate analytics tasks

Rust Innovation Lab launched, sponsors first project

PostgreSQL 18 to boost OLTP performance, but misses AI readiness

Getting encryption wrong (and getting it right, too)

How to build a native desktop app vs. a web UI app

PyApp: Build click-to-run Python apps with Rust