Learn how named vectors give R developers an easy to use key-value pairs
Whatโs the state abbreviation for Arkansas? Is it AR? AK? AS?
Maybe youโve got a data frame with the information. Or any info where thereโs one column with categories, and another column with values. Chances are, at some point youโd like to look up the value by category, sometimes known as the key. A lot of programming languages have ways to work with key-value pairs. This is easy to do in R, too, with named vectors. Hereโs how.
Iโve got data with state names and abbreviations, which Iโve stored in a data frame named postal_df. (The code to create that data frame is at the bottom of this post if youโd like to follow along).
Iโll run tail(postal_df) to see what that looks like.
State PostalCode
45 Vermont VT
46 Virginia VA
47 Washington WA
48 West Virginia WV
49 Wisconsin WI
50 Wyoming WY
A lookup table/named vector has values as the vector, and keys as the names. So let me first make a vector of the values, which are in the PostalCode column:
getpostalcode <- postal_df$PostalCode
And next I add names from the State column.
names(getpostalcode) <- postal_df$State
To use this named vector as a lookup table, the format is mylookupvector[โkeyโ].
So hereโs how to get the postal code for Arkansas:
getpostalcode['Arkansas']ย
If you want just the value, without the key, add the unname function to that value you get back:
unname(getpostalcode[โArkansasโ])
Update: You can also get just one value using the format getpostalcode[['Arkansas']] โ that is, double brackets instead of adding unname(). Thanks to Peter Harrison for the tip via Twitter. However, Hadley Wickham notes that the double-bracket format only works for one value. If you are doing something like creating a new column in a data frame, stick to unname().
Thatโs all there is to it. I know this is a somewhat trivial example, but it has some real-world use. For example, Iโve got a named vector of FIPS codes that I need when working with US Census data.
I started with a data frame of states and FIPS codes called fipsdf (the code for that is below). Next, I createdย a vector called getfips from the data frameโs fips code column and added the states as names.
fipsdf <- rio::import("data/FIPS.csv")
getfips <- fipsdf$FIPS
names(getfips) <- fipsdf$State
Now if I want the FIPS code for Massachusetts, I can use getfips['Massachusetts'] . I would add unname() to get just the value without the name: unname(getfips['Massachusetts']) .
If having to keep using unname() gets too annoying, you can even make a little function from your lookup table:
get_state_fips <- function(state, lookupvector = getfips){
fipscode <- unname(lookupvector[state])
return(fipscode)
}
Here, Iโve got two arguments to my function. One is my โkey,โ in this case the state name; the other is lookupvector, which defaults to my getfips vector.ย
And you can see how I use the function. Itโs just the function name with one argument, the state name: get_state_fips("New York") .
I can make a function that looks a bit more generic, such as
get_value <- function(mykey, mylookupvector){
myvalue <- mylookupvector[mykey]
myvalue <- unname(myvalue)
return(myvalue)
}
It has a more generic name for the function, get_value(); a more generic first argument name, mykey, and a second argument of mylookupvector that doesnโt default to anything.
Itโs the same thing Iโve been doing all along: getting the value from the lookup vector with lookupvector['key'] and then running the unname() function. But itโs all wrapped inside a function. So, calling it is a bit more elegant.
I can use that function with any named vector Iโve created. Here, Iโm using it with Arkansas and my getpostalcode vector:ย get_value("Arkansas", getpostalcode) .
Easy lookups in R! Just remember that names have to be unique. You can repeat values, but not keys.
I first saw this idea years ago in Hadley Wickhamโs Advanced R book. I still use it a lot and hope you find it helpful, too.
Code to create data frame with postal abbreviations
postal_df <- data.frame(stringsAsFactors=FALSE,
State = c("Alabama", "Alaska", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Delaware", "Florida", "Georgia",
"Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas",
"Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts",
"Michigan", "Minnesota", "Mississippi", "Missouri", "Montana",
"Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
"New York", "North Carolina", "North Dakota", "Ohio",
"Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina",
"South Dakota", "Tennessee", "Texas", "Utah", "Vermont",
"Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"),
PostalCode = c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD",
"TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY")
)
Code to create data frame with FIPS codes
fipsdf <- data.frame(State = c("Alabama", "Alaska", "Arizona", "Arkansas",
"California", "Colorado", "Connecticut", "Delaware", "Florida",
"Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa",
"Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts",
"Michigan", "Minnesota", "Mississippi", "Missouri", "Montana",
"Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
"New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma",
"Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota",
"Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington",
"West Virginia", "Wisconsin", "Wyoming"), FIPS = c("01", "02",
"04", "05", "06", "08", "09", "10", "12", "13", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28",
"29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39",
"40", "41", "42", "44", "45", "46", "47", "48", "49", "50", "51",
"53", "54", "55", "56"), stringsAsFactors = FALSE)


