Sure, you could use one of the commercial or open-source RSS readers. But isnโt it more fun to code your own?
RSS feeds have been around since the late โ90s, and they remain a handy way to keep up with multiple news sources. Choose your feeds wisely, and your RSS reader will let you easily scan headlines from multiple sources and stay up to date on fast-moving topics. And while there are several capable commercial and open-source RSS readers available, itโs a lot more satisfying to code your own.
Itโs surprisingly easy to create your own RSS feed reader in R. Just follow these eight steps.
Create a Quarto document or R script file
You can use a plain R script, but Quarto adds some useful, out-of-the-box styling. Quarto also gives you easier access to using JavaScript for the final display if you so choose. But the tutorial code works fine in an R file, too.
Unlike an R script, though, my Quarto document needs a YAML header to start. Iโll add a few settings in the YAML to generate a single HTML file (embed-resources: true), and not display my code (echo: false) or any code messages or warnings:
---
title: "Sharon's RSS Feed"
format:
html
embed-resources: true
editor: source
execute:
echo: false
warning: false
message: false
---
Load needed packages
Next, Iโll add some R code inside an R code block (```{r} and ``` enclose a block of executable code in Quarto; you donโt need those if youโre using a plain R script) and load the packages Iโll need. As you might guess from its name, tidyRSS is a library for reading RSS feeds into R.
``{r}
library(tidyRSS)
library(dplyr)
library(DT)
library(purrr)
library(stringr)
library(lubridate)
```
Add RSS feeds
Selecting relevant feeds is a key part of a useful RSS reader experience. I find mine based on sources I like and then checking websites or searching to see if RSS feeds exist. (As an optional exercise, you can use theย rvest package to read sitemaps and wrangle them into RSS format, but thatโs beyond the scope of this tutorial. Maybe in a future article!)
You may want to store your feeds in a separate CSV or Excel file and have your app import them. This way, you donโt have to touch the app code each time you update your feed list. For the sake of demo simplicity here, though, Iโll create a data frame in my script file with the feeds I want and my titles for each.
Since I write for both InfoWorld and Computerworld Iโll add both of those feeds. In addition, Iโll pull in a few R-specific RSS feeds, including R-Bloggers, R Weekly, and Mastodonโs #rstats and #QuartoPub RSS feeds at fosstodon.org, the Mastodon instance I use. In the code below, I save the feed info to a data frame call myfeeds with both feed URLs and my desired title for each feed. I then arrange them by feed title:
```{r}
myfeeds <- data.frame(feed_title = c("All InfoWorld",
"All Computerworld",
"Mastodon rstats",
"Mastodon QuartoPub",
"R Bloggers",
"R Weekly"),
feed_url = c("https://www.infoworld.com/index.rss",
"https://www.computerworld.com/index.rss",
"http://fosstodon.org/tags/rstats.rss",
"http://fosstodon.org/tags/QuartoPub.rss",
"https://feeds.feedburner.com/Rbloggers",
"https://rweekly.org/atom.xml")
) |>
arrange(feed_title)
```
Note: From here on, I wonโt be including the ```{r} ```ย Quarto code โfencesโ around the R code. All the rest of the R code still needs to be โfencedโ in a Quarto doc, though.
Get all the feeds into the same format
This is the most manual part of the process. Ideally, all feeds would be structured exactly the same way, be in the format I want, and never have missing data. In the real world, of course, RSS data can be as messy as any other data set. So, I want to check my feeds and see if/how they need to be cleaned.
In addition, I want to be able to import atom feeds like R-Bloggersโ as well as RSS feeds, which means I need to account for those.
To keep things simple, my reader will only display title, item date/time updated, item description, and a way to click to the original (the URL) item.
Iโll start by importing each of my feeds into R using tidyRSS, but as a list, one list entry for each feed, and then examine each to see what problems may arise.
feed_test <- map(myfeeds$feed_url, tidyfeed)
Iโm not including that code above in my final RSS reader file; itโs for development only.
Create a feed wrangling function
My wrangling function starts simply enough:
wrangle_feed <- function(the_feed_url, the_feed_dataframe = myfeeds) {
my_feed_data <- tidyRSS::tidyfeed(the_feed_url)
return(my_feed_data)
}
Iโd like the feed title to be what I call it in my spreadsheet, not what the feed creator titled it. So, Iโll use my feed data frame to look up the title and replace the existing feed title with this code:
my_feed_data$feed_title <- the_feed_dataframe$feed_title[the_feed_dataframe$feed_url == the_feed_url][1]
I want to select item_title, item_date, item_description, and item_link. But if itโs an atom feed, those will be called something different: entry_title, entry_last_updated, entry_content, and entry_url. Before I select the columns I want, Iโll check if itโs an atom feed and, if so, rename the atom columns with
if("entry_url" %in% names(my_feed_data)) {
my_feed_data <- my_feed_data |>
rename(item_title = entry_title, item_pub_date = entry_last_updated, item_link = entry_url, item_description = entry_content)
}
Mastodon RSS feeds donโt have titles for the posts. I could add the same default title to each post, such as a generic โMastodon Post,โ but Iโd prefer a title like โMastodon Post by {username}.โ Most Mastodon post URLs include the author handle starting with @, although occasionally one wonโt. I can extract the user name from the mastodon URL and add a custom title with the code below, defaulting to โMastodon Postโ if there is no obvious author in the link.
if(str_detect(my_feed_data$feed_title[1], "Mastodon")) {
my_feed_data <- my_feed_data |>
mutate(
item_author = str_replace_all(item_link, "^.*?/(@.*?)/.*?$", "1"),
item_title = if_else(str_detect(item_author, "@"), paste0("Mastodon Post by ", item_author), "Mastodon Post")
)
}
Itโs easy for me to find all the Mastodon feeds because I included โMastodonโ in those feed titles.
The str_replace_all()ย code uses a regular expression to find the author in the URL. The pattern "^.*?/(@.*?)/.*?$" will drop everything from the start of the string to the / before an @, keep everything from the @ until just before the next /, and then drop everything else.
Next Iโll do some additional data wrangling, including selecting and renaming the columns I want and making each item clickable back to the original source.
The code below selects and renames columns and also creates a clickable headline column.
my_feed_data <- my_feed_data |>
select(Headline = item_title, Date = item_pub_date, URL = item_link,
Description = item_description, Feed = feed_title) |>
mutate(
Headline = str_glue("<a target='_blank' title='{Headline}' href='https://www.infoworld.com/{URL}' rel="noopener">{Headline}</a>")
)
Many people like clickable headlines. However, I prefer a clickable >> at the end of the description instead of a clickable headline. The code below is one way to do that.
my_feed_data <- my_feed_data |>
select(Headline = item_title, Date = item_pub_date, URL = item_link, Description = item_description, Feed = feed_title) |>
mutate(
Description = str_glue("{Description}, <a target='_blank' href='https://www.infoworld.com/{URL}' rel="noopener"> >></a>"),
)
Add some optional data tweaks
The code so far is enough to generate data for a basic feed reader, but the app will look better with some optional tweaks.
For example, the R Bloggers atom feed includes full blog content, but I donโt want to download full content into my RSS reader because that makes quick scanning more difficult. Other descriptions may be longer than Iโd like as well.
Below is a function that trims the description afterย max_chars number of charactersโbut at the nearest complete word, so as not to cut off in the middle of a word. It then adds ellipses. The function first checks that thereโs a description at all, so the code wonโt break if the description is missing.
trim_if_too_long <- function(item_description, max_chars = 600) {
if(!is.na(item_description)) {
if(nchar(item_description) > max_chars) {
item_description <- stringr::str_sub(item_description, 1, max_chars)
item_description <- str_replace_all(item_description, "s[^s]+$", ". . . ")
}
return(item_description)
} else {
return("")
}
}
The function only makes changes if the item description is greater than max_chars (currently defaulting to 600). If the description is in fact longer, the first line of code trims the text to max_chars length. The second line uses a regular expression to replace anything thatโs a space followed by one or more characters that arenโt spaces at the end of the text string with an ellipsis. In other words, the regex removes any incomplete words at the end of the description and then adds three dots.
If you want to use this function in your RSS reader, make sure to place it above the wrangle_feed function definition in your Quarto doc or R script.
To apply the function to each feedโs description, Iโll use purrrโs map_chr() function:
map_chr(Description, trim_if_too_long)
and add that to my feed wrangling before I add my clickable >> arrows:
my_feed_data <- my_feed_data |>
select(Headline = item_title, Date = item_pub_date, URL = item_link, Description = item_description, Feed = feed_title) |>
mutate(
Description = purrr::map_chr(Description, trim_if_too_long),
Description = str_glue("{Description}, <a target='_blank' href='https://www.infoworld.com/{URL}' rel="noopener"> >></a>"),
)
A few of the feeds Iโve chosen include โTo read this article in full, please click hereโ text at the end, but thatโs not clickable. Itโs easy to remove text like that with str_replace_all().
If you want to use this code in your app, make sure to add that code before you do any other description wrangling:
my_feed_data <- my_feed_data |>
select(Headline = item_title, Date = item_pub_date, URL = item_link, Description = item_description, Feed = feed_title) |>
mutate(
Description = str_remove_all(Description, "To read this article in full, please click here"),
Description = purrr::map_chr(Description, trim_if_too_long),
Description = str_glue("{Description}, <a target='_blank' href='https://www.infoworld.com/{URL}' rel="noopener"> >></a>")
)
One more small nit: I donโt like my date/time displaying like 2022-11-16T08:00:00Z. The lubridate packageโs format_ISO8601() function makes it easy to set the desired precisionโin this case, I want ymdhm but not seconds. After that, Iโll replace the โTโ with a space so the date column will appear in a format such as 2022-12-21 18:44.
Date = format_ISO8601(Date, precision = "ymdhm"),
Date = str_replace_all(Date, "T", " ")
Below is my full wrangle_feed() function (not showing the separate trim_if_too_long() function above it).
wrangle_feed <- function(the_feed_url, the_feed_dataframe = myfeeds) {
my_feed_data <- tidyfeed(the_feed_url)
my_feed_data$feed_title <- the_feed_dataframe$feed_title[the_feed_dataframe$feed_url == the_feed_url][1]
if("entry_url" %in% names(my_feed_data)) {
my_feed_data <- my_feed_data |>
rename(item_title = entry_title, item_pub_date = entry_last_updated, item_link = entry_url, item_description = entry_content)
}
if(str_detect(my_feed_data$feed_title[1], "Mastodon")) {
my_feed_data <- my_feed_data |>
mutate(
item_author = str_replace_all(item_link, "^.*?/(@.*?)/.*?$", "1"),
item_title = if_else(str_detect(item_author, "@"), paste0("Mastodon Post by ", item_author), "Mastodon Post")
)
}
my_feed_data <- my_feed_data |>
select(Headline = item_title, Date = item_pub_date, URL = item_link, Description = item_description, Feed = feed_title) |>
mutate(
Description = str_remove_all(Description, "To read this article in full, please click here"),
Description = purrr::map_chr(Description, trim_if_too_long),
Description = str_glue("{Description}, <a target='_blank' href='https://www.infoworld.com/{URL}' rel="noopener"> >></a>"),
Date = format_ISO8601(Date, precision = "ymdhm"),
Date = str_replace_all(Date, "T", " ")
)
return(my_feed_data)
}
Handle a missing or broken feed
I want to make sure this code doesnโt blow up and stop on a single error if one of the feeds is unavailable. I can do that by making a โsafe,โ error-handling version of the function with purrrโs possibly():
wrangle_feed_safely <- possibly(wrangle_feed, otherwise = NULL)
The wrangle_feed_safely() version of the function returns NULL if thereโs an error instead of stopping. Now I can run the function on all my feed URLs and get a single data frame returned with purrrโs map_df(). The code below also arranges results by descending date so the newest entries appear first, regardless of source.
mydata <- map_df(myfeeds$feed_url, wrangle_feed_safely) |>
arrange(desc(Date))
Display the results
The hard part is done, we have our data! Now itโs time to display the results.
Iโll make a copy of the data frame without the URL field for use in my display table, since I donโt want to show the URL field (Iโve got the clickable >> in my description). I wouldnโt make a copy for a huge data set, but this is small, and itโs a bit of a backup in case I decide later on that I still want the URL field.
mytabledata <- select(mydata, -URL)
One of the easiest ways to display this data is with a table. Iโll use the DT package because I like its ability to use regular-expression searching. Regex searching is especially handy when searching for something like โR,โ because a regex lets you search for patterns such as R as a separate word and not just R that might be starting any capitalized word.
In my Quarto document, Iโll enclose the table code chunk in a โcolumn-pageโ CSS style class with :::{.column-page}ย at the top and :::ย at the end, as you can see in the code below. That tells my Quarto document to make the table wider than usualโa full page width. column-page is a built-in CSS style that increases the content width. But you donโt have to know how to code HTML and CSS in order to make this modification.
If this option still isnโt wide enough (sometimes the table still scrolls because of, say, a ridiculously long URL in a post that wonโt line break), you can use {.column-screen} instead of {.column-page} to remove the page margins altogether.
The code below also makes some tweaks to the default DT datatable. filter = 'top' adds search filters above each column. escape = FALSE displays HTML as HTML instead of showing the underlying code. I add regex=TRUE and caseInsensitive=TRUE and ignore-case searching to the search options. I also tweak the page length and page length menu options, and set my third column (Description) to be 80% of the table width. (If youโre wondering why the target column is 2 when I want the third column, itโs because DT is a wrapper for a JavaScript library, and the underlying library uses the JS convention of starting to count at 0).
:::{.column-page}
```{r}
DT::datatable(mytabledata, filter = 'top', escape = FALSE, rownames = FALSE,
options = list(
search = list(regex = TRUE, caseInsensitive = TRUE),
pageLength = 25,
lengthMenu = c(25, 50, 100, 200),
autowidth = TRUE,
columnDefs = list(list(width = '80%', targets = list(2)))
)
)
```
:::
Sharon Machlis
Example of the RSS feed reader table, searching for JavaScript entries.
Thanks to regex searching, you can search for R as a separate word with the regular expression bRb. The b indicates a โword boundaryโ such as a space, punctuation mark, or beginning or end of a line.
And there you have it, a simple RSS reader! There are more modications you could make, including caching results and further tweaking the display. For example, adding
Available feeds: `r knitr::combine_words(sort(unique(mydata$Feed)))`
to the Quarto document after parsing the RSS feeds will show a list of all the available feeds.ย
For more on Quarto and how you might use JavaScript with R in a Quarto document, see โA beginnerโs guide to using Observable JavaScript, R, and Python with Quarto.โ And for more R tips, head to InfoWorldโsย Do More With R page.


