See how to generate weekly and monthly reports in R including month-over-month, week-over-week, and year-over-year changes.
If you need to calculate changes such as last month versus the prior month or last month versus the same month a year earlier, R is a good choice. Itโs easy to do those calculations โ and you donโt have to worry whether a spreadsheet formula was properly clicked and dragged to cover all the necessary cells.
Like so many things in R, there are multiple ways to do this. Iโll cover two of them.
First, Iโll import some data about daily cycling trips on Bluebikes, the bicycle-share system in Boston, Cambridge, and three other nearby cities. If you want to follow along, download this zip file of CSV data and unzip it.
In the code below, I load a couple of packages and then read in the data from the daily_cycling_trips.csv file with the readr packageโs read_csv() function.
library(readr)
library(dplyr)
daily_trips <- readr::read_csv("daily_cycling_trips.csv")
In the next code block, Iโll add a new column to the data, called YearMonth, using base Rโs format() function to create a yyyy-mm format for each trip starting date. That is followed by typical dplyr group_by() and summarize()ย operations. In this case, Iโm grouping by YearMonth and then, within each group, creating a column called MonthlyTrips that has the sum of all of that monthโs trips. Finally, I make sure the data is arranged by YearMonth.
daily_trips <- daily_trips %>%
mutate(
YearMonth = format(TripStartDate, "%Y-%m")
)
# Usual dplyr group_by() and summarize() by month.
monthly_trips <- daily_trips %>%
group_by(YearMonth) %>%
summarize(
MonthlyTrips = sum(Trips)
) %>%
arrange(YearMonth)
Calculating monthly and yearly changes
Now that Iโve got monthly subtotals, I can calculate the month-over-month and year-over-year changes by using dplyrโs lag() function. lag() defaults to the previous value in the data frame column as it is currently ordered. You can change the number of lag items so itโs more than one item back. To compare with the previous month, a default of one is fine. To compare with the previous year, Iโll want the lag to be 12, for 12 items back. Note that this works only if there arenโt any missing months. You might want to add some code to check for missing data if youโre not sure of the dataโs integrity.
With lag(), you can set the column you want to order and lag by if your data frame isnโt ordered the way you need. (I donโt need to in this case, because my data is already ordered by YearMonth.)ย
monthly_report <- monthly_trips %>%
mutate(
MoM = (MonthlyTrips - lag(MonthlyTrips)) / lag(MonthlyTrips),
YoY = (MonthlyTrips - lag(MonthlyTrips, 12)) / lag(MonthlyTrips, 12)
)
If you look at the monthly_report object, the data is all there, although the format could be nicer.
Sharon Machlis, IDG
Monthly report data frame with month-over-month and year-over-year comparisons.
The code below multiplies the fractions by 100 to create a percent format, and then rounds to one decimal place.ย
monthly_report <- monthly_report %>%
mutate(
MoM = round(MoM * 100, 1),
YoY = round(YoY * 100, 1)
)
Thatโs the easiest example of monthly reporting โ only one data point per month. Sometimes, though, youโll have multiple categories you want to track each month, such as comparisons by city, or age group, or website. Itโs easy to tweak this code for that: You just group by your category in addition to the month before running a dplyr summarize(). Here is an example:
# Read in second data file
daily_trips_by_usertype <- readr::read_csv("daily_cycling_trips_by_usertype.csv")
# Add YearMonth column and get totals by month
monthly_trips_by_usertype <- daily_trips_by_usertype %>%
mutate(
YearMonth = format(TripStartDate, "%Y-%m")
) %>%
group_by(YearMonth, usertype) %>%
summarize(
MonthlyTrips = sum(Trips)
) %>%
arrange(YearMonth, usertype)
# Calculate MoM and YoY
monthly_report_by_usertype <- monthly_trips_by_usertype %>%
group_by(usertype) %>%
mutate(
MoM = (MonthlyTrips - lag(MonthlyTrips)) / lag(MonthlyTrips),
YoY = (MonthlyTrips - lag(MonthlyTrips, 12)) / lag(MonthlyTrips, 12)
)
In the code block above, Iโm doing the same thing as I did before, except Iโm comparing trips by month and user type. There are two types of users in this data set, Customer and Subscriber.ย If you run the code and then look at monthly_report_by_user_type, you will see month-over-month and year-over-year comparisons by both Subscriber and Customer user types.
This data also makes it easy to graph monthly percent changes with ggplot2.
Sharon Machlis, IDG
ggplot2 graph of month-over-month changes in trips by customer types at Bluebikes.
Instead of changing the data by multiplying by 100 and rounding, I used the scalesย package and scale_y_continuous() to do that for me in the ggplot2 graph code below:
library(ggplot2)
library(scales)
ggplot(monthly_report_by_usertype, aes(x=YearMonth, y=MoM, fill=usertype)) +
geom_col(position="dodge") +
scale_y_continuous(labels = scales::percent)
Note: If you just need โlast monthโ for a specific report and not comparisons for every month, use dplyrโs filter() function on your report data frame and set the YearMonth column to equal whatever the maximum value is for that data. For example:
filter(monthly_report_by_usertype, YearMonth == max(YearMonth))
Week-over-week calculations
Week over week is slightly different from monthly. For weekly, I generally use each weekโs starting date instead of a format like yyyy-ww. Thatโs because using numbers for weeks can get complicated when there are partial weeks at the start and end of a year.ย
In the code below, I use the lubridate package and its floor_date() function. You could use base Rโs cut.Date() function instead, but thatโs a bit more complicated since it returns factors instead of dates. (Withย cut.Date()ย I usually end up running extra code to turn the factors back to dates. floor_date() is one step for what I want.)
Next come the usual calculations, this time for week-over-week and year-over-year. Note that for year over year, the lag is 52 instead of 12. Itโs important here to be sure that Iโve got rows for every week, too.ย
# install.packages("lubridate")
library(lubridate)
weekly_trips <- daily_trips %>%
mutate(
WeekStarting = floor_date(TripStartDate, unit = "weeks")
) %>%
group_by(WeekStarting) %>%
summarize(
WeeklyTrips = sum(Trips)
) %>%
arrange(WeekStarting)
weekly_report <- weekly_trips %>%
mutate(
WoW = (WeeklyTrips - lag(WeeklyTrips)) / lag(WeeklyTrips),
YoY = (WeeklyTrips - lag(WeeklyTrips, 52)) / lag(WeeklyTrips, 52)
)
By now you may be thinking: OK, thatโs straightforward, but itโs a lot of typing for a simple weekly or monthly report. To streamline this process, make an RStudio code snippet! If you donโt know how they work, check out the โDo More With Rโ episode on code snippets.ย
Hereโs the snippet I made for monthly reports. Remember that every indented line needs to be indented with a tab, not just spaces, and the snippet needs to live in your RStudio snippet file, which you can access withย usethis::edit_rstudio_snippets().ย
snippet my_monthly_reports
monthly_report <- ${1:mydf} %>%
mutate(
YearMonth = format(${2:MyDateColumn}, "%Y-%m")
) %>%
group_by(YearMonth, ${3:MyCategory}) %>%
summarize(
MonthlyTotal = sum(${4:MyValueColumn})
) %>%
arrange(YearMonth, ${3:MyCategory}) %>%
ungroup() %>%
group_by(${3:MyCategory}) %>%
mutate(
MoM = (MonthlyTotal - lag(MonthlyTotal)) / lag(MonthlyTotal),
YoY = (MonthlyTotal - lag(MonthlyTotal, 12)) / lag(MonthlyTotal, 12)
)
You can see how the snippet works in the video embedded at the top of this article.
Once youโve set up a code snippet, doing month-over-month comparisons may even be faster in R than in Excel.
For more R tips, head to theย โDo More With Rโ playlist on YouTube.


