Sharon Machlis
Contributing Writer

How to calculate month-over-month changes in R

how-to
Jun 26, 20197 mins
AnalyticsR Language

See how to generate weekly and monthly reports in R including month-over-month, week-over-week, and year-over-year changes.

Do More With R [video teaser/video series] - R Programming Guide - Tips & Tricks
Credit: Thinkstock

If you need to calculate changes such as last month versus the prior month or last month versus the same month a year earlier, R is a good choice. Itโ€™s easy to do those calculations โ€” and you donโ€™t have to worry whether a spreadsheet formula was properly clicked and dragged to cover all the necessary cells.

Like so many things in R, there are multiple ways to do this. Iโ€™ll cover two of them.

First, Iโ€™ll import some data about daily cycling trips on Bluebikes, the bicycle-share system in Boston, Cambridge, and three other nearby cities. If you want to follow along, download this zip file of CSV data and unzip it.

download

ย 

In the code below, I load a couple of packages and then read in the data from the daily_cycling_trips.csv file with the readr packageโ€™s read_csv() function.

library(readr)
library(dplyr)
daily_trips <- readr::read_csv("daily_cycling_trips.csv")

In the next code block, Iโ€™ll add a new column to the data, called YearMonth, using base Rโ€™s format() function to create a yyyy-mm format for each trip starting date. That is followed by typical dplyr group_by() and summarize()ย operations. In this case, Iโ€™m grouping by YearMonth and then, within each group, creating a column called MonthlyTrips that has the sum of all of that monthโ€™s trips. Finally, I make sure the data is arranged by YearMonth.

daily_trips <- daily_trips %>%
  mutate(
    YearMonth = format(TripStartDate, "%Y-%m")
  )
# Usual dplyr group_by() and summarize() by month.
monthly_trips <- daily_trips %>%
  group_by(YearMonth) %>%
  summarize(
    MonthlyTrips = sum(Trips)
  ) %>%
  arrange(YearMonth)

Calculating monthly and yearly changes

Now that Iโ€™ve got monthly subtotals, I can calculate the month-over-month and year-over-year changes by using dplyrโ€™s lag() function. lag() defaults to the previous value in the data frame column as it is currently ordered. You can change the number of lag items so itโ€™s more than one item back. To compare with the previous month, a default of one is fine. To compare with the previous year, Iโ€™ll want the lag to be 12, for 12 items back. Note that this works only if there arenโ€™t any missing months. You might want to add some code to check for missing data if youโ€™re not sure of the dataโ€™s integrity.

With lag(), you can set the column you want to order and lag by if your data frame isnโ€™t ordered the way you need. (I donโ€™t need to in this case, because my data is already ordered by YearMonth.)ย 

monthly_report <- monthly_trips %>%
  mutate(
    MoM = (MonthlyTrips - lag(MonthlyTrips)) / lag(MonthlyTrips),
    YoY = (MonthlyTrips - lag(MonthlyTrips, 12)) / lag(MonthlyTrips, 12)
  )

If you look at the monthly_report object, the data is all there, although the format could be nicer.

Monthly report data frame Sharon Machlis, IDG

Monthly report data frame with month-over-month and year-over-year comparisons.

The code below multiplies the fractions by 100 to create a percent format, and then rounds to one decimal place.ย 

monthly_report <- monthly_report %>%
  mutate(
    MoM = round(MoM * 100, 1),
    YoY = round(YoY * 100, 1)
  )

Thatโ€™s the easiest example of monthly reporting โ€” only one data point per month. Sometimes, though, youโ€™ll have multiple categories you want to track each month, such as comparisons by city, or age group, or website. Itโ€™s easy to tweak this code for that: You just group by your category in addition to the month before running a dplyr summarize(). Here is an example:

# Read in second data file
daily_trips_by_usertype <- readr::read_csv("daily_cycling_trips_by_usertype.csv")
# Add YearMonth column and get totals by month
monthly_trips_by_usertype <- daily_trips_by_usertype %>%
  mutate(
    YearMonth = format(TripStartDate, "%Y-%m")
  ) %>%
  group_by(YearMonth, usertype) %>%
  summarize(
    MonthlyTrips = sum(Trips)
  ) %>%
  arrange(YearMonth, usertype)
# Calculate MoM and YoY
monthly_report_by_usertype <- monthly_trips_by_usertype %>%
  group_by(usertype) %>%
  mutate(
    MoM = (MonthlyTrips - lag(MonthlyTrips)) / lag(MonthlyTrips),
    YoY = (MonthlyTrips - lag(MonthlyTrips, 12)) / lag(MonthlyTrips, 12)
  )

In the code block above, Iโ€™m doing the same thing as I did before, except Iโ€™m comparing trips by month and user type. There are two types of users in this data set, Customer and Subscriber.ย If you run the code and then look at monthly_report_by_user_type, you will see month-over-month and year-over-year comparisons by both Subscriber and Customer user types.

This data also makes it easy to graph monthly percent changes with ggplot2.

Graph of month-over-month changes in number of trips by customer type at Bluebiles. Sharon Machlis, IDG

ggplot2 graph of month-over-month changes in trips by customer types at Bluebikes.

Instead of changing the data by multiplying by 100 and rounding, I used the scalesย package and scale_y_continuous() to do that for me in the ggplot2 graph code below:

library(ggplot2)
library(scales)
ggplot(monthly_report_by_usertype, aes(x=YearMonth, y=MoM, fill=usertype)) + 
  geom_col(position="dodge") +
  scale_y_continuous(labels = scales::percent)

Note: If you just need โ€œlast monthโ€ for a specific report and not comparisons for every month, use dplyrโ€™s filter() function on your report data frame and set the YearMonth column to equal whatever the maximum value is for that data. For example:

filter(monthly_report_by_usertype, YearMonth == max(YearMonth))

Week-over-week calculations

Week over week is slightly different from monthly. For weekly, I generally use each weekโ€™s starting date instead of a format like yyyy-ww. Thatโ€™s because using numbers for weeks can get complicated when there are partial weeks at the start and end of a year.ย 

In the code below, I use the lubridate package and its floor_date() function. You could use base Rโ€™s cut.Date() function instead, but thatโ€™s a bit more complicated since it returns factors instead of dates. (Withย cut.Date()ย I usually end up running extra code to turn the factors back to dates. floor_date() is one step for what I want.)

Next come the usual calculations, this time for week-over-week and year-over-year. Note that for year over year, the lag is 52 instead of 12. Itโ€™s important here to be sure that Iโ€™ve got rows for every week, too.ย 

# install.packages("lubridate")
library(lubridate)
weekly_trips <- daily_trips %>%
  mutate(
    WeekStarting = floor_date(TripStartDate, unit = "weeks")
  ) %>%
  group_by(WeekStarting) %>%
  summarize(
    WeeklyTrips = sum(Trips)
  ) %>%
  arrange(WeekStarting)

weekly_report <- weekly_trips %>%
  mutate(
    WoW = (WeeklyTrips - lag(WeeklyTrips)) / lag(WeeklyTrips),
    YoY = (WeeklyTrips - lag(WeeklyTrips, 52)) / lag(WeeklyTrips, 52)
)

By now you may be thinking: OK, thatโ€™s straightforward, but itโ€™s a lot of typing for a simple weekly or monthly report. To streamline this process, make an RStudio code snippet! If you donโ€™t know how they work, check out the โ€œDo More With Rโ€ episode on code snippets.ย 

Hereโ€™s the snippet I made for monthly reports. Remember that every indented line needs to be indented with a tab, not just spaces, and the snippet needs to live in your RStudio snippet file, which you can access withย usethis::edit_rstudio_snippets().ย 

snippet my_monthly_reports
	monthly_report <- ${1:mydf} %>%
	mutate(
	  YearMonth = format(${2:MyDateColumn}, "%Y-%m")
	) %>%
	group_by(YearMonth, ${3:MyCategory}) %>%
	summarize(
	  MonthlyTotal = sum(${4:MyValueColumn})
	) %>%
	  arrange(YearMonth, ${3:MyCategory}) %>%
	  ungroup() %>%
	  group_by(${3:MyCategory}) %>%
	mutate(
	  MoM = (MonthlyTotal - lag(MonthlyTotal)) / lag(MonthlyTotal),
	  YoY = (MonthlyTotal - lag(MonthlyTotal, 12)) / lag(MonthlyTotal, 12)
	)

You can see how the snippet works in the video embedded at the top of this article.

Once youโ€™ve set up a code snippet, doing month-over-month comparisons may even be faster in R than in Excel.

For more R tips, head to theย โ€œDo More With Rโ€ playlist on YouTube.