Some Twitter users migrating to Mastodon miss being able to run full-text searches of their own toots. Here's how to search your own posts using R and the rtoot package.
Whether youโve fully migrated from Twitter to Mastodon, are just trying out the โfediverse,โ or have been a longtime Mastodon user, you may miss being able to search through the full text of โtootsโ (also known as posts). In Mastodon, hashtags are searchable but other, non-hashtag text is not. The unavailability of full-text search lets users control how much of their content is easily discoverable by strangers. But what if you want to be able to search your own posts?
Some Mastodon instances allow users to do full-text searches of their own toots but others donโt, depending on the admin. Fortunately, itโs easy to full-text search your own Mastodon posts, thanks to R and the rtoot package developed by David Schoch. Thatโs what this article is about.
Set up a full-text search
First, install the rtoot package if itโs not already on your system with install.packages("rtoot"). Iโll also be using the dplyr and DT packages. All three can be loaded with the following command:
# install.packages("rtoot") # if needed
library(rtoot)
library(dplyr)
library(DT)
Next, youโll need your Mastodon ID, which is not the same as your user name and instance. The rtoot package includes a way to search across the fediverse for accounts. Thatโs a useful tool if you want to see if someone has an account anywhere on Mastodon. But since it also returns account IDs, you can use it to find your own ID, too.
To search for my own ID, Iโd use:
accounts <- search_accounts("smach@fosstodon.org")
That will likely only bring back a dataframe with one result. If you only search for a user name and no instance, such as search_accounts("posit") to see if Posit (formerly RStudio) is active on Mastodon, there could be more results.ย
My search had only one result, so my ID is the first (as well as only) item in the id column:
my_id <- accounts$id[1]
I can now retrieve my posts with rtootโsย get_account_statuses() function.
Pull and save your data
The default returns 20 results, at least for now, though the limit appears to be a lot higher if you set it manually with the limit argument. Do be kind about taking advantage of this setting, however, since most Mastodon instances are run by volunteers facing vastly increased hosting costs recently.
The first time you try to pull your own data, youโll be asked to authenticate. I ran the following to get my most recent 50 posts (note the use of verbose = TRUE to see any messages that might be returned):
smach_statuses <- get_account_statuses(my_id, limit = 50, verbose = TRUE)
Next, I was asked if I wanted to authenticate. After choosing yes, I received the following query:
On which instance do you want to authenticate (e.g., "mastodon.social")?
Next, I was asked:
What type of token do you want?
1: public
2: user
Since I want the authority to see all activity in my own account, I chose user. The package then stored an authentication token for me and I could then runย get_account_statuses().
The resulting data frameโwhich was actually a tibble, a special type of data frame used by tidyverse packagesโincludes 29 columns. A few are list-columns such as account and media_attachments with non-atomic results, meaning results are not in a strict two-dimensional format.ย
I suggest saving this result before going further so you donโt need to re-ping the server in case something goes awry with your R session or code. I usually use saveRDS, like so:
saveRDS(smach_statuses, "smach_statuses.Rds")
Trying to save results as a parquet file does not work due to the complex list columns. Using the vroom package to save as a CSV file works and includes the full text of the list columns. However, Iโd rather save as a native .Rds or .Rdata file.
Create a searchable table with your results
If all you want is a searchable table for full-text searching, you only need a few of those 29 columns. You will definitely want created_at, url, spoiler_text (if you use content warnings and want those in your table), and content. If you miss seeing engagement metrics on your posts, add reblogs_count, favourites_count, and replies_count.
Below is the code I use to create data for a searchable table for my own viewing. I added a URL column to create a clickableย >> with the URL of the post, which I then add to the end of each postโs content. That makes it easy to click through to the original version:
tabledata <- smach_statuses |>
filter(content != "") |>
# filter(visibility == "public") |> # If you want to make this public somewhere. Default includes direct messages.
mutate(
url = paste0("<a target='blank' href='", uri,"' rel="noopener"> >></a>"),
content = paste(content, url),
created_at := as.character(as.POSIXct(created_at, format = "%Y-%m-%d %H:%M UTC"))
) |>
select(CreatedAt = created_at, Post = content, Replies = replies_count, Favorites = favourites_count, Boosts = reblogs_count)
If I were sharing this table publicly, Iโd make sure to uncomment filter(visibility == "public") so only my public posts were available. The data returned by get_account_statuses() for your own account includes posts that are unlisted (available to anyone who finds them but not on public timelines by default) as well as those that are set for followers only or direct messages.
There are a lot of ways to turn this data into a searchable table. One way is with the DT package. The code below creates an interactive HTML table with search filter boxes that can use regular expressions. (See Do more with R: Quick interactive HTML tables to learn more about using DT.)ย
DT::datatable(tabledata, filter = 'top', escape = FALSE, rownames = FALSE,
options = list(
search = list(regex = TRUE, caseInsensitive = TRUE),
pageLength = 20,
lengthMenu = c(25, 50, 100),
autowidth = TRUE,
columnDefs = list(list(width = '80%', targets = list(2)))
))
Hereโs a screenshot of the resulting table:
Sharon Machlis
An interactive table of my Mastodon posts. This table was created with the DT R package using rtoot.
How to pull in new Mastodon posts
Itโs easy to update your data to pull new posts, because the get_account_statuses() function includes a since_id argument. To start, find the maximum ID from the existing data:
max_id <- max(smach_statuses$id)
Next, seek an update with all the posts since the max_id:
new_statuses <- get_account_statuses(my_id, since_id = max_id,
limit = 10, verbose = TRUE)
all_statuses <- bind_rows(new_statuses, smach_statuses)
If you want to see updated engagement metrics for some recent posts in existing data, Iโd suggest getting the last 10 or 20 overall posts instead of using since_id. You can then combine that with the existing data and dedupe by keeping the first item. Here is one way to do that:
new_statuses <- get_account_statuses(my_id, limit = 25, verbose = TRUE)
all_statuses <- bind_rows(new_statuses, smach_statuses) |>
distinct(id, .keep_all = TRUE)
How to read your downloaded Mastodon archive
There is another way to get all your posts, which is particularly useful if youโve been on Mastodon for some time and have a lot of activity over that period. You can download your Mastodon archive from the website.
In the Mastodon web interface, click the little gear icon above the left column for Settings, thenย Import and export > Data export.ย You should see an option to download an archive of your posts and media. You can only request an archive once every seven days, though, and it will not include any engagement metrics.
Once you download the archive, you can unpack it manually or, as I prefer, use the archive package (available on CRAN) to extract the files. Iโll also load the jsonlite, stringr, and tidyr packages before extracting files from the archive:
library(archive)
library(jsonlite)
library(stringr)
library(tidyr)
archive_extract("name-of-your-archive-file.tar.gz")
Next, youโll want to look at outbox.jsonโs orderItems. Hereโs how I imported that into R:
my_outbox <- fromJSON("outbox.json")[["orderedItems"]]
my_posts <- my_outbox |>
unnest_wider(object, names_sep = "_")
From there, I created a data set for a searchable table similar to the one from the rtoot results. This archive includes all activity, such as favoriting another post, which is why Iโm filtering both for type Create and to make sure object_content has a value. As before, I add a >> clickable URL to the post content and tweak how dates are displayed:
search_table_data <- my_posts |>
filter(type == "Create") |>
filter(!is.na(object_content)) |>
mutate(
url = paste0("<a target='blank' href='", object_url,"' rel="noopener"> >></a>")
) |>
rename(CreatedAt = published, Post = object_content) |>
mutate(CreatedAt = str_replace_all(CreatedAt, "T", " "),
CreatedAt = str_replace_all(CreatedAt, "Z", " "),
Post = str_replace(Post, "</p>$", " "),
Post = paste0(Post, " ", url, "")
) |>
select(CreatedAt, Post) |>
arrange(desc(CreatedAt))
Then, itโs another easy single function to make a searchable table with DT:
datatable(search_table_data, rownames = FALSE, escape = FALSE,
filter = 'top', options = list(search = list(regex = TRUE)))
This is handy for your own use, but I wouldnโt use archive results to share publicly, since itโs less obvious which of these might have been private messages (youโd need to do some filtering on the to column).
If you have any questions or comments about this article, you can find meย on Mastodon at smach@fosstodon.orgย as well as occasionally still on Twitter at @sharon000ย (although Iโm not sure for how much longer). Iโm also on LinkedIn.
For more R tips, head to InfoWorldโs Do More With R page.


