rtrek package includes some Star Trek datasets, but
much more data is available outside the package. You can access other
Star Trek data through various APIs.
Technically, there is only one formal API: the Star Trek API (STAPI).
rtrek has functions to assist with making calls to this API
in order to access specific data. See the STAPI vignette
The focus of this vignette is on accessing data from Memory Alpha.
rtrek interfaces with and extracts information from the Memory Alpha
Beta websites. Neither of these sites actually expose an API, but
rtrek with querying these websites in an
API-like manner. For working with Memory Beta content, see the
Memory Alpha is a website that hosts information on all things relating to official canon Star Trek. This strictly pertains to the television series and movies. There are many other officially licensed Star Trek productions, e.g., the many hundreds of novels, but these are not technically canon even though they are often treated as much by many fans. For a broader, licensed works focus, see Memory Beta.
When talking about using
rtrek to access data from
Memory Alpha, the term data is used loosely. It would be just as
accurate to say information, content or text. While the site contains a
vast amount of information, it is not structured in tidy tables like a
data scientist would love to conveniently encounter. Memory Alpha is a
wiki and can be thought of as similar to an encyclopedia. The bulk of
its pages consist of articles. While some of these may have interesting
html tables contained within, the site largely offers textual data.
Since Memory Alpha does not offer an API, the API-like interfacing
rtrek is just a collection of wrappers around
web page scraping. In terms of what the relevant functions bring back
from Memory Alpha, there are real limitations on the level of generality
and quality of formatting that can be achieved across such a massive and
diverse collection of articles.
There are six Memory Alpha web portals available. To see them, call
the main function for Memory Alpha access,
and pass it
portals as the API endpoint.
memory_alpha("portals") #> # A tibble: 6 × 2 #> id url #> <chr> <chr> #> 1 alternate Portal:Alternate_Reality #> 2 people Portal:People #> 3 science Portal:Science #> 4 series Portal:TV_and_films #> 5 society Portal:Society_and_Culture #> 6 technology Portal:Technology
The data frame returned provides each portal ID and respective “short
URL”. These relative URLs are given in order to reduce verbosity and
redundancy. All absolute URLs begin with
In this special case where
endpoint = "portals", this
table is returned from the package itself because it is already known.
The available portals are fixed. There is no accessing of Memory Alpha
yet. The URLs shown are also not needed by the user, but are provided
alongside the IDs for context.
When using a specific portal at the highest level (portal ID only), the returned data frame contains information about searchable categories available in the portal.
memory_alpha("people") #> # A tibble: 103 × 3 #> id url group #> <chr> <chr> <chr> #> 1 Acamarians Category:Acamarians By species #> 2 Akritirians Category:Akritirians By species #> 3 Aldeans Category:Aldeans By species #> 4 Andorians Category:Andorians By species #> 5 Androids Category:Androids By species #> 6 Aquans Category:Aquans By species #> 7 Ardanans Category:Ardanans By species #> 8 Augments Category:Augments By species #> 9 Ba'ku Category:Ba%27ku By species #> 10 Bajorans Category:Bajorans By species #> # ℹ 93 more rows
Again, there are
url columns. There
is also a
group (and potentially a
column. This is only to provide meaningful context for the values in the
id column if relevant for a given portal;
group is not used for anything and the user can ignore
The above call does involve reaching out to Memory Alpha. While the
portals are stable, it is expected that content within is regularly
updated. Remember that this is not a real API. Since one is not
available, what is really going on behind the scenes is the use of
rvest for web page harvesting.
Some portals have terminal endpoints - in Memory Alpha these are the
written articles - at the top level, but typically the top level results
for a portal are categories. You can always differentiate categories
from articles by the URL, which will begin with
in the former case.
Descending through subcategories is done by appending their
id values, separated by a forward slash
memory_alpha("people/Klingons") #> # A tibble: 243 × 2 #> Klingons url #> <chr> <chr> #> 1 Memory Alpha images (Klingons) Category:Memory_Alpha_images_(Klingons) #> 2 Amar Amar #> 3 Antaak Antaak #> 4 A'trom A%27trom #> 5 Atul Atul #> 6 Augments Category:Augments #> 7 Azetbur Azetbur #> 8 Ba'el Ba%27el #> 9 Ba'ktor Ba%27ktor #> 10 Barak-Kadan Barak-Kadan #> # ℹ 233 more rows memory_alpha("people/Klingons/Worf") #> # A tibble: 1 × 4 #> title content metadata categories #> <chr> <list> <list> <list> #> 1 Worf <xml_ndst> <tibble [1 × 16]> <tibble [14 × 2]>
Note the change in the structure of the final output, which is an article. This is the end of this particular road The result is still a data frame, but now has only one row, the article.
The columns include a text
title and three nested
content contains an
object left (mostly) unadulterated by
contains the article’s main content section, including ordered content
from a default set of html tags. For more control over article content,
ma_article in the next section.
contains a nested data frame of content parsed from the summary card
that appears in the top right corner of articles. If this fails to parse
for a given article,
NULL is returned.
categories returns a data frame containing categories in
which the article topic falls under and their respective URLs.
If you already know the article
id, You can obtain an
article directly using
ma_article instead of going through
an endpoint with
memory_alpha that terminates in the same
id. This also offers additional options to control what
tags are included in the returned result and whether that result is the
xml_nodeset object or a character vector of only
the extracted text. In either case, work is left to the user to do what
they intend such as text analysis.
worf <- ma_article("Worf", content_format = "character", content_nodes = c("h2", "h3")) worf #> # A tibble: 1 × 4 #> title content metadata categories #> <chr> <list> <list> <list> #> 1 Worf <chr > <tibble [1 × 16]> <tibble [14 × 2]> worf$content[] # Worf article section headings #>  "Early life" "The Rozhenkos" #>  "Coming of age" "Starfleet career" #>  "Service aboard the USS Enterprise-D" "Service on Deep Space 9" #>  "Service aboard the USS Enterprise-E" "Other adventures" #>  "Later career" "Personality" #>  "Physicality" "As a warrior" #>  "Ailments and injuries" "Family" #>  "K'Ehleyr" "Alexander" #>  "Jeremy Aster" "Jadzia Dax" #>  "Kurn" "Nikolai Rozhenko" #>  "Martok" "Friendships" #>  "The crew of the Enterprise" "Deep Space 9 companions" #>  "Kor" "Alternate realities and timelines" #>  "Holograms" "Memorable quotes" #>  "Chronology" "Appendices" #>  "See also" "Appearances" #>  "Background information" "Apocrypha" #>  "External links"
browse = TRUE the article page also launches in the
Full resolution source images can be downloaded and imported into R
ma_image if you know the short URL. The easiest way
to find URLs is by using a Memory Alpha portal. In the example below,
the Memory Alpha images category under Klingons is selected. Look for a
picture that includes Worf but also Data.
library(dplyr) klingons <- memory_alpha("people/Klingons/Memory Alpha images (Klingons)") klingons #> # A tibble: 1,354 × 2 #> `Memory Alpha images (Klingons)` url #> <chr> <chr> #> 1 Memory Alpha images (Klingon holograms) Category:Memory_Alpha_images_(Klingo… #> 2 Age of ascension pain sticks.jpg File:Age_of_ascension_pain_sticks.jpg #> 3 Ajilon Prime Klingon 1.jpg File:Ajilon_Prime_Klingon_1.jpg #> 4 Ajilon Prime Klingon 2.jpg File:Ajilon_Prime_Klingon_2.jpg #> 5 Ajilon Prime Klingons.jpg File:Ajilon_Prime_Klingons.jpg #> 6 Alexander and K'mtar.jpg File:Alexander_and_K%27mtar.jpg #> 7 Alexander at the kot'baval festival.jpg File:Alexander_at_the_kot%27baval_fe… #> 8 Alexander Rozhenko, 2367.jpg File:Alexander_Rozhenko,_2367.jpg #> 9 Alexander Rozhenko, 2370.jpg File:Alexander_Rozhenko,_2370.jpg #> 10 Alexander Rozhenko, 2374.jpg File:Alexander_Rozhenko,_2374.jpg #> # ℹ 1,344 more rows worf_data <- filter(klingons, grepl("Worf", url) & grepl("Data", url)) worf_data #> # A tibble: 10 × 2 #> `Memory Alpha images (Klingons)` url #> <chr> <chr> #> 1 Data and Worf, 2369.jpg File:Data_a… #> 2 Data tries talking to Worf.jpg File:Data_t… #> 3 Data, Picard and Worf, 2375.jpg File:Data,_… #> 4 Data, Worf, and Beverly Crusher find Wesley Crusher.jpg File:Data,_… #> 5 La Forge, Data, Riker, Worf, and Picard, 2379.jpg File:La_For… #> 6 Picard Data and Worf on Iconia.jpg File:Picard… #> 7 Picard, Data, and Worf away team.jpg File:Picard… #> 8 Thomas Riker and William Riker play poker with Data and Worf.jpg File:Thomas… #> 9 Wesley Crusher, Worf, and Data, quantum reality.jpg File:Wesley… #> 10 Worf carries Data through portal.jpg File:Worf_c…
Qapla’! This provides several results.
Technically, this is not the url to an image file. It is a url that
redirects you to some other seemingly random article on the website that
happens to include the image in it. This is not necessarily a unique
instance of the image, nor is there any consistency in what portal or
type of article it takes you to.
ma_article using the short form url, provide the article
content associated with the “file” url. See
for viewing the actual image.
x <- memory_alpha("people/Klingons/Memory Alpha images (Klingons)/Data tries talking to Worf.jpg") x #> # A tibble: 1 × 4 #> title content metadata categories #> <chr> <list> <list> <list> #> 1 The Icarus Factor (episode) <xml_ndst> <NULL> <tibble [1 × 2]> x <- ma_article("File:Data_tries_talking_to_Worf.jpg") x #> # A tibble: 1 × 4 #> title content metadata categories #> <chr> <list> <list> <list> #> 1 The Icarus Factor (episode) <xml_ndst> <NULL> <tibble [1 × 2]> x$categories #> [] #> # A tibble: 1 × 2 #> categories url #> <chr> <chr> #> 1 TNG episodes Category:TNG_episodes
The likely intent is to obtain an image file after browsing the web
pages that list images files. Even if you are interactively browsing the
website, you have to click several times and scroll through additional
articles before you can actually view the image file that was initially
presented to you as a clickable link. This is a frustrating user
experience and confusing design. If you have a file name you want to
view, just use
ma_image for this. It returns a ggplot
object of the image file rather than an associated article.
ma_image can take the additional arguments,
keep = TRUE to retain the downloaded image file, and
file to specify the output filename if you do not want it
to be derived from the short URL. If you need more control over the
keep = TRUE and then load the image file into R
directly to plot separately as needed.
You can perform a Memory Alpha site search using
ma_search. This returns a data frame of search results
content, including title, truncated text preview, and short URL for the
first page of search results.
It does not recursively collate search results through subsequent
pages of results. There could be an unexpectedly high number of pages of
results depending on the search query. Since the general nature of this
search feature seems relatively casual anyway, it aims only to provide a
first page preview. As with
browse = TRUE opens the page in the browser.
ma_search("Guinan") #> # A tibble: 25 × 3 #> title text url #> <chr> <chr> <chr> #> 1 Guinan "Multiple realities(covers informati… http… #> 2 Francis Guinan "Real World article(written from a P… http… #> 3 Unnamed El-Aurians "The following is a list of unnamed … http… #> 4 Unnamed individuals (unknown era) "List of unnamed individuals who liv… http… #> 5 Borg "Multiple realities(covers informati… http… #> 6 Jean-Luc Picard "Multiple realities(covers informati… http… #> 7 Data "Multiple realities(covers informati… http… #> 8 Star Trek: Picard "Real World article(written from a P… http… #> 9 Wesley Crusher "Multiple realities(covers informati… http… #> 10 Worf "Multiple realities(covers informati… http… #> # ℹ 15 more rows
Memory Alpha contains almost 50,000 pages at the time of this
rtrek version. It is possible that some articles may have
idiosyncratic structure that could make them inaccessible by these
Since this package version is also the first to offer this brand new functionality - and as mentioned, Memory Alpha does not offer an API, leading to a less reliable web-scraping approach, it is unknown what the likelihood is at this time of breaking changes occurring during updates to Memory Alpha by its maintainers.