This introductory vignette provides a brief, example-driven overview
of rtrek
.
The rtrek
package provides datasets related to the Star
Trek fictional universe and functions for working with those datasets.
It interfaces with the Star Trek API
(STAPI), Memory Alpha
and Memory
Beta to retrieve data, metadata and other information relating to
Star Trek.
The package also contains several local datasets covering a variety
of topics such as Star Trek timeline data, universe species data and
geopolitical data. Some of these are more information rich, while others
are toy examples useful for simple demonstrations. The bulk of Star Trek
data is accessed from external sources by API. A future version of
rtrek
will also include summary datasets resulting from
text mining analyses of Star Trek novels.
library(rtrek)
st_datasets()
#> # A tibble: 10 × 2
#> dataset description
#> <chr> <chr>
#> 1 stGeo Map tile set locations of interest.
#> 2 stSpecies Basic intelligent species data.
#> 3 stTiles Available map tile sets.
#> 4 stBooks Star Trek novel metadata.
#> 5 stSeries Names and acronyms of Star Trek series
#> 6 stapiEntities Star Trek API (STAPI) categories
#> 7 stLogos Metadata for various Star Trek logos
#> 8 tlBooks Novel-based timeline dataset
#> 9 tlEvents Event-based timeline dataset
#> 10 tlFootnotes Timeline dataset footnotes
At this time, several of the datasets are very small and are only
included in the package in order to demonstrate some very basic examples
and they are not particularly useful or interesting beyond this purpose.
However, rtek
now includes more sizable curated datasets
relating the compendium of licensed, published Star Trek literature and
multiple versions of Star Trek fictional universe historical
timelines.
Package datasets in rtrek
are somewhat eclectic and
currently limited. They will expand with further package development. To
list all available package datasets with a short description, call
st_datasets()
.
A largely comprehensive Star Trek book metadata table is available as
stBooks
, which is informed and curated from directly
parsing Star Trek e-book metadata rather than parsing third party
website content.
stBooks
#> # A tibble: 783 × 11
#> title author date publisher identifier series subseries nchap nword nchar
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <int> <int>
#> 1 Star … Alan … 2009… Simon an… 1439163391 AV NA 18 77035 4.60e5
#> 2 Starf… Rick … 2010… Simon Sp… 978144241… AV Starflee… 14 40129 2.39e5
#> 3 Starf… Rudy … 2010… Simon Sp… 978144241… AV Starflee… 31 52547 2.96e5
#> 4 Starf… Rick … 2011… Simon Sp… 978144241… AV Starflee… 13 39495 2.33e5
#> 5 Starf… Alan … 2012… Simon Sp… 978144242… AV Starflee… 30 62030 3.50e5
#> 6 Star … Alan … 2013… Gallery … 978147671… AV NA 17 77438 5.37e5
#> 7 Capta… James… 1998… Pocket B… 978143910… CT NA 21 95110 5.55e5
#> 8 Capta… Macke… 1998… Pocket B… 978074345… CT NA 26 76392 4.25e5
#> 9 Capta… Chris… 1998… Pocket B… 978143910… CT NA 34 78678 4.43e5
#> 10 The C… John … 2000… Pocket B… 978074340… CT NA 176 436682 2.47e6
#> # ℹ 773 more rows
#> # ℹ 1 more variable: dedication <chr>
This dataset is discussed further in the section below on e-book text mining.
Before moving on, it is worth mentioning a helpful table for mapping between series names and their abbreviations used throughout this package (and in the Star Trek community in general).
stSeries
#> # A tibble: 35 × 3
#> id abb type
#> <chr> <chr> <chr>
#> 1 Abramsverse AV series
#> 2 Challenger CHA series
#> 3 Deep Space Nine DS9 series
#> 4 Discovery DSC series
#> 5 Enterprise ENT series
#> 6 Klingon Empire KE series
#> 7 Miscellaneous MISC series
#> 8 New Frontier NF series
#> 9 Prometheus PRO series
#> 10 Seekers SKR series
#> # ℹ 25 more rows
The stTiles
data frame shows all available Star
Trek-themed map tile sets along with metadata and attribution
information. These map tiles can be used with the leaflet
and shiny
packages to make interactive maps situated in the
Star Trek universe.
stTiles
#> # A tibble: 2 × 8
#> id url description width height tile_creator map_creator map_url
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 galaxy1 https://leo… Geopolitic… 8000 6445 Matthew Leo… Rob Archer https:…
#> 2 galaxy2 https://leo… Geopolitic… 5000 4000 Matthew Leo… NA http:/…
The list is scant at the moment, but more will come. One thing to keep in mind is these tile sets use a simple/non-geographical coordinate reference system (CRS). Clearly, they are not Earth-based, though they are spatial in more ways than one!
Similar to game maps, there is a sense of space, but it is a simple Cartesian coordinate system and does not use geographic projections like you may be used to working with when analyzing spatial data or making Leaflet maps. This system is much simpler, but simple does not necessarily mean easy!
Inspect stGeo
:
stGeo
#> # A tibble: 18 × 4
#> id loc col row
#> <chr> <chr> <dbl> <dbl>
#> 1 galaxy1 Earth 2196 2357
#> 2 galaxy1 Romulus 2615 1742
#> 3 galaxy1 Qo'noS 3310 3361
#> 4 galaxy1 Breen 1004 939
#> 5 galaxy1 Ferenginar 1431 1996
#> 6 galaxy1 Cardassia 1342 2841
#> 7 galaxy1 Tholia 407 3866
#> 8 galaxy1 Tzenketh 1553 2557
#> 9 galaxy1 Talar 1039 3489
#> 10 galaxy2 Earth 2201 1595
#> 11 galaxy2 Romulus 2514 1178
#> 12 galaxy2 Qo'noS 3197 2303
#> 13 galaxy2 Breen 1228 1181
#> 14 galaxy2 Ferenginar 2026 886
#> 15 galaxy2 Cardassia 1543 1903
#> 16 galaxy2 Tholia 713 2971
#> 17 galaxy2 Tzenketh 1734 1721
#> 18 galaxy2 Talar 1338 2368
This is another small dataset containing locations of key planets in the Star Trek universe. Notice the coordinates do not appear meaningful. There is no latitude and longitude. Instead there are row and column entries defining cells in a matrix. The matrix dimensions are defined by the pixel dimensions of source map that was used to create each tile set.
The coordinates are also not consistent. Source maps differ significantly. Even if they had identical pixel dimensions, which they do not, each artist’s visual rendering of the fictional universe will place locations differently in space. In this sense, every tile set has a unique coordinate reference system. For each new tile set produced, all locations of interest must be georeferenced again.
This is not ideal, but it gets worse. Once you have locations’
coordinates defined that map onto a particular tile set, the
leaflet
package does not work in these row and column
grids. The (col, row)
pairs need to be transformed or
projected into Leaflet space. Fortunately, rtrek
does this
part for you with tile_coords()
. It takes a data frame like
one returned by st_tiles_data()
with columns named
col
and row
, as well as the name of an
available Star Trek map tile set. It returns a data frame with new
columns x
and y
that will map properly in a
leaflet
map built on that tile set.
id <- "galaxy1"
(d <- st_tiles_data(id))
#> # A tibble: 9 × 8
#> id loc col row body category zone species
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 galaxy1 Earth 2196 2357 Planet Homeworld United Federation of … Human
#> 2 galaxy1 Romulus 2615 1742 Planet Homeworld Romulan Star Empire Romulan
#> 3 galaxy1 Qo'noS 3310 3361 Planet Homeworld Klingon Empire Klingon
#> 4 galaxy1 Breen 1004 939 Planet Homeworld Breen Confederacy Breen
#> 5 galaxy1 Ferenginar 1431 1996 Planet Homeworld Ferengi Alliance Ferengi
#> 6 galaxy1 Cardassia 1342 2841 Planet Homeworld Cardassian Union Cardas…
#> 7 galaxy1 Tholia 407 3866 Planet Homeworld Tholian Assembly Tholian
#> 8 galaxy1 Tzenketh 1553 2557 Planet Homeworld Tzenkethi Coalition Tzenke…
#> 9 galaxy1 Talar 1039 3489 Planet Homeworld Talarian Republic Talari…
(d <- tile_coords(d, id))
#> # A tibble: 9 × 10
#> id loc col row body category zone species x y
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 galaxy1 Earth 2196 2357 Planet Homeworld United F… Human 68.6 -73.7
#> 2 galaxy1 Romulus 2615 1742 Planet Homeworld Romulan … Romulan 81.7 -54.4
#> 3 galaxy1 Qo'noS 3310 3361 Planet Homeworld Klingon … Klingon 103. -105.
#> 4 galaxy1 Breen 1004 939 Planet Homeworld Breen Co… Breen 31.4 -29.3
#> 5 galaxy1 Ferenginar 1431 1996 Planet Homeworld Ferengi … Ferengi 44.7 -62.4
#> 6 galaxy1 Cardassia 1342 2841 Planet Homeworld Cardassi… Cardas… 41.9 -88.8
#> 7 galaxy1 Tholia 407 3866 Planet Homeworld Tholian … Tholian 12.7 -121.
#> 8 galaxy1 Tzenketh 1553 2557 Planet Homeworld Tzenketh… Tzenke… 48.5 -79.9
#> 9 galaxy1 Talar 1039 3489 Planet Homeworld Talarian… Talari… 32.5 -109.
Here is an example using the galaxy1
map with
leaflet
. The st_tiles()
function is used to
link to the tile provider.
library(leaflet)
tiles <- st_tiles("galaxy1")
leaflet(d, options = leafletOptions(crs = leafletCRS("L.CRS.Simple"))) %>%
addTiles(tiles) %>% setView(108, -75, 2) %>%
addCircleMarkers(lng = ~x, lat = ~y, label = ~loc, color = "white", radius = 20)
The stSpecies
dataset is just a small table that pairs
species named with representative thumbnail avatars, mostly pulled from
the Memory Alpha website. There is nothing map-related here, but these
are used in this Stellar
Cartography example. It is similar to the Leaflet example above, but
a bit more interesting, with markers to click on and information
displays.
In the course of the above map-related examples, a few functions have
also been introduced. st_tiles()
takes an id
argument that is mapped to the available tile sets in
stTiles
and returns the relevant URL.
st_tiles_data()
takes the same id
argument and
returns a simple example data frame containing ancillary data related to
the available locations from stGeo
. The result is always
the same except that the grid cells for locations change with respect to
the chosen tile set. Finally, tile_coords()
can be applied
to one of these data frames to add x
and y
columns for a CRS that Leaflet will understand.
Fictional universe historical timeline data is an exciting type of in-universe Star Trek data to have at your fingertips to play around with and explore.
It is also difficult to compile. Many people have labored away intensely over the years compiling various attempts at integrated, internally consistent, accurate timelines of Star Trek universe lore. Some have turned out more successful than others.
As of rtrek
v0.2.0
the rudimentary
beginnings of what will ideally eventually become an up to date and
comprehensive timeline dataset are now underway in the form of two
different flavors of timeline datasets.
One is based on published works, mostly consisting of novels, as well as television series and movies, all placed in chronological order.
tlBooks
#> # A tibble: 2,122 × 14
#> year title series anthology format number novelization setting
#> <dbl> <chr> <chr> <chr> <chr> <int> <lgl> <chr>
#> 1 -5000000000 The Q Contin… TNG NA book 47 FALSE second…
#> 2 -5000000000 Spock's World TOS NA book NA FALSE second…
#> 3 -4000000000 Reciprocity TNG SNW story NA FALSE second…
#> 4 -3500000000 All Good Thi… TNG NA book NA TRUE second…
#> 5 -2500000000 The Q Contin… TNG NA book 47 FALSE second…
#> 6 -1000000000 The Q Contin… TNG NA book 48 FALSE second…
#> 7 -444800000 The Escape VOY NA book 2 FALSE second…
#> 8 -64018143 First Fronti… TOS NA book 75 FALSE second…
#> 9 -500000 Spock's World TOS NA book NA FALSE second…
#> 10 -307600 The Escape VOY NA book 2 FALSE second…
#> # ℹ 2,112 more rows
#> # ℹ 6 more variables: stardate_start <dbl>, stardate_end <dbl>,
#> # detailed_date <chr>, section <chr>, primary_entry_year <int>,
#> # footnote <int>
The other is an event-driven timeline that consists of textual entries referencing historically significant events, situated chronologically in the timeline.
tlEvents
#> # A tibble: 1,241 × 6
#> year setting series source info footnote
#> <dbl> <chr> <chr> <chr> <chr> <int>
#> 1 -4000000000 secondary SNW Reciprocity - SNW 2 ~4 b… NA
#> 2 -4000000000 primary SNW The Beginning - SNW 6 ~4 b… 1
#> 3 -64000000 time travel TOS First Frontier - TOS 75 ~64 … NA
#> 4 -27800 secondary DS9 Horn And Ivory - Gateways: Wha… ~278… NA
#> 5 -2700 inferred TOS Yesterday's Son - TOS 11 ~270… NA
#> 6 -300 secondary NA Spock's World - Vulcan Five ~300… NA
#> 7 -79 secondary NA Spock's World - Vulcan Six Birt… NA
#> 8 -70 secondary TNG The Devil's Heart - TNG ~70 … 2
#> 9 -33 secondary NA Spock's World - Vulcan Six Sura… NA
#> 10 -22 secondary TOS Rihannsu 2: The Romulan Way - … Vulc… NA
#> # ℹ 1,231 more rows
The two datasets are quite different in their focus and compliment one another.
One column these two timeline data frames share in common is the
footnote
column, which you can see only contains ID values
for entries which have a footnote. The tlFootnotes
dataset
can be referenced or joined by footnote
to one of the other
tables. Footnotes tend to be long strings of text and not associated
with most timeline entries, so they are kept in a separate table.
tlFootnotes
#> # A tibble: 605 × 3
#> id id text
#> <chr> <int> <chr>
#> 1 book 1 "This chapter chronicles the period from the formation of the 40…
#> 2 book 2 "The time and manner of Borg origins are entirely speculative. B…
#> 3 book 3 "This chapter chronicles the period from Surak's birth until his…
#> 4 book 4 "As a vision, this may be historically unreliable. The events co…
#> 5 book 5 "This chapter chronicles the period from the Ahkh War (against t…
#> 6 book 6 "The date of the Awakening is around 2000 years prior to \"The S…
#> 7 book 7 "This chapter chronicles the period from S'task's Declaration un…
#> 8 book 8 "This chapter chronicles the journey of the Rihannsu Travelers u…
#> 9 book 9 "These chapters chronicle the period beginning 25.86 real-time y…
#> 10 book 10 "This chapter chronicles the period from the Settlement until th…
#> # ℹ 595 more rows
tlBooks
is novel-driven, meaning that the timeline
entries (rows) provide a chronologically ordered list of licensed Star
Trek novels. This timeline is helpful for figuring out when stories are
set and the relative order in which they occur, but it does not provide
any description of events transpiring in the universe.
While this data is very informative, it is many years out of date, being last updated in October of 2006. It is also necessarily speculative. Settings are determined based in part on what is interpreted to be the intention of a given author for a given story.
Nevertheless, it still represents possibly the highest quality representation of the chronological ordering of Star Trek fiction that combines episodes and movies with written works. The concurrent timeline of Star Trek TV episodes and movies are interleaved with the novels and other stories, anthologies and other written fiction. This provides fuller context resulting in a much richer timeline.
tlEvents
is event-driven, meaning that the timeline
entries (rows) provide chronologically ordered historical events from
the Star Trek universe. As with tlBooks
, this timeline is
quite out of date. In fact it is at least somewhat more out of date than
tlBooks
, its last content update appearing to be no later
than 2005. This timeline is also more problematic than the other, and
less relevant moving forward. Its updating essentially ceased as the
other began.
However, it is included because unlike tlBooks
, which is
a timeline of production titles, this timeline dataset is event-driven.
While it may now be erroneous in places even independent from being out
of date, it is useful for its informative textual entries referencing
historically significant events in Star Trek lore.
In summary, these datasets have much value, but they should be used
with the awareness that they are necessarily imperfect ans speculative,
notably outdated, and tlEvents
in particular is less able
to stand the test of time as the Star Trek universe moves forward with
new publications and productions.
It should also be noted that while it may be tempting to merge these two data frames, this is not advisable if it is important to maintain chronological order. It is generally safe to assume that multiple entries within a single year are listed in a sensible order in cases where it may matter, within-year entries do not have specific, unique within-year dates. They are ordinal only. It is not possible to merge entries from both tables for a specific year and know how the combined set of entries should be ordered- unless you already know everything about Star Trek, in which case please craft the ultimate timeline in a universal file format that can be easily digested by a computer.
Now that you have seen an overview of available rtrek
datasets and some associated functions, it is time to turn attention to
external datasets. The Star Trek API
(STAPI) is a particularly useful data source.
Keep in mind that STAPI focuses more on providing real world data associated with Star Trek (e.g., when did episode X first air on television?) than on fictional universe data, but it contains both and the database holdings will grow with time.
To use the words of the developers, the STAPI is
the first public Star Trek API, accessible via REST and SOAP. It’s an open source project, that anyone can contribute to.
The API is highly functional. Please do not abuse the API with
constant requests. Their pages suggest no more than one request per
second, but I would suggest ten seconds between successive requests. The
default anti-DOS measures in rtrek
limit requests to one
per second. You can update this global rtrek
setting with
options()
, e.g. options(rtrek_antidos = 10)
for a minimum ten second wait between API calls to be an even better
neighbor. rtrek
will not permit faster requests. If set
below one second, the option is ignored and a warning thrown when making
any API call.
There a many fields, or entities, available in the API. The available IDs can be found in this table:
stapiEntities
#> # A tibble: 40 × 4
#> id class ncol colnames
#> <chr> <chr> <int> <named list>
#> 1 animal tbl_df 7 <chr [7]>
#> 2 astronomicalObject tbl_df 5 <chr [5]>
#> 3 book tbl_df 24 <chr [24]>
#> 4 bookCollection tbl_df 10 <chr [10]>
#> 5 bookSeries tbl_df 11 <chr [11]>
#> 6 character tbl_df 24 <chr [24]>
#> 7 comicCollection tbl_df 14 <chr [14]>
#> 8 comics tbl_df 15 <chr [15]>
#> 9 comicSeries tbl_df 15 <chr [15]>
#> 10 comicStrip tbl_df 12 <chr [12]>
#> # ℹ 30 more rows
These ID values are passed to stapi()
to perform a
search using the API. The other columns provide some information about
the object returned from a search. All entity searches return tibble
data frames. You can inspect or unnest the column names of each table
returned from every available entity search so you can see beforehand
what variables are associated with each entity.
Using stapi()
should be thought of as a three part
process:
stapi()
one more time referencing the
specific observation.To determine how many pages of results exist for a given search, set
page_count = TRUE
. The impact on the API will be equivalent
to only searching a single page of results. One page contains metadata
including the total number of pages. Nothing is returned in this “safe
mode”, but the total number of search results available is printed to
the console.
Searching movies only returns one page of results. However, there are a lot of characters in the Star Trek universe. Check the total pages available for character search.
stapi("character", page_count = TRUE)
#> Total pages to retrieve all results: 76
And that is with 100 results per page!
The default page = 1
only returns the first page.
page
can be a vector, e.g. page = 1:62
.
Results from multi-page searches are automatically combined into a
single, constant data frame output. For the second call to
stapi()
, return only page two here, which contains the
character, Q (currently, pending future character database updates that
may shift the indexing). In case that does change and Q is not always
near the top of page two of the search results, the example further
below hard-codes his unique/universal ID.
stapi("character", page = 2)
#> # A tibble: 100 × 24
#> uid name gender yearOfBirth monthOfBirth dayOfBirth placeOfBirth
#> <chr> <chr> <chr> <int> <lgl> <lgl> <lgl>
#> 1 CHMA0000051779 Alex … NA NA NA NA NA
#> 2 CHMA0000020014 Alexa… NA NA NA NA NA
#> 3 CHMA0000039688 Alexa… M NA NA NA NA
#> 4 CHMA0000206312 Alexa… NA NA NA NA NA
#> 5 CHMA0000213058 Alexa… NA NA NA NA NA
#> 6 CHMA0000189986 Alexa… NA NA NA NA NA
#> 7 CHMA0000069404 Alexa… NA NA NA NA NA
#> 8 CHMA0000176430 Alexa… M NA NA NA NA
#> 9 CHMA0000174627 Alexa… NA NA NA NA NA
#> 10 CHMA0000007635 Alexa… M 2366 NA NA NA
#> # ℹ 90 more rows
#> # ℹ 17 more variables: yearOfDeath <int>, monthOfDeath <int>, dayOfDeath <int>,
#> # placeOfDeath <lgl>, height <int>, weight <int>, deceased <lgl>,
#> # bloodType <lgl>, maritalStatus <chr>, serialNumber <chr>,
#> # hologramActivationDate <lgl>, hologramStatus <chr>,
#> # hologramDateStatus <lgl>, hologram <lgl>, fictionalCharacter <lgl>,
#> # mirror <lgl>, alternateReality <lgl>
Character tables can be sparse. There are a lot of variables, many of which will contain missing data for rare, esoteric characters. Even for more popular characters about whom much more universe lore has been uncovered, it still takes dedicated nerds to enter all the data in a database.
When a dataset contains a uid
column, this can be used
subsequently to extract a satellite dataset about that particular
observation that was returned in the original search. First you used
safe mode, then search mode, and now switch from search mode to
extraction mode to obtain data about Q, specifically. All that is
required to do this is pass Q’s uid
to stapi()
and call the function one last time. When uid
is no longer
NULL
, stapi()
knows not to bother with a
search and makes a different type of API call requesting information
about the uniquely identified entry.
Q <- "CHMA0000025118"
Q <- stapi("character", uid = Q)
library(dplyr)
Q$episodes %>% select(uid, title, stardateFrom, stardateTo)
#> uid title stardateFrom stardateTo
#> 1 EPMA0000259941 Veritas NA NA
#> 2 EPMA0000000651 Tapestry NA NA
#> 3 EPMA0000000500 Hide And Q 41590.5 41590.5
#> 4 EPMA0000277408 The Star Gazer NA NA
#> 5 EPMA0000280052 Farewell NA NA
#> 6 EPMA0000279099 Two of One NA NA
#> 7 EPMA0000278606 Watcher NA NA
#> 8 EPMA0000001510 The Q and the Grey 50384.2 50392.7
#> 9 EPMA0000001413 True Q 46192.3 46192.3
#> 10 EPMA0000000845 Q-Less 46531.2 46531.2
#> 11 EPMA0000001329 Q Who 42761.3 42761.3
#> 12 EPMA0000278900 Fly Me to the Moon NA NA
#> 13 EPMA0000000483 Encounter at Farpoint 41153.7 41153.7
#> 14 EPMA0000001458 All Good Things... 47988.0 47988.0
#> 15 EPMA0000162588 Death Wish 49301.2 49301.2
#> 16 EPMA0000289337 The Last Generation NA NA
#> 17 EPMA0000001347 Deja Q 43539.1 43539.1
#> 18 EPMA0000277535 Penance NA NA
#> 19 EPMA0000278226 Assimilation NA NA
#> 20 EPMA0000279450 Mercy NA NA
#> 21 EPMA0000001619 Q2 54704.5 54704.5
#> 22 EPMA0000001377 Qpid 44741.9 44741.9
The data returned on Q is actually a large list, including multiple data frames. For simplicity only a piece of it is shown above. For more examples, see the STAPI vignette.
Some functions in rtrek
provide an API-like
interface to online Star Trek-related data. Specifically, parsing data
from the Memory Alpha
and Memory
Beta websites. These sites do not provide APIs. Therefore the only
option is to read pages into R and parse the html. Behind the scenes
this is done using the xml2
and rvest
packages, but from the user perspective it is presented as passing an
API endpoint string to a function.
memory_alpha
and memory_beta
, as well as
several other related functions, are available in rtrek
.
These functions access data from Memory Alpha and Memory Beta. For
details and examples on these functions, see the Memory Alpha vignette
and the Memory Beta vignette.
This section will be continued in a future version of
rtrek
. For now what is available is a dataset
stBooks
. This dataset represents metadata parsed,
imperfectly but painstakingly and thoroughly, from actual Star Trek
books. stBooks
contains several different fields, including
useful fields for analysts such as the number of words and chapters in a
book.
stBooks
#> # A tibble: 783 × 11
#> title author date publisher identifier series subseries nchap nword nchar
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <int> <int>
#> 1 Star … Alan … 2009… Simon an… 1439163391 AV NA 18 77035 4.60e5
#> 2 Starf… Rick … 2010… Simon Sp… 978144241… AV Starflee… 14 40129 2.39e5
#> 3 Starf… Rudy … 2010… Simon Sp… 978144241… AV Starflee… 31 52547 2.96e5
#> 4 Starf… Rick … 2011… Simon Sp… 978144241… AV Starflee… 13 39495 2.33e5
#> 5 Starf… Alan … 2012… Simon Sp… 978144242… AV Starflee… 30 62030 3.50e5
#> 6 Star … Alan … 2013… Gallery … 978147671… AV NA 17 77438 5.37e5
#> 7 Capta… James… 1998… Pocket B… 978143910… CT NA 21 95110 5.55e5
#> 8 Capta… Macke… 1998… Pocket B… 978074345… CT NA 26 76392 4.25e5
#> 9 Capta… Chris… 1998… Pocket B… 978143910… CT NA 34 78678 4.43e5
#> 10 The C… John … 2000… Pocket B… 978074340… CT NA 176 436682 2.47e6
#> # ℹ 773 more rows
#> # ℹ 1 more variable: dedication <chr>
Obviously, verbatim licensed book content itself cannot be shared, so
it is not possible to provide capability in rtrek
to enable
analysts to perform their own unique text mining analyses on Star Trek
novel corpora. However, future versions of rtrek
will
include more summary datasets that will aim to represent more
interesting variables.
A few examples could be:
or any other set of interesting metrics that could be used, for example, to inform suggested reading lists of various titles, or books by particular authors with a favored style or focus.