These examples explore the word counts of published Star Trek novels
from 1967 through 2017 based on the stBooks
dataset.
First, load packages to assist with data manipulation and plotting.
library(dplyr)
library(lubridate)
library(ggplot2)
library(ggrepel)
library(gridExtra)
library(rtrek) # use >= v0.2.1 for more accurate word count data
Look for trends in word count through time for select series: The
Next Generation, Deep Space Nine, and Voyager. Mutating the
date
and nword
columns facilitates a better
plot. Retain outliers in general, but drop any titles containing the
word “Omnibus” because these are known to be larger books containing
multiple individual novels. Inspect the data.
keep_series <- c("TNG", "DS9", "VOY")
x <- mutate(stBooks, date = decimal_date(as.Date(date)), nword = nword / 1000) %>%
filter(series %in% keep_series & !grepl("Omnibus", title))
arrange(x, nword)
#> # A tibble: 266 × 11
#> title author date publisher identifier series subseries nchap nword nchar
#> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <int> <dbl> <int>
#> 1 Slings… Terri… 2008. Simon an… 1416550240 TNG NA 8 10.7 63356
#> 2 Slings… Keith… 2008. Simon an… 978141655… TNG NA 11 18.9 112768
#> 3 First … John … 1997. Pocket B… 978074345… TNG NA 13 23.6 137531
#> 4 Slings… Willi… 2008. Simon an… 1416550224 TNG NA 8 23.8 140719
#> 5 Typhon… Chris… 2012. Gallery … 978145165… TNG Typhon P… 14 25.3 149530
#> 6 Lust's… Paula… 2015. Pocket B… 978147677… DS9 NA 16 25.5 188622
#> 7 Slings… Rober… 2008. Simon an… 978141655… TNG NA 7 25.8 153414
#> 8 Q are … Rudy … 2015. Pocket B… 978147677… TNG NA 10 27.2 192705
#> 9 Slings… J. St… 2008. Pocket B… 978141655… TNG NA 12 27.6 159115
#> 10 The St… James… 2013. Pocket B… 978145169… TNG NA 5 29.2 212328
#> # ℹ 256 more rows
#> # ℹ 1 more variable: dedication <chr>
Create a plot, separating each series in a different panel rather than using color to differentiate them.
clrs <- c("cornflowerblue", "orange", "purple")
ggplot(x, aes(date, nword, color = series, fill = series)) +
geom_point(color = "black", shape = 21, size = 3) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Star Trek novel word count", subtitle = "By publication date and selected series",
x = "Publication date", y = "Word count (Thousand words)") +
theme_minimal() + scale_color_manual(values = clrs) + scale_fill_manual(values = clrs) +
scale_x_continuous(breaks = seq(1987, 2018, by = 3) + 0.5, labels = seq(1987, 2018, by = 3))
Look at the marginal change through time in average word count (pool all three series) with a simple linear model.
summary(lm(nword ~ date, data = x))
#>
#> Call:
#> lm(formula = nword ~ date, data = x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -73.630 -13.805 -3.222 10.338 121.988
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -1705.3048 446.1422 -3.822 0.000165 ***
#> date 0.8912 0.2229 3.998 8.3e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 27.17 on 264 degrees of freedom
#> Multiple R-squared: 0.05709, Adjusted R-squared: 0.05352
#> F-statistic: 15.98 on 1 and 264 DF, p-value: 8.298e-05
There is about a 40% increase in average word count per novel across the three series from the the first TNG novel in 1987 through 2017. In the case of the Voyager novels, however, the average word count approximately doubles and does so over a short production run. It is worth noting that as word count has trended upward noticeably, novels from these series have also been published less frequently. If you have read many of these book in paperback form, both the oldest and the more recent ones, this should not be a surprising result. Many of the newer releases have notably more pages and smaller font than the novels from the earlier years.
In this next example, consider all books from all series available in
stBooks
, with the exception of Omnibus editions and those
falling into the reference category. This example looks at cumulative
total word count over time for all the selected books. For plot labeling
purposes, this time divide the word count by one million.
x <- mutate(stBooks, date = decimal_date(as.Date(date)), nword = nword / 1e6) %>%
filter(series != "REF" & !grepl("Omnibus", title)) %>%
arrange(date) %>%
mutate(total_words = cumsum(nword))
x
#> # A tibble: 732 × 12
#> title author date publisher identifier series subseries nchap nword nchar
#> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <int> <dbl> <int>
#> 1 Banta… James… 1967 Amereon … 978084880… TOS NA NA 0.0411 235524
#> 2 Banta… James… 1968. Amereon … 978084880… TOS NA NA 0.0405 232094
#> 3 Banta… James… 1969. Bantam B… 978055312… TOS NA NA 0.0389 224369
#> 4 Banta… James… 1970. Bantam B… 978055310… TOS NA 9 0.0361 207001
#> 5 Banta… James… 1971. Bantam B… 978055312… TOS NA NA 0.0416 239859
#> 6 Banta… James… 1972. Bantam B… 978055314… TOS NA NA 0.0430 247985
#> 7 Banta… James… 1972. Bantam B… 978055313… TOS NA NA 0.0428 248923
#> 8 Banta… James… 1972. Bantam B… 978055313… TOS NA NA 0.0464 268309
#> 9 Banta… James… 1973. Bantam B… 978055312… TOS NA NA 0.0533 307648
#> 10 Banta… James… 1974. Bantam B… 978055312… TOS NA NA 0.0533 314575
#> # ℹ 722 more rows
#> # ℹ 2 more variables: dedication <chr>, total_words <dbl>
The plot will be labeled with series abbreviations, so a key is needed for clarity. In order to label points on the plot marking the onset of a new novel series, take the first entry in the data frame for each series. A bit of theme customization is required for the table grid object that will display the key as an inset figure.
tab_theme <- gridExtra::ttheme_default(
core = list(fg_params = list(cex = 0.5), padding = unit(c(2, 2), "mm")),
colhead = list(fg_params = list(cex = 0.5)))
series <- group_by(x, series) %>% slice(1) %>% ungroup() %>%
inner_join(stSeries, by = c("series" = "abb")) %>%
select(series, id, date, total_words)
series
#> # A tibble: 20 × 4
#> series id date total_words
#> <chr> <chr> <dbl> <dbl>
#> 1 AV Abramsverse 2009. 41.4
#> 2 CT Tales from the Captain's Table Anthology 1998. 17.4
#> 3 DS9 Deep Space Nine 1993. 8.97
#> 4 DSC Discovery 2018. 52.1
#> 5 ENT Enterprise 2002. 24.5
#> 6 KE Klingon Empire 2004. 30.1
#> 7 MISC Miscellaneous 1995. 11.7
#> 8 NF New Frontier 1997. 15.7
#> 9 SCE Starfleet Corps of Engineers 2001. 22.2
#> 10 SGZ Stargazer 2002. 25.6
#> 11 SKR Seekers 2015. 48.1
#> 12 SNW Strange New Worlds Anthology 1998. 17.5
#> 13 ST All-Series/Crossover 2000. 20.6
#> 14 SV Shatnerverse 1998. 17.2
#> 15 TLE The Lost Era 2004. 28.5
#> 16 TNG The Next Generation 1988. 4.01
#> 17 TOS The Original Series 1967 0.0411
#> 18 TTN Titan 2005. 34.0
#> 19 VAN Vanguard 2006. 34.7
#> 20 VOY Voyager 1996. 13.2
key <- bind_rows(series, tibble(series = "YA-", id = "Young adult books")) %>%
mutate(id = gsub(" Anthology", "", id)) %>%
select(1:2) %>%
setNames(c("Label", "Onset of book series")) %>%
tableGrob(rows = NULL, theme = tab_theme)
Finally, create the plot.
brks <- c(1967, seq(1970, 2015, by = 5), 2018)
ggplot(x, aes(date, total_words)) +
geom_step(size = 1, color = "gray30") +
geom_point(data = series, shape = 21, fill = "#FF3030", size = 3) +
geom_label_repel(data = series, aes(label = series), color = "white", fill = "#FF3030CC",
segment.color = "black", fontface = "bold", label.size = NA) +
labs(title = "Star Trek novel cumulative word count",
subtitle = "By publication date, excluding reference and omnibus titles",
x = "Publication date",
y = "Cumulative word count (Million words)") +
theme_minimal() +
scale_x_continuous(expand = c(0, 0), breaks = brks + 0.5, labels = brks) +
scale_y_continuous(expand = c(0, 0)) +
coord_cartesian(xlim = c(1966.5, 2019)) +
annotation_custom(key, xmin = 2007, ymax = 35)
The golden age of licensed Star Trek novel publishing is clear. The 1990s and 2000s saw the creation of many new series as well of regular publication of new novels from existing ones. The current trajectory appears similar to the early 1990s, at least in terms of total words written. Data up to date only through 2017.