--- title: "Labeling small strata" author: "Jason Cory Brunson" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{labeling small strata} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ## Setup This brief vignette uses the `vaccinations` dataset included in {ggalluvial}. As in [the technical introduction](http://corybrunson.github.io/ggalluvial/articles/ggalluvial.html), the order of the levels is reversed to be more intuitive. Objects from other {ggplot2} extensions are accessed via `::` and `:::`. ```{r setup} knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.align = "center") library(ggalluvial) data(vaccinations) vaccinations <- transform(vaccinations, response = factor(response, rev(levels(response)))) ``` ## Problem The issue on the table: Strata are most helpful when they're overlaid with text labels. Yet the strata often vary in height, and the labels in length, to such a degree that fitting the text inside the strata at a uniform size renders them illegible. In principle, the user could treat `size` as a variable aesthetic and manually fit text to strata, but this is cumbersome, and doesn't help anyway in cases where large text is needed. To illustrate the problem, check out the plot below. It's by no means an egregious case, but it'll do. (For a more practical example, see [this question on StackOverflow](https://stackoverflow.com/questions/50720718/labelling-and-theme-of-ggalluvial-plot-in-r), which prompted this vignette.) ```{r raw} ggplot(vaccinations, aes(x = survey, stratum = response, alluvium = subject, y = freq, fill = response, label = response)) + scale_x_discrete(expand = c(.1, 0)) + geom_flow(width = 1/4) + geom_stratum(alpha = .5, width = 1/4) + geom_text(stat = "stratum", size = 4) + theme(legend.position = "none") + ggtitle("vaccination survey responses", "labeled using `geom_text()`") ``` ### Fix One option is to simply omit those labels that don't fit within their strata. In response to [an issue](https://github.com/corybrunson/ggalluvial/issues/27), `v0.9.2` includes parameters in `stat_stratum()` to exclude strata outside a specified height range; while few would use this to omit the rectangles themselves, it can be used in tandem with `geom_text()` to shirk this problem, at least when the labels are concise: ```{r omit} ggplot(vaccinations, aes(x = survey, stratum = response, alluvium = subject, y = freq, fill = response, label = response)) + scale_x_discrete(expand = c(.1, 0)) + geom_flow(width = 1/4) + geom_stratum(alpha = .5, width = 1/4) + geom_text(stat = "stratum", size = 4, min.y = 100) + theme(legend.position = "none") + ggtitle( "vaccination survey responses", "labeled using `geom_text()` with `min.y = 100`" ) ``` This is a useful fix for some cases. Still, if the goal is a publication-ready graphic, then it reaffirms the need for more adaptable and elegant solutions. Fortunately, two wonderful packages deliver with, shall we say, flowing colors. ## Solutions Two {ggplot2} extensions are well-suited to this problem: [{ggrepel}](https://github.com/slowkow/ggrepel) and [{ggfittext}](https://github.com/wilkox/ggfittext). They provide new geom layers that use the output of existing stat layers to situate text: `ggrepel::geom_text_repel()` takes the same aesthetics as `ggplot2::geom_text()`, namely `x`, `y`, and `label`. In contrast, `ggfittext::geom_fit_text()` only specifically requires `label` but also needs enough information to determine the rectangle that will contain the text. This can be encoded as `xmin` and `xmax` or as `x` and `width` for the horizontal direction, and as `ymin` and `ymax` or as `y` and `height` for the vertical direction. Conveniently, `ggalluvial::stat_stratum()` produces more than enough information for both geoms, including `x`, `xmin`, `xmax`, and their vertical counterparts. All this can be gleaned from the `ggproto` objects that construct the layers: ```{r aesthetics} print(ggrepel::GeomTextRepel$required_aes) print(ggfittext:::GeomFitText$required_aes) print(ggfittext:::GeomFitText$setup_data) print(StatStratum$compute_panel) ``` I reached the specific solutions through trial and error. They may not be the best tricks for most cases, but they demonstrate what these packages can do. For many more examples, see the respective package vignettes: [for {ggrepel}](https://CRAN.R-project.org/package=ggrepel/vignettes/ggrepel.html), and [for {ggfittext}](https://CRAN.R-project.org/package=ggfittext/vignettes/introduction-to-ggfittext.html). ### Solution 1: {ggrepel} {ggrepel} is most often (in my experience) used to repel text away from symbols in a scatterplot, in whatever directions prevent them from overlapping the symbols and each other. In this case, however, it makes much more sense to align them vertically a fixed horizontal distance (`nudge_x`) away from the strata and repel them vertically from each other (`direction = "y"`) just enough to print them without overlap. It takes an extra bit of effort to render text _only_ for the strata at the first (or at the last) axis, but the result is worth it. ```{r ggrepel} ggplot(vaccinations, aes(x = survey, stratum = response, alluvium = subject, y = freq, fill = response)) + scale_x_discrete(expand = c(.4, 0)) + geom_flow(width = 1/4) + geom_stratum(alpha = .5, width = 1/4) + scale_linetype_manual(values = c("blank", "solid")) + ggrepel::geom_text_repel( aes(label = ifelse(as.numeric(survey) == 1, as.character(response), NA)), stat = "stratum", size = 4, direction = "y", nudge_x = -.5 ) + ggrepel::geom_text_repel( aes(label = ifelse(as.numeric(survey) == 3, as.character(response), NA)), stat = "stratum", size = 4, direction = "y", nudge_x = .5 ) + theme(legend.position = "none") + ggtitle("vaccination survey responses", "labeled using `geom_text_repel()`") ``` ### Solution 2: {ggfittext} {ggfittext} is simplicity itself: The strata are just rectangles, so no more parameter specifications are necessary to fit the text into them. One key parameter is `min.size`, which defaults to `4` and controls how small the text is allowed to get without being omitted. ```{r ggfittext} ggplot(vaccinations, aes(x = survey, stratum = response, alluvium = subject, y = freq, fill = response, label = response)) + scale_x_discrete(expand = c(.1, 0)) + geom_flow(width = 1/4) + geom_stratum(alpha = .5, width = 1/4) + ggfittext::geom_fit_text(stat = "stratum", width = 1/4, min.size = 3) + theme(legend.position = "none") + ggtitle("vaccination survey responses", "labeled using `geom_fit_text()`") ``` Note that this solution requires {ggfittext} v0.6.0. ## Appendix ```{r session info} sessioninfo::session_info() ```