What this article covers

A typical flyover workflow has the following steps:

  1. Combine different data sets into a single table.
  2. Apply a plotting function to the columns of the table.
  3. Build a display to navigate the plots.

This vignette will introduce alternative plotting functions for step 2. While there are some built-in flyover plotting functions ready to use, there will be times that you need something customized. You can write your own plotting function to pass into the build_plots function.

Data for this example

The data used in this example is the same as in the previous vignettes.

str(my_data)
## tibble [200 × 10] (S3: tbl_df/tbl/data.frame)
##  $ source: chr [1:200] "old" "old" "old" "old" ...
##  $ norm  : num [1:200] 1.586 1.709 0.891 0.547 1.606 ...
##  $ exp   : num [1:200] 0.1772 0.0858 0.3273 0.5311 2.5615 ...
##  $ chisq : num [1:200] 4.53 6.54 4.92 3.55 2.71 ...
##  $ lnorm : num [1:200] 1.024 0.619 1.016 4.919 1.054 ...
##  $ gamma : num [1:200] 0.866 0.89 0.549 1.02 0.948 ...
##  $ alpha : chr [1:200] "c" "b" "a" "d" ...
##  $ hilo  : chr [1:200] "high" "low" "low" "low" ...
##  $ tf    : logi [1:200] TRUE FALSE TRUE TRUE FALSE TRUE ...
##  $ fruit : chr [1:200] "pear" "pear" "apple" "pear" ...

Structure of flyover plotting functions

The built-in plotting functions that ship with this package are little more than wrappers for ggplot2 plots. For example, this is the body of the function flyover_histogram:

## function (tbl, var, group_var, ...) 
## {
##     ggplot(tbl, aes_string(x = var, fill = group_var)) + geom_histogram(position = "identity", 
##         alpha = 0.5, ...) + theme_minimal(base_size = 14)
## }
## <bytecode: 0x55c448f35e30>
## <environment: namespace:flyover>

Important features to note are the following:

  • the first argument is the tbl containing all the data to plot;
  • the second and third arguments are character strings specifying the variable to plot and the grouping variable, respectively;
  • the dots (...) provide further modification to the plot. These modifications can be specified in the call to build_plots and are passed down to these plotting functions;
  • the plotting is performed by ggplot2.

When you write a custom plotting function, this is the general structure you should follow for compatibility with the workings of build_plots. Of course, you are free to use any themes or other customizations you like.

Defining a custom plot

Let’s say we wish to compare distributions not through the use of histograms, but with empirical cumulative distribution functions (ECDFs). After some research on the ggplot2 website, we find the right syntax.

We can define the function for use in flyover as follows:

custom_ecdf <- function(tbl, var, group_var, ...) {
  ggplot(tbl, aes_string(x = var, color = group_var)) +
    stat_ecdf(size = 2, ...) +
    theme_minimal(base_size = 14)
}

Using a custom plot

We can now pass the custom plot function to build_plots in the same way as a built-in flyover function, with one important addition. The build_plots function doesn’t have any knowledge about what kind of data our custom plot operates on. We need to supply this knowledge via the keep_type argument, which tells build_plots which type of variables to keep before plotting. You must pass either "numeric" or "categorical". Since custom_ecdf operates on numeric data, that’s what we choose.

my_ecdf <- build_plots(my_data, custom_ecdf, "source", keep_type = "numeric")
build_display(my_ecdf,
              display_name = "custom ECDF",
              output_dir   = "display-ecdf")

Advanced customizations

When you write a custom plotting function, you are free to transform the data however you like prior to plotting. This is the strategy employed by, e.g., flyover_na_percent in order to calculate the percent of missing values. Note that you will still be required to operate on a single column at a time to remain compatible with the framework of the package.

## function (tbl, var, group_var, ...) 
## {
##     tbl_grp <- dplyr::group_by_at(tbl, group_var)
##     summary <- dplyr::summarize_at(tbl_grp, vars(var), flyover:::percent_na)
##     ggplot(summary, aes_string(x = group_var, y = var, group = 1)) + 
##         geom_point(stat = "summary", fun = sum) + geom_line(stat = "summary", 
##         fun = sum, ...) + theme_minimal(base_size = 14) + scale_y_continuous(limits = c(0, 
##         100)) + labs(x = group_var, y = "Percent NA")
## }
## <bytecode: 0x55c4492b9ef0>
## <environment: namespace:flyover>