A typical flyover
workflow has the following steps:
This vignette will introduce alternative plotting
functions for step 2. While there are some built-in
flyover
plotting functions ready to use, there will be
times that you need something customized. You can write your own
plotting function to pass into the build_plots
function.
The data used in this example is the same as in the previous vignettes.
str(my_data)
## tibble [200 × 10] (S3: tbl_df/tbl/data.frame)
## $ source: chr [1:200] "old" "old" "old" "old" ...
## $ norm : num [1:200] 1.586 1.709 0.891 0.547 1.606 ...
## $ exp : num [1:200] 0.1772 0.0858 0.3273 0.5311 2.5615 ...
## $ chisq : num [1:200] 4.53 6.54 4.92 3.55 2.71 ...
## $ lnorm : num [1:200] 1.024 0.619 1.016 4.919 1.054 ...
## $ gamma : num [1:200] 0.866 0.89 0.549 1.02 0.948 ...
## $ alpha : chr [1:200] "c" "b" "a" "d" ...
## $ hilo : chr [1:200] "high" "low" "low" "low" ...
## $ tf : logi [1:200] TRUE FALSE TRUE TRUE FALSE TRUE ...
## $ fruit : chr [1:200] "pear" "pear" "apple" "pear" ...
flyover
plotting functions
The built-in plotting functions that ship with this package are
little more than wrappers for ggplot2
plots. For example,
this is the body of the function flyover_histogram
:
## function (tbl, var, group_var, ...)
## {
## ggplot(tbl, aes_string(x = var, fill = group_var)) + geom_histogram(position = "identity",
## alpha = 0.5, ...) + theme_minimal(base_size = 14)
## }
## <bytecode: 0x55c448f35e30>
## <environment: namespace:flyover>
Important features to note are the following:
tbl
containing all the data
to plot;...
) provide further modification to the
plot. These modifications can be specified in the call to
build_plots
and are passed down to these plotting
functions;ggplot2
.When you write a custom plotting function, this is the general
structure you should follow for compatibility with the workings of
build_plots
. Of course, you are free to use any themes or
other customizations you like.
Let’s say we wish to compare distributions not through the use of
histograms, but with empirical cumulative distribution functions
(ECDFs). After some research on the ggplot2
website, we find the
right syntax.
We can define the function for use in flyover
as
follows:
custom_ecdf <- function(tbl, var, group_var, ...) {
ggplot(tbl, aes_string(x = var, color = group_var)) +
stat_ecdf(size = 2, ...) +
theme_minimal(base_size = 14)
}
We can now pass the custom plot function to build_plots
in the same way as a built-in flyover
function, with
one important addition. The build_plots
function
doesn’t have any knowledge about what kind of data our custom plot
operates on. We need to supply this knowledge via the
keep_type
argument, which tells build_plots
which type of variables to keep before plotting. You must pass either
"numeric"
or "categorical"
. Since
custom_ecdf
operates on numeric data, that’s what we
choose.
my_ecdf <- build_plots(my_data, custom_ecdf, "source", keep_type = "numeric")
build_display(my_ecdf,
display_name = "custom ECDF",
output_dir = "display-ecdf")
When you write a custom plotting function, you are free to transform
the data however you like prior to plotting. This is the strategy
employed by, e.g., flyover_na_percent
in order to
calculate the percent of missing values. Note that you will still be
required to operate on a single column at a time to remain compatible
with the framework of the package.
## function (tbl, var, group_var, ...)
## {
## tbl_grp <- dplyr::group_by_at(tbl, group_var)
## summary <- dplyr::summarize_at(tbl_grp, vars(var), flyover:::percent_na)
## ggplot(summary, aes_string(x = group_var, y = var, group = 1)) +
## geom_point(stat = "summary", fun = sum) + geom_line(stat = "summary",
## fun = sum, ...) + theme_minimal(base_size = 14) + scale_y_continuous(limits = c(0,
## 100)) + labs(x = group_var, y = "Percent NA")
## }
## <bytecode: 0x55c4492b9ef0>
## <environment: namespace:flyover>