Sunday, December 6, 2020

Colorizing points in a base R plot

Colorizing points in a base R plot

By default, base \(R\) plot uses hollow circles for points, perfectly adequate for a single data set, but less so for multivariate data because the edges are too thin for color to stand out well. My go-to option: set the pch argument to 16 and the col argument to the color of my choice.

Background

pch is the argument that specifies the shape of a point in a plot. The three basic selections for a circle shape are:

pch colorizing options
1 (or the default when pch is not given) the edge color can be changed but not the interior
16 (a so-called “solid circle”) the interior can be changed but not the edge
21 (a so-called “filled circle”) the interior and the edge can be different

The pch=16 and pch=21 colorizing options apply to other shapes that also fall into their respective “pch groups”: 15-20 for solid shapes and 21-25 for filled shapes. To see which shapes correspond to which pch value, check out help("points") as well as many posts on the web such as this one.

Circles and squares are illustrated in light and dark backgrounds below.1 IMO, points “pop” from their interior, not from their edge.2

Notice how

  • For the default (pch == 1)
    • the interior color cannot be changed and is always transparent so the background always shows through
    • the default edge color is black so the point virtually disappears on a dark background
  • For solid shapes (pch in 15:20)
    • col specifies both the interior and edge colors, necessarily the same
    • bg – specified or not – has no impact
  • For filled shapes (pch in 21:25)
    • col specifies the edge color
    • bg specifies the interior color, defaulting to “transparent” if unspecified3

Example

Here is a bivariate example using the mtcars dataset in \(R\).4

green = "#228833"
magenta = "#AA3377"
# build plot title -- see stackoverflow citation in footnote
a = quote(paste("miles per gallon vs displacement (i"))
b = quote(n^3)
c = quote(")")
e <- substitute(a * b * c, list(a = a, b = b, c = c))
with(mtcars, 
     plot(disp, mpg
          , pch = 16
          , col = c(green, magenta)[as.numeric(vs)+1]
          , main = e
          )
     )
legend("topright", c("v-engine", "straight-block"), col = c(green, magenta), pch = 16)

Not only do smaller engines get better gas mileage, but high-displacement straight-blocks were nonexistent in 1973.

Bottom line

Use base \(R\)’s default black circles to quickly visualize sequential data.

For colored circles use pch = 16 and col = color_of_your_choice.

Use pch = 21 when it is useful to differentiate a point’s edge from its interior.

Or try pch = "." for dots when you have many points but don’t want lines.

Postscript

The complementary colors green (#228833) and magenta (#AA3377) used in this post come from Paul Tol’s color-blind friendly muted color palette.5 Tol’s “Notes” page is worth visiting for other helpful colorizing advice. This non-color-blind author would be interested in reader feedback regarding the distinguishability of the colors used in this post.


The end of business today, 12/06/2020, marks 251.194 months since the end of the last millenium.
Generated with Rmarkdown in RStudio.


R Environment

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mondate_0.10.01.02

loaded via a namespace (and not attached):
 [1] compiler_4.0.3  magrittr_1.5    tools_4.0.3     htmltools_0.5.0
 [5] yaml_2.2.1      stringi_1.4.6   rmarkdown_2.3   knitr_1.28     
 [9] stringr_1.4.0   xfun_0.14       digest_0.6.25   rlang_0.4.6    
[13] evaluate_0.14  

  1. Regarding the color of the default background, the italicised phrases are from \(R\) help pages: normally “white” from help("par"), often transparent from help("frame")↩︎

  2. Called “border” in \(R\)↩︎

  3. Per help("par"): “For many devices the initial value [of the plot background] is set from the bg argument of the device, and for the rest it is normally "white".”↩︎

  4. Technique for superscript in title from https://stackoverflow.com/questions/34193276/concatenate-several-math-expressions-in-r↩︎

  5. For additional perspectives on color-impaired visualizations, see https://thenode.biologists.com/data-visualization-with-flying-colors/research/ and https://venngage.com/blog/color-blind-friendly-palette/↩︎

No comments:

Post a Comment

Colorizing points in a base R plot

Bright palette - Paul Tol Colorizing points in a base R plot Colorizing...