Bar plots let you view categorical variables as bars with heights based on the count of records within each category or some other summary value. Stacked and grouped barplots let you subdivide each category of a bar plot by another categorical variable, allowing for deeper investigation of relationships in data.
#Rprogramming #Datavizualization #ggplot2
Code used in this clip:
Load ggplot2 and data
library(tidyverse)
data <- diamonds
colors <- c("#FFFFFF","#F5FCC2","#E0ED87","#CCDE57",
"#B3C732","#94A813","#718200")
Stacked barplot in base R
barplot(table(diamonds$color, diamonds$clarity),
col = colors,
legend = levels(diamonds$color))
Side by side barplot in base R
barplot(table(diamonds$color, diamonds$clarity), beside = TRUE,
col = colors,
legend = levels(diamonds$color))
Stacked barplot in ggplot2 R
data %>% ggplot(aes(x = clarity, fill = color)) +
geom_bar(color = "black") +
scale_fill_manual(values = colors)
Side by side barplot in ggplot2 R
data %>% ggplot(aes(x = clarity, fill = color)) +
geom_bar(color = "black", position = "dodge") +
scale_fill_manual(values = colors)
Code Clips are basic code explanations in 3 minutes or less. They are intended to be short reference guides that provide quick breakdowns and copy/paste access to code needed to accomplish common data science tasks. Think Stack Overflow with a video explanation.
* Note: YouTube does not allow greater than or less than symbols in the text description, so the code above may not be exactly the same as the code shown in the video! For R that means I may use = for assignment and the special Unicode large < and > symbols in place of the standard sized ones for dplyr pipes and comparisons. These special symbols should work as expected for R code on Windows, but may need to be replaced with standard greater than and less than symbols for other operating systems.