Blogging with RStudio - an example with the 2020 KSUCrops team data

1) Script description

Hey there.

The exercise below is intended as a demonstration on how to use RStudio to create blogs and blog posts.

In fact, though we will be using RStudio as our platform, behind the scenes we are also using the following software:

  • blogdown package, which is an R API for implementing Hugo

  • Netlify, which is the host domain I chose to use (free!)

I wanted to thank Rachel Veenstra for gathering the data we will be using. Thanks Rachel!

The intended audience is the 2020 KSUCrops team members. Thus, I decided to use and explore our team’s demographics data.

The specific objectives of this script are to:

  1. Import a dataset from github.

  2. Do some quick data wrangling before each plot.

  3. Create plots (using the ggplot2 package) showing how our team can be categorized by gender, title, and country of origin.

  4. Upload the new post and build the blog.

2) Setup

Here we load all necessary packages to be used during this exercise.

The packages below need to be installed the first time if you haven’t yet.

To install a package, use the function install.packages("packagename").

2.1) Packages description

Package RCurl is needed in order to import the dataset that we will work with from my personal github account.

Package tidyverse actually contains a group of packages that follow the same syntax standard. It includes the package ggplot2, which is what we will be using the most for this exercise.

# Setting global chunk options
knitr::opts_chunk$set(echo = TRUE, tibble.width=Inf,
                      fig.path = "static")

# Loading necessary packages
library(RCurl)
library(tidyverse)
library(ggthemes)

2.2) mytheme

I will be using the same plot style for all plots. Thus, it is easier to create this style once, and just reuse it.

Below, I am creating the object mytheme, which contains my plotting style choices.

mytheme <- theme_bw()+
  theme(legend.position = "none",
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        text = element_text(size = 20))

2.3) Data import

Let’s import the dataset. This dataset is saved on my personal github account, so we are going to use the getURL() function to save its path into the df_path object.

After that point, we just use the read_csv() function as we would normally.

Here, we are saving the data into an object called team.

# Saving the URL path into the object df_path
df_path<- getURL("https://raw.githubusercontent.com/leombastos/datasets/master/KSUCropsTeam.csv")


# Reading the data file and saving it to an object called corn
team<- read_csv(df_path)
## Rows: 24 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Name, Title, Country, Gender
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The dataset has been imported.

3) Data exploration

Now, let’s explore it a bit to understand its current structure.

# Printing team
team
## # A tibble: 24 × 4
##    Name        Title            Country   Gender
##    <chr>       <chr>            <chr>     <chr> 
##  1 Adhemar     Visiting Scholar Brazil    Male  
##  2 Mario N.    Visiting Scholar Brazil    Male  
##  3 Rafael      Visiting Scholar Brazil    Male  
##  4 Lucas       Visiting Scholar Brazil    Male  
##  5 Luiz Felipe Visiting Scholar Brazil    Male  
##  6 Juan        Visiting Scholar Argentina Male  
##  7 Axel        Visiting Scholar Argentina Male  
##  8 Valentina   Research Scholar Argentina Female
##  9 Leticia     Research Scholar Brazil    Female
## 10 Constanza   Research Scholar Argentina Female
## # … with 14 more rows

This dataset has the columns Name, Title, Country, and Gender for each team member.

Column Name includes only first name for privacy purposes. When a name appeared more than once, the last name inital was added to differentiate between them.

Now, let’s create some plots!

4) Plotting demographics

4.1) Gender

Let’s first do a plot on how our team can be categorized by Gender.

# Wrangling
team %>%
  group_by(Gender) %>%
  summarise(N=length(Name)) %>%
  # Plotting
  ggplot(aes(x=Gender, y=N, fill=Gender))+
  geom_bar(stat="identity", color="black")+
  geom_text(aes(y=N-.5, label=N), size=6)+
  mytheme

Over all 24 team members, 8 are female and 16 are male.

4.2) Title

Now let’s do a plot on how our team can be categorized by Title

# Wrangling
team %>%
  group_by(Title) %>%
  summarise(N=length(Name)) %>%
  # Plotting
  ggplot(aes(x=reorder(Title, desc(N)), y=N, fill=Title))+
  geom_bar(stat="identity", color="black")+
  geom_text(aes(y=N-.5, label=N),size=6)+
  labs(x="Title")+
  mytheme+
  theme(axis.text.x = element_text(angle=45, hjust=1),
         plot.margin = unit(c(.2,.2,.2,1), "cm")) 

The largest Title categories are M.S. Student and Visiting Scholar, with 7 members each.

Cool!

4.3) Country

Finally, let’s do a plot on how our team can be categorized by Country.

# Wrangling
team %>%
  group_by(Country) %>%
  summarise(N=length(Name)) %>%
  mutate(Code=c("ar", "br", "us")) %>%
  # Plotting
  ggplot(aes(x=reorder(Country, desc(N)), y=N, fill=Country))+
  geom_bar(stat="identity", color="black")+
    #geom_flag(y = -1.5, aes(country = Code), size = 14)+
  geom_text(aes(y=N-.5, label=N), size=6)+
  labs(x="")+
  scale_fill_manual(values=c("dodgerblue1","green4","red2"))+
  mytheme+
  theme(plot.margin = unit(c(0.2,.2,1,.2), "cm"))

That looks cool! Over all 24 team members, 11 are from Argentina, 10 from Brazil, and 3 from the U.S.

Now, let’s upload this post!

Check out the YouTube video below for a step-by-step demonstration on how to create the post in RStudio and upload it via Netlify.

5) Resources

A quick google search will give you a lot of resources for using blogdown, RStudio, Hugo and its different themes, including the Academic.

Here are a couple of websites that can help you get started:

6) Summary

This script demonstrated how to:

  • Import a dataset from github
  • Use the combo group_by()/summarise() to extract number of observations within different groups.
  • Use the package ggplot2 package to create bar plots.
  • Use RStudio to create and, in my case, Netlify to upload the post and build the blog.

I hope you have enjoyed this post!

Please let me know in the comments below if you have any questions about this script, any suggestions to improve it, and any suggestions for future posts.

Happy blogging!

Leonardo M. Bastos
Leonardo M. Bastos
Assistant Professor, Integrative Precision Agriculture
comments powered by Disqus

Related