Background and Motivation
The ongoing COVID-19 pandemic, also known as the coronavirus, has fundamentally changed the way in which people live their lives. There have been multiple lockdowns and changes to working patterns, but ultimately people were given a clear message: stay at home.
With people spending more time at home, people will have spent more time enjoying hobbies that they may have, for example playing computer games. SteamDB is a third party online tool and database providing a huge amount of data about the video game distribution service Steam. Particularly of interest to me is the data about the number of daily concurrent users and daily users actively playing a game, which can be accessed here.
Setup and Data Origins
First we’ll load the packages that we’ll be using for this project.
Next we need to import the data.
And we’ll then have a look at the first few lines of the data.
## DateTime Users In.Game
## 1 2004-01-13 00:00:00 84998 NA
## 2 2004-01-14 00:00:00 NA NA
## 3 2004-01-15 00:00:00 NA NA
## 4 2004-01-16 00:00:00 NA NA
## 5 2004-01-17 00:00:00 NA NA
## 6 2004-01-18 00:00:00 NA NA
And the last few lines.
## DateTime Users In.Game
## 6261 2021-03-04 00:00:00 25226331 5916271
## 6262 2021-03-05 00:00:00 25783270 6460061
## 6263 2021-03-06 00:00:00 26142235 7200263
## 6264 2021-03-07 00:00:00 26117290 7153865
## 6265 2021-03-08 00:00:00 25630407 6142075
## 6266 2021-03-09 00:00:00 25267481 5924385
Apart from the names of the variables this dataset is already relatively well organised, which is quite helpful. It includes a date and time spanning back to January 2004, as well the number of concurrent online users and users in game for each date. This analysis will be focusing solely on the year 2020.
Aims
There are several questions I wanted to explore with this mini project:
Did COVID-19 cause a noticeable change in the number of users and users in game on Steam when it was declared a pandemic, countries began entering national lockdwons etc.
If there was a change, was it sustained throughout 2020 as countries began to ease restrictions and people began to return to work?
Data Preparation
We’ll start by renaming the columns to make them more suitable and in a better format.
We’ll also remove the “time” part of the data in the date column so that it is easier to manage later.
For ease, we’ll change values listed as ‘NA’ in the dataset to ‘0’.
df <- df %>%
mutate(users = replace(users,is.na(users),0))
df <- df %>%
mutate(in_game = replace(in_game,is.na(in_game),0))
One thing of note is that the number of users we are dealing with are very large, up to in the tens of millions. This could cause issues when it comes to visualisation of the data. Therefore it may be worth changing the values into something more manageable, such as having values in millions e.g. 14 million as opposed to 14,000,000.
Now if we view the last few lines of the data, we can see that the column names are more suitable and in a better format, and the values of users and in_game are much more manageable.
## date users in_game
## 6261 2021-03-04 25.22633 5.916271
## 6262 2021-03-05 25.78327 6.460061
## 6263 2021-03-06 26.14223 7.200263
## 6264 2021-03-07 26.11729 7.153865
## 6265 2021-03-08 25.63041 6.142075
## 6266 2021-03-09 25.26748 5.924385
For this purpose of this visualisation we are only interested in the values for each day in the year 2020, so we can exclude all other values. Although I am going to keep the original dataframe.
Data Visualisation
A line graph will be the most suitable way of visualising this data.
# Drawing the plot
g <- ggplot(df2, aes(x = date)) +
geom_line(aes(y = users), colour = "#0072B2") + # Drawing two seperate lines for users and users in game.
geom_line(aes(y = in_game), color = "#D55E00") +
# Customising and theming the plot
geom_vline(xintercept = as.Date("2020-03-13"), colour = "red") + # This adds a vertical line on the plot,
annotate(geom="text", x = as.Date("2020-07-01"), y = 15, # and this line annotates it with text.
label = "COVID-19 is declared a pandemic by the WHO", fontface = "bold") +
scale_x_date(date_breaks = "1 month",
date_minor_breaks = "1 week",
date_labels = "%b %Y") +
# With the package ggtext we can get the line colours in the title using HTML syntax.
labs(title = "How the number of <span style='color:#0072B2;'>Steam users</span>
and <span style='color:#D55E00;'>users in game</span> changed throughout 2020.",
caption = "Visualisation by Jarod Wilson | Data taken from www.SteamDB.com",
y = " Users in millions") +
dark_theme_bw() + # This uses the ggdark package to theme the plot with darker colours, which I personally prefer.
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank(),
plot.title = element_markdown()) # This line is necessary to get colours in the title of the plot.
g
Conclusions
In relation to the questions I wanted to explore with this visualisation:
There was a noticeable increase in the number of users and users in game on Steam as COVID-19 was declared a pandemic and in the months afterwards. This likely reflects a greater number of people looking for ways to spend their time as they stayed at home.
The number of users and users in game started to decrease at the beginning of May, this likely reflects some countries beginning to unlock and ease some restrictions. This decrease continued from June through to August, but began to increase again from September onowards; likely reflecting countries re-introducing restrictions as the second wave of the virus hit.
There also appears to be some of what I am going to call a “weekend effect.” In January and February 2020 the number of users spiked regularly before returning to roughly the same amount; and the same can be seen for users in game throughout 2020. Closer inspection of the data reveals that these spikes occured on Fridays, Saturdays and Sundays. This suggests that at weekends more people are using Steam and playing games on it, which makes logical sense! I think this is a good example of how visulisation of data can reveal patterns that we may not have otherwised noticed.
Extras
I wanted to use this mini project as an opportunity to try and expand my skills using R and ggplot2
. I particularly wanted to create a visualisation that has an element of interactivity to it, to try and do this I am going to use the plotly
and gganimate
packages.
The plotly
package helps improve the interactivity of ggplot2
data visualisations with a number of features including allowing users to hover over data points, zoom into specific areas, pan back and forth through time etc.
The gganimate
package allows for animation of ggplot2
visualisation in a number of ways. For this example we are going to animate the plot based on time.