Background and Motivation

The ongoing COVID-19 pandemic, also known as the coronavirus, has fundamentally changed the way in which people live their lives. There have been multiple lockdowns and changes to working patterns, but ultimately people were given a clear message: stay at home.

With people spending more time at home, people will have spent more time enjoying hobbies that they may have, for example playing computer games. SteamDB is a third party online tool and database providing a huge amount of data about the video game distribution service Steam. Particularly of interest to me is the data about the number of daily concurrent users and daily users actively playing a game, which can be accessed here.

Setup and Data Origins

First we’ll load the packages that we’ll be using for this project.

Next we need to import the data.

And we’ll then have a look at the first few lines of the data.

##              DateTime Users In.Game
## 1 2004-01-13 00:00:00 84998      NA
## 2 2004-01-14 00:00:00    NA      NA
## 3 2004-01-15 00:00:00    NA      NA
## 4 2004-01-16 00:00:00    NA      NA
## 5 2004-01-17 00:00:00    NA      NA
## 6 2004-01-18 00:00:00    NA      NA

And the last few lines.

##                 DateTime    Users In.Game
## 6261 2021-03-04 00:00:00 25226331 5916271
## 6262 2021-03-05 00:00:00 25783270 6460061
## 6263 2021-03-06 00:00:00 26142235 7200263
## 6264 2021-03-07 00:00:00 26117290 7153865
## 6265 2021-03-08 00:00:00 25630407 6142075
## 6266 2021-03-09 00:00:00 25267481 5924385

Apart from the names of the variables this dataset is already relatively well organised, which is quite helpful. It includes a date and time spanning back to January 2004, as well the number of concurrent online users and users in game for each date. This analysis will be focusing solely on the year 2020.

Aims

There are several questions I wanted to explore with this mini project:

  1. Did COVID-19 cause a noticeable change in the number of users and users in game on Steam when it was declared a pandemic, countries began entering national lockdwons etc.

  2. If there was a change, was it sustained throughout 2020 as countries began to ease restrictions and people began to return to work?

Data Preparation

We’ll start by renaming the columns to make them more suitable and in a better format.

We’ll also remove the “time” part of the data in the date column so that it is easier to manage later.

For ease, we’ll change values listed as ‘NA’ in the dataset to ‘0’.

One thing of note is that the number of users we are dealing with are very large, up to in the tens of millions. This could cause issues when it comes to visualisation of the data. Therefore it may be worth changing the values into something more manageable, such as having values in millions e.g. 14 million as opposed to 14,000,000.

Now if we view the last few lines of the data, we can see that the column names are more suitable and in a better format, and the values of users and in_game are much more manageable.

##            date    users  in_game
## 6261 2021-03-04 25.22633 5.916271
## 6262 2021-03-05 25.78327 6.460061
## 6263 2021-03-06 26.14223 7.200263
## 6264 2021-03-07 26.11729 7.153865
## 6265 2021-03-08 25.63041 6.142075
## 6266 2021-03-09 25.26748 5.924385

For this purpose of this visualisation we are only interested in the values for each day in the year 2020, so we can exclude all other values. Although I am going to keep the original dataframe.

Data Visualisation

A line graph will be the most suitable way of visualising this data.

Conclusions

In relation to the questions I wanted to explore with this visualisation:

  1. There was a noticeable increase in the number of users and users in game on Steam as COVID-19 was declared a pandemic and in the months afterwards. This likely reflects a greater number of people looking for ways to spend their time as they stayed at home.

  2. The number of users and users in game started to decrease at the beginning of May, this likely reflects some countries beginning to unlock and ease some restrictions. This decrease continued from June through to August, but began to increase again from September onowards; likely reflecting countries re-introducing restrictions as the second wave of the virus hit.

There also appears to be some of what I am going to call a “weekend effect.” In January and February 2020 the number of users spiked regularly before returning to roughly the same amount; and the same can be seen for users in game throughout 2020. Closer inspection of the data reveals that these spikes occured on Fridays, Saturdays and Sundays. This suggests that at weekends more people are using Steam and playing games on it, which makes logical sense! I think this is a good example of how visulisation of data can reveal patterns that we may not have otherwised noticed.

Extras

I wanted to use this mini project as an opportunity to try and expand my skills using R and ggplot2. I particularly wanted to create a visualisation that has an element of interactivity to it, to try and do this I am going to use the plotly and gganimate packages.

The plotly package helps improve the interactivity of ggplot2 data visualisations with a number of features including allowing users to hover over data points, zoom into specific areas, pan back and forth through time etc.

The gganimate package allows for animation of ggplot2 visualisation in a number of ways. For this example we are going to animate the plot based on time.

I really like both of these packages! They both add another level of detail to visualisations, and elements of interactivity arguably make them more engaging for some people.

With more time I would have liked to tweak a few things, such as changing the labels that appear when you hover over the lines on the first plot and perhaps adding a pause during the animation of the second.

Things I have learnt about using R

Summary

I really enjoyed this mini project! It was really interesting to analyse and visualise some data that I was interested in completely from start to finish.

I would have liked investigate other entertainment mediums such as Netflix, Amazon Prime, Spotify etc. to see how the number of users on these platforms changed throughout 2020 but sadly I could not find any available data to facilitate this. Maybe in the future.

The repo for this project can be found here.