TWITCH ADS STUDY

cover

Final project for the Cloud & Big Data course 2018-2019 at Universidad Complutense de Madrid. Using the Hyperspace template under Creative Commons.

Dataset

We want to improve the ad system in Twitch. As an example, we worked on this dataset from February 2015.

Tools

We have deployed a Hadoop cluster with Spark on Amazon Web Services to help us process all this data.

Results

You can check our results in graphics by day, or download the raw data obtained during the proccess.

What we do

As a part of the Cloud & Big Data course, we've been learning the basics of Spark programming. This is our main work pipeline during the project.

Spark + Python

First, we develop a general script in Python3 that extracts the data we need from every file of the dataset.

Amazon Web Services

We need the power of the cloud, so we run our virtual machine instances on Amazon Web Services' EC2.

Adjustments

We expect peek performance, so we need to tweak a little bit the configuration of our cluster.

Obtaining Results

After the dataset is processed, we retrieve the resulting data and organize it in our laptops.

Improvements

We obtain A LOT of data, so we need a second Python script to make the results more accesible.

Make it nice!

Finally, we can share the results in this very web site, using daily charts and enabling the resulting raw data download.

Get in touch

We are a student group from de Game Development Degree (third course) from Universidad Complutense de Madrid. You can contact us filling this form, or you can check out our GitHub repository!