A Practical Guide for Exploratory Data Analysis: Movies on Streaming Platforms

  • 时间: 2020-06-05 07:31:53

Exploring movies on Netflix, Hulu, Prime Video, and Disney+.

Photo by Thibault Penin on Unsplash

We live in the era of big data. We can collect lots of data which allows to infer meaningful results and make informed business decisions. To get the most out of data, a robust and thorough data analysis process is needed. In this post, we will try to explore a dataset about the movies on streaming platforms. The data is available here on Kaggle.

We can directly download Kaggle datasets into a Google Colab environment. Here is a step-by-step explanation on how to do it:

Let’s start with downloading the dataset and read it into a pandas dataframe.

import pandas as pdimport numpy as np!kaggle datasets download -d ruchi798/movies-on-netflix-prime-video-hulu-and-disneydf = pd.read_csv("/content/movies-on-netflix-prime-video-hulu-and-disney.zip")df.drop(["Unnamed: 0", "ID", "Type"], axis=1, inplace=True)df.head()

“Unnamed: 0” and “ID” columns are redundant as they do not provide any information about the movies so we drop them. “Type” column indicates whether the title is a movie or TV show. We drop them because all of the rows contain data on movies.