Recommended Datasets for Course Project

Use this thread to find and share interesting datasets. You can look for datasets at the following places:

Some interesting datasets


Inspirational project indeed!


I keep getting the error message “UnicodeDecodeError” when trying to create a DataFrame from my ‘candidate csv’ file for the project using pd.read_csv() . I am kinda frustrated! May any one help out please?

1 Like

@vincent-kizza try this

import pandas as pd
df = pd.read_csv(‘file_name.csv’, engine=‘python’)

1 Like

You can try using pd.read_csv('file.csv', encoding="utf-8")

if that doesn’t work, please post the entire traceback that you are getting?

1 Like

I played around with the Steam API and grabbed some information about my playtimes on there:

Because the playtime table only contained appids, I merged it with the table for appnames.

Check out the comparison I’ve made between Windows and Linux hours played! Turns out that if I played a game both on Linux and Windows, I usually used Linux more than Windows!
Obviously I’m hardly scratching the surface with this small exercise, for example I did not download any pricing information to add to my dataset which could be done in the future!


With your suggestion, I now get the following traceback

File “”, line 2
SyntaxError: invalid character in identifier

With you suggestion, I get the following traceback

File “”, line 2
SyntaxError: invalid character in identifier

For some reason the quotation marks in the code are not the standard ASCII quotation marks. Python is interpreting those as invalid characters.

I’m not sure which keyboard or language, you are using, but you need to make sure they are the following:

single quotation mark: '
double quotation mark: "

1 Like

I think this happens because someone is being lazy and copies directly from forum.

There must be some different encoding regarding to quotes or something :stuck_out_tongue:

1 Like

I’m having problems with downloading data from this link I both want to download the googleplaystore.csv and googleplaystore_user_reviews.csv, but i don’t know how to download data from kaggle. Is there anyway to import file from notebooks.

Given a column with values as 10,000,100 or 1000+ how to convert these into and integer and put back into same column

You can use IO library.
import io

and replace file read code with
with, 'r', encoding="utf-8") as raw_data:

It will work.

Can we use the dataset one or two years old, I mean up to 2018 or 2019? Or should we use the latest dataset?

as we learnt how to drop columns, is it possible to drop rows in a data frame? if yes how to?

This is how to drop row 1 in name_df

name_df.drop(1, axis=0)

Multiple rows

rows_list = [1,2,3,4,5] 
name_df.drop(rows_list, axis=0)