Found springboard blog https://www.springboard.com/blog/ which is interesting and also the article “19 Free Public Data Sets for Your Data Science Project” https://www.springboard.com/blog/free-public-data-sets-data-science-project/ . Also found this site http://insideairbnb.com/get-the-data.html that has lots of data for Airbnb and I can do data analysis from there. It will be really interesting and helpful because it is real data from Airbnb ! I downloaded the data from Athens and there are so many things to see and do!
Also I watched some videos on Data analysis Course by Jeff Leek up to video 17 and I learned to download data in R directly from websites, which is a very useful thing to know. Finally I continued the tutorial “Head Start for Data Scientist” on Kaggle, up to exploratory data analysis. I think this is the best way to learn data science because you are actually doing it and learn from other users and Kernels.
Finally did exercise 10 -Unit 3, from the book “An introduction to statistical learning”. Here is the code I used
#Excercise 10, Page 123
#In the regression the Price has a negative coefficient which indicates that the
#price has an effect on sales(It’s difficult to sell more expensive houses)
#Also R has created a UrbanYes variable which takes the values of 1 if the house is
#in a urban area and 0 otherwise. We can see that it has a negative effect on
#sales ,but it is not significant in the regression model we created.
#Finally R has created a USYes variable which takes the values of 1 if the house is
#in US and 0 otherwise.This shows that sales are affected
#by the location of the house, if the house is in US this makes it easier
#to be sold.
#equation form y= 13.043-0.054459*x1-0.021916*x2+ 1.200573*x3+ε
#we can reject the null hypothesis for the intercept,Price,and US
#The first model had a 41.52 R-squared and the second 62.43. The second
#model fits the data better.
#We can see that there is a high leverage observation that affects the model
#which is observation no.43