10-09-2020

Worked today with Dataquest and did the statistics courses to refresh my memory. So some observations are the following:

  • Simple random sampling is not a reliable sampling method when the sample size is small. Because sample means vary a lot around the population mean, there’s a good chance we’ll get an unrepresentative sample.
  • When we do simple random sampling, we should try to get a sample that is as large as possible. A large sample decreases the variability of the sampling process, which in turn decreases the chances that we’ll get an unrepresentative sample.

To ensure we end up with a sample that has observations for all the categories of interest, we can change the sampling method. We can organize our data set into different groups, and then do simple random sampling for every group. We can group our data set by player position, and then sample randomly from each group. This sampling method is called stratified sampling, and each stratified group is also known as a stratum.

Also another question that I found useful in a Dataquest exercise (In python) is the following:

Question: Create a column in affordable_apps called affordability. It should have the value cheap if the price is lower than 5, and reasonable otherwise.

Answer: affordable_apps[“affordability”] = affordable_apps.apply(lambda x: “cheap” if x[“Price”] < 5 else “reasonable”, axis=1)

09-09-2020

Today I discovered this page with various blogs about data science, statistics, machine learning etc. https://github.com/dataquestio/data-science-blogs which is a treasure. I checked some of the blogs and I realized that statistics is a major factor in Data science. There is also the technical part of programming (R, Python etc.) but the knowledge of statistics is the basis for everything else I think.

This is a really nice article that explains precision, recall, and ROC curves https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/

Eurostat also maintains an official website, a wiki, presenting all statistical topics in an easily understandable way: Statistics Explained (SE, http://ec.europa.eu/eurostat/statistics-explained/index.php/Main_Page). Really interesting!

Here is the HTML basic commands https://developer.mozilla.org/en-US/Learn/Getting_started_with_the_web/HTML_basics . I wrote a simple code after reading the article and I created a simple page .

Cascading Style Sheets, or CSS, is a language for adding styles to HTML pages.  https://developer.mozilla.org/en-US/docs/Learn/CSS/First_steps .

31-08-2020

It’s been a long time since I last wrote. During the summer I was studying mainly for the Musicology University and I have approximately 12 more lessons to finish. However, I managed to complete the Coursera Machine learning course , which was a bit difficult, but also very interesting. I try to decide if it’s worth buying the certificate (It is 70 euros), but I think that it will be good for my CV because it is a course that demands time and commitment for someone to complete.

 I have started some days ago to continue learning SQL and I use Dataquest. My plan is to finish all the courses on Dataquest (on the data analyst path) and afterwards make projects from Kaggle and upload them to Github.

I hope this year I will learn a lot more about data science and statistics and I will build a very interesting portofolio!

25-05-2020

I’m currently at the end of week 4 at Machine learning course , on Coursersa and I’ve also started the SQL course in DataQuest today. This is a link about SQL style guides https://www.sqlstyle.guide/ from this DataQuest course. I will try to work for approximately one hour on each course (Coursera and Dataquest), because I will have less time since the lockdown is finally over. I have also uploaded the previous days my first project on GitHub, which is the Titanic competition on Kaggle. The link is here https://github.com/AngelosTheodorakis/Data_Analysis_Projects/blob/master/Titanic_Dataset/Titanic_Project.md. So when I finish the Coursera course (which by the way is the most difficult from all the courses I’ve taken until now), I will continue with more projects from Kaggle and continue building my portofolio!

12-05-2020

I’m making progress in Data Science , currently I’m at week 4 on Coursera Machine learning course and I also make progress in DataQuest. What I’ve found the last days is that I need to make time for projects every day and not focus only on the online courses. With projects I can build my portofolio on GitHub and in the next 3-4 months I can have some projects uploaded for showing to potential employers. I can use Kaggle, because you can learn from other users as well, in a practical way. I have already started some projects, but I haven’t got much experience of presenting these to other people. Also I need to work with machine learning more and be able to make predictions using the data I have, on previously “unseen” data. I keep working every day, but I also need to work smart. In later posts I will show a sample of my work here.

01-05-2020

I made progress today with the Coursera Machine learning course. I am at the end of week 2 and I plan to finish the course until the end of the month. Also I made progress in DataQuest, which helps me learn some concepts better , or to confirm I know the concepts well. I believe that these 2 courses will help me a lot to be more confident as a data analyst. The challenge for this month is complete the Coursera course and make as much progress as I can on DataQuest! Also this month I want to try and send my CV , in order to gain some interviews experience.

30-04-2020

Made progress with DataQuest and finished the Python Introduction mission. These were mainly things I already knew but I understood them more in-depth, especially lists. I also finished 2 guided projects using Jupyter Notebook, and using Markdown for presenting them. I already have 25 hours of practice in DataQuest!

 I also read an interesting article from Dataquest called “An In-Depth Style Guide for Data Science Projects” https://www.dataquest.io/blog/data-science-project-style-guide/ ,which gives you some basic guidelines for your code and projects.

24-04-2020

Today I made progress in Dataquest and also started the machine learning course on Coursera! Also I’ve been accepted to Dataquest’s Covid-19 Financial Aid Scholarship. That means I’ll automatically receive free access to all of the courses and projects in the Data Analyst in Python and Data Analyst in R paths by Thursday, April 30. More specifically:

“Your free access will end on July 26, 2020. However, if you complete at least one full course prior to July 21, 2020, you will receive an extension with an additional three months of free access to the Data Analyst in Python and Data Analyst in R paths. Extensions will not be granted under other circumstances.”

So, absolutely great news!!! Dataquest is a really good online platform for learning Data science through projects and I’ll do my best to make as much progress as I can until then!

Also today I sent some of my plots regarding a research, to the producer of that research and hopefully I can have my work visible to all those who are interested in that research. My knowledge is still limited but I improve every day.

The last days I’ve focused more on the practical subjects of data analysis and I think that starting the machine learning course from Coursera I will help me learn a lot more and become more confident.

So much to learn and do! I try to make progress every day, have faith and everything will follow!

21-04-2020

It’ s been a long time since I wrote in my blog. The coronavirus crisis forced us to stay inside our homes , so I found a chance studying data-science more. I spent many hours and I learned a lot , regarding plots in R, Markdown files and Github integration, spent some time on Dataquest and finished with SQL course from Udemy. I also purchased 5 other courses from Udemy and improved a little my LinkedIn profile. I took the chance of making some cool plots from the data I had from my new part-time job . This made me learn a lot about R , Github and exploratory analysis in R ,especially ggplots. I will send my project tomorrow probably. I begin to understand how to do a project and I want to learn more about methods of prediction and apply them in my projects.

Design a site like this with WordPress.com
Get started