Titanic disaster analysis

Posted on Чт 17 Ноябрь 2016 in data analysis • Tagged with data, analisys, python, pandas, matplotlib, scikit-learn, numpy, machine learning, kaggleLeave a comment

I'm newbie at the Kaggle and I'm new to machine learning. I'll try to make this exploration interesting and detailed.

1. Data analysis

1.1. Expectations

What I do expect from this analysis? I’ll create a model predicting a survival on the Titanic. And on the way to prediction I'll make illustrations for all found dependencies.
First of all, I want to understand what kind of variables do I have.
Continue reading


Future stock prices prediction based on the historical data using simplified linear regression

Posted on Чт 06 Октябрь 2016 in data analysis • Tagged with data, analisys, python, pandas, matplotlib, scikit-learn, numpy, machine learning, linear regressionLeave a comment

In this post I want give a simplified explanation of what the linear regression model is and how to apply it for data predictions using python and some open python libraries (including scikit-learning).

Supervised learning is one of the major categories of Machine Learning algorithms. "Supervised" means we already have a dataset in which "correct answers" were given. For example, we have a stock data with open values and close values for a past few years, and we want to predict future values (prices or indexes). Supervised learning is subdivided into Regression problem and Classification problem. Regression problem means we're trying to predict a continuous value output (like predict stock value).

Continue reading