Import the Training data

Here we import the training data values and store it n a variable , so that we can use that variable to access the data when required.




    Import the Test data

    Here we import the training data values and store it n a variable , so that we can use that variable to access the data when required.

Cleaning the DATA

To find Null values and missing values from the training dataset




    To find missing values from the Training dataset




    To find missing values from the Test dataset

Survival

When we consider the entire training data , the most importaant part is to see how many people survived the accident , Unfortunately according to the data that is provided we can draw the conclusion that the number of survivors are less than the number of deceased.

Survival according to Age

Here i have taken age as a factor to find out what age categories survival rate is the highest. According to the data inage we can draw the conclusion that , people of age median 30 has the highest survival rate. At the same time we can draw the conclusion that the people of age median 25 has the lowest survival rate.

Visualizing training data based Class

Class is another importatnt factor in this analysis. I have taken class to be another important factor in order to analyse which class passengers have the most survival rate.



    Visualisation of Passengers class in Test Data

    We can draw a few importatnt conclusions from this graph : we see that maximum passengers are travellers in the third class we see that about 30%of the travellers are from first class also about 15% are from second class where as more than 55% of passengers belong to third class

Visualising Age based on Class

we can note that the third class is the most filled the median age of people on the ship is approximately 30 ie most of the people on board are approximately 30.




    We can notice that even in the test data given , the third class is the most filled with people of age 25-30 and the first class contains the range of 40-60 more. People of age 60-80 are the least in 3rd class.

Survivors by Class

comparison in 1st class : The deceased here is lesser than the number of passengers who have survived Comparison in 2nd class : The deceased here are almost equal to the number of people who have survived , just a tad bit more deceased compared to the survivors. Comparison in the 3rd class : The number of passengers deceased here are way more than the number of survivors.

    Even though the number of people in 3rd class are way more higher than the number of people in 1st class , The number of survivors in the first class are more.
    money does play its part.

Snapshots of Code

Code to display the training data.




    checking for the missing values.




    Making Predictions.




    Implementing Radom Forest Classifier.




    Code to diplay the survivors by class.



    My approach involved classifying the data and finding the survivors. I tried to find out important attributes which affected the survival, like age, class and Relatives Relatives includes: Parch and SibSp.

    I tried to find the survivors based on age and found out the most survivors are of the age :20-30.

    I tried to find the survivors based on class and found out the most survivors were from the first class.

    I tried to incorporate Relatives to see how many members of the same family survived but was not successful, however my I shall try and learn more models of ML so that I will be able to implement them and find a better meaning from the data and classify the survivors in a more efficient way.

Contact


Any Queries ? Contact me !

Email Id : Neelesh216@gmail.com

Address : 404 E Border st, Arlington, Texas 76010, USA

Phone : 682 375 1222