Here we will explain end to end EDA and ml project.
The following points will discuss in this project.
Loading the glassdoor dataset
CHECK basic info of data shape, columns datatypes etc
Returns basic statistics on numeric columns
checking missing records with their count
Removing the 'Unnamed' column
Removing the rows having irrelevant data '-1' as Salary Estimate value
Removing the text value from 'Salary Estimate' column
Removing extra text '$' and 'K' from 'Salary Estimate' column
Finding any inconsistencies in the salary
Creating salary_per_hour column using 'Per Hour' text
Creating column for 'Employee Provided Salary'
Removing 'Per Hour' and 'Employer Provided Salary' from 'Salary Estimate' column
Creating column for min_salary, max_salary, average_salary using salary column
Converting the hourly salaries to annual salaries
Removing numbers from 'Company Name' column and creating a column 'job_state'
Fixing Los Angeles to CA and calculating age of the companies
Cleaning the 'Job Description',' Competitors', Type of Ownership, 'Revenue' column
Finding Correlation between columns and plotting the correlation
Exploring categorical data:odinal, nominal
Plotting the data for 'Location' and 'Headquarters' columns
Plotting the data for 'Company Name', 'Size', 'Type of ownership', 'Revenue' columns
Plotting the data for 'Industry', 'Sector' columns
Plotting the data for 'job_title_simplified', 'job_seniority' columns
Top 15 Industries for Data Scientists
Top 10 Sectors for Data Scientists
Top Company types that pay Data Scientists well
Top 20 Companies that pay Data Scientists well
Trimming the 'Industry','job_state' column
Taking top 10 States and replacing others by 'Others'
Adding column of 'job_in_headquarters'
Mapping ranks to 'company_size','revenue','job_seniority' columns since it is ordinal categorical feature
Labelling 'type_of_ownership','industry','job_title_simplified','job_state' column using get_dummies()
creating final dataset after Feature Engineering check in case of any null, if yes drop
Splitting the dataset into train and test set
Creating linear regression model, random forest regression model, AdaBoost regression model
Project | Data Analysis Of Uber Data Set | Analysis / Visualization | Python Programming
• Project | Data Analysis Of Uber Data ...
Project | Data Analysis Of Uber Data Set | Analysis / Visualization | Python Programming
• Project | Data Analysis Of Uber Data ...