Here's a simple pattern that can be adapted to solve many ML problems. It has plenty of shortcomings, but can work surprisingly well as-is!
Shortcomings include:
Assumes all columns have proper data types
May include irrelevant or improper features
Does not handle text or date columns well
Does not include feature engineering
Ordinal encoding may be better
Other imputation strategies may be better
Numeric features may not need scaling
A different model may be better
And so on...
Want to watch all 50 scikit-learn tips? Enroll in my FREE online course:
👉 https://courses.dataschool.io/scikit-... 👈
Tips mentioned in this video:
Tip 1: • Use ColumnTransformer to apply differ...
Tip 2: • Seven ways to select columns using Co...
Tip 6: • Encode categorical features using One...
Tip 7: • Handle unknown categories with OneHot...
Tip 9: • Add a missing indicator to encode "mi...
Tip 11: • Impute missing values using KNNImpute...
Tip 16: • Use cross_val_score and GridSearchCV ...
Tip 27: • Two ways to impute missing values for...
Tip 43: • Use OrdinalEncoder instead of OneHotE...
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introdu...
2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/buildin...
3) LET'S CONNECT!
Newsletter: https://www.dataschool.io/subscribe/
Twitter: / justmarkham
Facebook: / datascienceschool
LinkedIn: / justmarkham