ML | Cross Validation | How to visualise KFold Cross Validation using Python and Matplotlib

Опубликовано: 02 Октябрь 2024
на канале: technologyCult
2,057
13

How to visualise KFold Cross Validation using Python and Matplotlib

In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used,[15] but in general k remains an unfixed parameter.

Code Starts Here
===============
from sklearn.model_selection import KFold

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1339)

n_splits = 5

Generate the class/group data
n_points = 100
X = np.random.randn(100, 10)

percentiles_classes = [.2, .4, .4]
y = np.hstack([[ii] * int(100 * perc)
for ii, perc in enumerate(percentiles_classes)])

Evenly spaced groups repeated once
groups = np.hstack([[ii] * 10 for ii in range(10)])

def visualize_groups(classes, groups, name):
Visualize dataset groups
fig, ax = plt.subplots()
ax.scatter(range(len(groups)), [1] * len(groups), c=groups, marker='_',
lw=50)
ax.scatter(range(len(groups)), [2] * len(groups), c=classes, marker='_',
lw=50)
visualize_groups(y, groups, 'no groups')


def plot_cv(cv, X, y, group, ax, n_splits, lw=10):
for ii, (tr, tt) in enumerate(cv.split(X=X, y=y, groups=groups)):
Fill in indices with the training/test groups
print('iiiiiiiiiiiiiii :',ii)
print('train :', tr)
print('test :', tt)
indices = np.array([np.nan] * len(X))
print('Indices Before',indices)
indices[tt] = 1
indices[tr] = 0
print('Indices After :', indices)

print('Range Len ',range(len(indices)))
print('Leeeen :', len(indices))
print([ii + 1] * len(indices))
Visualize the results
ax.scatter(range(len(indices)), [ii + 1] * len(indices),
c=indices, marker='_', lw=lw)
Plot the data classes and groups at the end
ax.scatter(range(len(X)), [ii + 2] * len(X),
c=y, marker='_', lw=lw)
This is for Plotting the Groups
ax.scatter(range(len(X)), [ii + 3] * len(X),
c=group, marker='_', lw=lw)

Formatting
yticklabels = list(range(n_splits)) + ['class', 'group']
ax.set(yticks=np.arange(n_splits+2) + 1, yticklabels=yticklabels,
xlabel='Sample index', ylabel="CV iteration",
ylim=[n_splits+2.2, -.2], xlim=[0, 100])
ax.set_title('{}'.format(type(cv).__name__), fontsize=15)
return ax

fig, ax = plt.subplots()
n_splits = 5
cv = KFold(n_splits,shuffle=True)
plot_cv(cv, X, y, groups, ax, n_splits)

All Playlist of this youtube channel
====================================

1. Data Preprocessing in Machine Learning
   • Data Preprocessing in Machine Learnin...  

2. Confusion Matrix in Machine Learning, ML, AI
   • Confusion Matrix in Machine Learning,...  

3. Anaconda, Python Installation, Spyder, Jupyter Notebook, PyCharm, Graphviz
   • Anaconda | Python Installation | Spyd...  

4. Cross Validation, Sampling, train test split in Machine Learning
   • Cross Validation | Sampling | train t...  

5. Drop and Delete Operations in Python Pandas
   • Drop and Delete Operations in Python ...  

6. Matrices and Vectors with python
   • Matrices and Vectors with python  

7. Detect Outliers in Machine Learning
   • Detect Outliers in Machine Learning  

8. TimeSeries preprocessing in Machine Learning
   • TimeSeries preprocessing in Machine L...  

9. Handling Missing Values in Machine Learning
   • Handling Missing Values in Machine Le...  

10. Dummy Encoding Encoding in Machine Learning
   • Label Encoding, One hot Encoding, Dum...  

11. Data Visualisation with Python, Seaborn, Matplotlib
   • Data Visualisation with Python, Matpl...  

12. Feature Scaling in Machine Learning
   • Feature Scaling in Machine Learning  

13. Python 3 basics for Beginner
   • Python | Python 3 Basics | Python for...  

14 Interview Questions in Machine Learning and Artificial Intelligence
   • Interview Question for Machine Learni...  

15. Jupyter Notebook Operations
   • Jupyter and Spyder Notebook Operation...