How to visualise KFold Cross Validation using Python and Matplotlib
In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used,[15] but in general k remains an unfixed parameter.
Code Starts Here
===============
from sklearn.model_selection import KFold
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1339)
n_splits = 5
Generate the class/group data
n_points = 100
X = np.random.randn(100, 10)
percentiles_classes = [.2, .4, .4]
y = np.hstack([[ii] * int(100 * perc)
for ii, perc in enumerate(percentiles_classes)])
Evenly spaced groups repeated once
groups = np.hstack([[ii] * 10 for ii in range(10)])
def visualize_groups(classes, groups, name):
Visualize dataset groups
fig, ax = plt.subplots()
ax.scatter(range(len(groups)), [1] * len(groups), c=groups, marker='_',
lw=50)
ax.scatter(range(len(groups)), [2] * len(groups), c=classes, marker='_',
lw=50)
visualize_groups(y, groups, 'no groups')
def plot_cv(cv, X, y, group, ax, n_splits, lw=10):
for ii, (tr, tt) in enumerate(cv.split(X=X, y=y, groups=groups)):
Fill in indices with the training/test groups
print('iiiiiiiiiiiiiii :',ii)
print('train :', tr)
print('test :', tt)
indices = np.array([np.nan] * len(X))
print('Indices Before',indices)
indices[tt] = 1
indices[tr] = 0
print('Indices After :', indices)
print('Range Len ',range(len(indices)))
print('Leeeen :', len(indices))
print([ii + 1] * len(indices))
Visualize the results
ax.scatter(range(len(indices)), [ii + 1] * len(indices),
c=indices, marker='_', lw=lw)
Plot the data classes and groups at the end
ax.scatter(range(len(X)), [ii + 2] * len(X),
c=y, marker='_', lw=lw)
This is for Plotting the Groups
ax.scatter(range(len(X)), [ii + 3] * len(X),
c=group, marker='_', lw=lw)
Formatting
yticklabels = list(range(n_splits)) + ['class', 'group']
ax.set(yticks=np.arange(n_splits+2) + 1, yticklabels=yticklabels,
xlabel='Sample index', ylabel="CV iteration",
ylim=[n_splits+2.2, -.2], xlim=[0, 100])
ax.set_title('{}'.format(type(cv).__name__), fontsize=15)
return ax
fig, ax = plt.subplots()
n_splits = 5
cv = KFold(n_splits,shuffle=True)
plot_cv(cv, X, y, groups, ax, n_splits)
All Playlist of this youtube channel
====================================
1. Data Preprocessing in Machine Learning
• Data Preprocessing in Machine Learnin...
2. Confusion Matrix in Machine Learning, ML, AI
• Confusion Matrix in Machine Learning,...
3. Anaconda, Python Installation, Spyder, Jupyter Notebook, PyCharm, Graphviz
• Anaconda | Python Installation | Spyd...
4. Cross Validation, Sampling, train test split in Machine Learning
• Cross Validation | Sampling | train t...
5. Drop and Delete Operations in Python Pandas
• Drop and Delete Operations in Python ...
6. Matrices and Vectors with python
• Matrices and Vectors with python
7. Detect Outliers in Machine Learning
• Detect Outliers in Machine Learning
8. TimeSeries preprocessing in Machine Learning
• TimeSeries preprocessing in Machine L...
9. Handling Missing Values in Machine Learning
• Handling Missing Values in Machine Le...
10. Dummy Encoding Encoding in Machine Learning
• Label Encoding, One hot Encoding, Dum...
11. Data Visualisation with Python, Seaborn, Matplotlib
• Data Visualisation with Python, Matpl...
12. Feature Scaling in Machine Learning
• Feature Scaling in Machine Learning
13. Python 3 basics for Beginner
• Python | Python 3 Basics | Python for...
14 Interview Questions in Machine Learning and Artificial Intelligence
• Interview Question for Machine Learni...
15. Jupyter Notebook Operations
• Jupyter and Spyder Notebook Operation...