Cross-Validation from Scratch

MediumCross-ValidationModel EvaluationMachine LearningPerformance MetricsData Splitting

Implement k-fold cross-validation from scratch and understand its importance in model evaluation. Compare cross-validation results with simple train-test split and visualize the performance across different folds.

Problem:

Implement k-fold cross-validation from scratch and understand its importance in model evaluation. Compare cross-validation results with simple train-test split and visualize the performance across different folds.

Examples:

Input: X, y = load_diabetes(return_X_y=True)
cv_results = cross_validation(X, y, LinearRegression())
Output: Mean MSE: 3000.25 ± 500.12
Mean R²: 0.4521 ± 0.0892
Cross-validation on diabetes dataset showing mean and standard deviation of metrics
Input: X = np.random.randn(1000, 10)
y = np.random.randn(1000)
folds = create_folds(X, y, k=10)
Output: 10 folds created
Each fold size ≈ 100
Creating 10 folds with random data, demonstrating equal fold sizes

Constraints:

  • Must shuffle data before creating folds
  • Must handle non-divisible fold sizes
  • Must calculate both MSE and R² metrics

Code Editorpython

Run your code to see the output here.

Output

Run your code to see the output here.