Data-Informed Thinking + Doing
Numerical Predictions Using Ridge and Lasso Regression
—using scikit-learn for Python.
Getting Started
Before diving in, the process/pipeline will be the following:
[Flow chart here]
If you are interested in reproducing this work, here are the versions of Python and Python packages used:
import sys
print(sys.version)
## 3.9.1 (v3.9.1:1e5d33e9b9, Dec 7 2020, 12:10:52)
## [Clang 6.0 (clang-600.0.57)]
# !pip install "numpy==1.20.0"
# !pip install "pandas==1.2.2"
# !pip install "matplotlib==3.3.4"
# !pip install "seaborn==0.11.1"
# !pip install "scikit-learn==0.24.1"
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn import linear_model
sns.set()
plt.style.use("ggplot")
iris = datasets.load_iris()
type(iris)
## <class 'sklearn.utils.Bunch'>
print(iris.keys())
## dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])
type(iris.data), type(iris.target)
## (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)
iris.data.shape
## (150, 4)
iris.target_names
## array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
x = iris.data
y = iris.target
df = pd.DataFrame(x, columns=iris.feature_names)
print(df.head())
## sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
## 0 5.1 3.5 1.4 0.2
## 1 4.9 3.0 1.4 0.2
## 2 4.7 3.2 1.3 0.2
## 3 4.6 3.1 1.5 0.2
## 4 5.0 3.6 1.4 0.2
_ = pd.plotting.scatter_matrix(df, c = y, figsize = [8, 8], s = 150, marker = "D")
plt.show()
Applied Advanced Analytics & AI in Sports