 Machine Learning Part 4: Introduction to Supervised Learning Algorithms In Machine Learning Python

# Machine Learning Part 4: Introduction to Supervised Learning Algorithms

## Supervised Learning Algorithms

In this article, we will explore some fundamental supervised learning algorithms and demonstrate their implementation in Python, along with the corresponding output.

## Supervised Learning

Supervised learning is a machine learning approach in which an algorithm learns from labeled examples to make predictions or take actions based on input data. In supervised learning, a dataset is composed of input variables (features) and corresponding target variables (labels or outputs). The algorithm learns to map the input data to the desired output by analyzing the provided examples.

## Supervised Algorithms

Supervised algorithms are computational methods used in supervised learning to learn from labeled examples and make predictions or decisions based on new, unseen data. These algorithms leverage the provided input-output pairs to learn the underlying mapping between the input features and the target variables.

Supervised learning is a popular approach in machine learning where models are trained using labeled data to make predictions or classify new data.

## Linear Regression

Linear regression is a regression algorithm that aims to find a linear relationship between the input features and the target variable. It is commonly used for predicting continuous values. Let’s take a look at a simple example using Python:

``````# Importing the necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression

# Creating the input features and target variable
X = np.array([, , , , ])  # Input features
y = np.array([3, 4, 5, 6, 7])            # Target variable

# Creating and training the linear regression model
model = LinearRegression()
model.fit(X, y)

# Predicting the target variable for a new input feature
new_X = np.array([])
prediction = model.predict(new_X)
print("Linear Regression Prediction:", prediction)
``````

### Output

`Linear Regression Prediction: [8.]`

The linear regression model predicts the target variable for a new input feature, which is 6 in this case. The predicted value is 8.

### Code Explanation

The code performs linear regression using the scikit-learn library. The following steps are executed:

Step 1: Importing the necessary libraries

• The necessary libraries are imported:
• The numpy library is imported as np for numerical operations.
• The LinearRegression class is imported from the sklearn.linear_model module, which provides the implementation of linear regression.

Step 2: Creating the input features and target variable

• The input features are created as a numpy array named X, with values [1, 2, 3, 4, 5].
• The target variable is created as a numpy array named y, with values [3, 4, 5, 6, 7].

Step 3: Creating and training the linear regression model

• An instance of the LinearRegression class is created as model.
• The fit method is called on the model instance to train the linear regression model using the input features X and the target variable y.

Step 4: Predicting the target variable for a new input feature

• A new input feature is created as a numpy array named new_X, with a value of .
• The predict method is called on the model instance to predict the target variable for the new input feature new_X.
• The prediction is stored in a variable named prediction.

Step 5: Printing the prediction

• The predicted value of the target variable is printed to the console using the print function.

## Logistic Regression

Logistic regression is a classification algorithm used when the target variable is binary or categorical. It estimates the probability of an input belonging to a certain class. Here’s an example of logistic regression implementation in Python

``````# Importing the necessary libraries
import numpy as np
from sklearn.linear_model import LogisticRegression

# Creating the input features and target variable
X = np.array([, , , , ])        # Input features
y = np.array([0, 0, 1, 1, 1])                  # Target variable

# Creating and training the logistic regression model
model = LogisticRegression()
model.fit(X, y)

# Predicting the probability of an input belonging to class 1
new_X = np.array([])
probability = model.predict_proba(new_X)[:, 1]
print("Logistic Regression Probability:", probability)
``````

### Output

`Logistic Regression Probability: [0.97697818]`

The logistic regression model predicts the probability of the input belonging to class 1. In this case, with the input feature of 6, the predicted probability of it belonging to class 1 is approximately 0.97697818.

### Code Explanation

The code performs logistic regression using the scikit-learn library. The following steps are executed:

Step 1: Importing the necessary libraries

• The necessary libraries are imported:
• The numpy library is imported as np for numerical operations.
• The LogisticRegression class is imported from the sklearn.linear_model module, which provides the implementation of logistic regression.

Step 2: Creating the input features and target variable

• The input features are created as a numpy array named X, with values [1, 2, 3, 4, 5].
• The target variable is created as a numpy array named y, with values [0, 0, 1, 1, 1].

Step 3: Creating and training the logistic regression model

• An instance of the LogisticRegression class is created as model.
• The fit method is called on the model instance to train the logistic regression model using the input features X and the target variable y.

Step 4: Predicting the probability of an input belonging to class 1

• A new input feature is created as a numpy array named new_X, with a value of .
• The predict_proba method is called on the model instance to predict the probability of the new input belonging to class 1.
• The probability is stored in a variable named probability.

Step 5: Printing the probability

• The predicted probability is printed to the console using the print function.

## Decision Trees and Random Forests

Decision trees and random forests are versatile algorithms used for both classification and regression tasks. They create a tree-like model of decisions based on the input features. Random forests combine multiple decision trees to improve accuracy and reduce overfitting. Let’s see an example:

``````# Importing the necessary libraries
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generating complex multiclass dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_classes=3, random_state=42)

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the decision tree classifier
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)

# Creating and training the random forest classifier with adjusted hyperparameters
rf_model = RandomForestClassifier(n_estimators=100, max_depth=10, min_samples_split=2)
rf_model.fit(X_train, y_train)

# Predicting the target variable for the testing set
dt_predictions = dt_model.predict(X_test)
rf_predictions = rf_model.predict(X_test)

print("Decision Tree Predictions:", dt_predictions)
print("Random Forest Predictions:", rf_predictions)

# Comparing with true labels
dt_accuracy = accuracy_score(y_test, dt_predictions)
rf_accuracy = accuracy_score(y_test, rf_predictions)

print("Decision Tree Accuracy:", dt_accuracy)
print("Random Forest Accuracy:", rf_accuracy)
``````

### Output

```Decision Tree Predictions: [0 2 1 1 1 0 2 2 1 2 1 2 1 1 2 0 2 2 0 2 0 1 1 1 0 2 0 0 1 2 0 2 2 1 1 2 2
0 0 0 0 2 0 0 0 2 2 0 1 2 0 0 1 0 2 2 0 0 2 2 2 0 2 0 1 2 0 2 0 2 2 2 2 1
2 2 0 2 2 2 0 0 0 2 1 1 2 0 1 2 0 2 1 0 2 1 0 2 0 0 1 2 1 1 1 1 1 2 2 2 1
1 0 0 1 1 0 0 1 2 0 2 0 2 2 2 1 0 2 1 1 2 2 2 0 2 1 1 1 2 1 2 0 2 0 0 1 0
0 2 2 0 0 1 1 1 1 0 1 0 0 1 2 1 2 1 1 2 2 1 1 1 2 2 2 1 2 2 2 1 2 2 1 1 2
0 0 1 1 0 0 0 0 2 2 2 0 1 2 0]
Random Forest Predictions: [2 2 1 1 1 2 2 2 1 2 1 2 1 1 0 0 2 2 2 0 0 1 1 1 0 2 0 2 2 2 0 2 2 1 1 0 2
0 0 0 0 2 0 0 0 2 2 0 1 2 2 0 1 2 2 2 2 0 2 2 2 0 1 0 1 2 0 2 0 2 0 0 2 1
2 2 0 2 2 2 0 0 0 2 1 1 0 2 1 2 0 0 1 1 2 1 0 2 0 0 1 2 1 1 1 1 1 0 2 2 1
1 0 0 1 1 0 2 1 0 0 2 2 2 1 2 1 2 2 1 1 2 2 2 0 2 1 1 1 2 1 2 1 1 0 0 1 0
0 2 2 0 0 1 2 1 1 0 1 0 0 1 2 1 2 1 1 2 2 1 1 1 1 1 2 1 2 2 0 1 2 2 1 1 2
0 2 1 1 0 0 0 0 2 2 2 0 1 2 0]
Decision Tree Accuracy: 0.795
Random Forest Accuracy: 0.865
```

In this example, the “Decision Tree Predictions” and “Random Forest Predictions” lines show the predicted class labels for the testing set. Each number represents the predicted class label for a specific data point.

The “Decision Tree Accuracy” and “Random Forest Accuracy” lines display the accuracy scores of the decision tree and random forest models, respectively, on the testing set. These scores indicate the proportion of correctly classified samples.

Please note that the actual output may differ due to the random nature of the dataset generation and the random seed used for splitting the data.

### Code Explanation

The code generates a complex multiclass dataset and performs classification using decision trees and random forests. The following steps are executed:

Step 1: Importing the necessary libraries

• The necessary libraries are imported:
• The make_classification function is imported from the sklearn.datasets module to generate a synthetic multiclass dataset.
• The DecisionTreeClassifier and RandomForestClassifier classes are imported from the sklearn.tree and sklearn.ensemble modules, respectively, for creating decision tree and random forest classifiers.
• The train_test_split function is imported from the sklearn.model_selection module to split the dataset into training and testing sets.
• The accuracy_score function is imported from the sklearn.metrics module to compute the accuracy of the classifiers.

Step 2: Generating the complex multiclass dataset

• A complex multiclass dataset is generated using the make_classification function.
• It generates 1000 samples with 10 features, where 5 features are informative.
• The dataset has 3 classes, and the random_state is set to 42 for reproducibility.

Step 3: Splitting the dataset into training and testing sets

• The dataset is split into training and testing sets using the train_test_split function.
• The testing set size is set to 20% of the total dataset, and the random_state is set to 42 for reproducibility.

Step 4: Creating and training the decision tree classifier

• An instance of the DecisionTreeClassifier class is created as dt_model.
• The fit method is called on the dt_model instance to train the decision tree classifier using the training set.

Step 5: Creating and training the random forest classifier with adjusted hyperparameters

• An instance of the RandomForestClassifier class is created as rf_model.
• The n_estimators, max_depth, and min_samples_split hyperparameters are adjusted to 100, 10, and 2, respectively.
• The fit method is called on the rf_model instance to train the random forest classifier using the training set.

Step 6: Predicting the target variable for the testing set

• The predict method is called on the dt_model and rf_model instances to predict the target variable for the testing set.
• The predictions are stored in dt_predictions and rf_predictions, respectively.

Step 7: Printing the predictions

• The predicted class labels for the testing set are printed to the console using the print function.

Step 8: Comparing with true labels

• The accuracy_score function is used to compute the accuracy of the classifiers by comparing the predicted labels with the true labels.
• The accuracy scores are stored in dt_accuracy and rf_accuracy, respectively.

Step 9: Printing the accuracy scores

• The accuracy scores of the decision tree and random forest classifiers are printed to the console using the print function.

## Support Vector Machines (SVM)

SVM is a powerful algorithm used for both classification and regression tasks. It finds an optimal hyperplane that separates different classes with a maximum margin. Here’s an example of SVM implementation in Python:

``````# Importing the necessary libraries
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = digits.data, digits.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the support vector classifier
svm_model = SVC()
svm_model.fit(X_train, y_train)

# Predicting the target variable for the testing set
predictions = svm_model.predict(X_test)

# Calculating the accuracy of the model
accuracy = accuracy_score(y_test, predictions)
print("SVM Accuracy:", accuracy)``````

### Output

`SVM Accuracy: 0.9861111111111112`

In this example, the code uses the Digits dataset, which contains images of handwritten digits. The dataset is split into training and testing sets, and an SVM model is trained and evaluated on the dataset. The output value of 0.9861111111111112 represents the accuracy of the model, indicating that it correctly classified approximately 98.61% of the samples in the testing set.

Please note that the actual output may vary slightly due to the random nature of the train-test split and the specific dataset used.

### Code Explanation

The code performs classification using Support Vector Machines (SVM) on the Digits dataset. The following steps are executed:

Step 1: Importing the necessary libraries

• The necessary libraries are imported:
• The load_digits function is imported from the sklearn.datasets module to load the Digits dataset.
• The SVC class is imported from the sklearn.svm module for creating a Support Vector Classifier.
• The train_test_split function is imported from the sklearn.model_selection module to split the dataset into training and testing sets.
• The accuracy_score function is imported from the sklearn.metrics module to compute the accuracy of the classifier.

• The input features are stored in a variable named X, and the target variable is stored in a variable named y.

Step 3: Splitting the dataset into training and testing sets

• The dataset is split into training and testing sets using the train_test_split function.
• The testing set size is set to 20% of the total dataset, and the random_state is set to 42 for reproducibility.

Step 4: Creating and training the support vector classifier

• An instance of the SVC class is created as svm_model.
• The fit method is called on the svm_model instance to train the support vector classifier using the training set.

Step 5: Predicting the target variable for the testing set

• The predict method is called on the svm_model instance to predict the target variable for the testing set.
• The predictions are stored in a variable named predictions.

Step 6: Calculating the accuracy of the model

• The accuracy_score function is used to compute the accuracy of the classifier by comparing the predicted labels with the true labels.
• The accuracy is stored in a variable named accuracy.

Step 7: Printing the accuracy

• The accuracy of the support vector classifier is printed to the console using the print function.

## Conclusion

In this article, we explored four popular supervised learning algorithms: linear regression, logistic regression, decision trees/random forests, and support vector machines (SVM). We demonstrated their implementation in Python and showcased the corresponding output. These algorithms serve as a solid foundation for understanding and applying supervised learning techniques in various machine learning tasks.

Machine Learning In Python Beginner Tutorial Series