Le domaine du Machine Learning vous intéresse ? Alors ce cours est fait pour vous !
Ce cours a été conçu par un IT Architect / Expert Machine Learning afin que nous pouvons partager nos connaissances et vous aider à apprendre des théories, des algorithmes et des bibliothèques de codage complexes de manière simple.
Nous vous guiderons étape par étape dans le monde du Machine Learning. Avec chaque didacticiel, vous développerez de nouvelles compétences et améliorerez votre compréhension de ce sous-domaine difficile mais lucratif de la science des données.
Ce cours peut être complété soit en suivant les tutoriels Python, soit en suivant R. tutoriels, ou les deux – Python & R. Choisissez le langage de programmation dont vous avez besoin pour votre carrière.
Ce cours est amusant et passionnant, et en même temps, nous approfondissons l’apprentissage automatique. Il est structuré de la manière suivante :
-
- Partie 1 – Prétraitement des données
- Partie 2 – Régression : régression linéaire simple, régression linéaire multiple Régression, régression polynomiale, SVR, régression d’arbre de décision, régression de forêt aléatoire
- Partie 3 – Classification : régression logistique, K-NN, SVM, noyau SVM, Naive Bayes, classification des arbres de décision, classification aléatoire des forêts
- Partie 4 – Clustering : K-Means, clustering hiérarchique
-
- Partie 5 – Apprentissage des règles d’association : Apriori, Eclat
- Partie 6 – Apprentissage par renforcement : Limite de confiance supérieure, échantillonnage de Thompson
- Partie 7 – Traitement du langage naturel : modèle de sac de mots et algorithmes pour la PNL
- Partie 8 – Apprentissage profond : Réseaux de neurones artificiels, réseaux de neurones convolutifs
- Partie 9 – Réduction de la dimensionnalité : PCA, LDA, Kernel PCA
- Partie 10 – Sélection du modèle et Boosting : validation croisée k-fold, réglage des paramètres, recherche de grille, XGBoost
Chaque section à l’intérieur de chaque partie est indépendante. Vous pouvez donc soit suivre l’intégralité du cours du début à la fin, soit accéder directement à n’importe quelle section spécifique et apprendre ce dont vous avez besoin pour votre carrière dès maintenant.
De plus, le cours regorge d’exercices pratiques basés sur des études de cas réelles. Ainsi, non seulement vous apprendrez la théorie, mais vous obtiendrez également de nombreuses pratiques pratiques pour construire vos propres modèles.
Welcome to the course! Here we will help you get started in the best conditions.
See the power of Machine Learning in action as we create a Logistic Regression predictive model for a real-world marketing and sales use-case!
In this video, Hadelin explains in details how to install R programming language and R studio on your computer so you can swiftly go through the rest of the course.
-------------------- Part 1: Data Preprocessing --------------------
Understand the steps involved in Machine Learning: Data Pre-Processing (Import the data, Clean the data, Split into training & test sets, Feature Scaling), Modelling (Build the model, Train the model, Make predictions), and Evaluation (Calculate performance metrics, Make a verdict).
Understand why it's important to split the data into a training set and a test set, how they differ and what they are used for.
Two types of feature scaling: Normalization and Standardization. In the practical tutorials we focus on Standardisation and here we will discuss the intuition behind Normalisation.
Data Preprocessing in Python
A short written summary of what needs to know in Object-oriented programming, e.g. class, object, and method.
Data Preprocessing in R
-------------------- Part 2: Regression --------------------
What is regression? 6 types of regression models are taught in this course.
Simple Linear Regression
The math behind Simple Linear Regression.
Finding the best fitting line with Ordinary Least Squares method to model the linear relationship between independent variable and dependent variable.
Data preprocessing for Simple Linear Regression in R.
Fitting Simple Linear Regression (SLR) model to the training set using R function ‘lm’.
Predicting the test set results with the SLR model using R function ‘predict’ .
Visualizing the training set results and test set results with R package ‘ggplot2’.
Multiple Linear Regression
An application of Multiple Linear Regression: profit prediction for Startups.
The math behind Multiple Linear Regression: modelling the linear relationship between the independent (explanatory) variables and dependent (response) variable.
The 5 assumptions associated with a linear regression model: linearity, homoscedasticity, multivariate normality, independence (no autocorrelation), and lack of multicollinearity - plus an additional check for outliers.
Coding categorical variables in regression with dummy variables.
Dummy variable trap and how to avoid it.
An intuitive guide to 5 Stepwise Regression methods of building multiple linear regression models: All-in, Backward Elimination, Forward Selection, Bidirectional Elimination, and Score Comparison.
Polynomial Regression
The math behind Polynomial Regression: modelling the non-linear relationship between independent variables and dependent variable.
Data preprocessing for Polynomial Regression in R.
Fitting Polynomial Regression model and Linear Regression model to the dataset in R.
Visualizing Linear Repression results and Polynomial Regression results and comparing the models' performance.
Predicting new results with Linear Regression model and Polynomial Regression model.
Template for regression modelling in R.
Support Vector Regression (SVR)
Understanding the intuition behind Support Vector Regression (SVR) for the linear case. Concepts like epsilon-insensitive tube and slack variables are explained in this tutorial.
Some info about upcoming tutorials on Support Vector Machines (SVM), Kernel functions and non-Linear Support Vector Regression (SVR)
Salary prediction with Support Vector Regression using R package ‘e1071’: data preprocessing, fitting, predicting, and visualizing the SVR results.
Decision Tree Regression
An intuitive guide to understanding Decision Tree Regression algorithms.
Salary prediction with Decision Tree Regression model using R package ‘rpart’: data preprocessing, fitting, predicting, and visualizing the results.
Random Forest Regression
An intuitive introduction of Random Forest Regression: an ensemble learning algorithms.
Salary prediction with Random Forest Regression model using R package ‘randomForest’, and visualizing the results with ‘ggplot2’.
Evaluating Regression Models Performance
The math behind R-squared.
Using R-squared as a goodness of fit measure and the math behind adjusted R-squared.
Regression Model Selection in Python
Regression Model Selection in R
Improving Backward Elimination with adjusted R-squared for model performance evaluation.
Interpretation of the Linear Regression analysis results: coefficients for linear relationships.
The pros & cons of each regression model; How to choose regression models? How to improve regression models. Introduction of regularization for the problem of over fitting.
-------------------- Part 3: Classification --------------------
Welcome to Part 3 of the course, where you will learn how to implement several classical Machine Learning Classification Models.
Logistic Regression
Examples of Classification use-cases: Churn Modelling in Business, Email Spam detection, Image classification
The intuition and math behind logistic regression: e.g. when exploring the correlation between people' age and whether or not they would take a certain action, we could instead predict the probability or likelihood of taking that action. The scientific approach to do so is to apply sigmoid function to linear regression equation.
How to calculate the "Likelihood" of a Logistic Regression and how to use the Maximum Likelihood method to pick the best-fitting logistic regression curve for your data.
Preprocess the data using the data preprocessing template made in Part1.
Fit logistic regression to a training set via 'glm'function. Glm is for Generalized Linear Models and is used here to linearly separate two classes of users.
Predict the test results in two steps: First, obtain the predicted probabilities via 'predict' function, and then converted the results to zeros and ones.
Evaluate the performance of logistic regression model by making a confusion matrix which counts the number of correct and incorrect predictions, which is realized in one line of code via 'table' function.
Visualize the predictive power of the logistic model on a graph made via 'ElemStatLearn' package. A step by step analysis of the graph.
Make a template from the logistic regression model in order to build future classification models more efficiently.
K-Nearest Neighbors (K-NN)
Simple and straightforward illustration of K-Nearest Neighbor algorithm: how to classify a new data point to a category of data - a step by step rule guide to KNN algorithm.
Implement the K-Nearest Neighbor algorithm to the social network dataset in R using the classification template. The knn function from a class library is used to fit KNN model to the dataset. The prediction boundary on the graph can separate data even when the data is not linearly separable.
Support Vector Machine (SVM)
The intuition behind Support Vector Machine algorithm: SVM searches for lines through maximum margin to separate two or more classes of data points. The points deciding the lines are support vectors that supporting the decision boundary and this whole SVM algorithm.
Implement the SVM algorithm to the social network dataset in R using the classification template. The SVM function from the e1071 package is used as a classifier to fit the dataset. As a linear kernal is chosen for this case, a straight line is obtained to separate the data.
Kernel SVM
In the previous section, the SVM algorithm tells us exactly how to find a decision boundary or a straight line to separate data points. However, when the data points are not linearly separable, we need to come up with an algorithm to deal with this situation.
How do we take our nonlinearly separable dataset, map it to a higher dimension and get a linearly separable dataset, and then build a decision boundary or hyperplane for the dataset with the SVM algorithm, and project all of this back to our original dimension.
Mapping the dataset to a higher dimensional space can be very highly compute-intensive, therefore, we explore a different approach with the kernel trick. By applying the Gaussian or the radial basis function (RBF) kernel functions, we can create nonlinear and complex decision boundary while everything is still happening in the same dimensions.
Other popular choices of kernel functions except for the Gaussian function or the RBF function, such as the sigmoid kernel and polynomial kernel. Visual representations of these different types of kernels in three dimensions.
Understand how the non-linear Support Vector Regression (SVR) model works using the RBF (Radial Basis Function) Kernel.
Implement the Kernel SVM algorithm to the social network dataset in R using the classification template. The svm function from e1071 package is used as a classifier to fit the dataset. We used the 'radial' kernel (like a Gaussian kernel) to create nonlinear classifier at a higher level, so we have a curved boundary separating the two categories.
Naive Bayes
The mathematical representation of Bayes Theorem and discussion of each one of its terms. Illustration of Bayes Theorem on an intuitive level through an example.
Understand the naïve Bayes classifier on an intuitive level, and learn that the naïve Bayes classifier is a probabilistic type of classifier because we first calculate the probabilities and based on probabilities we decide which class to put a new data point in.
Show the solution of the challenge that the instructor assigned the last lecture - step 2 of the naïve Bayes algorithm calculating the posterior probability.
Cover three additional comments about the naïve Bayes classifier: Why this algorithm is called naïve? How can we potentially drop the marginal probability in the naïve Bayes algorithm? What happens when there are more than two features involved in the dataset?
Implement the naive Bayes algorithm to the social network dataset in R using the classification template. The naiveBayes function from e1071 package is used as a classifier to fit the dataset. The graphic result shows that the naive Bayes algorithm manages to classify the nonlinear separable data points with a smooth boundary.
Decision Tree Classification
Decision Tree classification helps classify the data and give categorical variables as an outcome, which is different from Regression Tree that predicts real numbers. Basically, the Decision Tree Classification works by splitting the dataset into several iterations, and the splitting is done in such a way to maximize the number of a certain category in each of these splits.
Implement the Decision Tree Classification algorithm to the social network dataset in R using the classification template. The 'rpart' function is used as a classifier to fit the data set. The graphic result shows that the prediction boundary is composed of only horizontal and vertical lines, and the overfitting is less than in Python. The Decision Tree is also plotted for better interpretation of the results.
Random Forest Classification
The Random Forest algorithm is an ensemble learning method that combines a great number of Decision Trees. A step by step introduction of how Random Forest algorithm works. Particularly, Microsoft has used the Random Forest algorithm to understand the motions of body parts when developing Kinect.
Implement the Random Forest Classification algorithm to the social network dataset in R using the classification template. The randomForest function is used as a classifier to fit the dataset. A conclusion of all these classifiers we have built for this particular business problem.
Classification Model Selection in Python
Evaluating Classification Models Performance
Introduce the concepts of False Positives and False Negatives.
What is Accuracy Paradox and why shouldn't base your judgment only on accuracy when assessing your model.
What is Cumulative Accuracy Profile (CAP) and how to assess models based on their CAP curves. The distinction between CAP and Receiver Operating Characteristics (ROC).
The analysis of CAP - what intuitive insight can we derive from the CAP curve and how to quantify this effect.
A conclusion of the 7 classification models you have learned in this Part 3.
-------------------- Part 4: Clustering --------------------
Welcome message: How clustering differs from classification; Lecture topics include K-Means Clustering and Hierarchical Clustering.
K-Means Clustering
Understand the difference between Supervised Learning and Unsupervised Learning.
Understand how K-Means Clustering works behind the scene.
Choose the optimal number of clusters K with the Elbow method in K-Means.
Implement the K-Means algorithm with R function ‘kmeans’ to group clients, and visualize the clusters with function ‘clusplot’.
Hierarchical Clustering
Learn the Hierarchical Agglomerative clustering algorithm step by step.
How is a Dendrogram constructed and how does the Dendrogram work.
Implement Hierarchical Clustering using Dendrograms based on the largest Euclidean distance.
Import and prepare dataset for Hierarchical Clustering in R.
Find the optimal number of HC clusters using Dendrogram with R function ‘hclust’, and visualize the results.
Group clients using Hierarchical Clustering algorithm with R function ‘hclust’.
Visualize the Hierarchical Clustering results with R function ‘clusplot’.
Analyze and explain Hierarchical Clustering results in R.
The conclusion to the Hierarchical Clustering
-------------------- Part 5: Association Rule Learning --------------------
A welcome message for Association Rule Learning lectures. Lecture topics include Apriori and Eclat.
Apriori
What is Apriori? Learn the math behind Apriori. Introduce the Apriori algorithm step by step.
Implement a recommendation system with Apriori to optimize sales. Prepare dataset in R and describe the problem.
Train Apriori model with R function ‘Apriori’. Set the minimum support and minimum confidence.
Explain the association rules in the output of Apriori. Visualize the rules and sort them by their decreasing lift (the relevance of a rule) with R function ‘inspect’.
Eclat
The intuition behind the Eclat algorithm: the algorithm and an example of a movie recommendation system using Eclat.
Optimize the sales in a grocery store using Eclat algorithm with R function ‘eclat’ and visualize the results with R function ‘inspect’.
-------------------- Part 6: Reinforcement Learning --------------------
Welcome to Part 6 - Reinforcement Learning, where you will understand and learn how to implement Upper Confidence Bound (UCB) and Thompson Sampling models.
Upper Confidence Bound (UCB)
What is Reinforcement Learning and what is Multi-Armed Bandit Problem. History and modern application of Multi-Armed Bandit Problem. How are the two factors - exploration and exploitation in play to get to the optimal result in the process.
Intuitive concept behind the Upper Confidence Bound (UCB) algorithm and how it solves the Multi-Armed Bandit Problem.
Import and explore the 'Ads Clickthrough rate (CTR) Optimisation' dataset in R. Implement the Random Selection algorithm that consists of selecting at random one version of the ad at each time.
Implement the Upper Confidence Bound (UCB) algorithm in R from scratch.
Continue implementing the UCB algorithm in R. The result obtained with UCB algorithm almost doubles the result with the random selection algorithm, also the best ad is selected.
Visualize the result in R by making histogram to see the number of times each ad is selected.
Thompson Sampling
The intuition behind Thompson Sampling algorithm and how it solves the Multi-Armed Bandit Problem.
A brief comparison of UCB and Thompson Sampling algorithm - pros and cons of each of the algorithms.
Implement the Thompson Sampling algorithm in R from scratch.
Visualize the result in R, and to see the performance of Thompson Sampling algorithm beating UCB algorithm for this problem.
-------------------- Part 7: Natural Language Processing --------------------
Welcome to Part 7 - Natural Language Processing, where you will understand and learn NLP and a well-known model of it - the Bag of Words model.
A little challenge for those up for some practical activities, e.g. try other classification models and evaluate their performances for this particular problem.
How to prepare text dataset. Introduce and import the dataset of 1000 written reviews of restaurants in R.
Clean the text - step 1, create a corpus containing the text of the reviews with 'tm' library in R.
Clean the text - step 2, put all the letters of the reviews in lowercase with 'tm_map' function in R.
Clean the text - step 3, to simplify the corpus, remove all the numbers in the reviews with 'tm_map' function in R.
Clean the text - step 4, to simplify the corpus, remove all the punctuations in the reviews with 'tm_map' function in R.
Clean the text - step 5, to simplify the corpus, remove all the non-relevant words in the reviews with 'tm_map' function in R.
Clean the text - step 6, to simplify the corpus, apply stemming to the reviews with 'tm_map' function in R.
Clean the text - step 7, to simplify the corpus, remove all the extra spaces in the review with 'tm_map' function in R.
Create the sparse matrix of features for the Bag of Words models with 'DocumentTermMatrix' function in R, and reduce the sparsity.
Train the classification model using the Random Forest classification algorithm prepared in Section 16.
A little challenge for those up for some practical activities, e.g. try other classification models and evaluate their performances for this particular problem.
-------------------- Part 8: Deep Learning --------------------
Welcome message. Brief introduction of Deep Learning algorithms and their applications.
How IT technology evolves, what Deep Learning is, and why it’s called Deep Learning.
Artificial Neural Networks
How we’re going to learn Artificial Neural Networks. The list of topics included in the following ANN lectures.
How neurons in human brain work. How do we replicate that with Artificial Neural Network (ANN)?
The math behind 4 types of activation function: Threshold Function, Sigmoid, Rectifier, and Hyperbolic Tangent (tanh). How to choose an activation function?
How to use Artificial Neural Network for property price prediction.
Learning process of a back propagation neural network: how does a neural network pass signals, how is the prediction error back propagated through the neural network.
How does a neural network adjust its weights to minimize the cost function? How to find the optimal weights with Gradient Descent?
Use Stochastic Gradient Descent to optimize weights when the cost function isn’t convex (more than one global minimum). Compare Batch Gradient Descent and Stochastic Gradient Descent.
Summary of the learning process of a back propagation neural network. Steps to train an ANN model with Stochastic Gradient Descent.
Churn prediction of bank customers using Artificial Neural Network.
Dataset preparation for bank customer churns prediction using ANN model in R.
Install and initialize R package ‘h2o’ for customer churns prediction using ANN model.
Build the ANN model for customer churns prediction on the training set with h2o.deeplearning()
Make a prediction on the test set with the trained ANN model using h2o.predict(). Evaluate the model’s performance with the confusion matrix.
Convolutional Neural Networks
What and how we will learn in Convolutional Neural Networks section.
The biological inspiration of Convolutional Neural Networks. How can CNN work?
The three elements of Convolution Operation: input image, feature detector, and feature map. How Convolution Operation works?
Increase the non-linearity in images with ReLU Layer: a supplementary step to the Convolution Operation. What Rectifier is.
What is Pooling? The purpose and types of Pooling. How Max Pooling works. The number game using CNN created by Adam Harley.
How do we flatten the pooled feature map into a column to get the input vector for the ANN.
What is the aim of full connection and the full connection process? How to add a whole artificial neural network to a CNN? How to classify dog and cat images with CNN?
A summary of CNN and its building process
What is Softmax function? Why do we use Cross-Entropy loss to measure the error at a softmax layer?
-------------------- Part 9: Dimensionality Reduction --------------------
Welcome message. The dimensionality reduction techniques will be covered in this course.
Principal Component Analysis (PCA)
What is PCA used for? PCA procedure and examples.
PCA in a few words. Dimension reduction techniques summary. Dataset preparation for applying PCA in R.
Install R packages ‘caret’ and ‘e1071’ for PCA. Apply PCA to extract features and get feature datasets in R.
Fit SVM to the training feature dataset and predict the test set. Evaluate the model with a confusion matrix. Visualize the results.
Linear Discriminant Analysis (LDA)
What is LDA used for? How LDA differ from PCA? The 5 main steps for LDA algorithm.
Wine business customer segmentation using LDA and SVM with R package ‘MASS’ and ‘e1071’. Visualize and explain the results.
Kernel PCA
Implement Kernel PCA with R package ‘kernlab’ and Logistic Regression for purchase prediction application. Visualize and explain the results with R package ‘ElemStatLearn’.
-------------------- Part 10: Model Selection & Boosting --------------------
Welcome to Part 10 - understand and learn the techniques to evaluate the model and to improve the model performance.
Model Selection
Implement k-Fold Cross-Validation in R with 'caret' library, which optimizes the way to evaluate the model by fixing the variance problem for the training.
Implement Grid Search with 'caret' library in R to find the optimal values of the hyperparameters for a machine learning model.
XGBoost
Introduce the XGBoost - a powerful implementation of gradient boosting in terms of model performance and execution speed. A simple implementation of XGBoost in R with 'xgboost' library.
Exclusive Offer
Annex: Logistic Regression (Long Explanation)
The intuition behind logistic regression: e.g. when exploring the correlation between people' age and whether or not they would take a certain action, we could instead predict the probability or likelihood of taking that action. The scientific approach to do so is to apply sigmoid function to linear regression equation.