**Course Review: Machine Learning A-Z - Hands-On Python & R in Data Science**
PrefaceEvery day, we are experiencing continuous innovation across numerous fields, and the tremendous growth in the field of computing offers various technologies for us to consume. We are generating over 2 exabytes of data every day, which is too difficult to be handled just by human effort. Engineers all over the world have come up with automations to take care of such exercises. And the next step in this process is Machine Learning, which enables computer algorithms to make educated decisions in certain scenarios. With the buzz of technologies like artificial intelligence, self-driving vehicles, and speech recognition around, all of us have used machine learning as a technology knowingly or otherwise. The extraordinary breakthroughs in the field makes us willing to explore machine learning concepts more so than ever. While browsing through the existing training material for ML and its applications, I landed upon an excellent online course on Udemy: Machine Learning A-Z: Hands-On Python & R In Data Science by Kirill Eremenko and Hadelin de Ponteves.
InstructorsLet's get to know the instructors before digging into the course details:
Kirill EremenkoAn expert in the field of data science and forex trading, Kirill Eremenko has over 5 years of experience across various industries including finance, retail, and transport. Also skilled in big data, he works as a consultant and conducts courses on Udemy through SuperDataScience Team. Having earned degrees in Physics and Mathematics, Kirill combines his professional experience with academic background and provides excellent courses for learning enthusiasts. For the past decade, he also enjoys forex trading as it gives him a sense of independence personally as well as financially. This combined with the knowledge of data science gives him an edge in the domain of algorithmic trading. As an analyst, he is good at analyzing patterns in processes and human behavior. He uses various technologies such as scripting, Java, and MQL4 for the same.
Hadelin de PontevesPassionate about AI, Hadelin de Ponteves loves to conduct courses covering topics such as machine learning, deep learning and artificial intelligence. He has a master's degree in Data Science and plenty of experience in the field of machine learning. He worked with the AI team at Google to implement ML models for business analytics. Currently he is dedicated full time towards conducting various courses for learners and devoted towards sharing the knowledge he gained through his experience. He possesses a unique blend of analytical skills and creativity which is evident in his courses. Plus, as the course podcast says, he sleeps only 3 hours a day, and he's done that for the past 3 years! Have you met someone else who does that? Talk about passion for their work!
OverviewThe course spans across a staggering 285 lectures and the duration comes out to be around 41 hours. The target audience includes learners of all levels, from beginner to advanced. At the time of this writing, more than 290K students have undertaken the course on Udemy which provides credibility to its contents. As the course is designed by two professional Data Scientists, it's extensive in terms of contents. At the same time, it's organized in a fashion that learners across all levels will be able to grasp the concepts easily. To understand and implement the ML models, the instructors make use of Python and R, two popular programming languages commonly used in the field of data science. The learners have the option of going with the language of their choice and skip the other, or they can try out both. As a result, the instructors not only go hand-in-hand with learners about ML concepts, but these programming languages as well. Lots of learning involved!
OutlineThe instructors try to make the course an exciting experience for the learners, the course begins with an introductory session which explains the applications of machine learning and then proceed to the installation of Python and R runtime environments. The IDEs being chosen for the course are Anaconda for Python and R Studio for R programming. Latest versions are used for all the installations. Once we are ready with the environments, we dive into the actual course contents, which are divided into the following parts, which in turn are divided into multiple sections subsequently:
Part 1 - Data PreprocessingAs we know that machine learning algorithms deal with a huge amount of data, the first step is to preprocess the data to convert it into desired format. This makes it easier to apply subsequent steps on the same. The section discusses importing required libraries and datasets, how to deal with missing data entries, categorize the data into test and training sets etc. The instructors also go through some Python basics assuming the learners have no prior knowledge about the language. The course has organized the required libraries to be imported for Python and R separately. The datasets, however, are common for both the languages.
Part 2 – RegressionOnce the datasets are ready, next step is applying various regression models on the same for future predictions. The available models include Simple Linear, Multiple Linear, Polynomial, Support Vector regressions as well as Decision Trees and Random Forest classifications. At the end of the section, the course compares these models against each other for performance, indicating which one suits better as per necessity.
Part 3 – ClassificationJust as we used regression models to predict continuous numbers, classifications are used to predict a category. Classifications are used in various applications ranging from healthcare to designing marketing strategies. The models used for the same are Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Kernel SVM, Naive Bayes, Decision Tree and Random Forest Classifications. To understand the use case of each model, the part summarizes the pros and cons of each classification scheme at the end.
Part 4 – ClusteringAs the name suggests, clustering performs grouping on the dataset based upon various parameters. The section discusses the clustering models, namely K-Means and Hierarchical Clustering. The basic difference between these two lies in the number of clusters which is predefined in K-means, whereas hierarchical model gives us the optimal number of clusters as a result. The later however is not suitable for large datasets.
Part 5 - Association Rule LearningAssociation Rule Learning deals with establishing relations between entities, common examples that we regularly encounter are the social media or e-commerce recommendation algorithms. The models used for this purpose are Apriori and Eclat.
Part 6 - Reinforcement LearningAlso known as online learning, reinforcement learning observes the data till a certain time interval and decided the course of action for the next. Particularly used in AI for training machines, RL is a trial and error based method which rewards the AI for desired results, punishes otherwise. This is achieved using Upper Confidence Bound and Thompson Sampling models.
Part 7 - Natural Language ProcessingArguably the most widely used application of ML is NLP, common examples being speech recognition, text-to-speech conversion, and translation. While it seems logical to use NLP for these purposes, its keyword lookup functionality has far more implications ranging from healthcare to finance industries. Under the hood, NLP algorithms are nothing but classification models such as Logistic Regression, Naive Bayes, CART & Maximum Entropy (related to Decision Trees), and Hidden Markov models. A common model for NLP is Bag-of-words model which preprocesses text further to be consumed by classification models. At the end of the section, it's left for the learners to evaluate performance of each of these models, as an exercise.
Part 8 - Deep LearningAn upcoming stream of ML, deep learning is used for achieving various complex tasks. Mechanisms of deep learning targeted towards specific applications include: Artificial Neural Networks for regression and classification, Convolutional Neural Networks for Computer Vision, Recurrent Neural Networks for time series analysis, etc. The section covers Artificial and Convolutional Neural Networks.
Part 9 - Dimensionality ReductionUsing Dimensionality reduction techniques, the number of independent variables can be reduced for better visualization of ML models. Lesser the number of variables, the easier it is to plot them on a graph for comparison. There are 2 types of Dimensionality Reduction visualizations: Feature Selection and Feature Extraction. Feature Selection includes techniques such as backward elimination, forward selection, score comparison and more. As these are already covered in the Regression section, this particular section focuses on Feature Extraction methodologies such as Principal Component Analysis, Lunar Discriminant Analysis, Kernel PCA and Quadratic DA.
Part 10 - Model Selection & BoostingHaving learned all these models in the previous sections, it's quite possible to be confused about the selection of a specific model for a situation. This section discusses the techniques for model selection such as k-fold Cross Validation, Parameter Tuning and Grid Search. Also, the course concludes with a bonus section focusing on one of the powerful and popular machine learning models, XGBoost. One of the best things about the course is that it doesn't just focus on the theoretical aspects of machine learning, they engage learners into real world exercises. The exercises help them understand the concepts better and make them well equipped to solve large scale problems. The course also provides learners with additional datasets and code templates to play with - go build your own ML models!
What Worked WellHere are some positives about the course which stood out:
- Meet the Instructors: What's different about the course is, occasionally Kirill and Hadelin interview with the students via a podcast. They go through their backgrounds and a general overview of the course. Whoever wants to dig up more about the offerings of the course, I suggest the podcast would be a good starting point. Also, it's quite informal in nature, as they also talk about the projects that they are currently handling and other courses conducted by them, such as Deep Learning A-Z: Hands-On Artificial Neural Networks.
- Interactive Exercises: Another interesting aspect of the course is, the exercises are not just for the learners to finish and forget. They are expected to post their solution to the instructors via Q&A section or PM, basically initiating a conversation channel where the solution will not only be evaluated but also discussed upon.
- Comprehensive Q&A Section: As a lot of folks have taken the course already, the Q&A addresses most of the commonly encountered issues. I myself was able to resolve an issue faced during installation by following the steps mentioned under Q&A!
What can be ImprovedAs for everything, the course has some areas for improvement:
- As machine learning concepts tend to be mostly technical in nature, some of the learners may find the learning curve a bit steep, especially for the first few sections. Though it's more to be blamed upon ML as a field rather than the course, the prerequisites just say high school level mathematics, hence beginners might be a bit overwhelmed with the breadth and depth of the concepts. The course can either expect learners to refresh some of the required concepts in advance or the initial sections can be further divided into subsections to make it easier for learners to grasp the concepts.