**Course Review: Python for Data Science and Machine Learning Bootcamp**
Before we get started it would be helpful to know what data science and machine learning actually are. So in case you don't know, here are some basic definitions:
Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured
Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to "learn" with data, without being explicitly programmed.Glassdoor has ranked data scientist as the number one job in America with an average salary of $120,000 and over 4,500 job openings (as of the time of this writing). With these kind of numbers there's definitely a good amount of people who want to try out careers in data science, which creates a demand for courses on data science to help them level up their skills. With demand comes supply, which is the reason why there are so many data science and machine learning courses available online and at different institutions. Which presents another challenge, getting to choose the right course to help you in starting out you journey data science and machine learning. The past few weeks I have been taking one of those courses, Python for Data Science and Machine Learning Bootcamp, which is available only on Udemy. Throughout this article I present my take on this online course.
Instructor
This course is the work of Jose Portilla, an experienced Data Scientist with several years in the field and founder of Pierian Data. Jose Portilla is among the top instructors on Udemy with over half a million students and 15 courses. Most of his courses are focused on Python, Deep Learning, Data Science and Machine Learning, covering the latter 2 topics in both Python and R. Jose Portilla is a holder BS and MS in Mechanical Engineering, with several publications and patents to his name. For more information you can check out his profile on Udemy.Target Audience
This is probably the first question you have on any course so as to know of it's a fit for you. Machine learning and data science are advanced topics in math and programming. Therefore, there is a fairly steep learning curve that goes into understanding this concepts, which is why it is even more important to have a good resource to learn from.You can't jump from Novice to Expert. You have to go through the different stages of learning Novice, Intermediate, Advanced then Expert.For this course you have to possess some programming experience. In any language, a basic grasp of the core programming concepts, like data structures, conditional statements, etc. is important to have. It would be preferable to have this experience in Python, which is the programming language used throughout this course. However, knowledge of Python is not a necessity as the course does start out with a Python Crash Course, which will help you understand Python and follow along in the course.
Content Review
This is one of the most immersive courses I have come across. With almost 150 videos, clocking in at just over 21 hours in video length. This course takes the learner through an in-depth training of a number of topics, ranging from a Python crash course, an overview of data analysis libraries, an overview of data visualization libraries, and machine learning algorithms, amongst many others. This course also uses Jupyter NoteBooks which helps in sharing the code and providing a playground for all the code written and executed.Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.In the following sections we'll take a closer look at the actual content in this course.
Python Crash Course
From the name of the course you probably figured that the material would be using Python to explore data science and machine learning, so no surprise there. The Python Crash Course section takes you from the basics and through a few beginner concepts in the Python programming language. The mini crash course takes you through a few Python concepts including data types, conditional operators and statements, loops, lambdas, and many more. Most of the Python knowledge you will need is contained in this section, so you don't need to worry about being a Python expert before taking this course. However, the importance of taking time to get a better grasp of the language before proceeding to other stages can't be over-emphasized, as you'll then be able to focus on the machine learning concepts and not the small details of the programming language.Data Analysis
A very simple way to describe data science is that it involves extracting knowledge and insights from a data set. To be able to process the data and extract insights and information from it you have to be able to analyse it. This begs the question: What exactly is data analysis?Data Analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.Seeing as how critical data analysis is, this course takes time to guide you through several data analysis libraries in Python, which I'll touch on below.
- NumPy: A Python library, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- Pandas: A Python library for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Data Visualization
Data Visualization is critical because it helps with communicating information clearly and efficiently to users by use of statistical graphics, plots, information graphics and other tools.Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects.This course takes the learner through several data visualization libraries in Python, demonstrating to the learner how to create a variety of visualization for a wide range of data sets using the different libraries. Some of the visualization libraries taught in this course include:
- Matplotlib: A Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
- Seaborn: A Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- Pandas: A data library that has both analysis and visualization capabilities.
- Plotly: An interactive visualization library.
- Cufflinks: A library that helps connect Plotly with Pandas.
- Geographical Plotting: Creating choropleth maps for geographic data visualization.
Machine Learning
This is the second part of the course, which takes the learner through several machine learning algorithms. The course takes several steps to help students' understanding of the algorithm by offering instructions on theory, supplemental reading, a Python implementation of the algorithm, exercises on the algorithm, and solutions to the exercises. The course covers the different types of machine learning algorithms, namely supervised learning, unsupervised learning, and reinforcement learning extensively. Some of the machine learning algorithms covered in this course include:- Linear Regression: It is used to estimate real values based on continuous variables.
- Logistic Regression: It is used to estimate discrete values based on given set of independent variables.
- K Nearest Neighbour: kNN is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure.
- Natural Language Processing: The application of computational techniques to the analysis and synthesis of natural language and speech.
- Neural Nets and Deep Learning: Neural networks are computer system modelled on the human brain and nervous system. Deep learning, a powerful set of techniques for learning in neural networks.
- Support Vector Machines: SVM is supervised machine learning algorithm which can be used for both classification or regression challenges.
- K-Means Clustering: K-Means Clustering aims to partition observations into clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.