Machine Learning Terminology
Classification is a part of supervised learning (learning with labeled data) through which data inputs can be easily separated into categories. In machine learning, there can be binary classifiers with only two outcomes (e.g., spam, non-spam) or multi-class classifiers (e.g., types of books, animal species, etc.).
One of the most popular classification algorithms is a decision tree (essential for both data scientists and machine learning engineers), whereby repeated questions leading to precise classifications can build an “if-then” framework for narrowing down the pool of possibilities over time.
You can learn about other classification algorithms, like naive Bayes, k-nearest neighbor, and artificial neural networks, here.
Clustering is a form of unsupervised learning (learning with unlabeled data) that involves grouping data points according to features and attributes.
Clustering can be used to organize customer demographics and purchasing behavior into specific segments for targeting and product positioning. It can also analyze housing quality and geographic locations to create real estate valuations and plan the layout of new city developments. It can classify information by topics within libraries or web pages and compile an easily accessible directory for users.
The most common kind of clustering is K-means clustering, which involves representing each cluster by a variable “k” and then defining the centroid of those clusters. All data points are then assigned to a particular cluster and, through this process, we identify the centroid of the new clusters. Here are a few examples of what K-means clustering looks like in practice:
A hospital wants to locate emergency units at the minimum possible distance from areas where accidents frequently happen
A seismologist studies regions where earthquakes have occurred over the last few decades to identify the areas of greatest risk
A pizzeria wants to understand where to locate stores based on customer demand to minimize the distance the drivers need to travel for delivery
Other clustering methods that you can learn more about here include density-based clustering, hierarchical-based methods, partitioning methods, and grid-based methods.
Regressions create relationships and correlations between different types of data. For example, each profile picture has an image with pixels that belong to a person. With static prediction (one that stays the same over time), machine learning acknowledges that a certain pixel arrangement corresponds to a given name and allows for facial recognition (for example, when Facebook recommends tags for the photos you’ve just uploaded).
Regressions can also be useful when predicting outcomes based on data in the present. For a long time, statistical regression has been used to solve problems, such as predicting the recovery of cognitive functions after a stroke or predicting customer churn in the telecommunications industry. The only difference is that now many of these regression analyses can be done more efficiently and quickly by machines.
Regression is a type of structured machine learning algorithm where we can label the inputs and outputs. Linear regression provides outputs with continuous variables (any value within a range), such as pricing data. Logistical regression is when variables are categorically dependent and the labeled variables are precisely defined. For example, you can classify whether a store is open as (1) or (0), but there are only two possibilities.
Other types of regression that you can explore here are polynomial regression, support vector regression, decision tree regression, and random forest regression.
Deep learning is similar to machine learning—in fact, it’s more of an application of machine learning that imitates the workings of the human brain. Deep learning networks interpret big data (data that is too large to fit on a single computer)—both unstructured and structured—and recognize patterns. The more data they can “learn” from, the more informed and accurate their decisions will be. Here are some examples of deep learning in practice:
Chatbots and virtual assistants: Virtual assistants like Alexa and Siri or customer service chatbots on different web pages can receive human requests, decipher language, and present lifelike responses.
Real-time bidding and programmatic advertising: Advertising now depends on software buying advertising space through a competitive bidding process. Cognitiv AI is an example of a deep learning platform that synthesizes data on customer demographics, weather, available inventory, time of day, and other variables to create custom buying algorithms for a specific target market.
Neural networks are closely related to deep learning. They create sequential layers of neurons that deepen the understanding of data collected from a machine to provide an accurate analysis.
A neural network consists of layers of nodes, which receive stimulation from “trigger” data. This data then is assigned a weight through coefficients, as some data inputs may be more significant than others.
Neurons normally come in three different layers: an input layer of data, a hidden layer with mathematical computations, and an output layer. In an example where we want to estimate airline ticket prices, our input layer would collect the origin airport, destination airport, departure date, and airline. Each of those would receive a weight (perhaps the departure date matters more than the airline) and then the output would deliver a price prediction.
Natural Language Processing
Natural language processing is the subfield of AI that processes human languages. It is a very important term in the field of data science and machine learning. The challenge is that often human speech is not literal. There are figures of speech, words, or phrasing specific to certain dialects and cultures, and sentences that can take on different meanings with grammar and punctuation. Similar to human conversations, natural language processors need to use the syntax (arrangement of words) and semantics (meaning of that arrangement) to come up with correct interpretations.
The first step in natural language processing is converting unstructured language data into a form that can be read by a computer. The computer then assigns meaning to each sentence through algorithms and translates it back, often in another form (for example, speech to text or from one language to another).
Natural language processing can help translation apps like Google Translate, document collaboration and communication tools like Slack and Microsoft Word, and virtual assistants. Here’s an example of how the Royal Bank of Scotland incorporates text analytics to parse through customer service complaints from emails, surveys, and call centers to isolate problem areas and enact improvements to their relationships and reputation.
Machine vision, or computer vision, is the process by which machines can capture and analyze images. This allows for the diagnosis of skin cancer by looking at X-rays and other medical imagery, and for the detection of real-time traffic and vehicle types for self-driving cars, like Tesla’s new models.
There are many different ways that machines can “see”: representing colors numerically, decomposing images into different parts, and identifying corners, edges, and textures. As the machines gather and code more information, they begin to view the larger picture.
Many of the trends around machine vision right now include integration into the industrial internet of things, which involves collecting productivity inputs and sensory data in factories, and non-industrial applications like “driverless cars, autonomous farm equipment, drone applications, intelligenhttps://www.springboard.com/blog/data-science/machine-learning-terminology/t traffic systems, and guided surgery.”