Basic Machine Learning Terminology

Here are some of the most common machine learning concepts:

Labels

Labels are the conclusions we’re trying to make about data via machine learning. For example: if we’re using machine learning to determine whether a a house is overpriced or underpriced, “overpriced” or “underpriced” might be the label. That label is something we determine by analyzing features. Features can be considered the inputs (eg: square feet, neighbourhood, age of house, etc), while labels are the conclusion we draw from features (underpriced or overpriced).

Features

Features are the individual inputs we consider to generate machine learning conclusions (labels). For example: if we’re trying to determine whether a house is overpriced or underpriced, we might consider features such as size of house, age of house, neighbourhood, etc

Examples

An example is a specific instance of data. Eg: for estimating house price, an example would be data related to an individual house.

That data may be a labeled example (one where we have made a price estimate), or an unlabeled example (one where we have input data, but have not drawn any conclusions)

Models

A machine learning model is the solution used to draw conclusions from data. It is generated by running a machine learning algorithm. The output of running the algorithm over a set of data generates the model

Regression

Regression refers to machine learning problems that output continuous values (eg: most commonly, numbers). Regression models may predict things such as price, size, area, age, etc.

Classification

As opposed to regression models, classification models predict categories that data may belong in. Eg: nationality, spam or not spam, animal type, etc.