Machine Learning – Page 2

Logistic Regression Basics

Leave a Comment / Machine Learning / By Admin

Logistic regression is a method for calculating probabilities for problems with limited (often two) potential outcomes. Unlike linear regression, it ensures values can never be greater than 1 or less than 0.

The Basics of Regularization in Machine Learning

Leave a Comment / Machine Learning / By Admin

Regularization is the mitigation we take to ensure data is not overfitted. There are many ways to do this, but among the most important is ensuring we penalize model complexity. Generally: the simpler the model, the less likely we are to overfit.

If loss in validation data initially goes down, but then starts to rise again after a certain number of iterations (while the training data continues to go down), chances are we’re overfitting.

An Overview of Feature Crosses in Machine Learning

Leave a Comment / Machine Learning / By Admin

In linear regression problems, some data doesn’t lend itself easily to direct linear solutions. There are many ways to solve such models, but one particularly valuable one is feature crosses: that is, creating new columns that combine existing features to make data that is easier to work with.

For example: let’s suppose you have a data set where features with negative x and y values OR positive x and y values have a strong correlation to one label, and features with either a negative x/y AND a positive x/y correlate to the the other label. By creating a feature cross for the values of x * y, you could ensure all data with the same positive/negative values showed as positive, and all data with one negative/one positive showed as negative. Suddenly, a linear line becomes a feasible way of separating labels.

A Brief Overview of Feature Engineering

Leave a Comment / Machine Learning / By Admin

When we receive data, how do we determine and groom which data is relevant for the purpose of creating useful machine learning models? This challenge – which is often where a majority of a machine learning engineers’ time is spent – is called feature engineering.

Validation Data

Leave a Comment / Machine Learning / By Admin

If you separate data into training data and test data, you risk overfitting your results to the test data if you do multiple rounds of testing. How can you avoid that risk?

This is where validation data comes in. Validation data is a set of data that sits between training and test data.

Generalization, Overfitting, and Training & Testing Sets

Leave a Comment / Machine Learning / By Admin

In machine learning, generalization refers to a model’s ability to accurately predict values for new data. Overfitting happens when you try to fit a model to suit training data too tightly. Training sets are what you create a model against, whereas test sets are how you validate the accuracy of that model (without having the data influencing the model)

A (very) Brief Overview of Pandas

Leave a Comment / Machine Learning / By Admin

Pandas is a Python library that unlocks “data frames” – a row/column style arrangement similar to spreadsheets – directly in Python. Much like Google Sheets or Microsoft Excel, a data frame has data cells, named columns, and numbered rows.

A (Very) Brief Overview of NumPy

Leave a Comment / Machine Learning / By Admin

Numpy is a Python library for working with matrices. You can define matrices manually, via sequences, or via basic mathematical operations on existing matrices

Stochastic Gradient Descent & Mini-Batch Stochastic Gradient Descent

Leave a Comment / Machine Learning / By Admin

When dealing with massive amounts of data, it is often inefficient to try calculate models based on the entire set. Instead, you want to take a subset of that data to test against. Stochastic Gradient Descent takes a single example and calculates the convergence for that one point

The Basics of Machine Learning Loss Iteration, Gradient Descent, and Learning Rate

Leave a Comment / Machine Learning / By Admin

In this article, we provide a basic overview of loss iteration, gradient descent, and learning rates. Concisely: loss iteration is looping over different values to try and find increasingly more accurate weights for a model. Gradient descent is the idea that we can find more accurate models by examining the gradient of a given loss, and iterate over those gradients until we find one at the “bottom” of a convex shape. Learning rates are the amount of change in each of our “guesses” that try to get us to the most efficient values for a model