Machine Learning

The Basics of Regularization in Machine Learning

Regularization is the mitigation we take to ensure data is not overfitted. There are many ways to do this, but among the most important is ensuring we penalize model complexity. Generally: the simpler the model, the less likely we are to overfit.

If loss in validation data initially goes down, but then starts to rise again after a certain number of iterations (while the training data continues to go down), chances are we’re overfitting.

An Overview of Feature Crosses in Machine Learning

In linear regression problems, some data doesn’t lend itself easily to direct linear solutions. There are many ways to solve such models, but one particularly valuable one is feature crosses: that is, creating new columns that combine existing features to make data that is easier to work with.

For example: let’s suppose you have a data set where features with negative x and y values OR positive x and y values have a strong correlation to one label, and features with either a negative x/y AND a positive x/y correlate to the the other label. By creating a feature cross for the values of x * y, you could ensure all data with the same positive/negative values showed as positive, and all data with one negative/one positive showed as negative. Suddenly, a linear line becomes a feasible way of separating labels.

Validation Data

If you separate data into training data and test data, you risk overfitting your results to the test data if you do multiple rounds of testing. How can you avoid that risk?

This is where validation data comes in. Validation data is a set of data that sits between training and test data.

The Basics of Machine Learning Loss Iteration, Gradient Descent, and Learning Rate

In this article, we provide a basic overview of loss iteration, gradient descent, and learning rates. Concisely: loss iteration is looping over different values to try and find increasingly more accurate weights for a model. Gradient descent is the idea that we can find more accurate models by examining the gradient of a given loss, and iterate over those gradients until we find one at the “bottom” of a convex shape. Learning rates are the amount of change in each of our “guesses” that try to get us to the most efficient values for a model