Shameful Blog – Page 2 – An overview of technologies I should, shamefully, know better

An Overview of Machine Learning Classification

Leave a Comment / Machine Learning / By Admin

Sometimes, instead of returning a quantification of data (regression), we want to classify it. That is: place it into specific categories.

Often, we’ll use logistic regression as the foundation for our classification challenges. Eg: We could set a probability threshold for a logistic regression, and classify examples based on whether they fit above or below that threshold.

Logistic Regression Basics

Leave a Comment / Machine Learning / By Admin

Logistic regression is a method for calculating probabilities for problems with limited (often two) potential outcomes. Unlike linear regression, it ensures values can never be greater than 1 or less than 0.

The Basics of Regularization in Machine Learning

Leave a Comment / Machine Learning / By Admin

Regularization is the mitigation we take to ensure data is not overfitted. There are many ways to do this, but among the most important is ensuring we penalize model complexity. Generally: the simpler the model, the less likely we are to overfit.

If loss in validation data initially goes down, but then starts to rise again after a certain number of iterations (while the training data continues to go down), chances are we’re overfitting.

An Overview of Feature Crosses in Machine Learning

Leave a Comment / Machine Learning / By Admin

In linear regression problems, some data doesn’t lend itself easily to direct linear solutions. There are many ways to solve such models, but one particularly valuable one is feature crosses: that is, creating new columns that combine existing features to make data that is easier to work with.

For example: let’s suppose you have a data set where features with negative x and y values OR positive x and y values have a strong correlation to one label, and features with either a negative x/y AND a positive x/y correlate to the the other label. By creating a feature cross for the values of x * y, you could ensure all data with the same positive/negative values showed as positive, and all data with one negative/one positive showed as negative. Suddenly, a linear line becomes a feasible way of separating labels.

A Brief Overview of Feature Engineering

Leave a Comment / Machine Learning / By Admin

When we receive data, how do we determine and groom which data is relevant for the purpose of creating useful machine learning models? This challenge – which is often where a majority of a machine learning engineers’ time is spent – is called feature engineering.

Validation Data

Leave a Comment / Machine Learning / By Admin

If you separate data into training data and test data, you risk overfitting your results to the test data if you do multiple rounds of testing. How can you avoid that risk?

This is where validation data comes in. Validation data is a set of data that sits between training and test data.

Generalization, Overfitting, and Training & Testing Sets

Leave a Comment / Machine Learning / By Admin

In machine learning, generalization refers to a model’s ability to accurately predict values for new data. Overfitting happens when you try to fit a model to suit training data too tightly. Training sets are what you create a model against, whereas test sets are how you validate the accuracy of that model (without having the data influencing the model)

A (very) Brief Overview of Pandas

Leave a Comment / Machine Learning / By Admin

Pandas is a Python library that unlocks “data frames” – a row/column style arrangement similar to spreadsheets – directly in Python. Much like Google Sheets or Microsoft Excel, a data frame has data cells, named columns, and numbered rows.

A (Very) Brief Overview of NumPy

Leave a Comment / Machine Learning / By Admin

Numpy is a Python library for working with matrices. You can define matrices manually, via sequences, or via basic mathematical operations on existing matrices

Stochastic Gradient Descent & Mini-Batch Stochastic Gradient Descent

Leave a Comment / Machine Learning / By Admin

When dealing with massive amounts of data, it is often inefficient to try calculate models based on the entire set. Instead, you want to take a subset of that data to test against. Stochastic Gradient Descent takes a single example and calculates the convergence for that one point