Maximum Likelihood Estimation

Have you ever wanted to be able to predict things in nature? Whether it is the species of an animal, roll of a die, or anything that has a specific class. It might surprise you that you are able to do this using some math you may already know. But there is one issue when it comes to predicting outcomes of nature. A major drawback when it comes to predicting things in nature is that it’s rare that we have the true distribution of where the data came from. Luckily there is a solution to this. One approach that data scientists use is they assume the data comes from some parametric density given the data is continuous.

Thinking as a Data Scientist

To start Data Scientists first make an assumption that being that the true, underlying density has a certain form. A common assumption used is that the true distribution is Gaussian, also known as normal distribution. With this assumption there are two parameters that determine the shape of the distribution: the mean and the standard deviation. The mean controls the center and the standard deviation controls the width of the distribution. Because we do not know the true density of the distribution we have to make assumptions on the parameters. This allows us to make an estimation to what the true distribution may look like.

Looking at Real Data

Below is data sampled from the NBA showing the average number of rebounds center and points guards gotten in a random NBA season. As you can see you are able to input your own Means and Standard deviations. You will be playing the role as a Data Scientist in trying to make a model distribution that best fits the data. Go ahead and see how well you do! Then whenever you feel confident in your answer, compare yours with the best possible fit using our interactive features.

In the following visualization, you will be able to find the rebound percentage for point guards in purple and the rebound percentage for center players in light blue. By looking at their distribution in the x-axis, try to guess how their best fit Gaussian distributions would look like.

mu (Curve 1): expected value sigma (Curve 1): standard deviation mu (Curve 2): expected value sigma (Curve 2): standard deviation

Let's see how your prediction for the distributions of these two positions compares to the actual distributions. Click on "Show Distribution" to reveal the true distributions.

As you experimented with different parameter inputs you may have observed some inputs gave better results than others. Is there a way to quantify this without guessing and checking? Luckily, some clever people have figured out how to quantify this step. First is to assume the data are all sampled independently. Next let's say that P(xi, Mu, Sigma) is the probability of seeing xi, a single data point, with the parameters Mu and Sigma. Then P(x1, Mu, Sigma) * P(x2, Mu, Sigma) *... * P(xn, Mu, Sigma) tells us the likelihood of seeing x1, x2,..., xn, the entire data set, at the same time. We can think of this as a function of Mu and Sigma. The likelihood function takes in a Mu and Sigma and returns the probability that data was generated by the inputted Mu and Sigma.

The goal of creating this likelihood function is to maximize it by finding the Mu and Sigma that gives us the highest likelihood. One way some of you all know is by the use of good old calculus. Which is to take the partial derivatives of the likelihood function for Mu and Sigma, set them to 0 then solve for them.

After you get through all the math, you are left with the equations below. For those who are interested in the full derivations you can view them here.

The mu maximum likelihood estimator is just the mean of the sampled data. And the Sigma maximum likelihood estimator is the standard deviation of data but using the mu mle. By calculating these parameters from the data we can now fit a gaussian that best fits our data.

Ok but I know you are wondering how does this help you predict nature. Well we can use this to estimate probability densities then by using Bayes Theorem. But how? Well, back to our data set of centers and point guards. We will create a simple way to predict whether a NBA player is a point guard or center by the number of rebounds. We will create two separate fitted gaussians for each position like you seen before. So we will have P(X| Y = PG) and P(X| Y = Center). Then multiply each of those distributions by P(Y=PG) and P(Y=Center) respectively. Now with those gaussians how would we predict the position of a player? Well all you need to do is compare P(X| Y = PG)P(Y=PG) and P(X| Y = Center)P(Y=Center). Which ever has the higher probability then that will be the predictions.

Try it yourself below. We already created the P(X| Y = PG)P(Y=PG) and P(X| Y = Center)P(Y=Center) distributions for you. All you have to do is input the average rebounds of a center or point guard you know or look one up. Then click submit and see the predictions. Did we get it right?

Prediction:

As you can see we did not use an advanced algorithm to create our predictions. We simply used MLE to estimate probability densities so we could use Bayes Theorem to find the probabilities of whether the player with the inputted rebound percentage was a center or point guard and chose the higher probability to make our prediction.

Some of you who are more familiar with statistics and math may wonder why would someone estimate the densities P(𝑥 | 𝑌 = 0) and P(𝑥 | 𝑌 = 1) using a parametric approach instead of using a non-parametric approach like using histograms. Well one reason is that histograms suffer from the curse of dimensionality meaning when there are more dimensions (aka. features) histograms will require significantly much more data to have a good fit. Meaning if the sample is low and you can make a fair assumption about the underlying true density of the data then a parametric approach might suit you better. Predicting nature is a very complicated process since we will never fully understand how to quantify all the random interactions that happens that sum up into in a single outcome or outcomes. But by having a variety of tools like MLE it allows us to predict nature the best we can and perhaps one day we may get really close to predict nature.