Linear Regression Nineth

2 minute read

Comparing Model Predictions to Reality

The mean absolute error (MAE) measures the average amount that the observed outputs vary from the predicted outputs. Using our Uber trip example, MAE represents the average amount of money that our outputs for cost varied from the prediction. To estimate MAE, we begin by calculating the absolute value for each residual in our dataset. We use the absolute value of the residuals here. An absolute value is often used in mathematics to represent distances and, as we've learned, the residual is the distance between the observed and the fitted values. We care only about the magnitude of the residual, and not whether it is positive or negative. Next, we calculate the average of all of the residuals. This value is the MAE. The MAE essentially describes the typical magnitude of the residuals. The equation for MAE looks like this:

MAE=1n i=1n|yiy^i|

where:

  • yi = observed output value

  • yi^ = predicted output value

  • i=1n = the sum of...

  • |yiy^i| = the absolute value of each residual

  • 1n  = divide the above by the total number of observations (to return the average value)

It's okay if the equation looks intimidating. Let's break down how to compute MAE in R:

  1. Perform a linear regression

    • We'll show you how to do this soon - we promise!
  2. Calculate the absolute value of the residuals

    • In R, we do this by calling the abs() function on the variable that contains the residuals for our model.
    • Residuals are available as a standard output of a linear regression model in R, so we don't have to calculate the difference between the observed and predicted values.
  3. Sum the residuals

  4. Divide the sum by n, the number of data points

Notice that summing the residuals and dividing the sum by the number of data points is calculating the average. This can be achieved in R with the mean() function. So, assuming we have performed a linear regression on our data, we calculate MAE in R as follows:

Instructions

Let's practice what we've learned about calculating the MAE. We have performed a linear regression of cost onto distance for you and have provided the predictions as the predictions variable and the residuals as the residuals variable in the file titled uber_trips_lm.csv. The required packages were loaded during previous exercises.

  1. Load the uber_trips_lm.csv into R and assign this dataframe the name uber_trips_lm.
  2. Calculate the mean absolute error and save the result to MAE.
  3. Evaluate the following statement and assign the best answer TRUE or FALSE to the variable MAE_question: Based on the MAE result we calculated, we can say roughly that, cost predicted by distance is inaccurate by about $0.72, on average.

You do not need to use the predictions variable to complete this exercise, but we have provided it for you if you would like to go through the MAE equation above step-by-step.

Updated: