Linear Regression Nineth
Comparing Model Predictions to Reality
The mean absolute error (MAE) measures the average amount that the observed outputs vary from the predicted outputs. Using our Uber trip example, MAE represents the average amount of money that our outputs for cost
varied from the prediction. To estimate MAE, we begin by calculating the absolute value for each residual in our dataset. We use the absolute value of the residuals here. An absolute value is often used in mathematics to represent distances and, as we've learned, the residual is the distance between the observed and the fitted values. We care only about the magnitude of the residual, and not whether it is positive or negative. Next, we calculate the average of all of the residuals. This value is the MAE. The MAE essentially describes the typical magnitude of the residuals. The equation for MAE looks like this:
where:
-
= observed output value
-
= predicted output value
-
= the sum of...
-
| = the absolute value of each residual
-
= divide the above by the total number of observations (to return the average value)
It's okay if the equation looks intimidating. Let's break down how to compute MAE in R:
-
Perform a linear regression
- We'll show you how to do this soon - we promise!
-
Calculate the absolute value of the residuals
- In R, we do this by calling the
abs()
function on the variable that contains the residuals for our model. - Residuals are available as a standard output of a linear regression model in R, so we don't have to calculate the difference between the observed and predicted values.
- In R, we do this by calling the
-
Sum the residuals
-
Divide the sum by , the number of data points
Notice that summing the residuals and dividing the sum by the number of data points is calculating the average. This can be achieved in R with the mean()
function. So, assuming we have performed a linear regression on our data, we calculate MAE in R as follows:
Instructions
Let's practice what we've learned about calculating the MAE. We have performed a linear regression of cost
onto distance
for you and have provided the predictions as the predictions
variable and the residuals as the residuals
variable in the file titled uber_trips_lm.csv
. The required packages were loaded during previous exercises.
- Load the
uber_trips_lm.csv
into R and assign this dataframe the nameuber_trips_lm
. - Calculate the mean absolute error and save the result to
MAE
. - Evaluate the following statement and assign the best answer
TRUE
orFALSE
to the variableMAE_question
: Based on the MAE result we calculated, we can say roughly that,cost
predicted bydistance
is inaccurate by about $0.72, on average.
You do not need to use the predictions
variable to complete this exercise, but we have provided it for you if you would like to go through the MAE equation above step-by-step.