Linear Regression Nineth
Comparing Model Predictions to Reality
The mean absolute error (MAE) measures the average amount that the observed outputs vary from the predicted outputs. Using our Uber trip example, MAE represents the average amount of money that our outputs for cost varied from the prediction. To estimate MAE, we begin by calculating the absolute value for each residual in our dataset. We use the absolute value of the residuals here. An absolute value is often used in mathematics to represent distances and, as we've learned, the residual is the distance between the observed and the fitted values. We care only about the magnitude of the residual, and not whether it is positive or negative. Next, we calculate the average of all of the residuals. This value is the MAE. The MAE essentially describes the typical magnitude of the residuals. The equation for MAE looks like this:
where:
-
= observed output value
-
= predicted output value
-
= the sum of...
-
| = the absolute value of each residual
-
= divide the above by the total number of observations (to return the average value)
It's okay if the equation looks intimidating. Let's break down how to compute MAE in R:
-
Perform a linear regression
- We'll show you how to do this soon - we promise!
-
Calculate the absolute value of the residuals
- In R, we do this by calling the
abs()function on the variable that contains the residuals for our model. - Residuals are available as a standard output of a linear regression model in R, so we don't have to calculate the difference between the observed and predicted values.
- In R, we do this by calling the
-
Sum the residuals
-
Divide the sum by , the number of data points
Notice that summing the residuals and dividing the sum by the number of data points is calculating the average. This can be achieved in R with the mean() function. So, assuming we have performed a linear regression on our data, we calculate MAE in R as follows:
Instructions
Let's practice what we've learned about calculating the MAE. We have performed a linear regression of cost onto distance for you and have provided the predictions as the predictions variable and the residuals as the residuals variable in the file titled uber_trips_lm.csv. The required packages were loaded during previous exercises.
- Load the
uber_trips_lm.csvinto R and assign this dataframe the nameuber_trips_lm. - Calculate the mean absolute error and save the result to
MAE. - Evaluate the following statement and assign the best answer
TRUEorFALSEto the variableMAE_question: Based on the MAE result we calculated, we can say roughly that,costpredicted bydistanceis inaccurate by about $0.72, on average.
You do not need to use the predictions variable to complete this exercise, but we have provided it for you if you would like to go through the MAE equation above step-by-step.