Linear Regression Tenth

2 minute read

Recap

In this mission, we learned about some of the concepts that are fundamental to predictive models of all kinds, not just linear models. We imagined a situation where we recorded trip distance and trip cost data for 50 Uber rides. We learned that this data can be used to fit a model that predicts the total cost of a future Uber trip based on trip distance. We discovered that we can generate models focused solely on predicting a future outcome, or we can generate models to make inferences about the relationship between the predictor variable and the response variable.

It is nearly impossible for a model to be 100% accurate because of error. Some level of error is out of our control and will always exist in any model, but we learned that we can reduce error by selecting a modeling technique that provides the best fit to the data. Or, in the case of bivariate linear regression, we reduce the error as much as possible by selecting the predictor varriable that provides the best estimate of the response variable.

By choosing linear regression, we simplify the process of estimating f, because our model can focus on estimating only two parameters: Intercept and slope. The trade-off we face is that our model may not fit the data well if the true form of f is not linear. Fortunately for us, predicting Uber trip cost based on trip distance looks to be relatively well described by a linear model.

Uber linear model

Once we've trained a model, we can check the fit of the model against our training data. We can we can compare the predicted cost for each Uber trip, and compare it to the recorded cost of each trip. This quantifiable difference between the observed value and the model’s prediction is called a residual. The residuals can be analyzed and collapsed into a useful summary measure like mean absolute error, which is essentially the average distance that the observed output values differ from the predicted values.

We won't need to worry about overfitting any models in this course. But as we progress into using more complex models that require a greater number of predictor variables we will need to be mindful of overfitting to ensure that our model follows true patterns in the data instead of random noise that is unique to the training data.

We haven't covered every fundamental concept relevant to predictive modeling, but we're off to a good start. We'll learn about other concepts as we dig into the details of linear modeling. Throughout this course, we will work with the Uber ride data we have discussed. We will also use publically available property sale data from New York City to predict home sale prices using linear regression.

Updated: