All Flashcards
Explain the impact of outliers on a regression model.
Outliers can drastically reduce the correlation and may change the y-intercept of the regression line.
Explain the impact of high-leverage points on a regression model.
High-leverage points can significantly change the slope and may change the y-intercept of the regression line.
Explain how to transform data for an exponential model.
Take the natural logarithm (ln) of the y-values to linearize the relationship between ln(y) and x.
Explain how to transform data for a power model.
Take the natural logarithm (ln) of both the x and y-values to linearize the relationship between ln(y) and ln(x).
Explain how residual plots help in assessing model fit.
A random scatter of points in the residual plot indicates a good fit. Patterns suggest the model is not appropriate.
Explain the meaning of R² value.
R² represents the percentage of variation in the response variable explained by the model. Higher R² generally indicates a better fit.
What are the differences between outliers and high-leverage points?
Outliers: y-value far from the regression line, large residual | High-Leverage Points: x-value far from other points, potentially changes slope.
What are the differences between exponential and power model transformations?
Exponential: ln(y) vs. x | Power: ln(y) vs. ln(x)
What are the differences between the effects of outliers vs high leverage points?
Outliers: Affect correlation and y-intercept more | High Leverage Points: Affect the slope more
What are the differences between interpreting 'b' in transformed exponential and power models?
Exponential: b* needs to be exponentiated (e^b*) to find original 'b' | Power: b* is the original 'b'
What are the differences between the original exponential and power models?
Exponential: ŷ = abˣ (y changes exponentially with x) | Power: ŷ = axᵇ (y changes by a power of x)
What is the definition of an outlier?
A data point with a y-value far from the regression line, resulting in a large residual.
What is the definition of a high-leverage point?
A data point with an x-value far from the other data points.
Define influential point.
A data point that significantly alters the slope, y-intercept, and/or correlation of a regression model.
What is data transformation in statistics?
The process of applying a mathematical function (e.g., logarithm) to data to achieve linearity or stabilize variance.
Define residual.
The difference between the observed y-value and the predicted y-value (y - ŷ).