How to Read Computer Output for Regression

Announcement

How to Read the Output From Uncomplicated Linear Regression Analyses

This is the typical output produced from a simple linear regression of muscle strength (STRENGTH) on lean torso mass (LBM). That is, lean body mass is being used to predict musculus force.

Model Summary(b)
R R Square Adjusted R Square Std. Error of the Guess
.872(a) .760 .756 19.0481
a Predictors: (Constant), LBM
b Dependent Variable: Force

ANOVA
Source Sum of Squares df Mean Foursquare F Sig.
Regression 68788.829 one 68788.829 189.590 .000
Residual 21769.768 60 362.829

Full 90558.597 61




Coefficients
Variable Unstandardized Coefficients Standardized Coefficients t Sig. 95% Confidence Interval for B
B Std. Error Beta Lower Bound Upper Bound
(Constant) -13.971 10.314
-1.355 .181 -34.602 6.660
LBM 3.016 .219 .872 13.769 .000 2.577 3.454
Table of Coefficients

The cavalcade labeled Variable should be cocky-explanatory. Information technology contains the names of the items in the equation and labels each row of output.

The Unstandardized coefficients (B) are the regression coefficients. The regression equation is

Force = -13.971 + three.016 LBM
The predicted muscle strength of someone with 40 kg of lean body mass is
-13.971 + three.016 (xl) = 106.669

For cantankerous-sectional information like these, the regression coefficient for the predictor is the difference in response per unit of measurement difference in the predictor. For longitudinal data, the regression coefficient is the modify in response per unit change in the predictor. Here, strength differs 3.016 units for every unit of measurement difference in lean body mass. The distinction between cross-sectional and longitudinal data is still important. These strength data are cross-sectional so differences in LBM and forcefulness refer to differences betwixt people. If we wanted to describe how an individual's muscle force changes with lean body mass, we would have to measure force and lean body mass every bit they modify within people.

The Standard Errors are the standard errors of the regression coefficients. They tin exist used for hypothesis testing and constructing conviction intervals. For example, the standard mistake of the STRENGTH coefficient is 0.219. A 95% conviction interval for the regression coefficient for Force is synthetic as (3.016 k 0.219), where 1000 is the appropriate percentile of the t distribution with degrees of liberty equal to the Error DF from the ANOVA table. Here, the degrees of freedom is threescore and the multiplier is 2.00. Thus, the confidence interval is given by (3.016 2.00 (0.219)). If the sample size were huge, the error degress of freedom would be larger and the multiplier would become the familiar 1.96.

The Standardized coefficients (Beta) are what the regression coefficients would exist if the model were fitted to standardized data, that is, if from each ascertainment we subtracted the sample mean and then divided by the sample SD. People in one case thought this to be a expert thought. It isn't, yet some packages go along to report them. Other packages similar SAS do not. We volition hash out them later when we discuss multiple regression.

The t statistic tests the hypothesis that a population regression coefficient is 0, that is, H0: = 0. Information technology is the ratio of the sample regression coefficient B to its standard mistake. The statistic has the form (estimate - hypothesized value) / SE. Since the hypothesized value is 0, the statistic reduces to Estimate/SE. If, for some reason, we wished to test the hypothesis that the coefficient for Forcefulness was 1.7, we could calculate the statistic (iii.016-i.700)/0.219.

Sig. labels the two-sided P values or observed significance levels for the t statistics. The degrees of liberty used to calculate the P values is given by the Error DF from the ANOVA table. The P value for the contained variable tells u.s. whether the independent variable has statistically signifiant predictive adequacy.

In theory, the P value for the abiding could exist used to determine whether the constant could be removed from the model. In do, we exercise not ordinarily do that. There are two reasons for this.

  1. When there is no abiding, the model is
    Y = b1 X ,
    which forces Y to be 0 when X is 0. Even this is status is appropriate (for example, no lean torso mass means no forcefulness), it is frequently wrong to place this constraint on the regression line. Most studies are performed with the independent variable far removed from 0. While a straight line may be appropriate for the range of information values studied, the relationship may not be a direct line all the fashion down to values of 0 for the predictor.
  2. Standard practice (hierarchical modeling) is to include all simpler terms when a more complicated term is added to a model. Goose egg is simpler than a abiding. So if a change of Y with 10 is to exist place in a model, the constant should be included, too. It could be argued this is a variant of (ane).
The Analysis of Variance Table

The Analysis of Variance table is also known equally the ANOVA table (for Assay Of VAriance). It tells the story of how the regression equation accounts for variablity in the response variable.

The cavalcade labeled Source has iii rows: Regression, Residual, and Total. The column labeled Sum of Squares describes the variability in the response variable, Y.

The total amount of variability in the response is the Full Sum of Squares, . (The row labeled Full is sometimes labeled Corrected Total, where corrected refers to subtracting the sample mean before squaring and summing.) If a prediction had to exist fabricated without whatever other information, the best that could exist washed, in a sure sense, is to predict every value to be equal to the sample mean. The fault--that is, the amount of variation in the data that can't exist accounted for by this simple method--is given by the Full Sum of Squares.

When the regression model is used for prediction, the error (the corporeality of dubiety that remains) is the variability nigh the regression line, . This is the Residual Sum of Squares (residual for left over). Information technology is sometimes called the Mistake Sum of Squares. The Regression Sum of Squares is the difference between the Total Sum of Squares and the Residual Sum of Squares. Since the total sum of squares is the total amount of variablity in the response and the remainder sum of squares that withal cannot be deemed for after the regression model is fitted, the regression sum of squares is the amount of variablity in the response that is accouned for by the regression model.

Each sum of squares has a corresponding degrees of freedom (DF) associated with it.  Total df is n-1, one less than the number of observations. The Regression df is the number of contained variables in the model. For simple linear regression, the Regression df is 1. The Error df is the difference between the Full df and the Regression df. For elementary linear regression, the residual df is due north-2.

The Hateful Squares are the Sums of Squares divided by the corresponding degrees of freedom.

The F statistic, besides known as the F ratio, will be described in detail during the discussion of multiple regression. When there is only one predictor, the F statistic will be the square of the predictor variable's t statistic.

is the squared multiple correlation coefficient. It is also chosen the Coefficient of Determination. R² is the Regression sum of squares divided by the Total sum of squares, RegSS/TotSS. Information technology is the fraction of the variability in the response that is fitted by the model. Since the Full SS is the sum of the Regression and Residual Sums of squares, R² tin can exist rewritten as (TotSS-ResSS)/TotSS = ane- ResSS/TotSS. Some telephone call R² the proportion of the variance explained by the model. I don't similar the employ of the give-and-take explained because it implies causality. Withal, the phrase is firmly entrenched in the literature. Even Fisher used it. If a model has perfect predictability, the Residual Sum of Squares volition be 0 and R²=1. If a model has no predictive adequacy, R²=0. In practice, R² is never observed to be exactly 0 the same way the deviation between the means of two samples fatigued from the same population is never exaxctly 0 or a sample correlation coefficient is never exactly 0.

R, the multiple correlation coefficient and square root of R², is the correlation between the predicted and observed values. In simple linear regression, R will be equal to the magnitude correlation coefficient between X and Y. This is because the predicted values are b0+b110. Neither multiplying by b1 or adding b0 affects the magnitude of the correlation coefficient. Therefore, the correlation between X and Y volition be equal to the correlation between b0+baneX and Y, except for their sign if b1 is negative.

Adjusted-R² will be described during the discussion of multiple regression.

The Standard Error of the Gauge (likewise known as the Root Mean Square Error) is the square root of the Rest Mean Square. It is the standard deviation of the data nearly the regression line, rather than about the sample hateful. That is, information technology is



Copyright © 2000 Gerard Eastward. Dallal

simmonsrancelf.blogspot.com

Source: http://jerrydallal.com/lhsp/slrout.htm

0 Response to "How to Read Computer Output for Regression"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel