Lesson 4b : Means and variances

The most common thing we do in statistics is calculate means and variances of random samples. In SAS, there are two common procedures used for this: proc means and proc univariate. Proc means is the simpler of the two, it can give you the means, variance, range, as well as other descriptive statistics. Proc univariate gives you more detailed analysis of a sample (as in, it can test to see if the data is normally distributed) but the output is more restricted than proc means.

Using these two procedures is fairly simple. Let's first look at the proc means block.


Example 4.2

Looking back at the data from the class of 15 students, we can use proc means to calculate the means and standard deviation. The program code is as follows:

data students;
        input gender $ age grade;
        cards;
        m 22 86
        f 21 81
        f 35 92
        f 20 55
        m 22 41
        m 22 71
        f 20 79
        f 19 66
        m 20 98
        f 21 89
        f 19 71
        m 20 31
        m 21 82
        f 20 71
        f 18 91
        ;
run;

proc means data=students;
run;

What happens when you run this small code? We ran proc means by just declaring what data set to use, and running it. By default, proc means will take every numeric variable (I.E. grade and age, but not gender) and output the number of non-empty observations, the mean and the standard deviation of the sample, as well as the minimum and maximum values.

Here's what the output looks like:

                             The SAS System                                1
      Variable   N          Mean       Std Dev       Minimum       Maximum
      --------------------------------------------------------------------
      AGE       15    21.3333333     3.9581140    18.0000000    35.0000000
      GRADE     15    73.6000000    19.0555579    31.0000000    98.0000000
      --------------------------------------------------------------------


Notes on proc means

Proc means can can also output more statistics, or less statistics, depending on what you need. To do this, right after you declare which data set to use and before the semi-colon (";" character), you list the statistics to output. The full list can be outputted like so:

proc means data=students n nmiss sum mean std var min max;
run;

Where:

N - the number of non missing observations
NMISS - the number of missing observations
SUM - The sum of the observartions
MEAN - The mean of the sample
STD - The standard deviation of the sample
VAR - The variance of the sample
MIN - The minimum value
MAX - The maximum value

If you only want SAS to output the mean and the variance of the sample, all you need to say is:

proc means data=students mean var;
run;

As I said before, SAS also by default gives output for all numeric variables. If you only want specific variables, such as the grades, you can define within proc means the variables to list.

proc means data=students;
	var grade;
run;

Finally, sometimes you want to break down the data set by a certain variable. Say that you want SAS to give you the average of the students' grades and ages by gender - or compute the mean for the males and the mean for the females in your class. There is an easy way and a hard way to do this. If you're using SAS version 6 or higher, you can use the simple way by defining which classes to break down the output by. For example, to give output by gender, you would say:

proc means data=students;
	class gender;
run;

However, earlier versions of SAS would give you an error if you tried this. You still can break down the categories by using a 'by' statement instead of a 'class' statement. The problem is that the data set must be sorted first by that variable, or else SAS will give you another error. To do this, you would type in:

proc sort data=students;
        by gender;
run;

proc means data=students;
        by gender;
run;

This also works for SAS version 6 or higher, so if you do have the more recent copy of SAS, you can take your pick of which method to use.

Sample output of using the 'class' statement:

                             The SAS System                                2
------------------------------------ GENDER=f ----------------------------------

      Variable   N          Mean       Std Dev       Minimum       Maximum
      --------------------------------------------------------------------
      AGE        9    21.4444444     5.1747249    18.0000000    35.0000000
      GRADE      9    77.2222222    12.5576449    55.0000000    92.0000000
      --------------------------------------------------------------------
------------------------------------ GENDER=m ----------------------------------

      Variable   N          Mean       Std Dev       Minimum       Maximum
      --------------------------------------------------------------------
      AGE        6    21.1666667     0.9831921    20.0000000    22.0000000
      GRADE      6    68.1666667    26.5587399    31.0000000    98.0000000
      --------------------------------------------------------------------

Sample output from using 'sort' then 'by'

                                 The SAS System                                1
    GENDER    N Obs  Variable   N          Mean       Std Dev       Minimum
    -----------------------------------------------------------------------
    f             9  AGE        9    21.4444444     5.1747249    18.0000000
                     GRADE      9    77.2222222    12.5576449    55.0000000
    m             6  AGE        6    21.1666667     0.9831921    20.0000000
                     GRADE      6    68.1666667    26.5587399    31.0000000
    -----------------------------------------------------------------------
                    GENDER    N Obs  Variable       Maximum
                    ---------------------------------------
                    f             9  AGE         35.0000000
                                     GRADE       92.0000000
                    m             6  AGE         22.0000000
                                     GRADE       98.0000000
                    ---------------------------------------


To continue on with lesson 4c and calculating means and variances by using proc univariate, go here.

Or to just go back to the table of contents, click here.