The main purpose of SAS is to perform statistical tests on data. In lesson 4, we will be talking about some of the simple tests that you can perform. These include finding the means, standard deviations, and ranges of your data, finding the cross correlation of the data, and performing some simple t-tests.
The first statistical test we'll look at is proc freq, which measures frequencies.
Example 4.1
You teach a small class of 15 students. You have data of their gender, their age, and their final grade in percentages. You would like to make histograms of the class, broken down by age, by gender, and by grade. Plus, you would like to make a cross table histogram of gender and age.
The code is as follows. It can also be found under lesson4-1.sas.
/* Example 4-1 */ options linesize=80 pagesize=54 pageno=1; data students; input gender $ age grade; cards; m 22 86 f 21 81 f 35 92 f 20 55 m 22 41 m 22 71 f 20 79 f 19 66 m 20 98 f 21 89 f 19 71 m 20 31 m 21 82 f 20 71 f 18 91 ; run; proc freq data=students; tables age gender grade; run; proc freq data=students; tables gender*age; run;
In this case, we define each student by their gender, their age, and their final score in the class. Then we use proc freq twice. After we declare proc freq and the data set, we have to say what tables to make. In the first case, we want to make three tables - one for each of age, gender, and grade. The second time we run proc freq, we want a table of gender by age. Do to this, we insert the '*' character between the variables.
To understand what proc freq does, let's look at the sample output.
The SAS System 1 Cumulative Cumulative AGE Frequency Percent Frequency Percent 18 1 6.7 1 6.7 19 2 13.3 3 20.0 20 5 33.3 8 53.3 21 3 20.0 11 73.3 22 3 20.0 14 93.3 35 1 6.7 15 100.0 Cumulative Cumulative GENDER Frequency Percent Frequency Percent f 9 60.0 9 60.0 m 6 40.0 15 100.0 Cumulative Cumulative GRADE Frequency Percent Frequency Percent 31 1 6.7 1 6.7 41 1 6.7 2 13.3 55 1 6.7 3 20.0 66 1 6.7 4 26.7 71 3 20.0 7 46.7 79 1 6.7 8 53.3 81 1 6.7 9 60.0 82 1 6.7 10 66.7 86 1 6.7 11 73.3 89 1 6.7 12 80.0 91 1 6.7 13 86.7 92 1 6.7 14 93.3 98 1 6.7 15 100.0 The SAS System 2 TABLE OF GENDER BY AGE GENDER AGE Frequency Percent Row Pct Col Pct 18 19 20 21 22 35 Total f 1 2 3 2 0 1 9 6.67 13.33 20.00 13.33 0.00 6.67 60.00 11.11 22.22 33.33 22.22 0.00 11.11 100.00 100.00 60.00 66.67 0.00 100.00 m 0 0 2 1 3 0 6 0.00 0.00 13.33 6.67 20.00 0.00 40.00 0.00 0.00 33.33 16.67 50.00 0.00 0.00 0.00 40.00 33.33 100.00 0.00 Total 1 2 5 3 3 1 15 6.67 13.33 33.33 20.00 20.00 6.67 100.00
SAS handles a single variable table differently than it does a two variable table. In the single variable case, SAS outputs the count for each category, gives a percentage of the final count, a cumulative frequency, and a cumulative percentage.
In the two variable case, the output is more complicated. For each cell, SAS outputs the count for each variable, and also gives you a grand percentage, a row total percentage, and a column total percentage. For example, looking looking at row 1, column 3, which describes females aged 20, SAS gives us these statistics:
3 <- Grand total of females aged 20 20.00 <- Percentage of students who are both female and 20 33.33 <- Row percentage, or percent of females who are aged 20 60.00 , <- Column percentage, or percent of 20 year olds who are female
SAS also gives the row and column totals at the end of the chart. These are the same values if we just did a single variable call to proc freq. For example, the bottom row of the chart looks like:
Total 1 2 5 3 3 1 15 6.67 13.33 33.33 20.00 20.00 6.67 100.00
Remember, each row was a different age. If you compare these values to our frequency table of ages, these are the exact same.
This is the default output in proc frec. However, it also has many different optional statistics which you can include. For example, if you want to include expected values and deviations from the expected values, you can change the code slightly to read:
proc freq data=students; tables gender*age / expected deviation; run;
The options are included after the slash ("/" character). For further options, please see the procedure index.
To continue on with lesson 4b and calculating means and variances, go here.
Or to just go back to the table of contents, click here.