Using the General Social Survey

By Razib Khan | July 8, 2010 2:01 pm

I’ve mentioned this before, but I thought it would be useful to repeat again. Many of my social science related posts use Berkeley’s web interface with the General Social Survey. Regularly people ask me in the comments details as to the variables, or a more explicit elaboration of the methods. First, this is a weblog, not a venue for me to publish scholarly papers. Most of the GSS related posts are meant to be “quick & dirty,” and stimulate further exploration by readers. Unfortunately follow ups rarely happen. One can speculate why, but that’s how it is. Nevertheless, I thought I would repeat really quickly how to use the GSS in a basic fashion.

First, here’s the URL:

This is the database from 1972 to 2008. You’ll meet a screen like this:


The page is cluttered, but basically the right side is where you enter in your row and column variables which you want to cross or compare together. The left side allows you to explore the variables. Search and selected are pretty straightforward, while you can browse the list of variables in the menu to the bottom left. The easiest thing to do is just look at frequencies of X, Y, and Z against particular categories A, B and C (e.g., educational attainments vs. sex). But you can do more, at the top left if you select “analysis” you have more options:


I’ve been looking at mean values a lot. Sometimes the mean is obvious because the variables are quantitative. But if you’re talking about a dichotomous response it is “recoded” numerically (e.g., 0 vs. 1), so you have to keep in mind that the mean is just a representation of the underlying data. There are correlations and regressions too. You can do a lot with the GSS, but the more complicated or detailed you get in your analysis, the less appropriate for a “quick & dirty” they are. I’ve been shying away from presenting regressions because to do it right you have to be careful, and if you just throw out a bunch of betas people aren’t going to replicate your analysis and might put more stock in the model than they should (and it’s not hard to massage the betas you get with your variables my just manipulating the set of variables).

Here’s a quick example of a query:


WORDSUM will output the % in the sample who score 0, 1, 2, etc. out of 10 on the WORDSUM vocabuary test. I wanted to check it against highest education attained, DEGREE. I decided to combine those without high school diplomas, those with high school diplomas, and some college, into one category, and label it “No College.” Next I combined those with bachelors and graduate degrees into one category. Then I controlled for males and females, so it will output the row and column variables twice for each control. Finally I constrained the data set to non-hispanic whites who were surveyed after 1999 to the present (2008 in this survey).

Here’s the outcome for males:


CATEGORIZED UNDER: Data Analysis, GSS, Uncategorized

Comments (4)

  1. Katharine

    Is there a general list of the selection filters I can find? This is sort of fun to manipulate for s#!+s and giggles.

  2. browse through the demographic variables in the hierarchical menu. here are ones i use a lot

    sex , 1 = male, 2= female (e.g., sex(1)
    age (just put in the number, e.g., age(65-*) or age(18-35)
    race, 1 = white, 2= black, 3 = other
    polviews, 0 = very lib, 6 = very conserv, 3 = moderate (there’s slightly and just generic lib or cons in between)
    partyid, like polviews, but 1-7 strong dem to strong repub
    god, goes from atheist to “know god exists” (enter in numbers, 1 to 7 i think)
    degree, 0-4, no HS to grad school
    wordsum, 0-10, the 0-4 i usually put into one “stupid” class because the N’s get small here
    bible, literalist, non-literalist, bible book of fables

    just use the “view” feature for some of these

  3. Evan Harper

    Many, many, thanks for this. As somebody who enjoys debating politics & social issues to an almost unhealthy degree, I can see this will be a gold mine for me.


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!

About Razib Khan

I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. In relation to nationality I'm a American Northwesterner, in politics I'm a reactionary, and as for religion I have none (I'm an atheist). If you want to know more, see the links at


See More


RSS Razib’s Pinboard

Edifying books

Collapse bottom bar