Eszter Hargittai's Stata Goodies Page

Eszter's Stata Goodies Page

Stata is my choice for statistical software package. I've put together this page with some helpful resources for those who are just starting out with it or those who've been using it for a while but who have not bothered to explore it much. Of course, there are tons of such resources out there already. This page is mainly for me to reference helpful sites I've already found once and to point my colleagues to them easily.

GETTING STARTED

Although I mainly use Stata for Windows, all of the following are also relevant to Stata under UNIX (which I sometimes still use).

HOW-TO PAGES

Before you start - info on what you should know about Stata before you even start, e.g. details about variable names (although note that this is a bit outdated and is not about version 7 which allows variable names of up to 24 characters long), variable width, variable type, etc.

Some more basics are available on Princeton's Stata Tutorial.

UCLA has some great resources for learning how to use Stata in more depth (they also have a helpful search utility for their site)

USING DO-FILES

Do-files allow you to
1. keep track of everything you've done to/with your data so your actions can be replicable;
2. run tons of commands quickly.
You can also think of it as a safety mechanism for allowing you to easily go back to your original data set no matter what transformations you may have performed on it (or what variables you may have mistakenly deleted - yikes!).

Here's an example do-file template: stataexample.do

Alternative, same info but with explanations of what each line means: stataexample
(Although I had to give it a .txt extension for it to display without problems online, if you save it, make sure to save it with a .do extension.)
Note that I have a little information section on the top of that do-file. That is so you can keep track of what project this do-file is for, where it is located and what it does.

TEXT EDITOR

You can open your DO FILES in any text editor. If you are using Stata in Windows then you can simply go to the Windows menu and select Do-file editor (or press CTRL-8). However, this editor is quite limited in capacity as are the editors that automatically come with Windows: Notepad and Wordpad.

Instead of these options, I recommend using UltraEdit (shareware $35.00) because it comes with some nice additional features that will make editing do-files much more convenient. You can download a Stata7 configuration for it, which will automatically highlight certain words for you to distinguish commands from comment sections and the general body of your file. To use the Stata word list, go to Advanced->Configuration in UltraEdit and under Syntax Highlighting select the Stata7.txt file.

You may want to add some additional words to the Stata7.txt wordlist for highlighting. You can do this by editing the Stata7.txt file.

Here are some of the things I have found most helpful in UltraEdit:

Use Find function across files (search all .do files in a directory for a word or phrase)

Use Replace function across files (replace a word or phrase across several files at once)

Open numerous documents at the same time without cluttering the Windows bottom panel, UltraEdit has its own panel. If you're used to it being on the bottom, you can move it there.

SOME HELPFUL HINTS

If you want to see how much time it took to run your last command and if you want to be reminded of the time, set return messages as follows:
set rmsg on

The default memory setting may be too low for the size of your data set. You can change it with the following:
set mem XXm
where XX is the size of the memory you prefer (e.g. 20m).

Initially you can use no more than 40 variables in a model. You can change this with matsize (not in Small Stata though):
set matsize # (the maximum is 800 in Intercooled Stata)

You can easily create dummy variables:
tabulate varname, generate(dummyname)

!!! Make tables that you can import into other documents straight with nice layout and all pertinent information:
mktab - this saved my life! Thank you Conrad!
Here's an example of a command to make this run (more info is available in the ado help file that comes with mktab).
mktab (depvar iv1 iv2 iv3) (depvar iv1 iv2 iv3 iv4), cmd(reg) aux(_cons=Intercept) est(N, r2=R2, r2_a=Adjusted R2) flag(.1=***,1=**,5=*,10=#) notags efmt(%4.3f) xlabel ylabel log (logfile.log, replace) continue screen
which will give you (and save as logfile.log output) a table in which the first column has the names of your independent variables, the second column has the coefficients and standard errors of the first model with the three independent variables (iv1, iv2, iv3) in which significance is asterisked, and the third model includes the coefficients and standard errors of the second model with four variables (iv1, iv2, iv3, iv4), again asterisking those with specified significance. You also get the intercept plus info on N, R2 value, and adjusted R2 plus significance levels noted at the bottom of the table.

Random commands I've found helpful to know about

typing , obs after pwcorr lets you know how many observations were used for each pairwise correlation (corr is listwise)

typing , sig star(#) after pwcorr identifies the coefficients that are significant at the # level with a star (you replace the # with what level you want so (10) or (.1) for ten percent (use corr is listwise)

format allows you to specify the display format (e.g. details up to no more than two decimal points) format var %9.2f

list allows you to list some variables for all cases (or for whatever cases you specify); if you sort on one of them before running the list command then they will be listed according to that variable

To get several graphs on one image, create the graphs and then:
graph using g1 g2 g3
where g1, g2, g3 are your various graphs.

Dealing with files (merging, conversion)

How to merge multiple files

How to use a Stata graph in Microsoft documents (such as Word, PowerPoint)

HELPFUL ado FILES

To install these ado files from a network connected machine, from within STATA use:
net from http://url-of-data-source
net install name-of-ado-file
If you're looking for an ado file but don't know its location, just use
net search nameoffile
or you can try:
findit nameoffile

Data manipulation

cutv6 - to quickly recode continuous variables into groups

Data analysis

dlogit2 - to compute marginal effects for logistic regression, probit regression, and multinomial logistic regression (see detailed documentation here )

bys - "by" and "sort" in one command

Creating graphs

fbar for some additional bar graph options (nicer than hist)

Creating tables (as per the above)

mktab

MORE...

Not enough? Check out these additional resources:

My friend Diane's Stata Tricks page

STATA Mailing List archives

Still don't know what to do? Do a search on Google and make your search phrase as specific as possible (e.g. stata merge data two files if you're looking for info on how to merge two data files in Stata)

Feel free to leave your mark, please sign my Guestbook.

Back to Eszter Hargittai's Homepage

Last updated: June, 2003
http://www.princeton.edu/~eszter/stata.html