STATA Programming II

This document provides an overview of getting data into Stata, merging Stata datasets, aggregating and disaggregating observations, transposing datasets, and graphing in Stata. It discusses importing different file formats into Stata using insheet and infile, merging datasets using merge, aggregating data using collapse, transposing data structure using reshape, and creating various graph types like scatter plots, histograms, and bar graphs using graph.

Uploaded by

Afc Hawk

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

43 views

STATA Programming II

Uploaded by

Afc Hawk

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Stata Programming II September 8, 2003 1

Eric Reinhardt (Department of Political Science, Emory University, Atlanta, GA 30322)

Getting Data into Stata

• Datasets not in Stata format can be found in formats which can be classified along a number of
different dimensions: (1) whether it is in a proprietary binary file format associated with a
spreadsheet, database, or statistical program like Microsoft Excel, dBase, SPSS, etc., or instead is in
raw text (ASCII) format; and (2) whether it has single or multiple record observations. If raw text, it
can be fixed format (with variables starting and ending on specified columns, hence it looks like a
giant rectangle often with no spaces in between digits) or delimited (with variables set off from each
other by spaces, commas, tabs, or other characters). Since most proprietary programs can save their
datasets out into raw text files as an option, Stata is only designed to be capable of importing raw text
files (in addition to Stata format itself, of course).
• Single-record comma- or tab-delimited files are easiest to import. Set Stata’s memory sufficiently
high, then use the insheet command: insheet using filename.ext, clear [comma] [names],
where ext is the file’s extension (e.g., *.csv for many comma-delimited files), and the options
[comma] and [names] tell Stata that you are importing a comma-delimited (as opposed to tab-) file
and that the first row in the dataset has the variable names, respectively.
• For space-delimited, fixed-format, or multiple-record files, you must use the infile command
combined with a dictionary file. Refer to Stata’s help or my class example for more information.
• Once you have a dataset in Stata’s memory, you may need to check the variable memory storage
types (to make sure numeric variables are properly stored as numeric instead of strings), modify the
variable names and descriptions, etc.
• To export a Stata dataset in memory into a comma-delimited format which can be read into Excel,
etc., just use the outsheet command: outsheet using filename.csv, comma names replace.
You can optionally export only selected variables or observations, by listing the variables after
outsheet, or by using the if condition before the comma, respectively.

Merging Stata Datasets

• To merge rows and variables from one dataset into another based on matching values of a variable or
variables, use the merge command.
• First, verify each dataset has an identically named match variable(s) with identical values where
matches should occur.
• Second, sort each dataset by the match variable and save each file.
• Third, open up the file into which you want to import the new rows/columns.
• Fourth, type merge matchvar using dataset2, where dataset2 is the dataset you want to import
from and matchvar is the match variable(s). This process creates a new variable, _merge, which
indicates (see help merge) which rows were originally in the dataset you started with and which came
solely from the dataset you imported from.
• By default, the merge command does not change the values of existing rows or columns if the ones in
the importing dataset are different. Use the , update option to replace cells that were missing in the
original dataset with filled-in values from the importing dataset. Use the , replace option to
change filled-in values from the original dataset with filled-in values from the importing dataset.
Stata will not replace filled-in values from the original dataset with missing values from the importing
dataset.

Aggregating and Disaggregating Observations

• Aggregate rows sharing a common value of any given variable using the collapse command. First,
sort by the variable(s) to aggregate on the basis of. Second, type collapse (function)
newname1=varname1 (function) newname2=varname2, by(aggvar), where aggvar is the variable
Stata Programming II September 8, 2003 2
Eric Reinhardt (Department of Political Science, Emory University, Atlanta, GA 30322)

which is the basis of the aggregation, function is the code representing the particular aggregating
function you would like performed (e.g., mean, min, max, sum, count), varname1 is the first variable
which you would like to perform this aggregating function on, and newname1 is the name of the
variable Stata will create when it performs this aggregating function on varname1.
• The collapse command creates a new, smaller, dataset in memory, and drops the existing one, so save
if necessary before you use it.
• You can multiply rows, creating x number of rows for each existing row, by typing expand x.

Transposing

• You can shift your dataset’s structure (e.g., from a structure with values of one variable across time
recorded in separate columns for each time period, to a structure with all values of that variable
recorded in one column, with separate rows for each time period) with the reshape command.
• See help reshape for more information.

Graphing

• graph produces a wide variety of types of two-dimensional figures.

• [graph twoway] scatter y x produces a scatterplot of variable y against variable x; you don’t have
to type in the words in square brackets [] to make the command work.
• To plot multiple dependent variables against one common independent variable, e.g., year, type
scatter y1 y2 y3 year.
• To change the symbols represented for each pair y-x from the default dot, use the msymbol() option
after a comma after the main graph command. See Stata help for a list of the potential values you
could type in, for msymbol. You can even use variable values for the point markers, instead of dots,
etc., by typing scatter y x, msymbol(i) mlabposition(0) mlabel(variablename).
• You can connect the dots for selected y-x pairs in the graph, with the connect() option: scatter y1
year, connect(L) clpattern(solid) will connect the dots with a solid line (change what’s in the
clpattern parentheses for a dashed line, etc.). You might want to sort the data by the x-variable
before graphing; you can do this within the graph command by adding the sort(xvar) option. You
can do different connection styles for different y variables, a la scatter y1 y2 y3 year,
connect(L L .) clpattern(solid dash .), which connects the y1 dots with a solid line, the y2
series with a dashed line, and doesn’t draw any line connecting the y3 dots.
• You can label the axes of a graph using the xtitle and ytitle options after the graph command’s
comma, even specifying particular values you’d like to label if desired, e.g., xtitle(This is the Year
Number). See help graph for more information.
• You can copy the graph with the Edit-Copy menu selection, and paste into MS Word or elsewhere.
You can also save the graph using the saving() option after the comma: e.g., graph y x,
saving(filename,replace). This saves a new file named filename.gph in your working directory.
• ksm y x, low produces a scatterplot with a smoothed Lowess-style regression line, useful for
displaying a trend in one simple command. It takes the usual graph options, e.g., as per the above.
• Histogram y produces a histogram of the variable y, not surprisingly. You can add an option to
display a certain number of ‘bins’ or vertical bars, e.g., histogram y, bin(40).
• You can make a bar graph of group means (or medians or any other characteristic) like so: graph bar
(mean) y, over(x). Draw it sidewise by typing graph hbar, with otherwise the same input.
• You can combine Stata graphs in a variety of ways, to overlay one series on top of another, for
example.
• You can produce a wide variety of different looks on Stata 8 graphs, using different fonts, labels,
legends, multiple axis scales, etc. The graphing power in Stata 8 is extraordinarily flexible, but,
starting in this version, somewhat complex. See help graph for more information.

P1 Gen 2, X1 Extreme and X1 Extreme 2nd Hardware Maintenance Manual
No ratings yet
P1 Gen 2, X1 Extreme and X1 Extreme 2nd Hardware Maintenance Manual
115 pages
stata_tutorial MATERIAL
No ratings yet
stata_tutorial MATERIAL
3 pages
6.1_stata
No ratings yet
6.1_stata
62 pages
Stata Manual Introduction
No ratings yet
Stata Manual Introduction
24 pages
Training at Gudar Campus
No ratings yet
Training at Gudar Campus
83 pages
Introduction To Stata: 1 Data Manipulation
No ratings yet
Introduction To Stata: 1 Data Manipulation
6 pages
Stata: A Brief Introduction
No ratings yet
Stata: A Brief Introduction
9 pages
Stata Application Part I
No ratings yet
Stata Application Part I
27 pages
Stat A Guide
No ratings yet
Stat A Guide
16 pages
Stataguide
No ratings yet
Stataguide
16 pages
software material
No ratings yet
software material
13 pages
An Introduction To Stata For Economists: Data Management
No ratings yet
An Introduction To Stata For Economists: Data Management
49 pages
Manual
No ratings yet
Manual
14 pages
Introduction to Stata Software,MaU, 2022
No ratings yet
Introduction to Stata Software,MaU, 2022
93 pages
Getting Started With Your Data: Using Stata
No ratings yet
Getting Started With Your Data: Using Stata
32 pages
Stata Datawork
No ratings yet
Stata Datawork
22 pages
A Short Introduction To STATA
No ratings yet
A Short Introduction To STATA
8 pages
Applied Econometrics Using Stata
100% (2)
Applied Econometrics Using Stata
100 pages
A Short Guide To Stata 10 For Windows
No ratings yet
A Short Guide To Stata 10 For Windows
7 pages
Summary of Basic STATA Commands and Syntax
No ratings yet
Summary of Basic STATA Commands and Syntax
5 pages
Stataguide
No ratings yet
Stataguide
17 pages
A Short Guide To Stata 15: Version: 20-9-2021, 22:10
No ratings yet
A Short Guide To Stata 15: Version: 20-9-2021, 22:10
17 pages
Stata For Survey Analysis
No ratings yet
Stata For Survey Analysis
164 pages
STATAforEconWorkshop3
No ratings yet
STATAforEconWorkshop3
12 pages
Introduction To Stata: Ucla Idre Statistical Consulting Group
No ratings yet
Introduction To Stata: Ucla Idre Statistical Consulting Group
119 pages
CH - 1 - Introduction To Econometrics Software Stata
No ratings yet
CH - 1 - Introduction To Econometrics Software Stata
35 pages
A I S ECMT1020: N Ntroduction To Tata
No ratings yet
A I S ECMT1020: N Ntroduction To Tata
15 pages
Stata: Getting Starting and Being Productive With VA Data
No ratings yet
Stata: Getting Starting and Being Productive With VA Data
35 pages
What Is Stata?
No ratings yet
What Is Stata?
16 pages
Stata
No ratings yet
Stata
6 pages
Stata Cheat Sheet: Command in "User" Menu Useful For What? Additional Options More Info
No ratings yet
Stata Cheat Sheet: Command in "User" Menu Useful For What? Additional Options More Info
2 pages
Introduction To Stata: Li-Pin Juan
No ratings yet
Introduction To Stata: Li-Pin Juan
41 pages
Introduction to Stata for data management
No ratings yet
Introduction to Stata for data management
7 pages
Applied Econometrics Using Stata
100% (1)
Applied Econometrics Using Stata
100 pages
Introduction To Stata 2024-06-18 Handout
No ratings yet
Introduction To Stata 2024-06-18 Handout
52 pages
Lecture 1-2 Applied Econometrics
No ratings yet
Lecture 1-2 Applied Econometrics
68 pages
STATAforEconWorkshop1
No ratings yet
STATAforEconWorkshop1
12 pages
Computing For Research I: Spring 2012
No ratings yet
Computing For Research I: Spring 2012
34 pages
Stata Basics13
No ratings yet
Stata Basics13
23 pages
STATA
No ratings yet
STATA
26 pages
Stat A Tutorial
No ratings yet
Stat A Tutorial
40 pages
STATA Notes 2022
No ratings yet
STATA Notes 2022
25 pages
Computing Stata Notes
No ratings yet
Computing Stata Notes
5 pages
STATA Commands
No ratings yet
STATA Commands
42 pages
An Introduction To Stata Graphics
No ratings yet
An Introduction To Stata Graphics
53 pages
STATA Commands
100% (2)
STATA Commands
35 pages
Stata session 2
No ratings yet
Stata session 2
11 pages
Stat A Guide
No ratings yet
Stat A Guide
10 pages
STATAforEconWorkshop2
No ratings yet
STATAforEconWorkshop2
15 pages
Stata Absolute Beginners
No ratings yet
Stata Absolute Beginners
38 pages
Stata Excel Spreadsheet
No ratings yet
Stata Excel Spreadsheet
43 pages
Michael N. Mitchell - Data Management Using Stata - A Practical Handbook-STATA Press (2010)
100% (1)
Michael N. Mitchell - Data Management Using Stata - A Practical Handbook-STATA Press (2010)
405 pages
Stata Introduction To Stata
No ratings yet
Stata Introduction To Stata
12 pages
Introduction Stata Slides 2
No ratings yet
Introduction Stata Slides 2
25 pages
Data Analyses Stata Manual NYTS
No ratings yet
Data Analyses Stata Manual NYTS
40 pages
Introduction To Stata and Data Management
No ratings yet
Introduction To Stata and Data Management
30 pages
Stata Review
No ratings yet
Stata Review
9 pages
Stata Tutorial
No ratings yet
Stata Tutorial
44 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Conditional Simulation
100% (1)
Conditional Simulation
7 pages
Exercise 1
No ratings yet
Exercise 1
1 page
Programming in STATA
No ratings yet
Programming in STATA
15 pages
Baum 2003 - Introduction To Stata
No ratings yet
Baum 2003 - Introduction To Stata
65 pages
Microsoft and Dassault Systèmes in PLM (Product Lifecycle Management) Paris
No ratings yet
Microsoft and Dassault Systèmes in PLM (Product Lifecycle Management) Paris
15 pages
Budgets, Bidding & Schedules
No ratings yet
Budgets, Bidding & Schedules
30 pages
DMR163546 Rev01 Installation Part1
No ratings yet
DMR163546 Rev01 Installation Part1
53 pages
Christina Elizabeth Derosa Resume 2
No ratings yet
Christina Elizabeth Derosa Resume 2
1 page
MIPS R I J Instructions
No ratings yet
MIPS R I J Instructions
3 pages
Anti Detect
No ratings yet
Anti Detect
10 pages
Mastering Bitcoin PDF
No ratings yet
Mastering Bitcoin PDF
282 pages
AIX5L StudentGuide PDF
No ratings yet
AIX5L StudentGuide PDF
610 pages
Pa 9
No ratings yet
Pa 9
3 pages
Tips
No ratings yet
Tips
14 pages
Business Blueprint in SAP Implementation
No ratings yet
Business Blueprint in SAP Implementation
2 pages
Fo Inconsistency
No ratings yet
Fo Inconsistency
3 pages
Currency Conversion From Local Currency To USD in BW - How2BW
No ratings yet
Currency Conversion From Local Currency To USD in BW - How2BW
12 pages
Java Review (Essentials of Java For Hadoop)
No ratings yet
Java Review (Essentials of Java For Hadoop)
34 pages
Hacking Presentation Form 4
100% (1)
Hacking Presentation Form 4
10 pages
Chapter 2 Quiz Answers
No ratings yet
Chapter 2 Quiz Answers
4 pages
Most Asked Python Interview Questions 1684406154
No ratings yet
Most Asked Python Interview Questions 1684406154
24 pages
H3C Basic Router Commands
100% (1)
H3C Basic Router Commands
10 pages
Thesis On Pci Dss
100% (1)
Thesis On Pci Dss
7 pages
Archived: Asphalt Mixture Design Illustrated
No ratings yet
Archived: Asphalt Mixture Design Illustrated
80 pages
Lecture 13 - OO Design
No ratings yet
Lecture 13 - OO Design
15 pages
Introduction To Computer Science I Course Syllabus: Barbara - Hecker@csueastbay - Edu
No ratings yet
Introduction To Computer Science I Course Syllabus: Barbara - Hecker@csueastbay - Edu
3 pages
Delineating Watersheds ArcMap
No ratings yet
Delineating Watersheds ArcMap
7 pages
Diferenciacion Numerica
No ratings yet
Diferenciacion Numerica
396 pages
Catholic Ming Yuan College
No ratings yet
Catholic Ming Yuan College
3 pages
Thesis Template Philippines
100% (3)
Thesis Template Philippines
5 pages
Wordpress Best Practices On Aws PDF
No ratings yet
Wordpress Best Practices On Aws PDF
19 pages
IRM 6 Website Defacement
No ratings yet
IRM 6 Website Defacement
2 pages
PHP Unit 3
No ratings yet
PHP Unit 3
17 pages

STATA Programming II

Uploaded by

STATA Programming II

Uploaded by

Stata Programming II September 8, 2003 1

Eric Reinhardt (Department of Political Science, Emory University, Atlanta, GA 30322)

Getting Data into Stata

Merging Stata Datasets

Aggregating and Disaggregating Observations

• graph produces a wide variety of types of two-dimensional figures.

You might also like