Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas read_stata() Method

Quiz

The read_stata() method in Python's Pandas library is used to read or load data from a Stata dataset file into a Pandas DataFrame. In other words, this method allows you to import data from Stata's .dta files into a Pandas DataFrame, enabling easy data manipulation and analysis in Python. Stata is a software tool widely used for statistical analysis, and its dataset files are a common format for storing structured data, which is developed by StataCorp.

This read_stata() method supports features like automatic handling of Stata-specific data types, optional column selection, and chunk-based reading for large datasets. It allows users to convert categorical variables, handle missing values, and preserve data types.

Syntax

Below is the syntax of the Python Pandas read_stata() method −

pandas.read_stata(filepath_or_buffer, *, convert_dates=True, convert_categoricals=True, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False, compression='infer', storage_options=None)

Parameters

The Python Pandas read_stata() method accepts the below parameters −

filepath_or_buffer: A string, path object, or file-like object representing the location of the Stata dataset file to read.
convert_dates: A boolean indicating whether to convert date variables to Pandas datetime values. By default it is set to True.
convert_categoricals: A boolean indicating whether to read value labels and convert columns to Categorical/Factor variables. By default it is set to True.
index_col: Specifies the column to use as the DataFrame index. If None, no column is used as the index.
convert_missing: A boolean indicating whether to convert missing values to their Stata representations. If set to True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects. If set to False, missing values are replaced with nan.
preserve_dtypes: If True, preserves the original data types of variables in the Stata file. If False, numeric data are directed to pandas default types for foreign data (float64 or int64).
columns: Specifies a subset of columns to include in the output. By default, it includes all columns.
order_categoricals: Determines whether the converted categorical data are ordered.
chunksize: Read Stata data in chunks of specified size.
iterator: Returns the StataReader object.
compression: Specifies the compression method to use. If set to 'infer', the method will automatically detect the compression type based on the file extension (e.g., .gz, .bz2, .zip, .xz, .zst, .tar, .tar.gz, or .tar.bz2).
storage_options: Additional options for connecting to certain storage back-ends (e.g., AWS S3, Google Cloud Storage).

Return Value

The Pandas read_stata() method returns a DataFrame containing the data read from the specified Stata file or pandas.api.typing.StataReader object.

Example: Basic Reading of a Stata Dataset File

Here is a basic example demonstrating reading a Stata dataset file into a Pandas DataFrame using the read_stata() method.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})

# Save the DataFrame to a Stata file
df.to_stata("stata_file.dta")

# Read a Stata file  
result = pd.read_stata("stata_file.dta")  

print("DataFrame read from Stata file:")  
print(result)

When we run above program, it produces following result −

DataFrame read from Stata file:

	index	Col_1	Col_2
0	0	0	a
1	1	1	b
2	2	2	c
3	3	3	d
4	4	4	e

If you visit the folder where the Stata dataset files are saved, you can observe the generated .dta file.

Example: Reading Specific Columns from a Stata file

The following example demonstrates how to read specific columns from a Stata file using the read_stata() method with the columns parameter.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})

# Save the DataFrame to a Stata file
df.to_stata("stata_file.dta")

# Read specific columns from a Stata file  
df = pd.read_stata("stata_file.dta", columns=["Col_2"])  

print("Selected columns read from Stata file:")  
print(df)

While executing the above code we get the following output −

Selected columns read from Stata file:

	Col_2
0	a
1	b
2	c
3	d
4	e

Example: Setting a Custom Index Column While Reading a Stata File

The following example demonstrates how to use the read_stata() method for setting a custom index from the Stata file column data using the index_col parameter.

import pandas as pd
from datetime import datetime

# Create a DataFrame
df = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})

# Save the DataFrame to Stata with custom gzip compression
df.to_stata("stata_file.dta")

# Read a Stata file by specifying the column to set it as DataFrame Index
df = pd.read_stata("stata_file.dta", index_col="Col_2")  

print("DataFrame read from Stata file with custom index:")  
print(df)

Following is an output of the above code −

DataFrame read from Stata file with custom index:

	index	Col_1
Col_2
a	0	0
b	1	1
c	2	2
d	3	3
e	4	4

Example: Reading a Compressed Stata File

The read_stata() method can also accepts reading a compressed Stata file.

import pandas as pd
from datetime import datetime

# Create a DataFrame
df = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})

# Save the DataFrame to Stata with custom gzip compression
df.to_stata("compressed_file.dta.gz", compression={'method': 'gzip', 'compresslevel': 2})

# Read a compressed Stata file  
df = pd.read_stata("compressed_file.dta.gz", compression="gzip")  

print("DataFrame read from compressed Stata file:")  
print(df)

Following is an output of the above code −

DataFrame read from compressed Stata file:

	index	Col_1	Col_2
0	0	0	a
1	1	1	b
2	2	2	c
3	3	3	d
4	4	4	e

python_pandas_io_tool.htm

Print Page