pandas.DataFrame.query#
- DataFrame.query(expr, *, parser='pandas', engine=None, local_dict=None, global_dict=None, resolvers=None, level=0, inplace=False)[source]#
Query the columns of a DataFrame with a boolean expression.
Warning
This method can run arbitrary code which can make you vulnerable to code injection if you pass user input to this function.
- Parameters:
- exprstr
The query string to evaluate.
See the documentation for
eval()
for details of supported operations and functions in the query string.See the documentation for
DataFrame.eval()
for details on referring to column names and variables in the query string.- parser{‘pandas’, ‘python’}, default ‘pandas’
The parser to use to construct the syntax tree from the expression. The default of
'pandas'
parses code slightly different than standard Python. Alternatively, you can parse an expression using the'python'
parser to retain strict Python semantics. See the enhancing performance documentation for more details.- engine{‘python’, ‘numexpr’}, default ‘numexpr’
The engine used to evaluate the expression. Supported engines are
None : tries to use
numexpr
, falls back topython
'numexpr'
: This default engine evaluates pandas objects using numexpr for large speed ups in complex expressions with large frames.'python'
: Performs operations as if you hadeval
’d in top level python. This engine is generally not that useful.
More backends may be available in the future.
- local_dictdict or None, optional
A dictionary of local variables, taken from locals() by default.
- global_dictdict or None, optional
A dictionary of global variables, taken from globals() by default.
- resolverslist of dict-like or None, optional
A list of objects implementing the
__getitem__
special method that you can use to inject an additional collection of namespaces to use for variable lookup. For example, this is used in thequery()
method to inject theDataFrame.index
andDataFrame.columns
variables that refer to their respectiveDataFrame
instance attributes.- levelint, optional
The number of prior stack frames to traverse and add to the current scope. Most users will not need to change this parameter.
- inplacebool
Whether to modify the DataFrame rather than creating a new one.
- Returns:
- DataFrame or None
DataFrame resulting from the provided query expression or None if
inplace=True
.
See also
eval
Evaluate a string describing operations on DataFrame columns.
DataFrame.eval
Evaluate a string describing operations on DataFrame columns.
Notes
The result of the evaluation of this expression is first passed to
DataFrame.loc
and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed toDataFrame.__getitem__()
.This method uses the top-level
eval()
function to evaluate the passed query.The
query()
method uses a slightly modified Python syntax by default. For example, the&
and|
(bitwise) operators have the precedence of their boolean cousins,and
andor
. This is syntactically valid Python, however the semantics are different.You can change the semantics of the expression by passing the keyword argument
parser='python'
. This enforces the same semantics as evaluation in Python space. Likewise, you can passengine='python'
to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to usingnumexpr
as the engine.The
DataFrame.index
andDataFrame.columns
attributes of theDataFrame
instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifierindex
is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.For further details and examples see the
query
documentation in indexing.Backtick quoted variables
Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.
During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign.
A backtick can be escaped by double backticks.
See also the Python documentation about lexical analysis in combination with the source code in
pandas.core.computation.parsing
.Examples
>>> df = pd.DataFrame( ... {"A": range(1, 6), "B": range(10, 0, -2), "C&C": range(10, 5, -1)} ... ) >>> df A B C&C 0 1 10 10 1 2 8 9 2 3 6 8 3 4 4 7 4 5 2 6 >>> df.query("A > B") A B C&C 4 5 2 6
The previous expression is equivalent to
>>> df[df.A > df.B] A B C&C 4 5 2 6
For columns with spaces in their name, you can use backtick quoting.
>>> df.query("B == `C&C`") A B C&C 0 1 10 10
The previous expression is equivalent to
>>> df[df.B == df["C&C"]] A B C&C 0 1 10 10
Using local variable:
>>> local_var = 2 >>> df.query("A <= @local_var") A B C&C 0 1 10 10 1 2 8 9