0% found this document useful (0 votes)

4 views

Methods & Function in Databricks

The document provides a comprehensive overview of various column methods and functions used in data manipulation, particularly in PySpark. It includes methods for sorting, casting, and manipulating data types, as well as mathematical, datetime, and collection functions. Additionally, it describes operations for handling arrays and maps, such as filtering, transforming, and aggregating data.

Uploaded by

proxy9819

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Methods & Function in Databricks

Uploaded by

proxy9819

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Column Method

Column.getattr(item) An expression that gets an item at position

ordinal out of a list, or gets an item by key out of
a dict.

Column.getitem(k) An expression that gets an item at position

ordinal out of a list, or gets an item by key out of
a dict.

Column.alias(*alias, **kwargs) Returns this column aliased with a new name or

names (in the case of expressions that return
more than one column, such as explode).

Column.asc() Returns a sort expression based on the

ascending order of the column.

Column.asc_nulls_first() Returns a sort expression based on the

ascending order of the column, and null values
return before non-null values.

Column.asc_nulls_last() Returns a sort expression based on the

ascending order of the column, and null values
appear after non-null values.

Column.astype(dataType) astype() is an alias for cast().

Column.between(lowerBound, True if the current column is between the lower

upperBound) bound and upper bound, inclusive.

Column.bitwiseAND(other) Compute bitwise AND of this expression with

another expression.

Column.bitwiseOR(other) Compute bitwise OR of this expression with

another expression.

Column.bitwiseXOR(other) Compute bitwise XOR of this expression with

another expression.

Column.cast(dataType) Casts the column into type dataType.

Column.contains(other) Contains the other element.

Column.desc() Returns a sort expression based on the

descending order of the column.

Column.desc_nulls_first() Returns a sort expression based on the

descending order of the column, and null values
appear before non-null values.

Column.desc_nulls_last() Returns a sort expression based on the

descending order of the column, and null values
appear after non-null values.

Column.dropFields(*fieldNames) An expression that drops fields in StructType by

name.

Column.endswith(other) String ends with.

Column.eqNullSafe(other) Equality test that is safe for null values.

Column.getField(name) An expression that gets a field by name in a

StructType.

Column.getItem(key) An expression that gets an item at position

ordinal out of a list, or gets an item by key out of
a dict.

Column.ilike(other) SQL ILIKE expression (case insensitive LIKE).

Column.isNotNull() True if the current expression is NOT null.

Column.isNull() True if the current expression is null.

Column.isin(*cols) A boolean expression that is evaluated to true if

the value of this expression is contained by the
evaluated values of the arguments.

Column.like(other) SQL like expression.

Column.name(*alias, **kwargs) name() is an alias for alias().

Column.otherwise(value) Evaluates a list of conditions and returns one of

multiple possible result expressions.

Column.over(window) Define a windowing column.

Column.rlike(other) SQL RLIKE expression (LIKE with Regex).

Column.startswith(other) String starts with.

Column.substr(startPos, length) Return a Column which is a substring of the

column.

Column.when(condition, value) Evaluates a list of conditions and returns one of

multiple possible result expressions.

Column.withField(fieldName, An expression that adds/replaces a field in

col) StructType by name.

Functions
Normal Functions
col(col) Returns a Column based on the given column name.

column(col) Returns a Column based on the given column name.

lit(col) Creates a Column of literal value.

broadcast(df) Marks a DataFrame as small enough for use in broadcast

joins.
coalesce(*cols) Returns the first column that is not null.

input_file_name() Creates a string column for the file name of the current
Spark task.

isnan(col) An expression that returns true if the column is NaN.

isnull(col) An expression that returns true if the column is null.

monotonically_increas A column that generates monotonically increasing 64-bit

ing_id() integers.

nanvl(col1, col2) Returns col1 if it is not NaN, or col2 if col1 is NaN.

rand([seed]) Generates a random column with independent and

identically distributed (i.i.d.) samples uniformly distributed
in [0.0, 1.0).

randn([seed]) Generates a column with independent and identically

distributed (i.i.d.) samples from the standard normal
distribution.

spark_partition_id() A column for partition ID.

when(condition, value) Evaluates a list of conditions and returns one of multiple

possible result expressions.

bitwise_not(col) Computes bitwise not.

bitwiseNOT(col) Computes bitwise not.

expr(str) Parses the expression string into the column that it

represents

greatest(*cols) Returns the greatest value of the list of column names,

skipping null values.
least(*cols) Returns the least value of the list of column names,
skipping null values.

Math Functions
sqrt(col) Computes the square root of the specified float value.

abs(col) Computes the absolute value.

acos(col) Computes inverse cosine of the input column.

acosh(col) Computes inverse hyperbolic cosine of the input column.

asin(col) Computes inverse sine of the input column.

asinh(col) Computes inverse hyperbolic sine of the input column.

atan(col) Compute inverse tangent of the input column.

atanh(col) Computes inverse hyperbolic tangent of the input column.

atan2(col1, col2) New in version 1.4.0.

bin(col) Returns the string representation of the binary value of the given
column.

cbrt(col) Computes the cube-root of the given value.

ceil(col) Computes the ceiling of the given value.

conv(col, Convert a number in a string column from one base to another.
fromBase,
toBase)

cos(col) Computes cosine of the input column.

cosh(col) Computes hyperbolic cosine of the input column.

cot(col) Computes cotangent of the input column.

csc(col) Computes cosecant of the input column.

exp(col) Computes the exponential of the given value.

expm1(col) Computes the exponential of the given value minus one.

factorial(col) Computes the factorial of the given value.

floor(col) Computes the floor of the given value.

hex(col) Computes hex value of the given column, which could be

pyspark.sql.types.StringType,
pyspark.sql.types.BinaryType,
pyspark.sql.types.IntegerType or
pyspark.sql.types.LongType.

unhex(col) Inverse of hex.

hypot(col1, col2) Computes sqrt(a^2 + b^2) without intermediate overflow or

underflow.

log(arg1[, arg2]) Returns the first argument-based logarithm of the second

argument.

log10(col) Computes the logarithm of the given value in Base 10.

log1p(col) Computes the natural logarithm of the “given value plus one”.

log2(col) Returns the base-2 logarithm of the argument.

pmod(dividend, Returns the positive value of dividend mod divisor.

divisor)

pow(col1, col2) Returns the value of the first argument raised to the power of the
second argument.

rint(col) Returns the double value that is closest in value to the argument
and is equal to a mathematical integer.

round(col[, Round the given value to scale decimal places using HALF_UP
scale]) rounding mode if scale >= 0 or at integral part when scale < 0.

bround(col[, Round the given value to scale decimal places using HALF_EVEN
scale]) rounding mode if scale >= 0 or at integral part when scale < 0.

sec(col) Computes secant of the input column.

shiftleft(col, Shift the given value numBits left.

numBits)

shiftright(col, (Signed) shift the given value numBits right.

numBits)

shiftrightunsig Unsigned shift the given value numBits right.

ned(col, numBits)

signum(col) Computes the signum of the given value.

sin(col) Computes sine of the input column.

sinh(col) Computes hyperbolic sine of the input column.

tan(col) Computes tangent of the input column.

tanh(col) Computes hyperbolic tangent of the input column.

toDegrees(col) New in version 1.4.0.

degrees(col) Converts an angle measured in radians to an approximately

equivalent angle measured in degrees.

toRadians(col) New in version 1.4.0.

radians(col) Converts an angle measured in degrees to an approximately

equivalent angle measured in radians.

Datetime Functions
add_months(start, months) Returns the date that is months months after
start.

current_date() Returns the current date at the start of query

evaluation as a DateType column.

current_timestamp() Returns the current timestamp at the start of

query evaluation as a TimestampType column.

date_add(start, days) Returns the date that is days days after start.

date_format(date, format) Converts a date/timestamp/string to a value of

string in the format specified by the date format
given by the second argument.

date_sub(start, days) Returns the date that is days days before start.
date_trunc(format, timestamp) Returns timestamp truncated to the unit
specified by the format.

datediff(end, start) Returns the number of days from start to end.

dayofmonth(col) Extract the day of the month of a given

date/timestamp as integer.

dayofweek(col) Extract the day of the week of a given

date/timestamp as integer.

dayofyear(col) Extract the day of the year of a given

date/timestamp as integer.

second(col) Extract the seconds of a given date as integer.

weekofyear(col) Extract the week number of a given date as

integer.

year(col) Extract the year of a given date/timestamp as

integer.

quarter(col) Extract the quarter of a given date/timestamp as

integer.

month(col) Extract the month of a given date/timestamp as

integer.

last_day(date) Returns the last day of the month which the

given date belongs to.

localtimestamp() Returns the current timestamp without time

zone at the start of query evaluation as a
timestamp without time zone column.

minute(col) Extract the minutes of a given timestamp as

integer.
months_between(date1, date2[, Returns number of months between dates date1
roundOff]) and date2.

next_day(date, dayOfWeek) Returns the first date which is later than the
value of the date column based on second week
day argument.

hour(col) Extract the hours of a given timestamp as

integer.

make_date(year, month, day) Returns a column with a date built from the year,
month and day columns.

from_unixtime(timestamp[, Converts the number of seconds from unix

format]) epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in
the current system time zone in the given
format.

unix_timestamp([timestamp, Convert time string with given pattern

format]) (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix
time stamp (in seconds), using the default
timezone and the default locale, returns null if
failed.

to_timestamp(col[, format]) Converts a Column into

pyspark.sql.types.TimestampType using the
optionally specified format.

to_date(col[, format]) Converts a Column into

pyspark.sql.types.DateType using the
optionally specified format.

trunc(date, format) Returns date truncated to the unit specified by

the format.

from_utc_timestamp(timestamp, This is a common function for databases

tz) supporting TIMESTAMP WITHOUT TIMEZONE.

to_utc_timestamp(timestamp, tz) This is a common function for databases

supporting TIMESTAMP WITHOUT TIMEZONE.
window(timeColumn, Bucketize rows into one or more time windows
windowDuration[, …]) given a timestamp specifying column.

session_window(timeColumn, Generates session window given a timestamp

gapDuration) specifying column.

timestamp_seconds(col) Converts the number of seconds from the Unix

epoch (1970-01-01T00:00:00Z) to a timestamp.

window_time(windowColumn) Computes the event time from a window

column.

Collection Functions
array(*cols) Creates a new array column.

array_contains(col, value) Collection function: returns null if the array is

null, true if the array contains the given value,
and false otherwise.

arrays_overlap(a1, a2) Collection function: returns true if the arrays

contain any common non-null element; if not,
returns null if both the arrays are non-empty
and any of them contains a null element;
returns false otherwise.

array_join(col, delimiter[, Concatenates the elements of column using

null_replacement]) the delimiter.

create_map(*cols) Creates a new map column.

slice(x, start, length) Collection function: returns an array containing

all the elements in x from index start (array
indices start at 1, or from the end if start is
negative) with the specified length.
concat(*cols) Concatenates multiple input columns together
into a single column.

array_position(col, value) Collection function: Locates the position of the

first occurrence of the given value in the given
array.

element_at(col, extraction) Collection function: Returns element of array

at given index in extraction if col is array.

array_append(col, value) Collection function: returns an array of the

elements in col1 along with the added element
in col2 at the last of the array.

array_sort(col[, comparator]) Collection function: sorts the input array in

ascending order.

array_insert(arr, pos, value) Collection function: adds an item into a given

array at a specified array index.

array_remove(col, element) Collection function: Remove all elements that

equal to element from the given array.

array_distinct(col) Collection function: removes duplicate values

from the array.

array_intersect(col1, col2) Collection function: returns an array of the

elements in the intersection of col1 and col2,
without duplicates.

array_union(col1, col2) Collection function: returns an array of the

elements in the union of col1 and col2, without
duplicates.

array_except(col1, col2) Collection function: returns an array of the

elements in col1 but not in col2, without
duplicates.

array_compact(col) Collection function: removes null values from

the array.
transform(col, f) Returns an array of elements after applying a
transformation to each element in the input
array.

exists(col, f) Returns whether a predicate holds for one or

more elements in the array.

forall(col, f) Returns whether a predicate holds for every

element in the array.

filter(col, f) Returns an array of elements for which a

predicate holds in a given array.

aggregate(col, initialValue, merge[, Applies a binary operator to an initial state and

finish]) all elements in the array, and reduces this to a
single state.

zip_with(left, right, f) Merge two given arrays, element-wise, into a

single array using a function.

transform_keys(col, f) Applies a function to every key-value pair in a

map and returns a map with the results of
those applications as the new keys for the
pairs.

transform_values(col, f) Applies a function to every key-value pair in a

map and returns a map with the results of
those applications as the new values for the
pairs.

map_filter(col, f) Returns a map whose key-value pairs satisfy a

predicate.

map_from_arrays(col1, col2) Creates a new map from two arrays.

map_zip_with(col1, col2, f) Merge two given maps, key-wise into a single

map using a function.
explode(col) Returns a new row for each element in the
given array or map.

explode_outer(col) Returns a new row for each element in the

given array or map.

posexplode(col) Returns a new row for each element with

position in the given array or map.

posexplode_outer(col) Returns a new row for each element with

position in the given array or map.

inline(col) Explodes an array of structs into a table.

inline_outer(col) Explodes an array of structs into a table.

get(col, index) Collection function: Returns element of array

at given (0-based) index.

get_json_object(col, path) Extracts json object from a json string based

on json path specified, and returns json string
of the extracted json object.

json_tuple(col, *fields) Creates a new row for a json column

according to the given field names.

from_json(col, schema[, options]) Parses a column containing a JSON string into

a MapType with StringType as keys type,
StructType or ArrayType with the specified
schema.

schema_of_json(json[, options]) Parses a JSON string and infers its schema in

DDL format.

to_json(col[, options]) Converts a column containing a StructType,

ArrayType or a MapType into a JSON string.
size(col) Collection function: returns the length of the
array or map stored in the column.

struct(*cols) Creates a new struct column.

sort_array(col[, asc]) Collection function: sorts the input array in

ascending or descending order according to
the natural ordering of the array elements.

array_max(col) Collection function: returns the maximum

value of the array.

array_min(col) Collection function: returns the minimum value

of the array.

shuffle(col) Collection function: Generates a random

permutation of the given array.

reverse(col) Collection function: returns a reversed string

or an array with reverse order of elements.

flatten(col) Collection function: creates a single array from

an array of arrays.

sequence(start, stop[, step]) Generate a sequence of integers from start to

stop, incrementing by step.

array_repeat(col, count) Collection function: creates an array

containing a column repeated count times.

map_contains_key(col, value) Returns true if the map contains the key.

map_keys(col) Collection function: Returns an unordered

array containing the keys of the map.
map_values(col) Collection function: Returns an unordered
array containing the values of the map.

map_entries(col) Collection function: Returns an unordered

array of all entries in the given map.

map_from_entries(col) Collection function: Converts an array of

entries (key value struct types) to a map of
values.

arrays_zip(*cols) Collection function: Returns a merged array of

structs in which the N-th struct contains all
N-th values of input arrays.

map_concat(*cols) Returns the union of all the given maps.

from_csv(col, schema[, options]) Parses a column containing a CSV string to a

row with the specified schema.

schema_of_csv(csv[, options]) Parses a CSV string and infers its schema in

DDL format.

to_csv(col[, options]) Converts a column containing a StructType

into a CSV string.

Partition Transformation
Functions
years(col) Partition transform function: A transform for timestamps and
dates to partition data into years.

months(col) Partition transform function: A transform for timestamps and

dates to partition data into months.
days(col) Partition transform function: A transform for timestamps and
dates to partition data into days.

hours(col) Partition transform function: A transform for timestamps to

partition data into hours.

bucket(numBuckets, Partition transform function: A transform for any type that

col) partitions by a hash of the input column.

Aggregate Functions
approxCountDistinct(col[, rsd]) New in version 1.3.0.

approx_count_distinct(col[, rsd]) Aggregate function: returns a new Column for

approximate distinct count of column col.

avg(col) Aggregate function: returns the average of

the values in a group.

collect_list(col) Aggregate function: returns a list of objects

with duplicates.

collect_set(col) Aggregate function: returns a set of objects

with duplicate elements eliminated.

corr(col1, col2) Returns a new Column for the Pearson

Correlation Coefficient for col1 and col2.

count(col) Aggregate function: returns the number of

items in a group.

count_distinct(col, *cols) Returns a new Column for distinct count of

col or cols.
countDistinct(col, *cols) Returns a new Column for distinct count of
col or cols.

covar_pop(col1, col2) Returns a new Column for the population

covariance of col1 and col2.

covar_samp(col1, col2) Returns a new Column for the sample

covariance of col1 and col2.

first(col[, ignorenulls]) Aggregate function: returns the first value in

a group.

grouping(col) Aggregate function: indicates whether a

specified column in a GROUP BY list is
aggregated or not, returns 1 for aggregated
or 0 for not aggregated in the result set.

grouping_id(*cols) Aggregate function: returns the level of

grouping, equals to

kurtosis(col) Aggregate function: returns the kurtosis of

the values in a group.

last(col[, ignorenulls]) Aggregate function: returns the last value in

a group.

max(col) Aggregate function: returns the maximum

value of the expression in a group.

max_by(col, ord) Returns the value associated with the

maximum value of ord.

mean(col) Aggregate function: returns the average of

the values in a group.

median(col) Returns the median of the values in a group.

min(col) Aggregate function: returns the minimum
value of the expression in a group.

min_by(col, ord) Returns the value associated with the

minimum value of ord.

mode(col) Returns the most frequent value in a group.

percentile_approx(col, percentage[, Returns the approximate percentile of the

accuracy]) numeric column col which is the smallest
value in the ordered col values (sorted from
least to greatest) such that no more than
percentage of col values is less than the
value or equal to that value.

product(col) Aggregate function: returns the product of

the values in a group.

skewness(col) Aggregate function: returns the skewness of

the values in a group.

stddev(col) Aggregate function: alias for stddev_samp.

stddev_pop(col) Aggregate function: returns population

standard deviation of the expression in a
group.

stddev_samp(col) Aggregate function: returns the unbiased

sample standard deviation of the expression
in a group.

sum(col) Aggregate function: returns the sum of all

values in the expression.

sum_distinct(col) Aggregate function: returns the sum of

distinct values in the expression.
sumDistinct(col) Aggregate function: returns the sum of
distinct values in the expression.

var_pop(col) Aggregate function: returns the population

variance of the values in a group.

var_samp(col) Aggregate function: returns the unbiased

sample variance of the values in a group.

variance(col) Aggregate function: alias for var_samp

Window Functions
cume_dist() Window function: returns the cumulative distribution of
values within a window partition, i.e.

dense_rank() Window function: returns the rank of rows within a

window partition, without any gaps.

lag(col[, offset, default]) Window function: returns the value that is offset rows
before the current row, and default if there is less than
offset rows before the current row.

lead(col[, offset, default]) Window function: returns the value that is offset rows
after the current row, and default if there is less than
offset rows after the current row.

nth_value(col, offset[, Window function: returns the value that is the offsetth
ignoreNulls]) row of the window frame (counting from 1), and null if
the size of window frame is less than offset rows.

ntile(n) Window function: returns the ntile group id (from 1 to n

inclusive) in an ordered window partition.
percent_rank() Window function: returns the relative rank (i.e.

rank() Window function: returns the rank of rows within a

window partition.

row_number() Window function: returns a sequential number starting

at 1 within a window partition.

Sort Functions
asc(col) Returns a sort expression based on the ascending order of the
given column name.

asc_nulls_first( Returns a sort expression based on the ascending order of the

col) given column name, and null values return before non-null
values.

asc_nulls_last(c Returns a sort expression based on the ascending order of the

ol) given column name, and null values appear after non-null values.

desc(col) Returns a sort expression based on the descending order of the

given column name.

desc_nulls_first Returns a sort expression based on the descending order of the

(col) given column name, and null values appear before non-null
values.

desc_nulls_last( Returns a sort expression based on the descending order of the

col) given column name, and null values appear after non-null values.

String Functions
ascii(col) Computes the numeric value of the first
character of the string column.
base64(col) Computes the BASE64 encoding of a binary
column and returns it as a string column.

bit_length(col) Calculates the bit length for the specified string

column.

concat_ws(sep, *cols) Concatenates multiple input string columns

together into a single string column, using the
given separator.

decode(col, charset) Computes the first argument into a string from

a binary using the provided character set (one
of ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’,
‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’).

encode(col, charset) Computes the first argument into a binary from

a string using the provided character set (one
of ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’,
‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’).

format_number(col, d) Formats the number X to a format like

‘#,–#,–#.–’, rounded to d decimal places with
HALF_EVEN round mode, and returns the
result as a string.

format_string(format, *cols) Formats the arguments in printf-style and

returns the result as a string column.

initcap(col) Translate the first letter of each word to upper

case in the sentence.

instr(str, substr) Locate the position of the first occurrence of

substr column in the given string.

length(col) Computes the character length of string data or

number of bytes of binary data.

lower(col) Converts a string expression to lower case.

levenshtein(left, right) Computes the Levenshtein distance of the two
given strings.

locate(substr, str[, pos]) Locate the position of the first occurrence of

substr in a string column, after position pos.

lpad(col, len, pad) Left-pad the string column to width len with
pad.

ltrim(col) Trim the spaces from left end for the specified
string value.

octet_length(col) Calculates the byte length for the specified

string column.

regexp_extract(str, pattern, idx) Extract a specific group matched by a Java

regex, from the specified string column.

regexp_replace(string, pattern, Replace all substrings of the specified string

replacement) value that match regexp with replacement.

unbase64(col) Decodes a BASE64 encoded string column

and returns it as a binary column.

rpad(col, len, pad) Right-pad the string column to width len with
pad.

repeat(col, n) Repeats a string column n times, and returns it

as a new string column.

rtrim(col) Trim the spaces from right end for the specified
string value.

soundex(col) Returns the SoundEx encoding for a string

split(str, pattern[, limit]) Splits str around matches of the given pattern.
substring(str, pos, len) Substring starts at pos and is of length len
when str is String type or returns the slice of
byte array that starts at pos in byte and is of
length len when str is Binary type.

substring_index(str, delim, count) Returns the substring from string str before
count occurrences of the delimiter delim.

overlay(src, replace, pos[, len]) Overlay the specified portion of src with
replace, starting from byte position pos of src
and proceeding for len bytes.

sentences(string[, language, Splits a string into arrays of sentences, where

country]) each sentence is an array of words.

translate(srcCol, matching, A function translates any character in the

replace) srcCol by a character in matching.

trim(col) Trim the spaces from both ends for the

specified string column.

upper(col) Converts a string expression to uppercase.

UDF
call_udf(udfName, *cols) Call a user-defined function.

pandas_udf([f, returnType, Creates a pandas user defined function (a.k.a.

functionType])

udf([f, returnType]) Creates a user defined function (UDF).

unwrap_udt(col) Unwrap UDT data type column into its underlying

type.
Misc Functions
md5(col) Calculates the MD5 digest and returns the value as a 32
character hex string.

sha1(col) Returns the hex string result of SHA-1.

sha2(col, numBits) Returns the hex string result of SHA-2 family of hash
functions (SHA-224, SHA-256, SHA-384, and SHA-512).

crc32(col) Calculates the cyclic redundancy check value (CRC32) of a

binary column and returns the value as a bigint.

hash(*cols) Calculates the hash code of given columns, and returns the
result as an int column.

xxhash64(*cols) Calculates the hash code of given columns using the 64-bit
variant of the xxHash algorithm, and returns the result as a
long column.

assert_true(col[, Returns null if the input column is true; throws an exception

errMsg]) with the provided error message otherwise.

raise_error(errMsg) Throws an exception with the provided error message.

DataFrame
DataFrame.__getattr__(name) Returns the Column denoted by name.

DataFrame.getitem(item) Returns the column as a Column.

DataFrame.agg(*exprs) Aggregate on the entire DataFrame without

groups (shorthand for
df.groupBy().agg()).
DataFrame.alias(alias) Returns a new DataFrame with an alias set.

DataFrame.approxQuantile(col, Calculates the approximate quantiles of

probabilities, …) numerical columns of a DataFrame.

DataFrame.cache() Persists the DataFrame with the default

storage level (MEMORY_AND_DISK).

DataFrame.checkpoint([eager]) Returns a checkpointed version of this

DataFrame.

DataFrame.coalesce(numPartitions) Returns a new DataFrame that has exactly

numPartitions partitions.

DataFrame.colRegex(colName) Selects column based on the column

name specified as a regex and returns it
as Column.

DataFrame.collect() Returns all the records as a list of Row.

DataFrame.columns Retrieves the names of all columns in the

DataFrame as a list.

DataFrame.corr(col1, col2[, method]) Calculates the correlation of two columns

of a DataFrame as a double value.

DataFrame.count() Returns the number of rows in this

DataFrame.

DataFrame.cov(col1, col2) Calculate the sample covariance for the

given columns, specified by their names,
as a double value.

DataFrame.createGlobalTempView(nam Creates a global temporary view with this

e) DataFrame.

DataFrame.createOrReplaceGlobalTemp Creates or replaces a global temporary

View(name) view using the given name.
DataFrame.createOrReplaceTempView(n Creates or replaces a local temporary view
ame) with this DataFrame.

DataFrame.createTempView(name) Creates a local temporary view with this

DataFrame.

DataFrame.crossJoin(other) Returns the cartesian product with another

DataFrame.

DataFrame.crosstab(col1, col2) Computes a pair-wise frequency table of

the given columns.

DataFrame.cube(*cols) Create a multi-dimensional cube for the

current DataFrame using the specified
columns, so we can run aggregations on
them.

DataFrame.describe(*cols) Computes basic statistics for numeric and

string columns.

DataFrame.distinct() Returns a new DataFrame containing the

distinct rows in this DataFrame.

DataFrame.drop(*cols) Returns a new DataFrame without specified

columns.

DataFrame.dropDuplicates([subset]) Return a new DataFrame with duplicate

rows removed, optionally only considering
certain columns.

DataFrame.dropDuplicatesWithinWater Return a new DataFrame with duplicate

mark([subset]) rows removed,

DataFrame.drop_duplicates([subset]) drop_duplicates() is an alias for

dropDuplicates().

DataFrame.dropna([how, thresh, subset]) Returns a new DataFrame omitting rows

with null values.

DataFrame.dtypes Returns all column names and their data

types as a list.
DataFrame.exceptAll(other) Return a new DataFrame containing rows
in this DataFrame but not in another
DataFrame while preserving duplicates.

DataFrame.explain([extended, mode]) Prints the (logical and physical) plans to

the console for debugging purposes.

DataFrame.fillna(value[, subset]) Replace null values, alias for na.fill().

DataFrame.filter(condition) Filters rows using the given condition.

DataFrame.first() Returns the first row as a Row.

DataFrame.foreach(f) Applies the f function to all Row of this

DataFrame.

DataFrame.foreachPartition(f) Applies the f function to each partition of

this DataFrame.

DataFrame.freqItems(cols[, support]) Finding frequent items for columns,

possibly with false positives.

DataFrame.groupBy(*cols) Groups the DataFrame using the specified

columns, so we can run aggregation on
them.

DataFrame.head([n]) Returns the first n rows.

DataFrame.hint(name, *parameters) Specifies srow(ome hint on the current

DataFrame.

DataFrame.inputFiles() Returns a best-effort snapshot of the files

that compose this DataFrame.

DataFrame.intersect(other) Return a new DataFrame containing rows

only in both this DataFrame and another
DataFrame.
DataFrame.intersectAll(other) Return a new DataFrame containing rows
in both this DataFrame and another
DataFrame while preserving duplicates.

DataFrame.isEmpty() Checks if the DataFrame is empty and

returns a boolean value.

DataFrame.isLocal() Returns True if the collect() and take()

methods can be run locally (without any
Spark executors).

DataFrame.isStreaming Returns True if this DataFrame contains

one or more sources that continuously
return data as it arrives.

DataFrame.join(other[, on, how]) Joins with another DataFrame, using the

given join expression.

DataFrame.limit(num) Limits the result count to the number

specified.

DataFrame.localCheckpoint([eager]) Returns a locally checkpointed version of

this DataFrame.

DataFrame.mapInPandas(func, schema[, Maps an iterator of batches in the current

barrier]) DataFrame using a Python native function
that takes and outputs a pandas
DataFrame, and returns the result as a
DataFrame.

DataFrame.mapInArrow(func, schema[, Maps an iterator of batches in the current

barrier]) DataFrame using a Python native function
that takes and outputs a PyArrow’s
RecordBatch, and returns the result as a
DataFrame.

DataFrame.melt(ids, values, …) Unpivot a DataFrame from wide format to

long format, optionally leaving identifier
columns set.

DataFrame.na Returns a DataFrameNaFunctions for

handling missing values.
DataFrame.observe(observation, *exprs) Define (named) metrics to observe on the
DataFrame.

DataFrame.offset(num) Returns a new :class: DataFrame by

skipping the first n rows.

DataFrame.orderBy(*cols, **kwargs) Returns a new DataFrame sorted by the

specified column(s).

DataFrame.persist([storageLevel]) Sets the storage level to persist the

contents of the DataFrame across
operations after the first time it is
computed.

DataFrame.printSchema([level]) Prints out the schema in the tree format.

DataFrame.randomSplit(weights[, seed]) Randomly splits this DataFrame with the

provided weights.

DataFrame.rdd Returns the content as an pyspark.RDD of

Row.

DataFrame.registerTempTable(name) Registers this DataFrame as a temporary

table using the given name.

DataFrame.repartition(numPartitions, Returns a new DataFrame partitioned by

*cols) the given partitioning expressions.

DataFrame.repartitionByRange(numPa Returns a new DataFrame partitioned by

rtitions, …) the given partitioning expressions.

DataFrame.replace(to_replace[, value, Returns a new DataFrame replacing a

subset]) value with another value.

DataFrame.rollup(*cols) Create a multi-dimensional rollup for the

current DataFrame using the specified
columns, so we can run aggregation on
them.
DataFrame.sameSemantics(other) Returns True when the logical query plans
inside both DataFrames are equal and
therefore return the same results.

DataFrame.sample([withReplacement, Returns a sampled subset of this

…]) DataFrame.

DataFrame.sampleBy(col, fractions[, Returns a stratified sample without

seed]) replacement based on the fraction given
on each stratum.

DataFrame.schema Returns the schema of this DataFrame as a

pyspark.sql.types.StructType.

DataFrame.select(*cols) Projects a set of expressions and returns a

new DataFrame.

DataFrame.selectExpr(*expr) Projects a set of SQL expressions and

returns a new DataFrame.

DataFrame.semanticHash() Returns a hash code of the logical query

plan against this DataFrame.

DataFrame.show([n, truncate, vertical]) Prints the first n rows to the console.

DataFrame.sort(*cols, **kwargs) Returns a new DataFrame sorted by the

specified column(s).

DataFrame.sortWithinPartitions(*cols Returns a new DataFrame with each

, **kwargs) partition sorted by the specified column(s).

DataFrame.sparkSession Returns Spark session that created this

DataFrame.

DataFrame.stat Returns a DataFrameStatFunctions for

statistic functions.

DataFrame.storageLevel Get the DataFrame’s current storage level.

DataFrame.subtract(other) Return a new DataFrame containing rows
in this DataFrame but not in another
DataFrame.

DataFrame.summary(*statistics) Computes specified statistics for numeric

and string columns.

DataFrame.tail(num) Returns the last num rows as a list of Row.

DataFrame.take(num) Returns the first num rows as a list of Row.

DataFrame.to(schema) Returns a new DataFrame where each row

is reconciled to match the specified
schema.

DataFrame.toDF(*cols) Returns a new DataFrame that with new

specified column names

DataFrame.toJSON([use_unicode]) Converts a DataFrame into a RDD of string.

DataFrame.toLocalIterator([prefetchP Returns an iterator that contains all of the

artitions]) rows in this DataFrame.

DataFrame.toPandas() Returns the contents of this DataFrame as

Pandas pandas.DataFrame.

DataFrame.to_pandas_on_spark([index_
col])

DataFrame.transform(func, *args, Returns a new DataFrame.

**kwargs)

DataFrame.union(other) Return a new DataFrame containing the

union of rows in this and another
DataFrame.

DataFrame.unionAll(other) Return a new DataFrame containing the

union of rows in this and another
DataFrame.
DataFrame.unionByName(other[, …]) Returns a new DataFrame containing a
union of rows in this and another
DataFrame.

DataFrame.unpersist([blocking]) Marks the DataFrame as non-persistent,

and removes all blocks for it from memory
and disk.

DataFrame.unpivot(ids, values, …) Unpivot a DataFrame from wide format to

long format, optionally leaving identifier
columns set.

DataFrame.where(condition) where() is an alias for filter().

DataFrame.withColumn(colName, col) Returns a new DataFrame by adding a

column or replacing the existing column
that has the same name.

DataFrame.withColumns(*colsMap) Returns a new DataFrame by adding

multiple columns or replacing the existing
columns that have the same names.

DataFrame.withColumnRenamed(existing, Returns a new DataFrame by renaming an

new) existing column.

DataFrame.withColumnsRenamed(colsMa Returns a new DataFrame by renaming

p) multiple columns.

DataFrame.withMetadata(columnName, Returns a new DataFrame by updating an

metadata) existing column with metadata.

DataFrame.withWatermark(eventTime, Defines an event time watermark for this

…) DataFrame.

DataFrame.write Interface for saving the content of the

non-streaming DataFrame out into external
storage.

DataFrame.writeStream Interface for saving the content of the

streaming DataFrame out into external
storage.
DataFrame.writeTo(table) Create a write configuration builder for v2
sources.

DataFrame.pandas_api([index_col]) Converts the existing DataFrame into a

pandas-on-Spark DataFrame.

DataFrameNaFunctions.drop([how, Returns a new DataFrame omitting rows

thresh, subset]) with null values.

DataFrameNaFunctions.fill(value[, Replace null values, alias for na.fill().

subset])

DataFrameNaFunctions.replace(to_repl Returns a new DataFrame replacing a

ace[, …]) value with another value.

DataFrameStatFunctions.approxQuanti Calculates the approximate quantiles of

le(col, …) numerical columns of a DataFrame.

DataFrameStatFunctions.corr(col1, Calculates the correlation of two columns

col2[, method]) of a DataFrame as a double value.

DataFrameStatFunctions.cov(col1, Calculate the sample covariance for the

col2) given columns, specified by their names,
as a double value.

DataFrameStatFunctions.crosstab(col1 Computes a pair-wise frequency table of

, col2) the given columns.

DataFrameStatFunctions.freqItems(col Finding frequent items for columns,

s[, support]) possibly with false positives.

DataFrameStatFunctions.sampleBy(col, Returns a stratified sample without

fractions) replacement based on the fraction given
on each stratum.

PD 2 Preview
0% (3)
PD 2 Preview
32 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
PySpark Data Frame Questions PDF
100% (1)
PySpark Data Frame Questions PDF
57 pages
SQL Vs PySpark 1678871778
No ratings yet
SQL Vs PySpark 1678871778
8 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Reteach The Real Number System
No ratings yet
Reteach The Real Number System
1 page
Apache Spark Builtin Functions
No ratings yet
Apache Spark Builtin Functions
9 pages
SQL Cheat Sheet Python
No ratings yet
SQL Cheat Sheet Python
1 page
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
21 pages
SQL To Pyspark Conversion
No ratings yet
SQL To Pyspark Conversion
9 pages
SQL PySpark Cheat Sheet 1731729790
No ratings yet
SQL PySpark Cheat Sheet 1731729790
9 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Databricks vs SQL Cheat Sheet
No ratings yet
Databricks vs SQL Cheat Sheet
11 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Learningthepandaslibrary PDF
100% (1)
Learningthepandaslibrary PDF
233 pages
SQL & pySPARK
No ratings yet
SQL & pySPARK
9 pages
SQL vs Pyspark-1
No ratings yet
SQL vs Pyspark-1
9 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Techniques
No ratings yet
Techniques
31 pages
SQL_ &_PYSPAK
No ratings yet
SQL_ &_PYSPAK
6 pages
SQL and PySpark
No ratings yet
SQL and PySpark
80 pages
02. Python Pandas - 2 2020-21
No ratings yet
02. Python Pandas - 2 2020-21
21 pages
Pyspark SQL Basics Cheat Sheet: Python For Data Science
No ratings yet
Pyspark SQL Basics Cheat Sheet: Python For Data Science
1 page
Big Data Analytics in Apache Spark
No ratings yet
Big Data Analytics in Apache Spark
79 pages
Data Frames
No ratings yet
Data Frames
12 pages
IP Imp Notes
No ratings yet
IP Imp Notes
5 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
Python Vocabularies
100% (1)
Python Vocabularies
101 pages
Data Engineering 101 SQL and PySpark 1727161935
No ratings yet
Data Engineering 101 SQL and PySpark 1727161935
58 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Data Analysis With SQL: Postgresql Cheat Sheet
No ratings yet
Data Analysis With SQL: Postgresql Cheat Sheet
4 pages
lecture-week2
No ratings yet
lecture-week2
72 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
PySpark SQL Functions-10-03
No ratings yet
PySpark SQL Functions-10-03
357 pages
PySpark Entity Resolution
No ratings yet
PySpark Entity Resolution
5 pages
IP XII U1 Ch3 DataHandling (DataFrame) Final
No ratings yet
IP XII U1 Ch3 DataHandling (DataFrame) Final
45 pages
Data Science Professional
No ratings yet
Data Science Professional
21 pages
Pandas Cheat Sheet Serves
No ratings yet
Pandas Cheat Sheet Serves
20 pages
Expr_SelectExpr_In_Pyspark
No ratings yet
Expr_SelectExpr_In_Pyspark
9 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Top 100 Pyspark Functions for Data Engineers 1738131847
No ratings yet
Top 100 Pyspark Functions for Data Engineers 1738131847
30 pages
DS Practical
No ratings yet
DS Practical
30 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Python Tips For Data Scientist
No ratings yet
Python Tips For Data Scientist
87 pages
Python For DA
100% (2)
Python For DA
47 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
83% (12)
Pandas Cheat Sheet
2 pages
Pandas_Notes_Design
No ratings yet
Pandas_Notes_Design
5 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
jQuery 1.4 Reference Guide
From Everand
jQuery 1.4 Reference Guide
Jonathan Chaffer
3.5/5 (2)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Exercises of Tensors
From Everand
Exercises of Tensors
Simone Malacrida
No ratings yet
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Geometry in Real Life Quarter 2 Lessons 7-10-22
No ratings yet
Geometry in Real Life Quarter 2 Lessons 7-10-22
214 pages
Locus of Mid Point and Point Points Above and Below It of Falling Ladder ..
No ratings yet
Locus of Mid Point and Point Points Above and Below It of Falling Ladder ..
8 pages
Selina Concise Maths Solutions Class 7 Chapter 2 Rational Numbers
No ratings yet
Selina Concise Maths Solutions Class 7 Chapter 2 Rational Numbers
77 pages
t1
No ratings yet
t1
6 pages
Basic Polynomial Class Notes
No ratings yet
Basic Polynomial Class Notes
29 pages
QP (12-Dec-2018) With 600 Questions
No ratings yet
QP (12-Dec-2018) With 600 Questions
38 pages
Symmetry Reflection and Rotation
No ratings yet
Symmetry Reflection and Rotation
8 pages
Grade 9 Lesson Plan (All Topics)
No ratings yet
Grade 9 Lesson Plan (All Topics)
33 pages
Curriculum Writing Grade 9 4th Quarter Morta
86% (7)
Curriculum Writing Grade 9 4th Quarter Morta
23 pages
Magoosh GRE Math Formula Ebook
100% (1)
Magoosh GRE Math Formula Ebook
33 pages
Preparation For Calculus and Functions: Pamantasan NG Cabuyao
No ratings yet
Preparation For Calculus and Functions: Pamantasan NG Cabuyao
7 pages
Slope
No ratings yet
Slope
6 pages
Angle Measurement 3
No ratings yet
Angle Measurement 3
15 pages
Quadrilaterals Question
No ratings yet
Quadrilaterals Question
5 pages
Maths Fractions Presentation
No ratings yet
Maths Fractions Presentation
25 pages
Instant Access To Algebra Booster For JEE Main and Advanced Rejaul Makshud McGraw Hill 3e Edition Rejaul Makshud - Ebook PDF Ebook Full Chapters
100% (5)
Instant Access To Algebra Booster For JEE Main and Advanced Rejaul Makshud McGraw Hill 3e Edition Rejaul Makshud - Ebook PDF Ebook Full Chapters
41 pages
ALP DOCU FOR FINAL OUTPUT - PDF 1
No ratings yet
ALP DOCU FOR FINAL OUTPUT - PDF 1
10 pages
Aalgtrig w1 Real Numbers
No ratings yet
Aalgtrig w1 Real Numbers
38 pages
IT25
No ratings yet
IT25
14 pages
CSC Reviewer Prof and Sub-Prof - Docx Version 1
No ratings yet
CSC Reviewer Prof and Sub-Prof - Docx Version 1
5 pages
2018 04 13 17 22 08
No ratings yet
2018 04 13 17 22 08
4 pages
LQ2 Gen Math
No ratings yet
LQ2 Gen Math
2 pages
5 - Maths Activity Mats Set 2 - Highest Ability
No ratings yet
5 - Maths Activity Mats Set 2 - Highest Ability
4 pages
Newton Raphson Method
No ratings yet
Newton Raphson Method
4 pages
I Am Sharing 'IX CH 2and 3 Assignment' With You
No ratings yet
I Am Sharing 'IX CH 2and 3 Assignment' With You
3 pages
Mathematics: Quarter 1 - Module 9
75% (4)
Mathematics: Quarter 1 - Module 9
33 pages
CE Board Nov 2020 - Surveying - Set 6
No ratings yet
CE Board Nov 2020 - Surveying - Set 6
2 pages
27th Russian Mathematics Olympiad: 10 April 2001
No ratings yet
27th Russian Mathematics Olympiad: 10 April 2001
5 pages
Lecture Sheet 12 & 13 - Inequalities and Absolute Values-1
No ratings yet
Lecture Sheet 12 & 13 - Inequalities and Absolute Values-1
11 pages

Methods & Function in Databricks

Uploaded by

Methods & Function in Databricks

Uploaded by

Column Method

Column.__getattr__(item) An expression that gets an item at position

Column.__getitem__(k) An expression that gets an item at position

Column.alias(*alias, **kwargs) Returns this column aliased with a new name or

Column.asc() Returns a sort expression based on the

Column.asc_nulls_first() Returns a sort expression based on the

Column.asc_nulls_last() Returns a sort expression based on the

Column.astype(dataType) astype() is an alias for cast().

Column.between(lowerBound, True if the current column is between the lower

Column.bitwiseAND(other) Compute bitwise AND of this expression with

Column.bitwiseOR(other) Compute bitwise OR of this expression with

Column.bitwiseXOR(other) Compute bitwise XOR of this expression with

Column.cast(dataType) Casts the column into type dataType.

Column.desc() Returns a sort expression based on the

Column.desc_nulls_first() Returns a sort expression based on the

Column.desc_nulls_last() Returns a sort expression based on the

Column.dropFields(*fieldNames) An expression that drops fields in StructType by

Column.endswith(other) String ends with.

Column.eqNullSafe(other) Equality test that is safe for null values.

Column.getField(name) An expression that gets a field by name in a

Column.getItem(key) An expression that gets an item at position

Column.ilike(other) SQL ILIKE expression (case insensitive LIKE).

Column.isNotNull() True if the current expression is NOT null.

Column.isNull() True if the current expression is null.

Column.isin(*cols) A boolean expression that is evaluated to true if

Column.like(other) SQL like expression.

Column.otherwise(value) Evaluates a list of conditions and returns one of

Column.over(window) Define a windowing column.

Column.rlike(other) SQL RLIKE expression (LIKE with Regex).

Column.startswith(other) String starts with.

Column.substr(startPos, length) Return a Column which is a substring of the

Column.when(condition, value) Evaluates a list of conditions and returns one of

Column.withField(fieldName, An expression that adds/replaces a field in

column(col) Returns a Column based on the given column name.

lit(col) Creates a Column of literal value.

broadcast(df) Marks a DataFrame as small enough for use in broadcast

isnan(col) An expression that returns true if the column is NaN.

isnull(col) An expression that returns true if the column is null.

monotonically_increas A column that generates monotonically increasing 64-bit

nanvl(col1, col2) Returns col1 if it is not NaN, or col2 if col1 is NaN.

rand([seed]) Generates a random column with independent and

randn([seed]) Generates a column with independent and identically

spark_partition_id() A column for partition ID.

when(condition, value) Evaluates a list of conditions and returns one of multiple

bitwise_not(col) Computes bitwise not.

bitwiseNOT(col) Computes bitwise not.

expr(str) Parses the expression string into the column that it

greatest(*cols) Returns the greatest value of the list of column names,

abs(col) Computes the absolute value.

acos(col) Computes inverse cosine of the input column.

acosh(col) Computes inverse hyperbolic cosine of the input column.

asin(col) Computes inverse sine of the input column.

asinh(col) Computes inverse hyperbolic sine of the input column.

atan(col) Compute inverse tangent of the input column.

atanh(col) Computes inverse hyperbolic tangent of the input column.

atan2(col1, col2) New in version 1.4.0.

cbrt(col) Computes the cube-root of the given value.

ceil(col) Computes the ceiling of the given value.

cos(col) Computes cosine of the input column.

cosh(col) Computes hyperbolic cosine of the input column.

cot(col) Computes cotangent of the input column.

csc(col) Computes cosecant of the input column.

exp(col) Computes the exponential of the given value.

expm1(col) Computes the exponential of the given value minus one.

factorial(col) Computes the factorial of the given value.

floor(col) Computes the floor of the given value.

hex(col) Computes hex value of the given column, which could be

unhex(col) Inverse of hex.

hypot(col1, col2) Computes sqrt(a^2 + b^2) without intermediate overflow or

log(arg1[, arg2]) Returns the first argument-based logarithm of the second

log10(col) Computes the logarithm of the given value in Base 10.

log2(col) Returns the base-2 logarithm of the argument.

pmod(dividend, Returns the positive value of dividend mod divisor.

sec(col) Computes secant of the input column.

shiftleft(col, Shift the given value numBits left.

shiftright(col, (Signed) shift the given value numBits right.

Column.getattr(item) An expression that gets an item at position

Column.getitem(k) An expression that gets an item at position