Python - Pandas Merging, Joining, and Concatenating
Python - Pandas Merging, Joining, and Concatenating
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labelled axes (rows and columns). A
Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can join, merge, and concat
dataframe using different methods. In Dataframe df.merge(),df.join(), and df.concat() methods help in joining, merging and concating different dataframe.
In order to concat dataframe, we use concat() function which helps in concatenating a dataframe. We can concat a dataframe in many different ways,
they are:
AD
# using a .concat() method
frames = [df, df1]
res1 = pd.concat(frames)
res1
Output :
As shown in the output image, we have created two dataframe after concatenating we get one dataframe
Taking the union of them all, join='outer'. This is the default option as it results in zero information loss.
Taking the intersection, join='inner'.
Use a speci c index, as passed to the join_axes argument
res2
Output :
As shown in the output image, we get the intersection of dataframe
res2
Output :
As shown in the output image, we get the union of dataframe
# using join_axes
res3 = pd.concat([df, df1], axis=1, join_axes=[df.index])
res3
Output :
res = df.append(df1)
res
Output :
# using ignore_index
res = pd.concat([df, df1], ignore_index=True)
res
Output :
# using keys
frames = [df, df1 ]
Output :
# creating a series
s1 = pd.Series([1000, 2000, 3000, 4000], name='Salary')
res
Output :
Merging DataFrame
Pandas have options for high-performance in-memory merging and joining. When we need to combine very large DataFrames, joins serve as a
powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is
the common column that the two DataFrames will be joined on. It’s a good practice to use keys which have unique values throughout the column to
avoid unintended duplication of row values. Pandas provide a single function, merge(), as the entry point for all standard database join operations
between DataFrame objects.
There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.
res
Output :
res1
Output :
left LEFT OUTER JOIN Use keys from left frame only
right RIGHT OUTER JOIN Use keys from right frame only
outer FULL OUTER JOIN Use union of keys from both frames
Now we set how = 'left' in order to use keys from left frame only.
res
Output :
Now we set how = 'right' in order to use keys from right frame only.
res1
Output :
Now we set how = 'outer' in order to get union of keys from dataframes.
res2
Output :
Now we set how = 'inner' in order to get intersection of keys from dataframes.
res3
Output :
Joining DataFrame
In order to join dataframe, we use .join() function this function is used for combining the columns of two potentially differently-indexed DataFrames
into a single result DataFrame.
# joining dataframe
res = df.join(df1)
res
Output :
# getting union
res1 = df.join(df1, how='outer')
res1
Output :
res2
Output :
result
Output :