Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
77 views

Getting Started With Graph Analysis in Python With Pandas and Networkx

This document provides a tutorial on how to perform graph analysis in Python using Pandas and NetworkX. It demonstrates how to create a graph from a Pandas dataframe by connecting individuals who share the same phone number. The data is cleaned to remove duplicate and self connections. The cleaned data is then used to construct a graph object in NetworkX, which can be analyzed using various graph algorithms.

Uploaded by

ante mitar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Getting Started With Graph Analysis in Python With Pandas and Networkx

This document provides a tutorial on how to perform graph analysis in Python using Pandas and NetworkX. It demonstrates how to create a graph from a Pandas dataframe by connecting individuals who share the same phone number. The data is cleaned to remove duplicate and self connections. The cleaned data is then used to construct a graph object in NetworkX, which can be analyzed using various graph algorithms.

Uploaded by

ante mitar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

1 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

2 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

import pandas as pd

df = pd.DataFrame({'ID':[1,2,3,4,5,6],
'First Name':['Felix', 'Jean', 'James', 'Daphne', 'James', 'Peter'],
'Family Name': ['Revert', 'Durand', 'Wright', 'Hull', 'Conrad', 'Donovan'],
'Phone number': ['+33 6 12 34 56 78', '+33 7 00 00 00 00', '+33 6 12 34 56 78'
'Email': ['felix.revert@gmail.com', 'jean.durand@gmail.com', 'j.custom@gmail.com'

set_up_data.py hosted with by GitHub view raw

3 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

column_edge = 'Phone number'


column_ID = 'ID'

data_to_merge = df[[column_ID, column_edge]].dropna(subset=[column_edge]).drop_duplicates() # select column

# To create connections between people who have the same number,


# join data with itself on the 'ID' column.
data_to_merge = data_to_merge.merge(
data_to_merge[[column_ID, column_edge]].rename(columns={column_ID:column_ID+"_2"}),
on=column_edge
)

connect_individuals.py hosted with by GitHub view raw

4 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

# By joining the data with itself, people will have a connection with themselves.
# Remove self connections, to keep only connected people who are different.
d = data_to_merge[~(data_to_merge[column_ID]==data_to_merge[column_ID+"_2"])] \
.dropna()[[column_ID, column_ID+"_2", column_edge]]

# To avoid counting twice the connections (person 1 connected to person 2 and person 2 connected to person 1
# we force the first ID to be "lower" then ID_2
d.drop(d.loc[d[column_ID+"_2"]<d[column_ID]].index.tolist(), inplace=True)

clean_connections.py hosted with by GitHub view raw

import networkx as nx

G = nx.from_pandas_edgelist(df=d, source=column_ID, target=column_ID+'_2', edge_attr=column_edge)

G.add_nodes_from(nodes_for_adding=df.ID.tolist())

create_graph.py hosted with by GitHub view raw

5 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

6 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

7 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

8 of 8 4/25/2021, 3:23 PM

You might also like