diff --git a/UseCases/Graph_Analysis/Graph_Analysis_PY_SQL.ipynb b/UseCases/Graph_Analysis/Graph_Analysis_PY_SQL.ipynb index d9a59a58..245bc87b 100644 --- a/UseCases/Graph_Analysis/Graph_Analysis_PY_SQL.ipynb +++ b/UseCases/Graph_Analysis/Graph_Analysis_PY_SQL.ipynb @@ -19,21 +19,21 @@ "id": "05bc242a-7ee9-4b33-b837-ff52a9cd76d2", "metadata": {}, "source": [ - "
Introduction
\n", + "Introduction
\n", "\n", - "Call Detail Records (CDRs) contain valuable information about communication patterns and interactions between users. By leveraging community detection algorithms on CDR data, businesses can gain insights into the underlying network structure and uncover meaningful communities within their user base.
\n", + "Call Detail Records (CDRs) contain valuable information about communication patterns and interactions between users. By leveraging community detection algorithms on CDR data, businesses can gain insights into the underlying network structure and uncover meaningful communities within their user base.
\n", "\n", - "The objective of this analysis is to identify distinct communities or groups of users within the CDR network. Communities are like smaller social circles or friend groups within the larger group of friends. It helps us understand how people naturally form different clusters based on their interactions and relationships. This analysis also identifies influential people in the graph.\n", + "
The objective of this analysis is to identify distinct communities or groups of users within the CDR network. Communities are like smaller social circles or friend groups within the larger group of friends. It helps us understand how people naturally form different clusters based on their interactions and relationships. This analysis also identifies influential people in the graph.\n",
"
\n",
"
\n",
"By grouping users into communities based on their calling patterns, the business can better understand the dynamics and relationships among users, leading to several potential applications and benefits like Customer Segmentation, Fraud Detection, Network Optimization, Cross-Selling and Upselling Opportunities, Customer Support and Retention, etc.\n",
"
In this demo, we'll be using Script Table Operator(STO) to execute custom python scripts on Vantage. The STO operates by executing R and Python scripts from the command line of the Advanced SQL Engine underlying operating system, according to\n", + "
In this demo, we'll be using Script Table Operator(STO) to execute custom python scripts on Vantage. The STO operates by executing R and Python scripts from the command line of the Advanced SQL Engine underlying operating system, according to\n", "the following sequence:\n", "
\n", "\n", - "Downloading and installing additional software needed" + "
Downloading and installing additional software needed" ] }, { @@ -64,8 +64,7 @@ "source": [ "%%capture\n", "# '%%capture' suppresses the display of installation steps of the following packages\n", - "!pip install python-louvain\n", - "!pip install mplcursors" + "!pip install python-louvain" ] }, { @@ -74,10 +73,10 @@ "metadata": {}, "source": [ "
Note: The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: 0 0
\n", + "Note: The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: 0 0
\n", "Here, we import the required libraries, set environment variables and environment paths (if required).
" + "Here, we import the required libraries, set environment variables and environment paths (if required).
" ] }, { @@ -94,9 +93,7 @@ "import teradataml\n", "\n", "import community\n", - "import matplotlib.pyplot as plt\n", "import warnings\n", - "import mplcursors\n", "\n", "# Suppress warnings\n", "warnings.filterwarnings('ignore')" @@ -107,9 +104,9 @@ "id": "ced12adf-b2c4-49aa-b5c8-67dd090ff080", "metadata": {}, "source": [ - "You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.
" + "You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.
" ] }, { @@ -140,7 +137,7 @@ "id": "3732eef0-96bd-417b-95e3-403c1999afa8", "metadata": {}, "source": [ - "Begin running steps with Shift + Enter keys.
" + "Begin running steps with Shift + Enter keys.
" ] }, { @@ -148,8 +145,8 @@ "id": "1a96f1c6-cb43-440f-9682-a851fe09165f", "metadata": {}, "source": [ - "Getting Data for This Demo
\n", - "We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.
" + "Getting Data for This Demo
\n", + "We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.
" ] }, { @@ -168,7 +165,7 @@ "id": "658a3111-ab72-451c-85c2-de0a5f7a0188", "metadata": {}, "source": [ - "Next is an optional step – if you want to see the status of databases/tables created and space used.
" + "Next is an optional step – if you want to see the status of databases/tables created and space used.
" ] }, { @@ -186,9 +183,9 @@ "id": "e601e5e6-c6b3-466a-9a7b-f5ff16d2610a", "metadata": {}, "source": [ - "Below is a sample of the data provided, where 'fromuserid' represents the caller and 'touserid' represents the callee.
" + "Below is a sample of the data provided, where 'fromuserid' represents the caller and 'touserid' represents the callee.
" ] }, { @@ -206,10 +203,10 @@ "id": "97340221-0690-4a0f-814b-5a25c14a5c8a", "metadata": {}, "source": [ - "Community Detection using Louvain Algorithm
\n", - "The below cell will perform the following steps:
\n", - "Community Detection using Louvain Algorithm
\n", + "The below cell will perform the following steps:
\n", + "The below cell will run the script installed in the above step and store the result in the communities variable.
" + "The below cell will run the script installed in the above step and store the result in the communities variable.
" ] }, { @@ -279,7 +276,7 @@ "id": "f15a9780-7fb3-4994-9d77-0e4d7a6be492", "metadata": {}, "source": [ - "\n", + "
\n",
"We have a big group of customers, and they all like to talk to each other on the phone. When we talk about communities, we are interested in finding groups of users who are closely connected to each other and interact more frequently among themselves.\n",
"
\n",
"
\n",
@@ -291,13 +288,13 @@
"id": "b95dfc16-8e0d-4729-80d7-58d582ab8a75",
"metadata": {},
"source": [
- "
Eigenvector Centrality
\n", - "Eigenvector Centrality is an algorithm that measures the transitive influence of nodes. Relationships originating from high-scoring nodes contribute more to the score of a node than connections from low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.\n", + "
Eigenvector Centrality
\n", + "Eigenvector Centrality is an algorithm that measures the transitive influence of nodes. Relationships originating from high-scoring nodes contribute more to the score of a node than connections from low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.\n",
"
\n",
"
\n",
"The below cell will perform the following steps:
The below cell will run the script installed in the above step and store the result in the centralities variable.
" + "The below cell will run the script installed in the above step and store the result in the centralities variable.
" ] }, { @@ -363,7 +360,7 @@ "id": "b51bfbc5-66a4-4eda-b7f8-b5a7bd93abb0", "metadata": {}, "source": [ - "\n", + "
\n", "We have a large group of customers, and they to talk to each other on the phone. Some of the customers are very popular and talk to lots of other customers, while others talk to only a few customers. Eigenvector centrality is a way to measure how important or popular each person is in this group based on the phone calls they make. So, in our graph with phone calls, eigenvector centrality helps us identify the people who are most connected to others and who have important connections. These people are considered more influential or popular in the group. This information can be used to efficiently target the the influential users and in turn the respective communities.
" ] }, @@ -372,13 +369,13 @@ "id": "4e691b4f-0599-40cc-992d-2d12b02544a1", "metadata": {}, "source": [ - "Betweenness Centrality
\n", - "Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another. The algorithm calculates shortest paths between all pairs of nodes in a graph.\n", + "
Betweenness Centrality
\n", + "Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another. The algorithm calculates shortest paths between all pairs of nodes in a graph.\n",
"
\n",
"
\n",
"The below cell will perform the following steps:
The below cell will run the script installed in the above step and store the result in the betweenness variable.
" + "The below cell will run the script installed in the above step and store the result in the betweenness variable.
" ] }, { @@ -444,7 +441,7 @@ "id": "7ae8c526-04af-441f-aa8f-2a8180b49c60", "metadata": {}, "source": [ - "\n", + "
\n",
"We have a group of customers, and they all like to talk to each other on the phone. Betweenness centrality is a way to measure how important or influential you are in this group based on the phone calls made by everyone. So, if you have a lot of customers who rely on you to connect with each other, it means you have high betweenness centrality. You're like a central hub in the group, helping users communicate and making sure everyone stays connected.\n",
"
\n",
"
\n",
@@ -456,13 +453,13 @@
"id": "9bf951e4-016b-4fbf-b82f-29e3aac86ca0",
"metadata": {},
"source": [
- "
Closeness Centrality
\n", - "Closeness centrality is a way of detecting nodes that are able to spread information very efficiently through a graph. The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.\n", + "
Closeness Centrality
\n", + "Closeness centrality is a way of detecting nodes that are able to spread information very efficiently through a graph. The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Nodes with a high closeness score have the shortest distances to all other nodes.\n",
"
\n",
"
\n",
"The below cell will perform the following steps:
The below cell will run the script installed in the above step and store the result in the closeness variable.
" + "The below cell will run the script installed in the above step and store the result in the closeness variable.
" ] }, { @@ -528,7 +525,7 @@ "id": "418b16e6-6090-4169-ad45-066df91b56d3", "metadata": {}, "source": [ - "\n", + "
\n",
"We have a group of customers, and you all enjoy talking to each other on the phone. Closeness centrality is a way to measure how close or connected you are to all users in the group. When we talk about closeness centrality, we are interested in figuring out how quickly you can reach all the users when you make a phone call. If you can reach the users easily and quickly, then you have high closeness centrality.\n",
"
\n",
"
\n",
@@ -541,91 +538,114 @@
"id": "2b5b1d8b-18b5-4236-b05e-83d265030882",
"metadata": {},
"source": [
- "
Note: Please hover over the nodes to see additional information.
\n", + "Note: Please hover over the nodes to see additional information.
\n", "The above graph displays the data in graph format. On hovering on the node, you might see Customer ID and the EigenVector Centrality Score i.e., the influence score. The larger nodes are influential and are connected to other influential nodes. These are the leader nodes of the respective communities.\n", + "
The above graph displays the data in graph format. On hovering on the node, you might see Customer ID, the EigenVector Centrality Score i.e., the influence score and the Community ID. The larger nodes are influential and are connected to other influential nodes. These are the leader nodes of the respective communities.\n",
"
\n",
"
Targeting the leader of the communities in a telecom dataset can provide several benefits to a telecom company. Here are some ways it can help:
In summary, targeting leaders of telecom communities enables a company to tap into their influence, leverage word-of-mouth marketing, gain valuable insights, foster partnerships, and enhance customer satisfaction. These efforts can result in increased brand visibility, customer acquisition, and long-term success for the telecom company.
\n" + "In summary, targeting leaders of telecom communities enables a company to tap into their influence, leverage word-of-mouth marketing, gain valuable insights, foster partnerships, and enhance customer satisfaction. These efforts can result in increased brand visibility, customer acquisition, and long-term success for the telecom company.
\n" ] }, { @@ -667,8 +687,8 @@ "id": "4a811fed-0b58-4770-ae1f-aae0e6b9be23", "metadata": {}, "source": [ - "Databases and Tables
\n", - "The following code will clean up tables and databases created above.
" + "Databases and Tables
\n", + "The following code will clean up tables and databases created above.
" ] }, { @@ -732,7 +752,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.10" + "version": "3.11.14" } }, "nbformat": 4,