Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Basic_python_assignment

The document is an assignment that includes code for data manipulation using pandas and numpy in Python. It presents a table with 100 rows and 15 columns, detailing various metrics such as start and end times, machine IDs, memory assignments, and CPU usage statistics. The data appears to be related to machine performance metrics over time.

Uploaded by

monikanekkanti
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Basic_python_assignment

The document is an assignment that includes code for data manipulation using pandas and numpy in Python. It presents a table with 100 rows and 15 columns, detailing various metrics such as start and end times, machine IDs, memory assignments, and CPU usage statistics. The data appears to be related to machine performance metrics over time.

Uploaded by

monikanekkanti
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 29

{

"cells": [
{
"cell_type": "markdown",
"id": "baa1d9c2",
"metadata": {},
"source": [
"# Assignment1"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ceaf6159",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "22c4a11b",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0.1</th>\n",
" <th>Unnamed: 0</th>\n",
" <th>start_time</th>\n",
" <th>end_time</th>\n",
" <th>machine_id</th>\n",
" <th>collection_id</th>\n",
" <th>instance_index</th>\n",
" <th>assign_memory</th>\n",
" <th>average_memory</th>\n",
" <th>average_cpus</th>\n",
" <th>max_cpus</th>\n",
" <th>max_memory</th>\n",
" <th>page_cache_memory</th>\n",
" <th>cpi</th>\n",
" <th>mpi</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>65</td>\n",
" <td>65</td>\n",
" <td>1032600000000</td>\n",
" <td>1032900000000</td>\n",
" <td>376004609076</td>\n",
" <td>106401189869</td>\n",
" <td>161</td>\n",
" <td>0.176270</td>\n",
" <td>0.001699</td>\n",
" <td>0.000278</td>\n",
" <td>0.001360</td>\n",
" <td>0.001699</td>\n",
" <td>0.000261</td>\n",
" <td>2.442438</td>\n",
" <td>0.018496</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>67</td>\n",
" <td>67</td>\n",
" <td>888900000000</td>\n",
" <td>889200000000</td>\n",
" <td>376470209477</td>\n",
" <td>106401189869</td>\n",
" <td>200</td>\n",
" <td>0.176270</td>\n",
" <td>0.002682</td>\n",
" <td>0.000296</td>\n",
" <td>0.002544</td>\n",
" <td>0.002686</td>\n",
" <td>0.000269</td>\n",
" <td>2.953928</td>\n",
" <td>0.021430</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>69</td>\n",
" <td>69</td>\n",
" <td>2029200000000</td>\n",
" <td>2029500000000</td>\n",
" <td>376004609076</td>\n",
" <td>106401189869</td>\n",
" <td>161</td>\n",
" <td>0.176270</td>\n",
" <td>0.003353</td>\n",
" <td>0.000456</td>\n",
" <td>0.005371</td>\n",
" <td>0.003494</td>\n",
" <td>0.001480</td>\n",
" <td>3.007910</td>\n",
" <td>0.021317</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>71</td>\n",
" <td>71</td>\n",
" <td>1838100000000</td>\n",
" <td>1838400000000</td>\n",
" <td>375997457623</td>\n",
" <td>106401189869</td>\n",
" <td>260</td>\n",
" <td>0.176270</td>\n",
" <td>0.002041</td>\n",
" <td>0.000289</td>\n",
" <td>0.002911</td>\n",
" <td>0.002041</td>\n",
" <td>0.001444</td>\n",
" <td>2.995649</td>\n",
" <td>0.022582</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>75</td>\n",
" <td>75</td>\n",
" <td>837000000000</td>\n",
" <td>837300000000</td>\n",
" <td>376004609076</td>\n",
" <td>106401189869</td>\n",
" <td>161</td>\n",
" <td>0.176270</td>\n",
" <td>0.001915</td>\n",
" <td>0.000218</td>\n",
" <td>0.001211</td>\n",
" <td>0.001917</td>\n",
" <td>0.000260</td>\n",
" <td>2.076526</td>\n",
" <td>0.018584</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>1295</td>\n",
" <td>1295</td>\n",
" <td>1583400000000</td>\n",
" <td>1583700000000</td>\n",
" <td>376470209477</td>\n",
" <td>106401189869</td>\n",
" <td>200</td>\n",
" <td>0.176270</td>\n",
" <td>0.002590</td>\n",
" <td>0.000480</td>\n",
" <td>0.003487</td>\n",
" <td>0.002590</td>\n",
" <td>0.000289</td>\n",
" <td>3.711172</td>\n",
" <td>0.023036</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>1296</td>\n",
" <td>1296</td>\n",
" <td>1114200000000</td>\n",
" <td>1114500000000</td>\n",
" <td>376470209477</td>\n",
" <td>106401189869</td>\n",
" <td>200</td>\n",
" <td>0.176270</td>\n",
" <td>0.002064</td>\n",
" <td>0.000381</td>\n",
" <td>0.002338</td>\n",
" <td>0.002068</td>\n",
" <td>0.000274</td>\n",
" <td>3.421821</td>\n",
" <td>0.023051</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>1307</td>\n",
" <td>1307</td>\n",
" <td>507900000000</td>\n",
" <td>508200000000</td>\n",
" <td>375997113395</td>\n",
" <td>81683188857</td>\n",
" <td>240</td>\n",
" <td>0.018341</td>\n",
" <td>0.004532</td>\n",
" <td>0.008636</td>\n",
" <td>0.037292</td>\n",
" <td>0.004623</td>\n",
" <td>0.000096</td>\n",
" <td>1.463835</td>\n",
" <td>0.005757</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>1308</td>\n",
" <td>1308</td>\n",
" <td>327900000000</td>\n",
" <td>328200000000</td>\n",
" <td>375996998777</td>\n",
" <td>81683188857</td>\n",
" <td>240</td>\n",
" <td>0.018341</td>\n",
" <td>0.004707</td>\n",
" <td>0.008713</td>\n",
" <td>0.028961</td>\n",
" <td>0.004791</td>\n",
" <td>0.000096</td>\n",
" <td>5.973701</td>\n",
" <td>0.081686</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>1312</td>\n",
" <td>1312</td>\n",
" <td>402000000000</td>\n",
" <td>402300000000</td>\n",
" <td>375996998774</td>\n",
" <td>81683188857</td>\n",
" <td>290</td>\n",
" <td>0.018341</td>\n",
" <td>0.004639</td>\n",
" <td>0.007591</td>\n",
" <td>0.036255</td>\n",
" <td>0.004738</td>\n",
" <td>0.000096</td>\n",
" <td>0.983670</td>\n",
" <td>0.002963</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 15 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0.1 Unnamed: 0 start_time end_time
machine_id \\\n",
"0 65 65 1032600000000 1032900000000
376004609076 \n",
"1 67 67 888900000000 889200000000
376470209477 \n",
"2 69 69 2029200000000 2029500000000
376004609076 \n",
"3 71 71 1838100000000 1838400000000
375997457623 \n",
"4 75 75 837000000000 837300000000
376004609076 \n",

".. ... ... ... ... ... \n",


"95 1295 1295 1583400000000 1583700000000
376470209477 \n",
"96 1296 1296 1114200000000 1114500000000
376470209477 \n",
"97 1307 1307 507900000000 508200000000
375997113395 \n",
"98 1308 1308 327900000000 328200000000
375996998777 \n",
"99 1312 1312 402000000000 402300000000
375996998774 \n",
"\n",
" collection_id instance_index assign_memory average_memory \\\n",
"0 106401189869 161 0.176270 0.001699 \n",
"1 106401189869 200 0.176270 0.002682 \n",
"2 106401189869 161 0.176270 0.003353 \n",
"3 106401189869 260 0.176270 0.002041 \n",
"4 106401189869 161 0.176270 0.001915 \n",
".. ... ... ... ... \n",
"95 106401189869 200 0.176270 0.002590 \n",
"96 106401189869 200 0.176270 0.002064 \n",
"97 81683188857 240 0.018341 0.004532 \n",
"98 81683188857 240 0.018341 0.004707 \n",
"99 81683188857 290 0.018341 0.004639 \n",
"\n",
" average_cpus max_cpus max_memory page_cache_memory cpi
mpi \n",
"0 0.000278 0.001360 0.001699 0.000261 2.442438
0.018496 \n",
"1 0.000296 0.002544 0.002686 0.000269 2.953928
0.021430 \n",
"2 0.000456 0.005371 0.003494 0.001480 3.007910
0.021317 \n",
"3 0.000289 0.002911 0.002041 0.001444 2.995649
0.022582 \n",
"4 0.000218 0.001211 0.001917 0.000260 2.076526
0.018584 \n",

".. ... ... ... ... ... ... \


n",
"95 0.000480 0.003487 0.002590 0.000289 3.711172
0.023036 \n",
"96 0.000381 0.002338 0.002068 0.000274 3.421821
0.023051 \n",
"97 0.008636 0.037292 0.004623 0.000096 1.463835
0.005757 \n",
"98 0.008713 0.028961 0.004791 0.000096 5.973701
0.081686 \n",
"99 0.007591 0.036255 0.004738 0.000096 0.983670
0.002963 \n",
"\n",
"[100 rows x 15 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#dataset1\n",
"usage_df=pd.read_csv(\"instanceUsage_data.csv\")\n",
"usage_df"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "807f586f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>time</th>\n",
" <th>status</th>\n",
" <th>machine_id</th>\n",
" <th>collection_id</th>\n",
" <th>instance_index</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9</td>\n",
" <td>2.436050e+12</td>\n",
" <td>10</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.993900e+11</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10</td>\n",
" <td>1.046210e+12</td>\n",
" <td>3</td>\n",
" <td>3.760370e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21</td>\n",
" <td>2.222130e+12</td>\n",
" <td>6</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>37</td>\n",
" <td>3.251860e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>569</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40</td>\n",
" <td>5.326920e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>2033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440085</th>\n",
" <td>4995895</td>\n",
" <td>5.299730e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440086</th>\n",
" <td>4995896</td>\n",
" <td>5.806460e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440087</th>\n",
" <td>4995897</td>\n",
" <td>8.070960e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440088</th>\n",
" <td>4995898</td>\n",
" <td>1.071880e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440089</th>\n",
" <td>4995904</td>\n",
" <td>1.407150e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.840760e+11</td>\n",
" <td>1077</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>440090 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 time status machine_id collection_id \\\
n",
"0 9 2.436050e+12 10 3.759970e+11 3.993900e+11 \
n",
"1 10 1.046210e+12 3 3.760370e+11 3.305870e+11 \
n",
"2 21 2.222130e+12 6 3.759970e+11 3.305870e+11 \
n",
"3 37 3.251860e+11 3 3.764700e+11 3.305870e+11 \
n",
"4 40 5.326920e+11 3 3.759970e+11 3.305870e+11 \
n",
"... ... ... ... ... ... \
n",
"440085 4995895 5.299730e+11 3 3.764700e+11 3.744710e+11 \
n",
"440086 4995896 5.806460e+11 3 3.759970e+11 3.744710e+11 \
n",
"440087 4995897 8.070960e+11 3 3.759970e+11 3.744710e+11 \
n",
"440088 4995898 1.071880e+12 3 3.759970e+11 3.744710e+11 \
n",
"440089 4995904 1.407150e+12 3 3.759970e+11 3.840760e+11 \
n",
"\n",
" instance_index \n",
"0 8 \n",
"1 434 \n",
"2 434 \n",
"3 569 \n",
"4 2033 \n",
"... ... \n",
"440085 8 \n",
"440086 8 \n",
"440087 8 \n",
"440088 8 \n",
"440089 1077 \n",
"\n",
"[440090 rows x 6 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#dataset2\n",
"event_df=pd.read_csv(\"instanceEvent_data (1).csv\")\n",
"event_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d1e8d5a2",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"##**_ Question-1 Assign numeric values (as given in 'Data Description') to the
variables in 'STATUS' attribute. [Hint: Use 'dictionaries' for doing the same.]
Repeat this using normal functions and lambda function. _**##"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "588b23b0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>time</th>\n",
" <th>status</th>\n",
" <th>machine_id</th>\n",
" <th>collection_id</th>\n",
" <th>instance_index</th>\n",
" <th>dict_status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9</td>\n",
" <td>2.436050e+12</td>\n",
" <td>10</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.993900e+11</td>\n",
" <td>8</td>\n",
" <td>UPDATE RUNNING</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10</td>\n",
" <td>1.046210e+12</td>\n",
" <td>3</td>\n",
" <td>3.760370e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21</td>\n",
" <td>2.222130e+12</td>\n",
" <td>6</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" <td>FINISH</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>37</td>\n",
" <td>3.251860e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>569</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40</td>\n",
" <td>5.326920e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>2033</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440085</th>\n",
" <td>4995895</td>\n",
" <td>5.299730e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440086</th>\n",
" <td>4995896</td>\n",
" <td>5.806460e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440087</th>\n",
" <td>4995897</td>\n",
" <td>8.070960e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440088</th>\n",
" <td>4995898</td>\n",
" <td>1.071880e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440089</th>\n",
" <td>4995904</td>\n",
" <td>1.407150e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.840760e+11</td>\n",
" <td>1077</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>440090 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 time status machine_id collection_id \\\
n",
"0 9 2.436050e+12 10 3.759970e+11 3.993900e+11 \
n",
"1 10 1.046210e+12 3 3.760370e+11 3.305870e+11 \
n",
"2 21 2.222130e+12 6 3.759970e+11 3.305870e+11 \
n",
"3 37 3.251860e+11 3 3.764700e+11 3.305870e+11 \
n",
"4 40 5.326920e+11 3 3.759970e+11 3.305870e+11 \
n",
"... ... ... ... ... ... \
n",
"440085 4995895 5.299730e+11 3 3.764700e+11 3.744710e+11 \
n",
"440086 4995896 5.806460e+11 3 3.759970e+11 3.744710e+11 \
n",
"440087 4995897 8.070960e+11 3 3.759970e+11 3.744710e+11 \
n",
"440088 4995898 1.071880e+12 3 3.759970e+11 3.744710e+11 \
n",
"440089 4995904 1.407150e+12 3 3.759970e+11 3.840760e+11 \
n",
"\n",
" instance_index dict_status \n",
"0 8 UPDATE RUNNING \n",
"1 434 SCHEDULE \n",
"2 434 FINISH \n",
"3 569 SCHEDULE \n",
"4 2033 SCHEDULE \n",
"... ... ... \n",
"440085 8 SCHEDULE \n",
"440086 8 SCHEDULE \n",
"440087 8 SCHEDULE \n",
"440088 8 SCHEDULE \n",
"440089 1077 SCHEDULE \n",
"\n",
"[440090 rows x 7 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"# Using dictionary\n",
"status_dict = {\n",
" 0:'SUBMIT',\n",
" 1:'QUEUE',\n",
" 2:'ENABLE',\n",
" 3:'SCHEDULE',\n",
" 4:'EVICT',\n",
" 5:'FAIL',\n",
" 6:'FINISH',\n",
" 7:'KILL',\n",
" 8:'LOST',\n",
" 9:'UPDATE PENDING',\n",
" 10:'UPDATE RUNNING'\n",
"}\n",
"\n",
"event_df['dict_status'] = event_df['status'].map(status_dict)\n",
"event_df"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "49f789e1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>time</th>\n",
" <th>status</th>\n",
" <th>machine_id</th>\n",
" <th>collection_id</th>\n",
" <th>instance_index</th>\n",
" <th>dict_status</th>\n",
" <th>lambda_status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9</td>\n",
" <td>2.436050e+12</td>\n",
" <td>10</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.993900e+11</td>\n",
" <td>8</td>\n",
" <td>UPDATE RUNNING</td>\n",
" <td>UPDATE RUNNING</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10</td>\n",
" <td>1.046210e+12</td>\n",
" <td>3</td>\n",
" <td>3.760370e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21</td>\n",
" <td>2.222130e+12</td>\n",
" <td>6</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" <td>FINISH</td>\n",
" <td>FINISH</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>37</td>\n",
" <td>3.251860e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>569</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40</td>\n",
" <td>5.326920e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>2033</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440085</th>\n",
" <td>4995895</td>\n",
" <td>5.299730e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440086</th>\n",
" <td>4995896</td>\n",
" <td>5.806460e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440087</th>\n",
" <td>4995897</td>\n",
" <td>8.070960e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440088</th>\n",
" <td>4995898</td>\n",
" <td>1.071880e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440089</th>\n",
" <td>4995904</td>\n",
" <td>1.407150e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.840760e+11</td>\n",
" <td>1077</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>440090 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 time status machine_id collection_id \\\
n",
"0 9 2.436050e+12 10 3.759970e+11 3.993900e+11 \
n",
"1 10 1.046210e+12 3 3.760370e+11 3.305870e+11 \
n",
"2 21 2.222130e+12 6 3.759970e+11 3.305870e+11 \
n",
"3 37 3.251860e+11 3 3.764700e+11 3.305870e+11 \
n",
"4 40 5.326920e+11 3 3.759970e+11 3.305870e+11 \
n",
"... ... ... ... ... ... \
n",
"440085 4995895 5.299730e+11 3 3.764700e+11 3.744710e+11 \
n",
"440086 4995896 5.806460e+11 3 3.759970e+11 3.744710e+11 \
n",
"440087 4995897 8.070960e+11 3 3.759970e+11 3.744710e+11 \
n",
"440088 4995898 1.071880e+12 3 3.759970e+11 3.744710e+11 \
n",
"440089 4995904 1.407150e+12 3 3.759970e+11 3.840760e+11 \
n",
"\n",
" instance_index dict_status lambda_status \n",
"0 8 UPDATE RUNNING UPDATE RUNNING \n",
"1 434 SCHEDULE SCHEDULE \n",
"2 434 FINISH FINISH \n",
"3 569 SCHEDULE SCHEDULE \n",
"4 2033 SCHEDULE SCHEDULE \n",
"... ... ... ... \n",
"440085 8 SCHEDULE SCHEDULE \n",
"440086 8 SCHEDULE SCHEDULE \n",
"440087 8 SCHEDULE SCHEDULE \n",
"440088 8 SCHEDULE SCHEDULE \n",
"440089 1077 SCHEDULE SCHEDULE \n",
"\n",
"[440090 rows x 8 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Use lambda function \n",
"status_dict = {\n",
" 0:'SUBMIT',\n",
" 1:'QUEUE',\n",
" 2:'ENABLE',\n",
" 3:'SCHEDULE',\n",
" 4:'EVICT',\n",
" 5:'FAIL',\n",
" 6:'FINISH',\n",
" 7:'KILL',\n",
" 8:'LOST',\n",
" 9:'UPDATE PENDING',\n",
" 10:'UPDATE RUNNING'\n",
"}\n",
"event_df['lambda_status'] = event_df['status'].map(lambda x: status_dict[x])\
n",
"event_df"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6b7dd398",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>time</th>\n",
" <th>status</th>\n",
" <th>machine_id</th>\n",
" <th>collection_id</th>\n",
" <th>instance_index</th>\n",
" <th>dict_status</th>\n",
" <th>lambda_status</th>\n",
" <th>function_status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9</td>\n",
" <td>2.436050e+12</td>\n",
" <td>10</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.993900e+11</td>\n",
" <td>8</td>\n",
" <td>UPDATE RUNNING</td>\n",
" <td>UPDATE RUNNING</td>\n",
" <td>UPDATE RUNNING</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10</td>\n",
" <td>1.046210e+12</td>\n",
" <td>3</td>\n",
" <td>3.760370e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21</td>\n",
" <td>2.222130e+12</td>\n",
" <td>6</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>434</td>\n",
" <td>FINISH</td>\n",
" <td>FINISH</td>\n",
" <td>FINISH</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>37</td>\n",
" <td>3.251860e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>569</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40</td>\n",
" <td>5.326920e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.305870e+11</td>\n",
" <td>2033</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440085</th>\n",
" <td>4995895</td>\n",
" <td>5.299730e+11</td>\n",
" <td>3</td>\n",
" <td>3.764700e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440086</th>\n",
" <td>4995896</td>\n",
" <td>5.806460e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440087</th>\n",
" <td>4995897</td>\n",
" <td>8.070960e+11</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440088</th>\n",
" <td>4995898</td>\n",
" <td>1.071880e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.744710e+11</td>\n",
" <td>8</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" <tr>\n",
" <th>440089</th>\n",
" <td>4995904</td>\n",
" <td>1.407150e+12</td>\n",
" <td>3</td>\n",
" <td>3.759970e+11</td>\n",
" <td>3.840760e+11</td>\n",
" <td>1077</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" <td>SCHEDULE</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>440090 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 time status machine_id collection_id \\\
n",
"0 9 2.436050e+12 10 3.759970e+11 3.993900e+11 \
n",
"1 10 1.046210e+12 3 3.760370e+11 3.305870e+11 \
n",
"2 21 2.222130e+12 6 3.759970e+11 3.305870e+11 \
n",
"3 37 3.251860e+11 3 3.764700e+11 3.305870e+11 \
n",
"4 40 5.326920e+11 3 3.759970e+11 3.305870e+11 \
n",
"... ... ... ... ... ... \
n",
"440085 4995895 5.299730e+11 3 3.764700e+11 3.744710e+11 \
n",
"440086 4995896 5.806460e+11 3 3.759970e+11 3.744710e+11 \
n",
"440087 4995897 8.070960e+11 3 3.759970e+11 3.744710e+11 \
n",
"440088 4995898 1.071880e+12 3 3.759970e+11 3.744710e+11 \
n",
"440089 4995904 1.407150e+12 3 3.759970e+11 3.840760e+11 \
n",
"\n",
" instance_index dict_status lambda_status function_status \
n",
"0 8 UPDATE RUNNING UPDATE RUNNING UPDATE RUNNING \
n",
"1 434 SCHEDULE SCHEDULE SCHEDULE \
n",
"2 434 FINISH FINISH FINISH \
n",
"3 569 SCHEDULE SCHEDULE SCHEDULE \
n",
"4 2033 SCHEDULE SCHEDULE SCHEDULE \
n",
"... ... ... ... ... \
n",
"440085 8 SCHEDULE SCHEDULE SCHEDULE \
n",
"440086 8 SCHEDULE SCHEDULE SCHEDULE \
n",
"440087 8 SCHEDULE SCHEDULE SCHEDULE \
n",
"440088 8 SCHEDULE SCHEDULE SCHEDULE \
n",
"440089 1077 SCHEDULE SCHEDULE SCHEDULE \
n",
"\n",
"[440090 rows x 9 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Using user defined function \n",
"def assign_status_string(status):\n",
" if status == 0:\n",
" return 'SUBMIT'\n",
" elif status == 1:\n",
" return 'QUEUE'\n",
" elif status == 2:\n",
" return 'ENABLE'\n",
" elif status == 3:\n",
" return 'SCHEDULE'\n",
" elif status == 4:\n",
" return 'EVICT'\n",
" elif status == 5:\n",
" return 'FAIL'\n",
" elif status == 6:\n",
" return 'FINISH'\n",
" elif status == 7:\n",
" return 'KILL'\n",
" elif status == 8:\n",
" return 'LOST'\n",
" elif status == 9:\n",
" return 'UPDATE PENDING'\n",
" elif status == 10:\n",
" return 'UPDATE RUNNING'\n",
"\n",
"# Apply function to STATUS column using apply function\n",
"event_df['function_status'] = event_df['status'].apply(assign_status_string)\
n",
"event_df"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3d4c92aa",
"metadata": {},
"outputs": [],
"source": [
"###Question-2 Sorting of the column 'machine_id'(write a function for sorting)
without using any inbuilt sorting functions. ###"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ad6a75a",
"metadata": {},
"outputs": [],
"source": [
"def sort_by_machineid(event_df):\n",
" n = len(event_df)\n",
" \n",
" # Traverse through all elements of DataFrame\n",
" for i in range(n-1):\n",
" \n",
" # Last i elements are already sorted\n",
" for j in range(0, n-i-1):\n",
" \n",
" # Swap if the element found is greater than the next element\n",
" if event_df.loc[j, 'machine_id'] > event_df.loc[j+1,
'machine_id']:\n",
" event_df.loc[j], event_df.loc[j+1] = event_df.loc[j+1],
event_df.loc[j]\n",
" \n",
" return event_df\n",
"result=sort_by_machineid(event_df)\n",
"result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d1859cb",
"metadata": {},
"outputs": [],
"source": [
"def search_averagecpus(usage_df):\n",
" results = []\n",
" for index, row in usage_df.iterrows():\n",
" maxcpus = row['max_cpus']\n",
" if maxcpus == 0: # prevent division by zero\n",
" continue\n",
" ratio = row['average_cpus'] / maxcpus\n",
" if 0.4 <= ratio <= 0.5:\n",
" results.append(row['average_cpus'])\n",
" return results\n",
"results = search_averagecpus(usage_df)\n",
"#print(results) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d1a0924",
"metadata": {},
"outputs": [],
"source": [
"def swap_cpi_min(usage_df):\n",
" cpi_column = usage_df['cpi']\n",
" min_index = 0\n",
" for i in range(1, len(cpi_column)):\n",
" if cpi_column[i] < cpi_column[min_index]:\n",
" min_index = i\n",
" # swap the minimum value with the first value in the column\n",
" cpi_column[0], cpi_column[min_index] = cpi_column[min_index],
cpi_column[0]\n",
" usage_df['cpi'] = cpi_column\n",
" return usage_df\n",
"usage_df = swap_cpi_min(usage_df)\n",
"usage_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b17ad1b",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"def swap_max_with_last(usage_df):\n",
" # Find the index of the maximum value in the 'cpi' column\n",
" max_index = 0\n",
" max_value = float('-inf')\n",
" for i, value in enumerate(usage_df['cpi']):\n",
" if value > max_value:\n",
" max_value = value\n",
" max_index = i\n",
"\n",
" # Swap the maximum value with the last value in the 'cpi' column\n",
" last_index = len(usage_df) - 1\n",
" usage_df.at[max_index, 'cpi'], usage_df.at[last_index, 'cpi'] =
usage_df.at[last_index, 'cpi'], usage_df.at[max_index, 'cpi']\n",
" \n",
" return usage_df\n",
"df_swapped = swap_max_with_last(usage_df)\n",
"df_swapped"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c0838fa6",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"#5a) Add a column named 'failure' which takes a value 1 when averagememory is
greater than or equal to 0.85 of maxmemory \n",
"usage_df['failure'] = usage_df.apply(lambda row: 1 if row['average_memory'] >=
0.85 * row['max_memory'] else 0, axis=1)\n",
"usage_df['failure'].groupby(usage_df['failure']).count()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55a17aca",
"metadata": {},
"outputs": [],
"source": [
"#5b)Form a column which take the Root mean squared AVERAGE value
of \"average_memory\" and \"average_cpus\" columns in a dataframe, name it
as \"average_cpu_memory\"\n",
"import numpy as np\n",
"usage_df['average_cpu_memory'] = np.sqrt((usage_df['average_memory']**2 +
usage_df['average_cpus']**2) / 2)\n",
"usage_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f24e53c2",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Plot a line graph of 'time' vs 'average_cpu_memory'\n",
"plt.scatter(usage_df['start_time'], usage_df['average_cpu_memory'])\n",
"\n",
"# Set the x-axis label and y-axis label\n",
"plt.xlabel('Time')\n",
"plt.ylabel('Average CPU Memory')\n",
"\n",
"# Set the title of the plot\n",
"plt.title('Usage Graph')\n",
"#plt.ylim([0, 0.0005])\n",
"# Display the plot\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74e7c556",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Plot a line graph of 'time' vs 'average_cpu_memory'\n",
"plt.scatter(usage_df['start_time'], usage_df['max_cpus'])\n",
"\n",
"# Set the x-axis label and y-axis label\n",
"plt.xlabel('Time')\n",
"plt.ylabel('Max CPU Memory')\n",
"\n",
"# Set the title of the plot\n",
"plt.title('Usage Graph')\n",
"#plt.ylim([0, 0.0005])\n",
"# Display the plot\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e75a828",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Plot a line graph of 'time' vs 'average_cpu_memory'\n",
"plt.scatter(usage_df['average_memory'], event_df['lambda_status'])\n",
"\n",
"# Set the x-axis label and y-axis label\n",
"plt.xlabel('Average memory')\n",
"plt.ylabel('STATUS')\n",
"\n",
"# Set the title of the plot\n",
"plt.title('Usage Graph')\n",
"#plt.ylim([0, 0.0005])\n",
"# Display the plot\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64b0461d",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Plot a line graph of 'time' vs 'average_cpu_memory'\n",
"plt.scatter(usage_df['average_memory'], usage_df['max_memory'])\n",
"\n",
"# Set the x-axis label and y-axis label\n",
"plt.xlabel('Average memory')\n",
"plt.ylabel('Max memory')\n",
"\n",
"# Set the title of the plot\n",
"plt.title('Usage Graph')\n",
"#plt.ylim([0, 0.0005])\n",
"# Display the plot\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9860bdfa",
"metadata": {},
"source": [
"# Assignment2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa188f38",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# creating a new column 'condition' based on the condition given\n",
"usage_df['condition'] =
usage_df['average_memory'].gt(usage_df['assign_memory']).astype(int)\n",
"usage_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e5e16ef",
"metadata": {},
"outputs": [],
"source": [
"# create a new column named 'division'\n",
"usage_df['division'] = usage_df['average_memory'] / usage_df['assign_memory']\
n",
"\n",
"# replace np.inf values with -1\n",
"usage_df['division'] = usage_df['division'].replace(np.inf, -1)\n",
"usage_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a63776e6",
"metadata": {},
"outputs": [],
"source": [
"# create a histogram of the 'STATUS' column\n",
"plt.hist(event_df['status'], edgecolor='black', alpha=0.5)\n",
"\n",
"# set the x and y labels and the title of the histogram\n",
"plt.xlabel('status')\n",
"plt.ylabel('frequency')\n",
"plt.title('Histogram of STATUS')\n",
" \n",
"# display the histogram\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8def305c",
"metadata": {},
"outputs": [],
"source": [
"# Form a failure column which takes value 1 if STATUS is \"fail\"
or \"lost\" , 0 if STATUS is \"finish\" and -1 for rest of the \"STATUS\" values
(5)\n",
"event_df[\"failure\"] = event_df[\"dict_status\"].apply(lambda x: 1 if
x.lower() in [\"fail\", \"lost\"] else 0 if x.lower() == \"finish\" else -1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ccf174e7",
"metadata": {},
"outputs": [],
"source": [
"event_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1a2904c",
"metadata": {},
"outputs": [],
"source": [
"# Create a scatter plot using \"division\" as X and \"failure\" as Y\n",
"plt.scatter(usage_df[\"division\"], event_df[\"failure\"])\n",
"\n",
"# Add labels to the X and Y axes\n",
"plt.xlabel(\"Division\")\n",
"plt.ylabel(\"Failure\")\n",
"\n",
"# Show the plot\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2606831",
"metadata": {},
"outputs": [],
"source": [
"from mpl_toolkits.mplot3d import Axes3D\n",
"fig = plt.figure()\n",
"ax = fig.add_subplot(111, projection=\"3d\")\n",
"ax.scatter(usage_df[\"average_cpus\"], usage_df[\"average_memory\"],
event_df[\"failure\"])\n",
"\n",
"# Add labels to the X, Y, and Z axes\n",
"ax.set_xlabel(\"Average CPUs\")\n",
"ax.set_ylabel(\"Average Memory\")\n",
"ax.set_zlabel(\"Failure\")\n",
"\n",
"# Show the plot\n",
"plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a90b55a",
"metadata": {},
"outputs": [],
"source": [
"\n",
"#Join the data tables of instance events & instance usage, based on primary
keys \n",
"#( try to find it out yourself) (Hint: There are 3 primary keys) , \n",
"#then sort it with respect to time & take only those columns for which time is
between start time and end time. (20)\n",
"\n",
"# Join the two tables based on the primary keys\n",
"joined_table = pd.merge(event_df, usage_df,
on=['machine_id','collection_id','instance_index'])\n",
"\n",
"# Sort the resulting table by time\n",
"joined_table.sort_values(by=\"time\", inplace=True)\n",
"joined_table\n",
"\n",
"# Define the start and end times for the filter\n",
"#start_time = \"2022-01-01 00:00:00\"\n",
"#end_time = \"2022-01-31 23:59:59\"\n",
"\n",
"# Filter the table based on the start and end times\n",
"#filtered_table = joined_table[(joined_table[\"timestamp\"] >= start_time) &
(joined_table[\"timestamp\"] <= end_time)]\n",
"\n",
"# Select only the desired columns for the final table\n",
"#final_table = filtered_table[[\"instance_id\", \"event_type\", \"timestamp\",
\"account_id\", \"cpus\", \"memory\"]]\n",
"\n",
"# Save the final table to a CSV file\n",
"#final_table.to_csv(\"joined_and_filtered_table.csv\", index=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c686a466",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "28f25faf",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

You might also like