Data Analysis and Visualization On Space Race (Spacenalyzer)
Data Analysis and Visualization On Space Race (Spacenalyzer)
Data Analysis and Visualization On Space Race (Spacenalyzer)
(Spacenalyzer)
Major Project Report
Submitted to the Centurion University
In partial fulfillment of requirements for the award of degree
Bachelor of Technology
in
Computer Science and Engineering
By
Hari Varma Pamudurthi
211801380018
CERTIFICATE
This is to certify that their report entitled "Data Analysis and Visualization
on Space Race" submitted by Hari Varma Pamudurthi (211801380018) in
Department of Computer Science and Engineering in partial fulfilment of
the B.Tech. Degree in Computer Science and Engineering is a bonafide
record of the seminar work carried out by him under our guidance and
supervision. This report in any form has not been submitted to any other
University or Institute for any purpose.
(Project Guide)
Dr. Lakshman Rao
Department of Computer Science
And Engineering
Centurion University (CUTM-AP)
External Examiner
2
Abstract
3
INDEX
ABSTRACT 3
LIST OF FIGURES 6
1. INTRODUCTIION 7
2. EXISTING AND PROPOSED SYSTEM 8
3. SYSTEM REQUIREMENTS 9
3.1 Hardware Requirements 9
3.2 Software Requirements 9
4. METHODOLOGY 10
4.1 Modular Design 10
4.1.1 Data Acquisition 11
4.1.2 Data Processing 11
4.1.3 Implementation 11
4.1.4 Analysis 11
4.1.5 Visualization 11
4.2 What is Python 12
4.3 Modules 13
4.3.1 Pandas 13
4.3.2 Plotly. Express 13
4
5. DATA SET 14
6. IMPLEMENTATION 15
7. RESULTS 19
8. CONCLUSION 24
9. REFERENCE 25
5
LIST OF FIGURES
FIG. NO. TITLE PAGE NO
7.1 Data Set- World Rocket 19
7.2 Pie Chart- Showing the Status of 19
Rockets Launched
7.3 Bar Chart Comparison on Active and Retired 20
Rockets
7.4 3D-Plot on Which Country 20
Launched What Rocket
7.5 Bar Chart Analysis on Which Company 21
Launched Which Rocket
7.6 Scatter and Range Plot on Money 22
Spent on Rocket Launches
7.7 Sunburst Plot on World 22
Spacecraft Analysis
7.8 Tree Map Analysis on World 23
Spacecraft’s
6
1. Introduction
The globe has taken a tremendous interest in trying to travel beyond the planet
since since the USSR launched the first ever artificial satellite into space,
known as the Sputnik, at the beginning of the First World War. The pinnacle
of engineering and science, requiring extremely high levels of both theoretical
and experimental study, is rocket science, cosmology, and astronomy.
When and where a space launch should occur for it to reach its destination
with the least amount of resistance and with the maximum likelihood of
success are decisions that need a lot of arithmetic. Extreme engineering levels
are used at the same time to test the launch vehicles for potential malfunctions
and recreate equivalent space conditions here on Earth. For any of these space
missions to succeed, years of arduous labor, research, and testing are
necessary.
When launches are successful, it gives the country great pride. When the
missions fail and millions of money and national hopes go up in smoke, it is
also very depressing. As bizarre as science is, no notable experiment has ever
been conducted without a fair amount of failures.
I'd want to express my gratitude to the dataset contributor for making the effort
to give us with a fantastic dataset that can be used to analyze the many
accomplishments and failures of the world's space organizations.
The visualization I performed with my obtained dataset will be helpful to
analyze the data of the various rockets launched by various countries. It also
lets us know about the rocket status, the cost spent for launching the rocket.
Python was utilized as it is the most optimum programming language for data
analysis in order to see and analyze this data. The raw data that is available on
the internet cannot be understood by the human brain, so in order to facilitate
and clarify their comprehension, I created interactive, easily comprehensible
graphs.
7
2. Existing and Proposed System
There are many other visualizations available on the web for having a proper
analysis for the requirement to be full filled. But I believe that they are
outdated and the techniques and technology used while creating those
visualizations are far outdated from what we have today.
Therefore, given the state of technology now and the changes made to
programming languages. In my visualization I’ve used and added different
types of visualization modules and graphs which weren’t in use previously.
The graphs I’ve produces are interactive so that the users can also operate
them. I’ve linked the graphs with an html file so that the user may view the
3D plots with ease.
8
3. System Requirements
9
4. Methodology
10
4.1.1. Data Acquisition
For collecting the raw data I used online sources like Kaggle, GitHub,
Google Scholar, etc. This raw data was later converted into numerical
data using Excel Workbook, which later was used Data processing.
4.1.2. Data Processing
Now the acquisited data is processed in excel for filtering the garbage
values. I also computed the data using mathematical calculations in excel
for a better result. After processing the data I imported the data in python
for implementation.
4.1.3.Implementation
After processing the data I imported the data in python environment for
analysis. I analyzed the data using pandas by importing it in the csv
format for operations and manipulated the data using parameters like use
calls, head, etc. Now the data has to be analyzed.
4.1.4.Analysis
After implementation the data has to analyze for visualization. In analysis
stage the user can clearly analyze the data in the form of rows and
columns for a clear understanding. With this we can have a clear
understanding of what data to visualize.
4.1.5.Visualization
After analysis we come to the visualization stage where we used some
python built-in-modules such as plotly where I produced interactive
graphs for a better representation of the data. I took the column names as
x, y axis for performing these visualizations. These visualization helps
the users to have a clear understanding about the raw data, which again
will be helpful for a better analysis.
11
4.2. What is Python
4.3. Modules
For this project I’ve used the following modules:-
1. Pandas
2. Plotly. Express
12
4.3.1. Pandas
For the purpose of manipulating and analyzing data, the Python
programming language has a software package called pandas. It includes
specific data structures and procedures for working with time series and
mathematical tables. It is free software distributed under the BSD license's
three clauses. The word is derived from "panel data," a phrase used in
econometrics to refer to data sets that contain observations for the same
persons throughout a range of time periods. Python data analysis is a play
on words in the name of the thing. When Wes McKinney worked as a
researcher at AQR Capital from 2007 to 2010, he began creating the
pandas that would eventually become famous.
13
5. Dataset
So for performing any visualizations the main thing is the data. These data
can be of any type i.e. raw data, filtered data, selective data etc. A data set
(sometimes spelled dataset) is a group of data. In the case of tabular data, a
data set relates to one or more database tables, where each row refers to a
specific record in the corresponding data set and each column to a specific
variable. The data set includes values for each of the variables, such as the
object's height and weight, for each set member. Data sets can also be made
up of a group of files or documents. In my case for visualizing and analyzing
this project I used an already existing dataset which I obtained from a very
trusted website named “kaggle”.
14
6. Implementation
Code:-
##importing all the modules##
import dash
import plotly.express as px
import pandas as pd
df=pd.read_csv(r'C:\123456.csv')
df['Launch date']=pd.to_datetime(df['Datum'])
df['Launch date']=df['Launch date'].astype(str)
df['Launch date']=df['Launch date'].str.split(' ',expand=True)[0]
df['Launch date']=pd.to_datetime(df['Launch date'])
df[' Rocket']=df[' Rocket'].str.replace(',','')
df[' Rocket']=df[' Rocket'].astype(float)
df['Status Rocket']=df['Status Rocket'].str.replace('Status','')
df.drop('Datum',axis=1,inplace=True)
data=df.dropna()
df=data.head(20)
df
15
##Separating the unique values##
print(df.Country.nunique())
print(df.Country.unique())
##Visualization##
##2) Barchart##
df=df.groupby('Status Rocket').count().reset_index()
df=df.rename(columns={"Detail": "Details"})
fig_bar=px.bar(data_frame=df, x='Status Rocket', y='Details',
template='plotly_dark', title='Status of rockets Carrying Missions')
fig_bar.show()
16
##3) 3d-Line plot##
df=pd.read_csv(r'C:\123456.csv').head(100)
fig_bar=px.line_3d(data_frame=df, x='Com Name',
y='Detail',z='Country',template='plotly_dark',color_discrete_sequence=['ora
nge','green'], title='Which COuntry Launched What Rocket')
fig_bar.show()
fig_bar.write_html("hari.html")
##4) Barchart##
df=pd.read_csv(r'C:\123456.csv').head(50)
fig_bar=px.bar(data_frame=df, x='Detail', y='Com
Name',template='plotly_dark',color_discrete_sequence=['yellow','green'],bar
mode='group',height=2000, title='Which company Launched which Rocket')
fig_bar.show()
df=pd.read_csv(r'C:\123456.csv').head(1000)
fig = px.scatter(data_frame=df, x='Com Name', y=" Rocket", color='Com
Name', title='Money Spent in
billions',template='plotly_dark',marginal_y='violin',color_discrete_sequence
=['red','green','blue'])
fig.show()
data2=data.head(100)
17
##6) Sun Burst##
fig_graph=e.sunburst(data_frame=data2,
path=['Country','Com Name','Detail'],
template='plotly_dark',
hover_data=[' Rocket'],
title='World Spacecrafts Analysis',
color=' Rocket',
maxdepth=-1,
color_discrete_sequence=['orange','red','green','blue','hotpink'] )
fig_graph
fig_graph=e.treemap(data_frame=data2,
path=['Country','Com Name','Detail'],
template='plotly_dark',
hover_data=[' Rocket'],
title='World Spacecrafts Analysis',
color=' Rocket',
maxdepth=-1,
color_discrete_sequence=['orange','red','green','blue','hotpink'])
fig_graph
################THE-END#################
18
7. Results
20
Fig. 7.5 (Bar Chart Analysis on which Space Agency Launched Which
Rocket)
21
Fig. 7.6 (Scatter and Range Plot Cost of Launching)
22
Fig. 7.8 (Tree Map Analysis on World Spacecrafts)
23
8. Conclusion
The space is a wide area of exploration which is never ending. So, to study
space it is very important that there is proper equipment and instruments to go
ahead with the research and development work.
From the foregoing, we can see how crucial data visualization is to all
industries, as well as its advantages and many methods for creating visual
formats. Without this crucial step, analytics cannot process any future steps. I
therefore draw the conclusion that data visualization can be used in any
industry or profession. Data visualization is also necessary because the vast
majority of massive, unstructured data cannot be comprehended by human
minds alone. These data sets must be transformed into a format that we can
easily comprehend. To discover trends and linkages, graphs and maps are
essential if we are to gain understanding and reach a more accurate
conclusion.
Graphs and charts help us convey data results so that we can spot patterns and
trends, gain understanding, and swiftly arrive at smarter conclusions. The
significance of data visualization and what it means to our clients must be
understood by us. To help customers visualize their data in an understandable
and relevant way, we should offer them appealing and user-friendly
visualization capabilities and tools.
24
9. References
1. Kaggle (https://www.kaggle.com/code/arindambaruah/who-is-
leading-the-space-race/notebook)
2. W3School (https://www.w3schools.com/python)
5. Plotly (https://plotly.com/python/)
6. Pandas (https://pandas.pydata.org/)
7. https://ieeexplore.ieee.org/abstract/document/7284779
8. https://www.mdpi.com/2220-9964/8/7/292
9. https://www.sciencedirect.com/science/article/abs/pii/S009457651
9300621
10.https://www.jstage.jst.go.jp/article/jsme2/29/3/29_ME2903rh/_arti
cle/-char/ja/
25