Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
42 views27 pages

SciServerLab Session1 2023-24 Student

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 27

SciServerLab_session1_2023-24_StudentCopy

November 21, 2023

1 Introduction to Python and SDSS spectroscopic data


A python exercise notebook written by Rita Tojeiro, October 2017; last revised in October 2023. This
notebook has benefited from examples provided by Britt Lundgren (University of North Carolina)
and Jordan Raddick (John Hopkins University), and resource lists by Rick Muller (Sandia National
Laboratories).
Welcome to the SciServer Lab. Over two sessions, you will work through two Python notebooks.
This lab has two main goals:
1. To introduce you to the Python language, mostly as a way to interact with astronomical data.
2. To allow you to explore the largest astronomical dataset in existence, and derive some con-
clusions about galaxy properties, large-scale structure, and spectral data.
This Lab is assessed, and it will count towards your final mark.

1.1 Jupyter notebooks and completing this lab


Python is a programming language, and notebooks are web applications consisting of sequences of
code cells that allow you to run snippets of code and keep a record of your work - just like a Lab
book. They are also useful ways to guide users through a pre-established work flow, and that is
the way we will use them here.
Cells in notebooks can have multiple functions. In our case we will use code cells or markdown cells.
You can change the function of a cell on the dropdown menu at the header of the page. Code cells
interpret and execute code, and markdown cells allow you to write down text - just as I am doing
now.
To run the code in a cell you simply type in your commands, then press shift+enter to execute
(just pressing enter will give you a new line).
The way this Lab is structured is via a combination of pre-filled code cells, which you
need to execute, and empty code cells, which you need to fill in and then execute. If
you wish to add a new cell, either press the “+” sign in the menu at the top of the page, or click
Insert->Insert Cell Above/Below.
Jupyter notebooks are auto-saved with some frequency, but you are strongly encouraged to File-
>Save and Checkpoint at the end of every completed exercise. When you want to close your
notebook, you need to File->Close and Halt (instead of just closing your browser window).
The first part of this session is a highly incomplete, extremely brief introduction to Python. It
is designed to give you the minimum set of tools to interact with data, and successfully complete

1
the Lab. You are encouraged to go use additional resources to solidify your knowledge of Python.
The second half of this session will begin your direct interaction with SDSS data. The next Lab
session invites you to explore concepts around the structure of the Universe and galaxy properties
using a separate notebook.

1.1.1 Format and assessment


This first lab session is a mixture of independent non-assessed work and independent assessed work.
Part 1 is not assessed but compulsory for all students. You are asked to go through the
exercises independently. Your instructor will go through some of the exercises on the screen as we
proceed through the session. If you’re familiar with Python, Numpy, Matplotlib and Pandas, you
may work through this section a little faster.
Part 2 is assessed and compulsory for all students. The assessment will be a Moodle quiz,
at the end of your second Lab session.

1.1.2 Submission
You will need to save your notebook as a PDF file and submit it on Moodle. This will automatically
print all of your code, plots and other outputs. To print, choose File->Print Preview. That will
open another tab in your browser, with a print-friendly version of your notebook. You can then
save to PDF (this step will be different depending on your browser and operating system). Make
sure you have answered all exercises.
Both notebooks are due at the end of your final Lab session.

1.1.3 Python resources


• Python for Astronomers A introduction to Python, aimed at complete beginners. Chapters 2,
4 and 7 are particularly relevant for this module, and recommended. You can download the
interactive tutorials and upload them onto SciServer to run them (select Python 2 if SciServer
asks you for a Kernerl when you open the tutorial).
• Python Tutorial First stop reference for basic tasks (looping, string formating, classes, func-
tions, etc). A good place to look for specific examples. Sections 3, 4 and 5 are the most
relevant for this module, but go beyond what you will need here.
• Python Standard Library The definitive reference for everything Python can do out of the
box. Many common and not so common tasks are already taken care of. Not suitable for the
complete beginner.

1.2 Part 1 - Getting started with Python and Jupyter notebooks


1.2.1 1.1 The basics
We will begin with the traditional start in any programming language - by typing in “Hello world!”,
and asking python to display the message using the print() function.
I’ve filled in the python command in the cell below for you. To execute it, select the cell with your
mouse and press shift+enter.

2
[1]: print('Hello world!')

Hello world!
The print() function can print the value of a wide range of variables. In the next cells, we define
5 variables, and print their values in different ways. Execute it. Do all the printed values make
sense? You can change the variable values and the operations within the print function argument
to make sure you understand what is happening.
[2]: x = 10 #an integer
y = 2.0 #a float
given_name = 'Rita'#a string
favourite_colour = 'purple'#another string
blank = ' '

[3]: print(x,y,given_name)

10 2.0 Rita

[4]: print(x + y)

12.0

[5]: print(x/y)

5.0

[6]: print(given_name + blank + str(x))

Rita 10
Fill in the values in cell below, and print.
[7]: my_given_name= 'William'
my_favourite_colour= 'red'

In our first example, we asked python to print the sum of two numbers. Python performed the
computation, and printed the outcome. We can also create new variables by performing operations
on variables that we have previously named, like so:
[8]: h = x + y
print(h)

12.0

[9]: h = x - y
print(h)

8.0

3
[10]: h = x/y
print(h)

5.0

[11]: h = x*y
print(h)

20.0

[12]: h = x**y
print(h)

100.0
From the example above, you can also see that if you assign a different value to a variable, you
overwrite its previous value. Variables have no memory!

1.2.2 1.2 Error messages


If you make an error in syntax or do something else wrong (like calling a variable that isn’t defined),
python will print an error message. Don’t panic. Python error messages can be verbose and long,
but typically the very last line will give you a clue as to the reason of the error. Execute the next
two cells: they will result in an error. Amend the code in the cell and re-execute until you don’t
get an error message .
[14]: print(h)

100.0

[29]: my_surname='Urquhart'
blank=' '
print('My surname is' + blank + my_surname)

My surname is Urquhart

1.2.3 1.3 Loops


Loops allow you to execute a set of instructions a number of times, according to either a counter
(in an if loop), or until some condition is satisfied (in a while loop). For example, here’s a quick
way to print the numbers 1 to 10:
[34]: for i in range(0,10):
print(i)
print("loop is finished")

0
1
2
3

4
4
5
6
7
8
9
loop is finished
Indentation is extremely important in Python, and it defines whether lines of code fall within a
loop (or function), or not. In the example above, we created a loop where a variable i varies from
0 to 9. Within each iteration of the loop we printed the value of i. When the loop was finished, we
printed a statement - note the indentation of the print statement.
What would happen if the print statement was indented? Write down your answer before you
execute the next code cell.
Write your answer here (double-click this cell to edit it):

[35]: for i in range(0,10):


print(i)
print("loop is finished")

0
loop is finished
1
loop is finished
2
loop is finished
3
loop is finished
4
loop is finished
5
loop is finished
6
loop is finished
7
loop is finished
8
loop is finished
9
loop is finished
Use the next cell to write some code that will:
1. in each iteration compute a variable called t that is 2 times i, and print its value.
2. print “loop is running” in each iteration

5
[49]: for i in range(0,10):
t = 2.0*i
print(t)
print("loop is running")

0.0
loop is running
2.0
loop is running
4.0
loop is running
6.0
loop is running
8.0
loop is running
10.0
loop is running
12.0
loop is running
14.0
loop is running
16.0
loop is running
18.0
loop is running

1.2.4 1.4 Lists and arrays


A variable doesn’t need to hold a single value. They can be lists of numbers, strings, or combina-
tions of any type of variable. Below are examples of lists, some operations, and examples of how
to access certain elements.
Does addition do what you expect it to do?
[50]: a = [0,1,2,3]
b = ['0','1','2','3']
c = ['hello', 'world', '37']

[51]: print(a+b)

[0, 1, 2, 3, '0', '1', '2', '3']

[52]: print(c + b)

['hello', 'world', '37', '0', '1', '2', '3']

[53]: print(c)

['hello', 'world', '37']

6
1.2.5 Indices
The position of a certain element in an list (or array - you can think of arrays as special types of
lists, that only hold one type of variable) is called an index. The first element has an index of 0.
For example to access the first element of variable c, I type c[0]. The second element is at index
1, and to access it I type c[1].
[54]: print(c[0])
print(c[1])

hello
world
You can operate on individual elements, not just the whole array. For example:
[55]: print(a[1] + a[2])

3
Now access different elements of the lists yourself. Write code that prints the last element of
variables a and b.
[71]: print(str(a[3]) + blank + str(b[3]))

3 7

1.2.6 1.5 Numpy


Generally speaking, when doing data analysis and numerical calculations, python lists just don’t
cut it. We will be using the Numpy library to create and manipulate arrays, and operate on
them using built in functions.
The Numpy library is the math workhorse of Python. Everything is vectorized, so a*b+c**d makes
sense if a is a number or an n-dimensional array. This tutorial gives you some very limited examples
of the potential of numpy, below are some extra resources.
• Numpy Example List Examples using every Numpy function. Keep close at hand!
• Numpy for Matlab Users Know Matlab? Then use this translation guide.
• Numpy Reference Main documentation
Numpy is not automatically loaded when you start up python (or a notebook), so we have to
explicity import it. We’ll call it np for short.
[59]: import numpy as np

We’ll define some numpy arrays using np.array() and make some simple calculations. Does it
behave the way you expect? You can use the last cell to experiment, and make other computations
with the arrays.
[60]: a = np.array([0,1,2,3]) #a numpy array
b = np.array([4,5,6,7]) #another numpy array

7
[61]: print(a + b)

[ 4 6 8 10]

[62]: print(a * b)

[ 0 5 12 21]

[63]: print(a - b)

[-4 -4 -4 -4]

[64]: print(a / b)

[0. 0.2 0.33333333 0.42857143]


Next we will work with larger arrays. We’ll begin by using a numpy function, np.linspace(), to
create an array, and demonstrate how to access elements in several ways.
At its simplest, the command np.linspace(i_first, i_last, N) will return an array with N
elements, where the first element is set to i_start, the last element is set to i_stop, and the other
elements are linearly spaced in between these two. More information here:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linspace.html

[65]: N=100
i_start=0
i_stop=99
x = np.linspace(i_start,i_stop,N)
print(x)

[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.


18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.
72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89.
90. 91. 92. 93. 94. 95. 96. 97. 98. 99.]
You can operate on the whole array, like so:
[66]: y = x + 1
print(y)

[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.


15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42.
43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56.
57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70.
71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84.
85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98.
99. 100.]

8
[67]: y = x*2
print(y)

[ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.


28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126. 128. 130. 132. 134. 136. 138.
140. 142. 144. 146. 148. 150. 152. 154. 156. 158. 160. 162. 164. 166.
168. 170. 172. 174. 176. 178. 180. 182. 184. 186. 188. 190. 192. 194.
196. 198.]
And you can assess and operate on individual elements of the array, by specifying their index, like
so:
[72]: a = x[20]
print(a)

20.0

[73]: b = x[20] + x[21]


print(b)

41.0

[74]: print(x)

[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.


18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.
72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89.
90. 91. 92. 93. 94. 95. 96. 97. 98. 99.]
You can also access more than one element of arrays, in a specified range of indices, for example:
[75]: x[0:2]

[75]: array([0., 1.])

[76]: x[1:81]

[76]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,
14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.,
40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52.,
53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65.,
66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78.,
79., 80.])

9
[77]: x[:8]

[77]: array([0., 1., 2., 3., 4., 5., 6., 7.])

[78]: x[40:45]

[78]: array([40., 41., 42., 43., 44.])

The following cell will give an error. Why?


[81]: x[99]

[81]: 99.0

Finally, you can also define an array of indices, to make it easier to access certain parts of your
array. For example, if you were only interested in every 10th element of an array, you could define
a new variable, which we we call indices, and that we can use to access those elements quickly,
by writing x[indices]:
[82]: indices = np.array([10,20,30])
print(x[indices])
print(y[indices])

[10. 20. 30.]


[20. 40. 60.]
This is where things get more complicated, but also more powerful. We will use a Numpy function,
called where(), to return the indices of all array elements that pass a certain condition.
(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html)
For example, let us say that you want to know all the elements of y that have a value that is less
than 10.
We begin by using where() to return the indices that hold elements for which that is true (note,
because where is a Numpy function, we have to preceed it with np., so we write np.where):

[83]: y_is_small = np.where(y<10)[0]

(aside: every time we use np.where we need to add a [0] at the end of the command. If you really
want to know why, ask me, but you don’t really need to, as long as you remember. Hint: next
week’s Lab will be harder if you forget!)

[84]: print(y_is_small)

[0 1 2 3 4]
So there are 5 elements of the array y that have a value less than 10. We can check that this is
true, by printing the values of y at these indices:
[85]: print(y[y_is_small])

10
[0. 2. 4. 6. 8.]
Good!
Multidimensional arrays (matrices) are straightforward. Let’s define an array a that has two rows:

[86]: a = np.array( [[1,2,3], [4,5,6]])


print(a)

[[1 2 3]
[4 5 6]]
Element access is done in order row->column. So to access the first row 1 and column 0 you would
write:
[87]: a[1,0]

[87]: 4

Numpy arrays have methods (functions) that allow you to easily compute basic statistics. To
compute the maximum value, the mean value, and the number of elements in y, you can write:
[88]: print(y.max(), y.mean(), len(y))

198.0 99.0 100

1.2.7 1.6 Visualisation


In this Lab, you will be using Python to interact with data. A large part of that is visualisation.
Being able to explore aspects of your data using graphs is a fundamental skill in academia and
industry at large.
We will use the Python library matplotlib - an extensive library with extraordinary functionality.
We won’t even stratch the surface here, and instead focus on being able to produce basic, clear
plots.
More information here: http://matplotlib.org/api/pyplot_summary.html
First we will import the library, we’ll call it plt for short.
[89]: import matplotlib.pyplot as plt

We’re ready for our first plot. Throughout this lab you will be asked to produce plots in exercises.
Always label your axes, provide a sensible title to your plot, and add a legend where
necessary!
Let’s do a simple line plot of y vs x, and label everything sensibly. Line plots will link data points
with a line.
[90]: x = np.linspace(0,99,100)
y = x**2

11
[ ]: plt.figure(figsize=(8,8))
plt.plot(x,y, label='y = x$^2$')
plt.ylabel('this is y', fontsize=20)
plt.title('Hello matplotlib!')
plt.legend()

Let’s zoom in at the very start of our curve, and explicitly mark the actual data points with small
circles. Experiment with some of the commands and parameters below.
[91]: plt.figure(figsize=(10,8))
plt.plot(x,y, label='y = x$^2$', marker='o') #see https://matplotlib.org/api/
,→markers_api.html

plt.xlabel('x', fontsize=20)
plt.ylabel('y', fontsize=20)
plt.title('Hello matplotlib!')
plt.legend(loc='upper left')
plt.xlim(0,10)
plt.ylim(0,100)

[91]: (0.0, 100.0)

12
Histograms are simply couting plots, separating data points into bins according to the value of
the data points.
We will begin by creating two arrays, each with 5000 points drawn randomly between -1 and 1
according to a Gaussian distribution.
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

[92]: y1 = np.random.randn(5000)
y2 = np.random.randn(5000)

You can visualise your data by plotting the value of an array (say y1) as a function of its index
number, as exemplified below.
[93]: plt.figure(figsize=(15,6))
plt.plot(y1, marker='o', linestyle='none')
plt.xlabel('array index number i', fontsize=15)
plt.ylabel('y[i]', fontsize=15)

[93]: Text(0, 0.5, 'y[i]')

By eye, it looks like most elements of y1 have a value around zero, and most are perhaps between
-1 and 1. A histogram bins the values of an array according to their value, so you can easily inspect
the most frequent values.
Let us now make a histogram for each variable. We will count how many array elements have a
certain value, in 20 bins ranging from -5 to 5.
[94]: plt.figure(figsize=(10,8))
plt.hist(y1, bins=20, range=(-5,5), label='y1', color='purple',␣
,→histtype='step') #the histtype = 'step' makes your histogram be a line,␣

,→rather than filled boxes like the next one)

13
plt.hist(y2, bins=20, range=(-5,5), alpha=0.4, label='y2', color='green') #note␣
,→the alpha value - it makes the second plot slightlt transparent (alpha=1 =␣

,→opaque)

plt.ylabel('N(value)', fontsize=15)
plt.xlabel('value of random number', fontsize=15)
plt.title('A histogram', fontsize=15)
plt.legend( fontsize=15)

[94]: <matplotlib.legend.Legend at 0x7fa58267b130>

Experiment with the np.hist() command, changing the width of the bins, or the number of bins.
You can also change the colour, labels, etc (N.B.: Python understands American spelling
only - you’ll need to use color).

[99]: plt.figure(figsize=(20,16))
plt.hist(y1, bins=20, range=(-7,7), label='y1', color='purple',␣
,→histtype='step') #the histtype = 'step' makes your histogram be a line,␣

,→rather than filled boxes like the next one)

14
plt.hist(y2, bins=20, range=(-7,7), alpha=0.4, label='y2', color='blue') #note␣
,→the alpha value - it makes the second plot slightlt transparent (alpha=1 =␣

,→opaque)

plt.ylabel('N(value)', fontsize=15)
plt.xlabel('value of random number', fontsize=15)
plt.title('A histogram', fontsize=150)
plt.legend( fontsize=15)

[99]: <matplotlib.legend.Legend at 0x7fa58266cc10>

Finally, we will look at scatter plots, useful for when we need to plot data that we don’t necessarily
think should be connected with a line. For example, when we are looking at how variables correlate
with one another, or where they sit in some parameter space.
The y1 and y2 arrays are sets of (uncorrelated) random numbers. That means that the value of
element 1 (for example) in y1 will be independent of (or uncorrelated with) the value of element 1
in y2. Let’s plot the values of y1 against the values of y2. Can you predict what you will see?

15
[100]: plt.figure(figsize=(10,10))
plt.scatter(y1,y2, marker='*', s=50)
plt.xlabel('y$_1$', fontsize=20)
plt.ylabel('y$_2$', fontsize=20)
plt.title('A scatter plot')

[100]: Text(0.5, 1.0, 'A scatter plot')

1.2.8 1.7 Pandas


The next library we will consider is called pandas - the Python data analysis library. The workhorse
of pandas is an object called a DataFrame. Think of it as a table of data that can be queried
like a database and manipulated very efficiently, especially for large datasets. Some of the objects

16
used in next week session are dataframes, so let us have a quick round of how to access dataframe
elements (DataFrames are argually one of the most powerful tools for data analysis in Python - we
will not use them to their potential).
We will begin by importing pandas, as pd for short.
https://pandas.pydata.org

[101]: import pandas as pd

Next we will create a dataframe, called df, with two columns, each holding one of our arrays of
random elements. DataFrames are indexed (that’s the first column you see below), allowing one to
access elements very efficiently. The next few cells demonstrate how to access DataFrame elements.
[102]: df = pd.DataFrame({'y1':y1, 'y2':y2})
df

[102]: y1 y2
0 -0.700759 0.551675
1 -0.038827 -0.316043
2 0.096796 0.167424
3 0.253266 -0.611019
4 0.966742 1.456996
… … …
4995 -1.182573 1.258707
4996 2.491254 2.127884
4997 -1.095310 -0.062212
4998 -0.700128 -0.720970
4999 0.410355 -0.185050

[5000 rows x 2 columns]

To access a specific row of data, you do so by specifiying the location of that row. E.g., to see the
values of y1 and y2 in row 10:
[103]: df.loc[10]

[103]: y1 0.491884
y2 -1.429145
Name: 10, dtype: float64

If you want to access a column specifically, say the value of y1 in row 10:
[104]: df.loc[10]['y1']

[104]: 0.49188444867648407

You can very easily manipulate data frames to select subsets of your data that satisfy certain
criteria, like we did before.

17
One way to do that is using numpy.where() again. To select the elements of your dataframe where
y2 is greater than 0 but less than 0.5, and then print those y2 values, you might do:
[105]: y2_pos = np.where( (df['y2'] >0) & (df['y2'] < 0.5))[0] #y2_pos is a list of␣
,→indices, or row numbers.

[106]: #this prints the values of 'y2' that are greater than 0 and smaller that 0.5/
print(df.loc[y2_pos]['y2'])

2 0.167424
9 0.131835
12 0.433351
16 0.306912
17 0.125599

4959 0.363569
4960 0.402134
4976 0.426183
4984 0.036551
4990 0.225481
Name: y2, Length: 965, dtype: float64
You can then plot slices of your data frame in different colours. E.g.:
[107]: plt.figure(figsize=(10,8))
plt.scatter(df['y1'],df['y2'], marker='*', s=50, color='blue') #this plots all␣
,→rows in your dataframe, in blue

plt.scatter(df.loc[y2_pos]['y1'],df.loc[y2_pos]['y2'], marker='*', s=50,␣


,→color='red') #this plots ONLY the rows that satisfy the condition you␣

,→previously set

plt.xlabel('y$_1$', fontsize=20)
plt.ylabel('y$_2$', fontsize=20)

[107]: Text(0, 0.5, 'y$_2$')

18
1.3 Part 2 - SDSS data
The Sloan Digital Sky Survey (SDSS, www.sdss.org) is the largest astronomical survey in the
world, containing imaging and spectroscopic data of millions of galaxies, stars and quasars. In this
Lab, you’ll be using python notebooks to interact directly with SDSS data.
But first, we will begin with a more straightforward exploration.
The SDSS Skyserver provides a simple interface for exploring the images and spectra of objects
in the SDSS. It is assumed now that you have watched the introductory video to this lab session,
where the basic features of this interface are explained.
In order to get a quick feel for the spectroscopic properties of stars, star clusters and galaxies, take
some time to navigate the Skyserver, starting at the following location:
RA = 182.7, Dec = 18.7
Try selecting only objects with spectra, and use the EXPLORE feature to explore the images and
spectra of the wide variety of objects in the field. You can add objects that you find interesting to
a temporary notebook by clicking “add to notes”. You can view your notebook by clicking “show
notes”. It’s OK to take your time - this is your chance to explore.
The “Name” field in the Navigate window will recognise many Messier and NGC catalogue numbers.

19
For example, for an example of a nice globular cluster, type “ngc 4147” in the “Name” field.
Wikipedia will readily give you lists of Open Clusters and Globular Clusters (though note that
many will lie outside the SDSS footprint).

1.3.1 Exercise 1:
Consider the following:
• How do the images of stars, galaxies, and stellar clusters compare? Focus on their size on the
sky, their colour, etc.
• How do the spectra of stars and galaxies compare? Focus on similarities and differences.

1.3.2 Exercise 2:
Find a galaxy that you like that has a spectra, and write down the plate, fiber, MJD (find them
on the bottom right of the EXPLORE window) and the SDSS objid (on the top right of the same
window).

[108]: #answer:
MJD = 51930
fiber = 181
plate = 285

We will now pre-define some functions that we will use to fetch data on that galaxy directly onto
your work space, and that you can maniputale immediately in this workbook. We will also import
a set of libraries and arrange some settings.
Execute the next two code cells below.
[109]: # Import libraries for use in this notebook.
import numpy as np # standard Python lib for math ops
import pandas # data manipulation package
import matplotlib.pyplot as plt # another graphing package
import urllib.request
import os
from astropy.io import fits, ascii
print('Supporting libraries imported')

# Apply some special settings to the imported libraries


# ensure columns get written completely in notebook
pandas.set_option('display.max_colwidth', -1)
# do *not* show python warnings
import warnings
warnings.filterwarnings('ignore')
print('Settings applied')

Supporting libraries imported


Settings applied
<ipython-input-109-dd3213f1b1d1>:12: FutureWarning: Passing a negative integer
is deprecated in version 1.0 and will not be supported in future version.

20
Instead, use None to not limit the column width.
pandas.set_option('display.max_colwidth', -1)

[110]: def get_sdss_spectrum(plate, fibreID, MJD):

#construct filename and URL, then download to data/ on local folder. data/␣
,→must exist.

url_base_sdss = 'https://dr13.sdss.org/sas/dr13/sdss/spectro/redux/26/
,→spectra/' #last plate 3006 + plates 8015+8033

url_base_eboss = 'https://dr13.sdss.org/sas/dr13/eboss/spectro/redux/v5_9_0/
,→spectra/' #plates 3586 to 7565

plate_str = '{:04d}'.format(plate)
fibre_str = '{:04d}'.format(fibreID)
MJD_str = '{:05d}'.format(MJD)

if plate <= 3006:


filename = 'spec-'+plate_str + '-' + MJD_str + '-' + fibre_str + '.fits'
file_url = url_base_sdss + plate_str + '/' + filename
save_file = 'data/'+filename
print('Retrieving file ' + file_url)
urllib.request.urlretrieve(file_url, save_file)
print('Saved file in ' + save_file)
elif plate >= 3586:
filename = 'spec-'+plate_str + '-' + MJD_str + '-' + fibre_str + '.fits'
file_url = url_base_eboss + plate_str + '/' + filename
print('Retrieving file ' + file_url)
save_file = 'data/'+filename
urllib.request.urlretrieve(file_url, save_file)
print('Saved file in ' + save_file)
print('You do not need to open this file - the data is now saved in the two␣
,→variables called wavelength and flux')

#read in data from downloaded file


hdulist = fits.open(save_file)
tbdata = hdulist[1].data
flux = tbdata['flux']
wave = 10.**tbdata['loglam']

return wave, flux


print("Loaded function get_sdss_spectrum")

def fetch_sdss_filter(filter):
url_base = 'https://classic.sdss.org/dr7/instruments/imager/filters/'
filename = filter+'.dat'
file_url = url_base + filename
#print('Retrieving file ' + file_url)

21
save_file = 'data/'+filename
urllib.request.urlretrieve(file_url, save_file)
#print('Saved file in ' + save_file)

res = ascii.read(save_file)

return res
print("Loaded function fetch_sdss_filter")

Loaded function get_sdss_spectrum


Loaded function fetch_sdss_filter
Now we will use a function get_sdss_spectrum to download the spectrum of your selected galaxy.
The function returns two arrays: the wavelength in the array wavelength, and the flux in the array
flux. * Each element in the array ‘wavelength’ holds a specific value of wavelength in angstroms. *
Each element in the array ‘flux’ holds the flux measured at the corresponding wavelength, in units
of 10−17 erg/s/cm2 /AA.

[111]: wavelength, flux = get_sdss_spectrum(plate, fiber, MJD)

Retrieving file https://dr13.sdss.org/sas/dr13/sdss/spectro/redux/26/spectra/028


5/spec-0285-51930-0181.fits
Saved file in data/spec-0285-51930-0181.fits
You do not need to open this file - the data is now saved in the two variables
called wavelength and flux
You can now access the data directly on this notebook. Try, for example, doing print(flux)
[114]: print(wavelength)

[3801.0188 3801.8933 3802.77 … 9193.905 9196.0205 9198.141 ]

1.3.3 Exercise 3:
1. What is length of the wavelength array? And of the flux array?
2. Use plt.plot() to make a plot of your spectrum. Remember to add labels to the axis, and a
title. Does it match the spectrum in the EXPLORE window?
[117]: x = wavelength
y = flux

[126]: plt.plot(x,y,)
plt.xlabel('wavelength', fontsize=20)
plt.ylabel('flux', fontsize=20)
plt.title('Galaxy')

plt.xlim(3000,10000)
plt.ylim(0,200)

22
[126]: (0.0, 200.0)

1.4 Broadband photometry and galaxy colours


The SDSS photometric camera imaged the sky using 5 different filters. The images that you saw
on the SkyServer Navigate Tool are the combination of these images, creating a colour image.
The 5 photometric bands are named with letters, and from the blue to the red they are: u, g, r,
i and z. Each filter allows light of certain wavelengths to hit the detector, and blocks all other
wavelengths. Their response is characterised by the so called transmission curves.
The cell below fetches the transmission curves of the 5 filters, and the cell after that plots them
overlaid with the spectrum of the galaxy you chose in the above exercise. Execute both cells.
[121]: # Load filter files using fetch_sdss_filter()
u_data = fetch_sdss_filter('u')
u_wave = u_data['col1']
u_tp = u_data['col3']

g_data = fetch_sdss_filter('g')
g_wave = g_data['col1']
g_tp = g_data['col3']

r_data = fetch_sdss_filter('r')

23
r_wave = r_data['col1']
r_tp = r_data['col3']

i_data = fetch_sdss_filter('i')
i_wave = i_data['col1']
i_tp = i_data['col3']

z_data = fetch_sdss_filter('z')
z_wave = z_data['col1']
z_tp = z_data['col3']

[122]: #plot filter curves overlaid on spectrum.


plt.figure(figsize=(18,8))
plt.plot(wavelength, flux/flux.max(), label='spectrum', color='black')

plt.legend()
plt.title('Plate = {0:.0f}, MJD = {1:.0f}, Fiber = {2:.0f}'.
,→format(locals()['plate'],locals()['MJD'],locals()['fiber']))

plt.xlabel(r'$\lambda (\AA)$', fontsize=30)


plt.ylabel('Flux', fontsize=30)
plt.xlim(2500,10000)

plt.fill(u_wave, u_tp, color='blue', alpha=0.4)


plt.fill(g_wave, g_tp, color='green', alpha=0.4)
plt.fill(r_wave, r_tp, color='orangered', alpha=0.4)
plt.fill(i_wave, i_tp, color='red', alpha=0.4)
plt.fill(z_wave, z_tp, color='darkred', alpha=0.4)

for f, c, loc in zip('ugriz', 'bgrmk', [3500, 4600, 6100, 7500, 8800]):


plt.text(loc, 0.02, f, color=c, fontsize=30)

24
The width of the filter shows you the range of wavelengths captured by each filter, and the amplitude
shows you how much light is allowed through. The overall normalisation is irrelevant here, but notice
for example how much more efficient the r filter is, compared to the u or z bands. Notice also how
we can probe wavelenths bluewards and redwards of the limits of the SDSS spectrograph.
Although spectra offer much more detailed information (i.e. redshifts!), flux coming through from
the photometric bands can tells us an awful lot about a galaxy. The integrated flux in each band is
most often translated into an apparent magnitude. We refer to the difference between bands, say
‘g-r’ or ‘u-g’, as a colour.

1.4.1 Understanding colour


The following code plots the spectra of two galaxies that I have previously chosen. Please execute
the two following cells.
[123]: mjd_1 = 53469
fiber_1 = 402
plate_1 = 2099

plate_2= 285
fiber_2=227
mjd_2=51930
wavelength1, flux1 = get_sdss_spectrum(plate=plate_1,fibreID=fiber_1,MJD=mjd_1)
wavelength2, flux2 = get_sdss_spectrum(plate=plate_2,fibreID=fiber_2,MJD=mjd_2)

Retrieving file https://dr13.sdss.org/sas/dr13/sdss/spectro/redux/26/spectra/209


9/spec-2099-53469-0402.fits
Saved file in data/spec-2099-53469-0402.fits
You do not need to open this file - the data is now saved in the two variables
called wavelength and flux
Retrieving file https://dr13.sdss.org/sas/dr13/sdss/spectro/redux/26/spectra/028
5/spec-0285-51930-0227.fits
Saved file in data/spec-0285-51930-0227.fits
You do not need to open this file - the data is now saved in the two variables
called wavelength and flux

[124]: #possible solution


plt.figure(figsize=(18,8))
#Here I am plotting the first galaxy that I chose:
plt.plot(wavelength1, flux1/flux1.max(), label='Galaxy 1', color='black')
#Here I am plotting the second galaxy that I chose:
plt.plot(wavelength2, flux2/flux2.max(), label='Galaxy 2', color='green') #Here

plt.legend(fontsize=20, loc='upper left')


#plt.title('Plate = {0:.0f}, MJD = {1:.0f}, Fiber = {2:.0f}'.
,→format(locals()['plate'],locals()['MJD'],locals()['fiber']))

25
plt.xlabel(r'$\lambda (\AA)$', fontsize=30)
plt.ylabel('Flux', fontsize=30)
plt.xlim(2500,10000)

plt.fill(u_wave, u_tp, color='blue', alpha=0.4)


plt.fill(g_wave, g_tp, color='green', alpha=0.4)
plt.fill(r_wave, r_tp, color='orangered', alpha=0.4)
plt.fill(i_wave, i_tp, color='red', alpha=0.4)
plt.fill(z_wave, z_tp, color='darkred', alpha=0.4)

for f, c, loc in zip('ugriz', 'bgrmk', [3500, 4600, 6100, 7500, 8800]):


plt.text(loc, 0.02, f, color=c, fontsize=30)

1.4.2 Exercise 4 :
Which of the two galaxies has a higher value of g-r? Why? Which of these galaxies would appear
bluer to the eye?
Galaxy 1 will have a higher g-r value, because the magnitudes for galaxy one in the red section are
lower because its flux is greater than it is in the green band, g-r will be greater than galaxy 2 whos
difference in magnitudes will be smaller because the magniture in the r zone is greater because.
the flux is low, galaxy 2 will appear bluer.

1.4.3 Exercise 5:
Which type of stars do you expect dominate the spectrum of each of the galaxies 1 and 2?
Galaxy 1 - red giants Galaxy 2 - white dwarfs
Congratulations, that is the end of the Lab! Make sure you’ve run all the code cells,
filled in all the text answers and that your plots are all showing without error. Export

26
you Lab book to PDF, and submit it on Moodle before the end of the second Lab session.
Now that you know the basics, next session we will explore the local structure of the Universe and
how galaxy properties relate to it.
[125]:

File "<ipython-input-125-60d24d59a30b>", line 1


Galaxy 1 - red giants
^
SyntaxError: invalid syntax

[ ]:

27

You might also like