SciServerLab Session1 2023-24 Student
SciServerLab Session1 2023-24 Student
SciServerLab Session1 2023-24 Student
1
the Lab. You are encouraged to go use additional resources to solidify your knowledge of Python.
The second half of this session will begin your direct interaction with SDSS data. The next Lab
session invites you to explore concepts around the structure of the Universe and galaxy properties
using a separate notebook.
1.1.2 Submission
You will need to save your notebook as a PDF file and submit it on Moodle. This will automatically
print all of your code, plots and other outputs. To print, choose File->Print Preview. That will
open another tab in your browser, with a print-friendly version of your notebook. You can then
save to PDF (this step will be different depending on your browser and operating system). Make
sure you have answered all exercises.
Both notebooks are due at the end of your final Lab session.
2
[1]: print('Hello world!')
Hello world!
The print() function can print the value of a wide range of variables. In the next cells, we define
5 variables, and print their values in different ways. Execute it. Do all the printed values make
sense? You can change the variable values and the operations within the print function argument
to make sure you understand what is happening.
[2]: x = 10 #an integer
y = 2.0 #a float
given_name = 'Rita'#a string
favourite_colour = 'purple'#another string
blank = ' '
[3]: print(x,y,given_name)
10 2.0 Rita
[4]: print(x + y)
12.0
[5]: print(x/y)
5.0
Rita 10
Fill in the values in cell below, and print.
[7]: my_given_name= 'William'
my_favourite_colour= 'red'
In our first example, we asked python to print the sum of two numbers. Python performed the
computation, and printed the outcome. We can also create new variables by performing operations
on variables that we have previously named, like so:
[8]: h = x + y
print(h)
12.0
[9]: h = x - y
print(h)
8.0
3
[10]: h = x/y
print(h)
5.0
[11]: h = x*y
print(h)
20.0
[12]: h = x**y
print(h)
100.0
From the example above, you can also see that if you assign a different value to a variable, you
overwrite its previous value. Variables have no memory!
100.0
[29]: my_surname='Urquhart'
blank=' '
print('My surname is' + blank + my_surname)
My surname is Urquhart
0
1
2
3
4
4
5
6
7
8
9
loop is finished
Indentation is extremely important in Python, and it defines whether lines of code fall within a
loop (or function), or not. In the example above, we created a loop where a variable i varies from
0 to 9. Within each iteration of the loop we printed the value of i. When the loop was finished, we
printed a statement - note the indentation of the print statement.
What would happen if the print statement was indented? Write down your answer before you
execute the next code cell.
Write your answer here (double-click this cell to edit it):
0
loop is finished
1
loop is finished
2
loop is finished
3
loop is finished
4
loop is finished
5
loop is finished
6
loop is finished
7
loop is finished
8
loop is finished
9
loop is finished
Use the next cell to write some code that will:
1. in each iteration compute a variable called t that is 2 times i, and print its value.
2. print “loop is running” in each iteration
5
[49]: for i in range(0,10):
t = 2.0*i
print(t)
print("loop is running")
0.0
loop is running
2.0
loop is running
4.0
loop is running
6.0
loop is running
8.0
loop is running
10.0
loop is running
12.0
loop is running
14.0
loop is running
16.0
loop is running
18.0
loop is running
[51]: print(a+b)
[52]: print(c + b)
[53]: print(c)
6
1.2.5 Indices
The position of a certain element in an list (or array - you can think of arrays as special types of
lists, that only hold one type of variable) is called an index. The first element has an index of 0.
For example to access the first element of variable c, I type c[0]. The second element is at index
1, and to access it I type c[1].
[54]: print(c[0])
print(c[1])
hello
world
You can operate on individual elements, not just the whole array. For example:
[55]: print(a[1] + a[2])
3
Now access different elements of the lists yourself. Write code that prints the last element of
variables a and b.
[71]: print(str(a[3]) + blank + str(b[3]))
3 7
We’ll define some numpy arrays using np.array() and make some simple calculations. Does it
behave the way you expect? You can use the last cell to experiment, and make other computations
with the arrays.
[60]: a = np.array([0,1,2,3]) #a numpy array
b = np.array([4,5,6,7]) #another numpy array
7
[61]: print(a + b)
[ 4 6 8 10]
[62]: print(a * b)
[ 0 5 12 21]
[63]: print(a - b)
[-4 -4 -4 -4]
[64]: print(a / b)
[65]: N=100
i_start=0
i_stop=99
x = np.linspace(i_start,i_stop,N)
print(x)
8
[67]: y = x*2
print(y)
20.0
41.0
[74]: print(x)
[76]: x[1:81]
[76]: array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,
14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.,
40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52.,
53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65.,
66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78.,
79., 80.])
9
[77]: x[:8]
[78]: x[40:45]
[81]: 99.0
Finally, you can also define an array of indices, to make it easier to access certain parts of your
array. For example, if you were only interested in every 10th element of an array, you could define
a new variable, which we we call indices, and that we can use to access those elements quickly,
by writing x[indices]:
[82]: indices = np.array([10,20,30])
print(x[indices])
print(y[indices])
(aside: every time we use np.where we need to add a [0] at the end of the command. If you really
want to know why, ask me, but you don’t really need to, as long as you remember. Hint: next
week’s Lab will be harder if you forget!)
[84]: print(y_is_small)
[0 1 2 3 4]
So there are 5 elements of the array y that have a value less than 10. We can check that this is
true, by printing the values of y at these indices:
[85]: print(y[y_is_small])
10
[0. 2. 4. 6. 8.]
Good!
Multidimensional arrays (matrices) are straightforward. Let’s define an array a that has two rows:
[[1 2 3]
[4 5 6]]
Element access is done in order row->column. So to access the first row 1 and column 0 you would
write:
[87]: a[1,0]
[87]: 4
Numpy arrays have methods (functions) that allow you to easily compute basic statistics. To
compute the maximum value, the mean value, and the number of elements in y, you can write:
[88]: print(y.max(), y.mean(), len(y))
We’re ready for our first plot. Throughout this lab you will be asked to produce plots in exercises.
Always label your axes, provide a sensible title to your plot, and add a legend where
necessary!
Let’s do a simple line plot of y vs x, and label everything sensibly. Line plots will link data points
with a line.
[90]: x = np.linspace(0,99,100)
y = x**2
11
[ ]: plt.figure(figsize=(8,8))
plt.plot(x,y, label='y = x$^2$')
plt.ylabel('this is y', fontsize=20)
plt.title('Hello matplotlib!')
plt.legend()
Let’s zoom in at the very start of our curve, and explicitly mark the actual data points with small
circles. Experiment with some of the commands and parameters below.
[91]: plt.figure(figsize=(10,8))
plt.plot(x,y, label='y = x$^2$', marker='o') #see https://matplotlib.org/api/
,→markers_api.html
plt.xlabel('x', fontsize=20)
plt.ylabel('y', fontsize=20)
plt.title('Hello matplotlib!')
plt.legend(loc='upper left')
plt.xlim(0,10)
plt.ylim(0,100)
12
Histograms are simply couting plots, separating data points into bins according to the value of
the data points.
We will begin by creating two arrays, each with 5000 points drawn randomly between -1 and 1
according to a Gaussian distribution.
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html
[92]: y1 = np.random.randn(5000)
y2 = np.random.randn(5000)
You can visualise your data by plotting the value of an array (say y1) as a function of its index
number, as exemplified below.
[93]: plt.figure(figsize=(15,6))
plt.plot(y1, marker='o', linestyle='none')
plt.xlabel('array index number i', fontsize=15)
plt.ylabel('y[i]', fontsize=15)
By eye, it looks like most elements of y1 have a value around zero, and most are perhaps between
-1 and 1. A histogram bins the values of an array according to their value, so you can easily inspect
the most frequent values.
Let us now make a histogram for each variable. We will count how many array elements have a
certain value, in 20 bins ranging from -5 to 5.
[94]: plt.figure(figsize=(10,8))
plt.hist(y1, bins=20, range=(-5,5), label='y1', color='purple',␣
,→histtype='step') #the histtype = 'step' makes your histogram be a line,␣
13
plt.hist(y2, bins=20, range=(-5,5), alpha=0.4, label='y2', color='green') #note␣
,→the alpha value - it makes the second plot slightlt transparent (alpha=1 =␣
,→opaque)
plt.ylabel('N(value)', fontsize=15)
plt.xlabel('value of random number', fontsize=15)
plt.title('A histogram', fontsize=15)
plt.legend( fontsize=15)
Experiment with the np.hist() command, changing the width of the bins, or the number of bins.
You can also change the colour, labels, etc (N.B.: Python understands American spelling
only - you’ll need to use color).
[99]: plt.figure(figsize=(20,16))
plt.hist(y1, bins=20, range=(-7,7), label='y1', color='purple',␣
,→histtype='step') #the histtype = 'step' makes your histogram be a line,␣
14
plt.hist(y2, bins=20, range=(-7,7), alpha=0.4, label='y2', color='blue') #note␣
,→the alpha value - it makes the second plot slightlt transparent (alpha=1 =␣
,→opaque)
plt.ylabel('N(value)', fontsize=15)
plt.xlabel('value of random number', fontsize=15)
plt.title('A histogram', fontsize=150)
plt.legend( fontsize=15)
Finally, we will look at scatter plots, useful for when we need to plot data that we don’t necessarily
think should be connected with a line. For example, when we are looking at how variables correlate
with one another, or where they sit in some parameter space.
The y1 and y2 arrays are sets of (uncorrelated) random numbers. That means that the value of
element 1 (for example) in y1 will be independent of (or uncorrelated with) the value of element 1
in y2. Let’s plot the values of y1 against the values of y2. Can you predict what you will see?
15
[100]: plt.figure(figsize=(10,10))
plt.scatter(y1,y2, marker='*', s=50)
plt.xlabel('y$_1$', fontsize=20)
plt.ylabel('y$_2$', fontsize=20)
plt.title('A scatter plot')
16
used in next week session are dataframes, so let us have a quick round of how to access dataframe
elements (DataFrames are argually one of the most powerful tools for data analysis in Python - we
will not use them to their potential).
We will begin by importing pandas, as pd for short.
https://pandas.pydata.org
Next we will create a dataframe, called df, with two columns, each holding one of our arrays of
random elements. DataFrames are indexed (that’s the first column you see below), allowing one to
access elements very efficiently. The next few cells demonstrate how to access DataFrame elements.
[102]: df = pd.DataFrame({'y1':y1, 'y2':y2})
df
[102]: y1 y2
0 -0.700759 0.551675
1 -0.038827 -0.316043
2 0.096796 0.167424
3 0.253266 -0.611019
4 0.966742 1.456996
… … …
4995 -1.182573 1.258707
4996 2.491254 2.127884
4997 -1.095310 -0.062212
4998 -0.700128 -0.720970
4999 0.410355 -0.185050
To access a specific row of data, you do so by specifiying the location of that row. E.g., to see the
values of y1 and y2 in row 10:
[103]: df.loc[10]
[103]: y1 0.491884
y2 -1.429145
Name: 10, dtype: float64
If you want to access a column specifically, say the value of y1 in row 10:
[104]: df.loc[10]['y1']
[104]: 0.49188444867648407
You can very easily manipulate data frames to select subsets of your data that satisfy certain
criteria, like we did before.
17
One way to do that is using numpy.where() again. To select the elements of your dataframe where
y2 is greater than 0 but less than 0.5, and then print those y2 values, you might do:
[105]: y2_pos = np.where( (df['y2'] >0) & (df['y2'] < 0.5))[0] #y2_pos is a list of␣
,→indices, or row numbers.
[106]: #this prints the values of 'y2' that are greater than 0 and smaller that 0.5/
print(df.loc[y2_pos]['y2'])
2 0.167424
9 0.131835
12 0.433351
16 0.306912
17 0.125599
…
4959 0.363569
4960 0.402134
4976 0.426183
4984 0.036551
4990 0.225481
Name: y2, Length: 965, dtype: float64
You can then plot slices of your data frame in different colours. E.g.:
[107]: plt.figure(figsize=(10,8))
plt.scatter(df['y1'],df['y2'], marker='*', s=50, color='blue') #this plots all␣
,→rows in your dataframe, in blue
,→previously set
plt.xlabel('y$_1$', fontsize=20)
plt.ylabel('y$_2$', fontsize=20)
18
1.3 Part 2 - SDSS data
The Sloan Digital Sky Survey (SDSS, www.sdss.org) is the largest astronomical survey in the
world, containing imaging and spectroscopic data of millions of galaxies, stars and quasars. In this
Lab, you’ll be using python notebooks to interact directly with SDSS data.
But first, we will begin with a more straightforward exploration.
The SDSS Skyserver provides a simple interface for exploring the images and spectra of objects
in the SDSS. It is assumed now that you have watched the introductory video to this lab session,
where the basic features of this interface are explained.
In order to get a quick feel for the spectroscopic properties of stars, star clusters and galaxies, take
some time to navigate the Skyserver, starting at the following location:
RA = 182.7, Dec = 18.7
Try selecting only objects with spectra, and use the EXPLORE feature to explore the images and
spectra of the wide variety of objects in the field. You can add objects that you find interesting to
a temporary notebook by clicking “add to notes”. You can view your notebook by clicking “show
notes”. It’s OK to take your time - this is your chance to explore.
The “Name” field in the Navigate window will recognise many Messier and NGC catalogue numbers.
19
For example, for an example of a nice globular cluster, type “ngc 4147” in the “Name” field.
Wikipedia will readily give you lists of Open Clusters and Globular Clusters (though note that
many will lie outside the SDSS footprint).
1.3.1 Exercise 1:
Consider the following:
• How do the images of stars, galaxies, and stellar clusters compare? Focus on their size on the
sky, their colour, etc.
• How do the spectra of stars and galaxies compare? Focus on similarities and differences.
1.3.2 Exercise 2:
Find a galaxy that you like that has a spectra, and write down the plate, fiber, MJD (find them
on the bottom right of the EXPLORE window) and the SDSS objid (on the top right of the same
window).
[108]: #answer:
MJD = 51930
fiber = 181
plate = 285
We will now pre-define some functions that we will use to fetch data on that galaxy directly onto
your work space, and that you can maniputale immediately in this workbook. We will also import
a set of libraries and arrange some settings.
Execute the next two code cells below.
[109]: # Import libraries for use in this notebook.
import numpy as np # standard Python lib for math ops
import pandas # data manipulation package
import matplotlib.pyplot as plt # another graphing package
import urllib.request
import os
from astropy.io import fits, ascii
print('Supporting libraries imported')
20
Instead, use None to not limit the column width.
pandas.set_option('display.max_colwidth', -1)
#construct filename and URL, then download to data/ on local folder. data/␣
,→must exist.
url_base_sdss = 'https://dr13.sdss.org/sas/dr13/sdss/spectro/redux/26/
,→spectra/' #last plate 3006 + plates 8015+8033
url_base_eboss = 'https://dr13.sdss.org/sas/dr13/eboss/spectro/redux/v5_9_0/
,→spectra/' #plates 3586 to 7565
plate_str = '{:04d}'.format(plate)
fibre_str = '{:04d}'.format(fibreID)
MJD_str = '{:05d}'.format(MJD)
def fetch_sdss_filter(filter):
url_base = 'https://classic.sdss.org/dr7/instruments/imager/filters/'
filename = filter+'.dat'
file_url = url_base + filename
#print('Retrieving file ' + file_url)
21
save_file = 'data/'+filename
urllib.request.urlretrieve(file_url, save_file)
#print('Saved file in ' + save_file)
res = ascii.read(save_file)
return res
print("Loaded function fetch_sdss_filter")
1.3.3 Exercise 3:
1. What is length of the wavelength array? And of the flux array?
2. Use plt.plot() to make a plot of your spectrum. Remember to add labels to the axis, and a
title. Does it match the spectrum in the EXPLORE window?
[117]: x = wavelength
y = flux
[126]: plt.plot(x,y,)
plt.xlabel('wavelength', fontsize=20)
plt.ylabel('flux', fontsize=20)
plt.title('Galaxy')
plt.xlim(3000,10000)
plt.ylim(0,200)
22
[126]: (0.0, 200.0)
g_data = fetch_sdss_filter('g')
g_wave = g_data['col1']
g_tp = g_data['col3']
r_data = fetch_sdss_filter('r')
23
r_wave = r_data['col1']
r_tp = r_data['col3']
i_data = fetch_sdss_filter('i')
i_wave = i_data['col1']
i_tp = i_data['col3']
z_data = fetch_sdss_filter('z')
z_wave = z_data['col1']
z_tp = z_data['col3']
plt.legend()
plt.title('Plate = {0:.0f}, MJD = {1:.0f}, Fiber = {2:.0f}'.
,→format(locals()['plate'],locals()['MJD'],locals()['fiber']))
24
The width of the filter shows you the range of wavelengths captured by each filter, and the amplitude
shows you how much light is allowed through. The overall normalisation is irrelevant here, but notice
for example how much more efficient the r filter is, compared to the u or z bands. Notice also how
we can probe wavelenths bluewards and redwards of the limits of the SDSS spectrograph.
Although spectra offer much more detailed information (i.e. redshifts!), flux coming through from
the photometric bands can tells us an awful lot about a galaxy. The integrated flux in each band is
most often translated into an apparent magnitude. We refer to the difference between bands, say
‘g-r’ or ‘u-g’, as a colour.
plate_2= 285
fiber_2=227
mjd_2=51930
wavelength1, flux1 = get_sdss_spectrum(plate=plate_1,fibreID=fiber_1,MJD=mjd_1)
wavelength2, flux2 = get_sdss_spectrum(plate=plate_2,fibreID=fiber_2,MJD=mjd_2)
25
plt.xlabel(r'$\lambda (\AA)$', fontsize=30)
plt.ylabel('Flux', fontsize=30)
plt.xlim(2500,10000)
1.4.2 Exercise 4 :
Which of the two galaxies has a higher value of g-r? Why? Which of these galaxies would appear
bluer to the eye?
Galaxy 1 will have a higher g-r value, because the magnitudes for galaxy one in the red section are
lower because its flux is greater than it is in the green band, g-r will be greater than galaxy 2 whos
difference in magnitudes will be smaller because the magniture in the r zone is greater because.
the flux is low, galaxy 2 will appear bluer.
1.4.3 Exercise 5:
Which type of stars do you expect dominate the spectrum of each of the galaxies 1 and 2?
Galaxy 1 - red giants Galaxy 2 - white dwarfs
Congratulations, that is the end of the Lab! Make sure you’ve run all the code cells,
filled in all the text answers and that your plots are all showing without error. Export
26
you Lab book to PDF, and submit it on Moodle before the end of the second Lab session.
Now that you know the basics, next session we will explore the local structure of the Universe and
how galaxy properties relate to it.
[125]:
[ ]:
27