Lab08S ExternalPackages
Lab08S ExternalPackages
Lab08S ExternalPackages
- understand and revise the use of pandas in the design of data analysis programs;
- understand and revise the use of pandas in the implementation of data analysis programs.
The above statements are the learning outcomes of this laboratory and will be achieved in concert
with the other learning activities that you undertake for this unit.
2. Convert your Pseudocode to Python code. If you are using Visual Studio Code, use pip to
install numpy and ensure you have created and utilised a new Workspace for this week. Then,
run the code and note any errors that you encounter. Correct any errors in your Pseudocode
and Python code and run it again, repeating until all errors are fixed.
a. Firstly, download the tables from the Data WA open data portal by clicking the
“Download” button to the right of the large “Appendix 8” button as an Excel file – the
default option (click here).
b. Once you have done so, write Python code to read in Table 8.7 from the file using the
read_excel function, noting the specification of the sheet name from the
documentation (click here).
c. Print the DataFrame once it has been created and note its dimensions.
d. Then, remove any rows that contain cells with no data (as well as the first row with
column headers) and print the DataFrame again, noting the difference in dimensions.
e. Then, remove any rows that have quasi-no data (i.e. those with elements consisting of “-”
or “-(d)”). This can be achieved using the ‘tilde’ operator negating a statement (e.g.
~(the_df[“Column”] == “Value) as an expression). You may wish to output the
table to see if you have removed all elements in the process.
f. Finally, filter out any rows that are for values less than $5m in either the final financial
year or the one before it. Print the DataFrame again, noting the difference in dimensions.
g. Output descriptive statistics to the user. Postulate why they have been output the way
they have. Then, output the result to a CSV file named processed_data.csv.