assignment1_Advanced Python_UHasselt
assignment1_Advanced Python_UHasselt
Questions
Implement the following programs in python:
(1) A nucleotide is the basic building block of DNA sequences. The four nucleobases that can
appear in a DNA sequence are A,C,G and T. Thus, a DNA sequence is a long chain (string)
of nucleotides such as "TTAATTTACTCACTGGCTA".
Use regular expressions to implement a function dna_match that takes a string s representing
a DNA sequence as input. The function returns True if and only if s has the following
property:
Example
Let s1 = "TTAATTTACTCACTGGCTA", s2 = "TTAATTTACTCACTGGCT",
s3 = "TTAATTTACCCTCAACATGGGCTA", and s4 = "AATTTAGGCTAAATTTAGGCTA". Then
• dna_match(s1) = True;
• dna_match(s2) = False;
• dna_match(s3) = False; and
• dna_match(s4) = True.
(2) Use regular expressions to implement a function collect_email_from_text that takes two
strings as inputs: a string representing some text and a name of an output file. The function
extracts a list of all the valid email addresses found in that text and writes this list in a
specific format to the output file.
1
The format of the output file must begin with the line ’The number of emails extracted
is n’, where n is the number of emails extracted. Following this line, each of the extracted
emails should be written on a line by itself. Make sure not to add an empty new line at the
end of the file. Note that their order in the file should be the same as order they appear in
the input text.
Example
Assume that we can extract two email addresses ’a@b.c’ and ’a@d.e’, then the format of
the output file is as follows :
If there are no emails extracted, then the file will contain only one line of the form
Note that a valid email address has the format left@right, where:
• left is nonempty string that can’t begin with a dot (.). Moreover, it may contain
only alphanumeric characters, underscores (_), dashes ( - ), plus ( + ), double quotes
("), and/or dots (.).
• right is a nonempty string that must contain a dot (.), but it can’t begin or end with
a dash ( - ) or with a dot (.). Moreover, it may contain only alphanumeric characters,
dashes ( - ), and/or dots (.), however, it can’t have two successive dots.
Example
The following strings are valid examples of emails:
• ’__e--mail@123.123.123.123’;
• ’"email"@domain.com’;
• ’john.doe+1@gmail.com’; and
• ’jane@do-e.uk.com’.
• ’@domain.com’;
• ’e.mail@.domain.com’;
• ’email@domain..com’; and
• ’email@-domain.com’.
2
"""
author : [ firstname lastname ]
studentnumber : [ your studentnumber ]
"""
Before the deadline make sure to submit the solution file through Blackboard.
Helper Functions You can define as many additional functions as you want. Help functions
should start with an underscore. But all solutions should not exceed 500 lines. In fact, 500 lines
is extremely generous bound. You need much less than that.
Grading The assignments are graded by both hand and unit tests. That means the results
of the functions should be precisely as expected by the given unit tests. Check the test file
test assignment1.py on Blackboard. Do not edit that file except for the imports if needed.
In grading we do not only look at functionality, but we also take code quality (including comments
in the code) into account.