Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

assignment1_Advanced Python_UHasselt

The assignment requires individual completion and focuses on implementing Python functions using regular expressions for DNA sequence validation and email extraction. Students must submit a single file named 'assignment1.py' containing all solutions, adhering to specific formatting and content guidelines. Plagiarism will be strictly monitored, and adherence to academic integrity is emphasized.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

assignment1_Advanced Python_UHasselt

The assignment requires individual completion and focuses on implementing Python functions using regular expressions for DNA sequence validation and email extraction. Students must submit a single file named 'assignment1.py' containing all solutions, adhering to specific formatting and content guidelines. Plagiarism will be strictly monitored, and adherence to academic integrity is emphasized.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Advanced Programming in Python

Assignment 1: Regular Expressions and Files

Due date is 10/03/2024 at 23:59

The assignment is strictly individual. Plagiarism and cooperation will be


thoroughly checked by hand, and by specialised software. Not only copying
other peoples code, but also giving your source code to other students will
be considered as fraud. Using solutions from the internet, even very small
portions are considered as plagiarism.
When we detect fraud, conform with the exam regulations (OER), we
will inform the exam comity, which will start a fraud process (see the OER).
Unfortunately, every academic year students get caught; we hope that this
will not be the case this year.

Questions
Implement the following programs in python:

(1) A nucleotide is the basic building block of DNA sequences. The four nucleobases that can
appear in a DNA sequence are A,C,G and T. Thus, a DNA sequence is a long chain (string)
of nucleotides such as "TTAATTTACTCACTGGCTA".
Use regular expressions to implement a function dna_match that takes a string s representing
a DNA sequence as input. The function returns True if and only if s has the following
property:

Every substring of the form AATTTA in s is followed by at most 10 arbitrary nucleotides


and then followed by the substring GGCTA.

Example
Let s1 = "TTAATTTACTCACTGGCTA", s2 = "TTAATTTACTCACTGGCT",
s3 = "TTAATTTACCCTCAACATGGGCTA", and s4 = "AATTTAGGCTAAATTTAGGCTA". Then

• dna_match(s1) = True;
• dna_match(s2) = False;
• dna_match(s3) = False; and
• dna_match(s4) = True.

(2) Use regular expressions to implement a function collect_email_from_text that takes two
strings as inputs: a string representing some text and a name of an output file. The function
extracts a list of all the valid email addresses found in that text and writes this list in a
specific format to the output file.

1
The format of the output file must begin with the line ’The number of emails extracted
is n’, where n is the number of emails extracted. Following this line, each of the extracted
emails should be written on a line by itself. Make sure not to add an empty new line at the
end of the file. Note that their order in the file should be the same as order they appear in
the input text.

Example
Assume that we can extract two email addresses ’a@b.c’ and ’a@d.e’, then the format of
the output file is as follows :

The number of emails extracted is 2


a@b.c
a@d.e

If there are no emails extracted, then the file will contain only one line of the form

The number of emails extracted is 0

Note that a valid email address has the format left@right, where:

• left is nonempty string that can’t begin with a dot (.). Moreover, it may contain
only alphanumeric characters, underscores (_), dashes ( - ), plus ( + ), double quotes
("), and/or dots (.).
• right is a nonempty string that must contain a dot (.), but it can’t begin or end with
a dash ( - ) or with a dot (.). Moreover, it may contain only alphanumeric characters,
dashes ( - ), and/or dots (.), however, it can’t have two successive dots.

Example
The following strings are valid examples of emails:

• ’__e--mail@123.123.123.123’;
• ’"email"@domain.com’;
• ’john.doe+1@gmail.com’; and
• ’jane@do-e.uk.com’.

However, the following are not valid emails:

• ’@domain.com’;
• ’e.mail@.domain.com’;
• ’email@domain..com’; and
• ’email@-domain.com’.

Submission and Grading Criteria


First of all, do not solve the assignment in multiple files. We expect from you one file
that contains all the solutions of the assignment’s questions. In case you are not solving all
the questions, then make sure that you also include the functions of the unsolved questions
in your submission with pass in the body. Moreover, your solution file must have the name
assignment1.py; otherwise, the tests will not run. Also, make sure that file begins with

2
"""
author : [ firstname lastname ]
studentnumber : [ your studentnumber ]
"""

Before the deadline make sure to submit the solution file through Blackboard.

Helper Functions You can define as many additional functions as you want. Help functions
should start with an underscore. But all solutions should not exceed 500 lines. In fact, 500 lines
is extremely generous bound. You need much less than that.

Grading The assignments are graded by both hand and unit tests. That means the results
of the functions should be precisely as expected by the given unit tests. Check the test file
test assignment1.py on Blackboard. Do not edit that file except for the imports if needed.
In grading we do not only look at functionality, but we also take code quality (including comments
in the code) into account.

You might also like