Due electronically: Thursday 23 January, 6:00 pm
This assignment is worth 1 mark towards your grade in this course.
Part marks will count.
Coming up with the algorithms required for this
assignment should be a straightforward task for all CSC148 students.
If you don't find it so, then you may need to get more programming
experience before taking 148.
Since the sequence of nucleotides that
makes up a particular gene
can be represented by a string of characters,
the gene-matching problem is an instance of a
general string-matching problem.
In this assignment, we will tackle a simple string-matching problem: Given two strings of any length, find the longest substring (of length 1 or greater) common to both. For instance, if the strings are: xypdzxabcdelmndeghij and abcdefghijk
then the longest common substring, abcde, is 5 characters long and occurs at position 7 in the first string and position 1 in the second string: xypdzxlmndeghij and fghijk
Your task is to write a program that reads in two strings and finds the longest substring common to both. The input strings will be on two separate lines of the input file. The output should be:
There were no common substrings
if there were no substrings common to both input strings
(in which case, there must not have been even any characters common to both),
or, if there was at least one substring common to both:
The longest common substring has length: l
It occurs in string 1 starting at character: i
It occurs in string 2 starting at character: j
where i, j, and l are appropriate integers.
In the case of a tie, the substring closest to the beginning of
the first string is chosen.
If this also yields a tie (because that substring occurs more than
once in the second string),
the occurence closest to the beginning of the second string is chosen.
The incidence matrix for the sample strings used above
is shown below.
To make it easier to read, we have printed the two
strings
down the left and
across the top.
Notice several important things about an incidence matrix:
An obvious and simple algorithm for matching two strings, given
their incidence matrix, is to look for the longest sequence
of trues along one of the main diagonals.
To save typing,
our starter code is available online through the course web page
located at
http://www.cs.utoronto.ca/
dianeh/148.html
(You will also find there an example of an input file
and the output that it should produce.)
Make your own copy of the starter code using your web browser.
(With Netscape, you can use the
``Save As'' command under the ``File'' pull-down menu.)
Then edit the program and insert the missing pieces in the places marked
by >> and <<.
You must use our starter code, and
you must not make any alterations to it.
In particular, do not change the subprogram headers and do not change
the input or output statements.
Do not add prompts for the input.
You will probably find it convenient to break some of the subprograms down
further into smaller subtasks.
Feel free to add additional subprograms.
Because this assignment is to be marked by a program,
you will hand it in electronically rather than
on paper.
Name the file containing your completed program
a0.t if it is written in OOT
or
a0.c if it is written in C.
When you are convinced of the correctness of your code,
hand it in electronically using the ``Hand in (submit)'' command,
found under the ``File Utilities'' icon.
If you hand in a solution more than once, we will have only a copy of your
most recent version; our copy of the older ones will
be erased.
The program that will mark your assignment is not terribly
clever,
and can only compare your program's output character for character
with what we tell it is the correct output.
If your output is different in any
way from the expected output, it will be considered
wrong, and your mark will be zero.
For this reason, we are providing, among other things,
the statements that perform input and output.
If you modify these in any way, or add other input or output
statements
(for example, if you add prompts)
your mark will be zero.
In addition, you must use the file name a0.t or a0.c
(as appropriate)
when you hand in your assignment, or your mark will be zero.
The most effective way to get this assignment done quickly is to write
your program out on paper and simulate it by hand before you
go near a computer, in order to get rid of as many errors as possible.
Even if you plan to develop your program at home, you will have to use one
of our PCs
to hand in your final program.
Make sure you log in and get comfortable with our computers well before
the due date.
Try to get your assignment finished early. The computers are likely to
be very crowded near the due date.
The course web site is located at
http://www.cs.utoronto.ca/
dianeh/148.html