CSC 209H: Assignment 1

Weight: 10% of course grade.

Due date: 10:00 p.m. Monday, June 12, 10:00pm.


Introduction

In this assignment you will be asked to write two shell scripts (using sh) that manipulate files and directories.

Part 1 - fs

A typical question users may have regarding files on the file system is how big are they? There are a number of programs that answer this question, such as ls or du.

Sometimes we might want to find the size of all files (that match some filename pattern) in all subdirectories. For example, we might want to find the size of all C or text files in the current directory and all subdirectories. It is possible to do this with find, or you can write a sh script to accomplish this.

Your task will be to write a sh script (without using find, ls or du) that takes shell filename patterns, and reports the size of each file that matches these patterns.

The fs script must support the following usage:

If no switches are given, fs reports the size (in bytes, as reported by stat) of all files relative to the current directory that match the filepatterns, one per line. If a particular file matches more than one pattern, it will be reported multiple times. You may assume that any argument that begins with a dash, -, is a command line switch, and all switches preceed any file patterns.

If the -k switch is given as an argument to fs, the sizes will be reported in kilobytes instead of bytes. (Recall, there are 1024 bytes in a kilobyte.) You may choose and document a suitable rounding convention.

If the -b switch is given, the number of blocks (as reported by stat) occupied by each file will be reported instead of its size. At most one of -k or -b may be specified.

If the -r switch is given, the fs script operates recursively: not only files relative to the current directory reported, but also any files relative to any subdirectories, and so on. Only non-hidden subdirectories are considered. See the examples for more details.

If the -f switch is given, only regular files are reported by fs (ordinarily directories and other kinds of files may have their sizes reported, but the -f switch inhibits this behaviour).

The reporting format is the file size (or number of blocks), followed by a tab character, followed by the path to the file relative to the (original) current directory. (See the examples for details.)

If no file patterns are given, or if no files match the given patterns, your program should produce no output. For cases not covered by this specification (such as the return status of your script or error handling), you may specify and implement a reasonable behaviour.

Examples

The following command reports the size of all C files in the current directory. Note that the * is expanded by the shell.

$ fs *.c
162	a.c
938	b.c

The following command reports the size of all C files in all directories including and below the current directory. Note that the * is passed directly to the script.

$ fs -r "*.c"
162	a.c
938	b.c
2490	subdir/xyz.c
727	threads/readn.c
661	threads/writen.c

The following command reports the size of all files stating with s or named a.c in the current directory. Notice that subdir is a directory.

$ fs 's*' a.c
31424	sol.txt
512	subdir
162	a.c

The following command repeats this, but omitting the directory.

$ fs -f 's*' a.c
31424	sol.txt
162	a.c

The following command reports the number of blocks in all C files that are contained in the parent directory of some directory below the current directory. Notice that the same files are printed twice, once for each subdirectory.

$ fs -b -r '../*.c'
2	subdir/../a.c
2	subdir/../b.c
2	threads/../a.c
2	threads/../b.c

Hints and Clarifications


Part 2 - vcardtoabook

A vCard is a virtual business card some people attach to their email messages. It typically contains the name, contact details and address of the sender, which you might want to import into your address book.

Your job is to write a shell script (using sh) called vcardtoabook that will take as input a vCard file, parse (interpret) its contents, and add the information to a simple address book (such as that used for the pine mailer).

The vCard format

The following is an example of such a vCard:

BEGIN:VCARD
VERSION:2.1
FN:John Doe III
N:Doe;John;;;III
BDAY:1/1/70
ADR;WORK:PO box 1;;123 College St.;Toronto;Ontario;M1M 1M1;Canada
TEL;PREF;MSG;WORK:999-123-4567
TEL;CELL:987-654-3210
EMAIL;INTERNET:jd@someisp.ca
EMAIL;INTERNET:johndoe@gmail.com
UID:9e1762c1d3a2da01f0f722486631b7b3
REV:2006-06-02T09\:27\:06Z
X-GENERATOR:vCardMaker 0.5.1
END:VCARD

Your script will read such a file from its standard input, extracting the full name, primary email address, primary street address, and primary telephone number, and create an entry in the .addressbook file. (Typically the .addressbook would be located in the user's home directory, but for this assignment we'll keep in the current working directory to prevent modification of your own address book.)

Your script will be executed with no arguments. From its standard input, your script will attempt to read a vCard. The vCard begins with the line BEGIN:VCARD, and you should ignore anything appearing before this line (think of the input being an email message piped from the user's email program, with the vCard occurring at the end of the message). Similarly, the vCard ends with the line END:VCARD and you should ignore anything following this line. You may assume you are given at most one vCard.

You should extract the following information from the input:

The .addressbook file

The .addressbook file is a text file consisting of a number of records. A line beginning with a non-space character is the start of a record, while a line beginning with a space character is a continuation of the previous line. The format of the records is discussed below.

To store the extracted information about this new contact, we will attempt to add a line like the following to the end of the .addressbook file:

john	John Doe III	John Doe III <jd@someisp.ca>		PO box 1;;123 College St.;Toronto;Ontario;M1M 1M1;Canada, WORK:999-123-4567

The first field is a unique nickname for this record: no other record in the .addressbook file should use this nickname. The nickname should not contain any spaces. Following the nickname is a tab character. Next is the full name, followed by a tab, followed by the email address. The email address begins with the full name, followed by a space, then the actual email address within angle brackets. Following the email address are two tab characters (the missing field is used as the FCC, which we're ignoring), followed by the comments field.

The comments field should be filled as follows. Should both an address and a telephone number be extracted from the vCard, the address should appear first, followed by a comma and a space, followed by the phone number. Should only one of the address and telephone number be found, it should appear in the comments field. If neither an address nor a telephone number is found in the vCard, the comments field should remain blank.

Avoiding duplicates

It is possible (and likely) that the user will use your script to attempt to add a duplicate contact to the address book. You script should avoid doing so.

We will define a duplicate entry in the .addressbook file as two records containing the identical email address. Before adding the line described above to the .addressbook file, your script should verify that the email address does not already appear in the file. If it does, your script should not modify the .addressbook file.

If the .addressbook file does not previously exist in the current directory, your script should create it when adding the appropriate record.

Hints and Clarifications


Learning Objectives

What to hand in

You will commit to the a1 directory of your CSC209 repository the following files:

Please remember to make your shell scripts executable by ensuring that the first line is "#!/bin/sh".

You are strongly encouraged to take advantage of the version control system and commit your work frequently so that you can keep track of your progress. Please note that perfectly fine (and even recommended) that you keep any additional files related to this assignment (such as files used for testing) under version control. The markers will simply ignore such files.