MIE453 - Bioinformatics Systems (Fall 06)

Tutorial 2 - Arrays & Hashes

Contents

  1. Variables & Operators
  2. Arrays
  3. Hashes

1. Variables & Operators

Comment

Variables

Kinds of variables Variable declaration

Values

Variable Typing

Example: Print a scalar

#!/usr/bin/perl -w
# Storing DNA in a variable, and printing it out

# First we store the DNA in a variable called $DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Next, we print the DNA onto the screen
print $DNA;

# Finally, we'll specifically tell the program to exit.
exit;

Example: Concatenate a scalar

#!/usr/bin/perl -w
# Concatenating DNA

# Store two DNA fragments into two variables called $DNA1 and $DNA2
$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';

# Print the DNA onto the screen
print "Here are the original two DNA fragments:\n\n";

print $DNA1, "\n";

print $DNA2, "\n\n";

# Concatenate the DNA fragments into a third variable and print them
# Using "string interpolation"
$DNA3 = "$DNA1$DNA2";

print "Here is the concatenation of the first two fragments (version 1):\n\n";

print "$DNA3\n\n";

# An alternative way using the "dot operator":
# Concatenate the DNA fragments into a third variable and print them
$DNA3 = $DNA1 . $DNA2;

print "Here is the concatenation of the first two fragments (version 2):\n\n";

print "$DNA3\n\n";

# Print the same thing without using the variable $DNA3
print "Here is the concatenation of the first two fragments (version 3):\n\n";

print $DNA1, $DNA2, "\n";

exit;

Operators

Arithmetic Operators

Numeric Comparisons

String Comparisons

2. Arrays

Arrays are ordered collections of zero of more scalar values, indexed by position.

Array assignment

Accessing array elements

Array copy (using assignment operator)

Scalar vs List context

Array operators

output: ?

3. Hashes

A hash (also called an associative array) is a collection of zero or more pairs of scalar values, called keys and values

Hash assignment

Accessing Hash elements

Hash operators

output: ?

output: ?

what if there are duplicates in the values?

output: ?

Example: restriction enzyme hash

#!/usr/bin/perl -w
# Restriction enzymes are proteins that cut DNA at short, specific sequences
# e.g., EcoRI cuts where it finds GAATTC, between G and A
#
# Intialize restriction enzyme hash
# keys are the names of restriction enzymes, values are the DNA sequence they cut
# h

%re_lookup = (
          'Eco47III'=> 'AGCGCT',
          'EcoRI'   => 'GAATTC',
          'HindIII' => 'AAGCTT',
);

print "Enter restriction enzyme name\n";
$re=<STDIN>;
chomp $re;

$seq = $re_lookup{$re}; if (defined($seq)) { print "RE sequence for $re is: $seq\n"; } else { print "Sorry, I don't know about \"$re\""; }

Example: Generic code

#
# codon2aa
#
# A subroutine to translate a DNA 3-character codon to an amino acid
#   Version 3, using hash lookup

sub codon2aa {
    my($codon) = @_;

    $codon = uc $codon;
 
    my(%genetic_code) = (
    
    'TCA' => 'S',    # Serine
    'TCC' => 'S',    # Serine
    'TCG' => 'S',    # Serine
    'TCT' => 'S',    # Serine
    'TTC' => 'F',    # Phenylalanine
    'TTT' => 'F',    # Phenylalanine
    'TTA' => 'L',    # Leucine
    'TTG' => 'L',    # Leucine
    'TAC' => 'Y',    # Tyrosine
    'TAT' => 'Y',    # Tyrosine
    'TAA' => '_',    # Stop
    'TAG' => '_',    # Stop
    'TGC' => 'C',    # Cysteine
    'TGT' => 'C',    # Cysteine
    'TGA' => '_',    # Stop
    'TGG' => 'W',    # Tryptophan
    'CTA' => 'L',    # Leucine
    'CTC' => 'L',    # Leucine
    'CTG' => 'L',    # Leucine
    'CTT' => 'L',    # Leucine
    'CCA' => 'P',    # Proline
    'CCC' => 'P',    # Proline
    'CCG' => 'P',    # Proline
    'CCT' => 'P',    # Proline
    'CAC' => 'H',    # Histidine
    'CAT' => 'H',    # Histidine
    'CAA' => 'Q',    # Glutamine
    'CAG' => 'Q',    # Glutamine
    'CGA' => 'R',    # Arginine
    'CGC' => 'R',    # Arginine
    'CGG' => 'R',    # Arginine
    'CGT' => 'R',    # Arginine
    'ATA' => 'I',    # Isoleucine
    'ATC' => 'I',    # Isoleucine
    'ATT' => 'I',    # Isoleucine
    'ATG' => 'M',    # Methionine
    'ACA' => 'T',    # Threonine
    'ACC' => 'T',    # Threonine
    'ACG' => 'T',    # Threonine
    'ACT' => 'T',    # Threonine
    'AAC' => 'N',    # Asparagine
    'AAT' => 'N',    # Asparagine
    'AAA' => 'K',    # Lysine
    'AAG' => 'K',    # Lysine
    'AGC' => 'S',    # Serine
    'AGT' => 'S',    # Serine
    'AGA' => 'R',    # Arginine
    'AGG' => 'R',    # Arginine
    'GTA' => 'V',    # Valine
    'GTC' => 'V',    # Valine
    'GTG' => 'V',    # Valine
    'GTT' => 'V',    # Valine
    'GCA' => 'A',    # Alanine
    'GCC' => 'A',    # Alanine
    'GCG' => 'A',    # Alanine
    'GCT' => 'A',    # Alanine
    'GAC' => 'D',    # Aspartic Acid
    'GAT' => 'D',    # Aspartic Acid
    'GAA' => 'E',    # Glutamic Acid
    'GAG' => 'E',    # Glutamic Acid
    'GGA' => 'G',    # Glycine
    'GGC' => 'G',    # Glycine
    'GGG' => 'G',    # Glycine
    'GGT' => 'G',    # Glycine
    );

    if(exists $genetic_code{$codon}) {
        return $genetic_code{$codon};
    }else{

            print STDERR "Bad codon \"$codon\"!!\n";
            exit;
    }
}
# dna2peptide 
#
# A subroutine to translate DNA sequence into a peptide

sub dna2peptide {

    my($dna) = @_;

    use strict;
    use warnings;

    # Initialize variables
    my $protein = '';

    # Translate each three-base codon to an amino acid, and append to a protein 
    for(my $i=0; $i < (length($dna) - 2) ; $i += 3) {
        $protein .= codon2aa(substr($dna,$i,3) );
    }

    return $protein;
}

print "Please enter your dna sequence:\n";
$dna = <STDIN>;
$peptide = dna2peptide($dna);
print "Here is the translated protein sequence: $peptide\n";

exit;

How about modify about code to accomodate the 6 reading frames?

Some examples and perl scripts are adopted from the book Beginning Perl for Bioinformatics, James Tisdall, ISBN, 0-596-00080-4, 2001.