Comment
Variables
Values
Variable Typing
Example: Print a scalar
#!/usr/bin/perl -w # First we store the DNA in a variable called $DNA # Next, we print the DNA onto the screen # Finally, we'll specifically tell the program
to exit. |
Example: Concatenate a scalar
#!/usr/bin/perl -w # Store two DNA fragments into two variables called
$DNA1 and $DNA2 # Print the DNA onto the screen print $DNA1, "\n"; print $DNA2, "\n\n"; # Concatenate the DNA fragments into a third variable
and print them print "Here is the concatenation of the first two fragments (version 1):\n\n"; print "$DNA3\n\n"; # An alternative way using the "dot operator": print "Here is the concatenation of the first two fragments (version 2):\n\n"; print "$DNA3\n\n"; # Print the same thing without using the variable
$DNA3 print $DNA1, $DNA2, "\n"; exit; |
Operators
Arithmetic Operators

Numeric Comparisons

String Comparisons

Arrays are ordered collections of zero of more scalar values, indexed by position.
Array assignment
Accessing array elements
Array copy (using assignment operator)
| #!/usr/bin/perl -w # Array copies # Initialize two arrays with same content print "--- Initial values of two arrays ---\n"; # Modify the first array print "--- New values of two arrays ---\n"; exit; |
Scalar vs List context
| #!/usr/bin/perl -w # Demonstration of "scalar context" and "list context" @bases = ('A', 'C', 'G', 'T'); print "@bases\n"; $a = @bases; print $a, "\n"; ($a) = @bases; print $a, "\n"; exit; |
Array operators
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
$base1 = shift @bases;
print "@bases";
output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
unshift(@bases, 'U');
print "@bases";
output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
$base1 = pop @bases;
print "@bases";output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
push(@bases, 'U');
print "@bases";output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
@reverse = reverse @bases;
print "@reverse";output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
$len = scalar @bases;
print $len;output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
splice (@bases, 2, 0, 'X');
print "@bases";output: ?
how about splice (@bases, 2, 1, 'X');
#!/usr/bin/perl -w
$bases = 'ACGT';
@bases=split('', $bases);
print "@bases";output: ?
#!/usr/bin/perl -w
@bases = ('A', 'C', 'G', 'T');
$bases=join('', @bases');
print "@bases";output: ?
#!/usr/bin/perl -w
@array = ('a', 'b', 'C',3, 1);
@sorted = sort (@array);
print "@sorted";output: ?
#!/usr/bin/perl -w
#
# Calculating the reverse complement of a strand of DNA using string
#
# The DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
print "Here is the starting DNA:\n\n$DNA\n\n";
# Calculate the reverse
$revcom1 = reverse $DNA;
# Calculate the complement
$revcom1 =~ tr/ACGTacgt/TGCAtgca/;
print "Here is the reverse complement DNA using STRING:\n\n$revcom1\n\n";
#
# Calculating the reverse complement of a strand of DNA using array
#
# Split the DNA string into an array of characters
@DNA = split('', $DNA);
# Calculate the reverse
@reverse = reverse @DNA;
# Join the array of characters of the reverse
$revcom2 = join('', @reverse);
# Calculate the complement
$revcom2 =~ tr/ACGTacgt/TGCAtgca/;
print "Here is the reverse complement DNA using ARRAY:\n\n$revcom2\n"; |
A hash (also called an associative array) is a collection of zero or more pairs of scalar values, called keys and values
Hash assignment
Accessing Hash elements
Hash operators
#!/usr/bin/perl -w
%genes = ( 'gene1' => 'AACCCGGTTGGTT', 'gene2'=>'CCTTTDGGAAGGTC' );
@keys = keys %genes; @values = values %genes;
print "Keys are: @keys\n"; print "Values are: @values";
output: ?
#!/usr/bin/perl -w
%genes = ( 'gene1' => 'AACCCGGTTGGTT', 'gene2'=>'CCTTTDGGAAGGTC' );
%rev_genes = reverse %genes; @keys = keys %rev_genes; @values = values %rev_genes;
print "Keys are: @keys\n"; print "Values are: @values";
output: ?
what if there are duplicates in the values?
#!/usr/bin/perl -w
%genes = ( 'gene1' => 'AACCCGGTTGGTT', 'gene2'=>'CCTTTDGGAAGGTC' );
delete $genes{'gene1'}; @keys = keys %genes; @values = values %genes;
print "Keys are: @keys\n"; print "Values are: @values";
output: ?
Example: restriction enzyme hash
#!/usr/bin/perl -w
# Restriction enzymes are proteins that cut DNA at short, specific sequences
# e.g., EcoRI cuts where it finds GAATTC, between G and A
#
# Intialize restriction enzyme hash
# keys are the names of restriction enzymes, values are the DNA sequence they cut
# h
%re_lookup = (
'Eco47III'=> 'AGCGCT',
'EcoRI' => 'GAATTC',
'HindIII' => 'AAGCTT',
);
print "Enter restriction enzyme name\n";
$re=<STDIN>;
chomp $re;
|
Example: Generic code
#
# codon2aa
#
# A subroutine to translate a DNA 3-character codon to an amino acid
# Version 3, using hash lookup
sub codon2aa {
my($codon) = @_;
$codon = uc $codon;
my(%genetic_code) = (
'TCA' => 'S', # Serine
'TCC' => 'S', # Serine
'TCG' => 'S', # Serine
'TCT' => 'S', # Serine
'TTC' => 'F', # Phenylalanine
'TTT' => 'F', # Phenylalanine
'TTA' => 'L', # Leucine
'TTG' => 'L', # Leucine
'TAC' => 'Y', # Tyrosine
'TAT' => 'Y', # Tyrosine
'TAA' => '_', # Stop
'TAG' => '_', # Stop
'TGC' => 'C', # Cysteine
'TGT' => 'C', # Cysteine
'TGA' => '_', # Stop
'TGG' => 'W', # Tryptophan
'CTA' => 'L', # Leucine
'CTC' => 'L', # Leucine
'CTG' => 'L', # Leucine
'CTT' => 'L', # Leucine
'CCA' => 'P', # Proline
'CCC' => 'P', # Proline
'CCG' => 'P', # Proline
'CCT' => 'P', # Proline
'CAC' => 'H', # Histidine
'CAT' => 'H', # Histidine
'CAA' => 'Q', # Glutamine
'CAG' => 'Q', # Glutamine
'CGA' => 'R', # Arginine
'CGC' => 'R', # Arginine
'CGG' => 'R', # Arginine
'CGT' => 'R', # Arginine
'ATA' => 'I', # Isoleucine
'ATC' => 'I', # Isoleucine
'ATT' => 'I', # Isoleucine
'ATG' => 'M', # Methionine
'ACA' => 'T', # Threonine
'ACC' => 'T', # Threonine
'ACG' => 'T', # Threonine
'ACT' => 'T', # Threonine
'AAC' => 'N', # Asparagine
'AAT' => 'N', # Asparagine
'AAA' => 'K', # Lysine
'AAG' => 'K', # Lysine
'AGC' => 'S', # Serine
'AGT' => 'S', # Serine
'AGA' => 'R', # Arginine
'AGG' => 'R', # Arginine
'GTA' => 'V', # Valine
'GTC' => 'V', # Valine
'GTG' => 'V', # Valine
'GTT' => 'V', # Valine
'GCA' => 'A', # Alanine
'GCC' => 'A', # Alanine
'GCG' => 'A', # Alanine
'GCT' => 'A', # Alanine
'GAC' => 'D', # Aspartic Acid
'GAT' => 'D', # Aspartic Acid
'GAA' => 'E', # Glutamic Acid
'GAG' => 'E', # Glutamic Acid
'GGA' => 'G', # Glycine
'GGC' => 'G', # Glycine
'GGG' => 'G', # Glycine
'GGT' => 'G', # Glycine
);
if(exists $genetic_code{$codon}) {
return $genetic_code{$codon};
}else{
print STDERR "Bad codon \"$codon\"!!\n";
exit;
}
}
# dna2peptide
#
# A subroutine to translate DNA sequence into a peptide
sub dna2peptide {
my($dna) = @_;
use strict;
use warnings;
# Initialize variables
my $protein = '';
# Translate each three-base codon to an amino acid, and append to a protein
for(my $i=0; $i < (length($dna) - 2) ; $i += 3) {
$protein .= codon2aa(substr($dna,$i,3) );
}
return $protein;
}
print "Please enter your dna sequence:\n";
$dna = <STDIN>;
$peptide = dna2peptide($dna);
print "Here is the translated protein sequence: $peptide\n";
exit; |
How about modify about code to accomodate the 6 reading frames?
Some examples and perl scripts are adopted from the book Beginning Perl for Bioinformatics, James Tisdall, ISBN, 0-596-00080-4, 2001.