sentRep
Class Sentence

java.lang.Object
  extended by sentRep.Sentence

public class Sentence
extends java.lang.Object


Constructor Summary
Sentence(java.util.Hashtable<java.lang.String,Pair<java.lang.Double,java.lang.Double>> representation, SentenceFactory.Resource sentenceType)
          Creates a sentence out of a hashtable with a string and number pair.
Sentence(SentenceFactory.Resource sentenceType)
          Constructor creates an empty sentence of a given resource type.
 
Method Summary
 void addFeature(java.lang.String key, double val, double normalizedVal)
          Adds a feature to the sentence
 void applyTF_IDF(java.util.Hashtable<java.lang.String,java.lang.Integer> docCount, int totalDocs)
          Applies TF.IDF to the sentence, this time with normalized TF.
 void applyTF_IDF(java.util.Hashtable<java.lang.String,java.lang.Integer> docCount, int totalDocs, int totalWords)
          Applies TF.IDF to all the words in the resource.
 boolean containsKey(java.lang.String key)
          Checks to see if sentence contains a word.
 void deleteFeature(java.lang.String key)
           
 double[] getFeatureVector(Sentence sen)
          Generates and prints out the features for the weka ML algorithms to train/test on.
 double getModified(java.lang.String key)
          Gets the modified value for a key word.
 double getOriginal(java.lang.String key)
          Gets original value for a key word.
 SentenceFactory.Resource getResourceType()
          Gets the resource type.
 java.util.Set<java.lang.String> keySet()
          Gets the key set as a set of strings.
 void printFeatureVector(Sentence sen, java.lang.String value)
           
 void printFeatureVector(Sentence sen, java.lang.String value, java.lang.String[] keys, java.io.BufferedWriter bw)
          Generates and prints out the features for the weka ML algorithms to train/test on.
static void printHeader()
          prints out header information for weka ML algorithms.
static void printHeader(int max, java.io.BufferedWriter bw)
          Prints out header information, this time printing to a buffered writer.
 double similarityModified(Sentence target)
          computes cosine similarity between two hashtables v1 and v2 weights of items are reweighted based on the normal distribution.
 double similarityOriginal(Sentence target)
          computes cosine similarity between two hashtables v1 and v2 weights of items are not re-weighted.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Sentence

public Sentence(SentenceFactory.Resource sentenceType)
Constructor creates an empty sentence of a given resource type.

Parameters:
sentenceType -

Sentence

public Sentence(java.util.Hashtable<java.lang.String,Pair<java.lang.Double,java.lang.Double>> representation,
                SentenceFactory.Resource sentenceType)
Creates a sentence out of a hashtable with a string and number pair. A resource type is also provided.

Parameters:
representation -
sentenceType -
Method Detail

addFeature

public void addFeature(java.lang.String key,
                       double val,
                       double normalizedVal)
Adds a feature to the sentence

Parameters:
key -
val -
normalizedVal -

deleteFeature

public void deleteFeature(java.lang.String key)

applyTF_IDF

public void applyTF_IDF(java.util.Hashtable<java.lang.String,java.lang.Integer> docCount,
                        int totalDocs,
                        int totalWords)
Applies TF.IDF to all the words in the resource. This version provides normalized TF.

Parameters:
docCount -
totalDocs -
totalWords -

applyTF_IDF

public void applyTF_IDF(java.util.Hashtable<java.lang.String,java.lang.Integer> docCount,
                        int totalDocs)
Applies TF.IDF to the sentence, this time with normalized TF.

Parameters:
docCount -
totalDocs -

containsKey

public boolean containsKey(java.lang.String key)
Checks to see if sentence contains a word.

Parameters:
key -
Returns:
true, if word is found, false otherwise.

getModified

public double getModified(java.lang.String key)
Gets the modified value for a key word.

Parameters:
key -
Returns:
double value

getOriginal

public double getOriginal(java.lang.String key)
Gets original value for a key word.

Parameters:
key -
Returns:
double value

keySet

public java.util.Set<java.lang.String> keySet()
Gets the key set as a set of strings.

Returns:
key set

getResourceType

public SentenceFactory.Resource getResourceType()
Gets the resource type.

Returns:
Resource enumeration value

similarityModified

public double similarityModified(Sentence target)
computes cosine similarity between two hashtables v1 and v2 weights of items are reweighted based on the normal distribution.

Parameters:
target -
Returns:
similarity score

similarityOriginal

public double similarityOriginal(Sentence target)
computes cosine similarity between two hashtables v1 and v2 weights of items are not re-weighted.

Parameters:
target -
Returns:
similarity score

getFeatureVector

public double[] getFeatureVector(Sentence sen)
Generates and prints out the features for the weka ML algorithms to train/test on.

Parameters:
sen -

printFeatureVector

public void printFeatureVector(Sentence sen,
                               java.lang.String value)

printFeatureVector

public void printFeatureVector(Sentence sen,
                               java.lang.String value,
                               java.lang.String[] keys,
                               java.io.BufferedWriter bw)
Generates and prints out the features for the weka ML algorithms to train/test on.

Parameters:
sen -
value -
keys -
bw -
Throws:
java.io.IOException

printHeader

public static void printHeader()
prints out header information for weka ML algorithms.


printHeader

public static void printHeader(int max,
                               java.io.BufferedWriter bw)
Prints out header information, this time printing to a buffered writer.

Parameters:
max -
bw -