semDist
Class SemDist

java.lang.Object
  extended by semDist.SemDist
All Implemented Interfaces:
Distance

public class SemDist
extends java.lang.Object
implements Distance

SemDist : program that performs various experiments to calculate semantic distance. This program was used to obtain the results of Roget's Thesaurus and Semantic Similarity paper which can be found at: http://www.site.uottawa.ca/~mjarmasz/pubs/jarmasz_roget_sim.pdf

Version:
1.2 Nov 2008 Usage : java SemDist Format of input file: pairs of comma separated words and phrases, one pair per line. Extra information can be contained on the line as long as it is separated by a comma, for example: car,automobile,3.92
Author:
Mario Jarmasz & Alistair Kennedy

Constructor Summary
SemDist()
          Constructor for a new SemDist object.
SemDist(boolean morph)
          Constructor for a new SemDist object.
SemDist(java.lang.String year)
          Constructs a new SemDist object.
SemDist(java.lang.String year, boolean morph)
          Constructs a new SemDist object.
 
Method Summary
 java.lang.String[] getClosestPOS(java.lang.String word1, java.lang.String word2)
          Obtains the part of speech for the closest of these two words
 int getSimilarity(java.lang.String word1, java.lang.String word2)
          Obtains the maximum similarity between two strings, passed as parameters.
 int getSimilarity(java.lang.String word1, java.lang.String word2, java.lang.String pos)
          Obtains the maximum similarity between two strings, passed as parameters.
 int getSimilarity(java.lang.String word1, java.lang.String pos1, java.lang.String word2, java.lang.String pos2)
          Obtains the maximum similarity between two strings, passed as parameters.
static void main(java.lang.String[] args)
          The Main method runs the program.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SemDist

public SemDist(java.lang.String year)
Constructs a new SemDist object. Requires a year, either 1911 or 1987 to be passed to it.

Parameters:
year -

SemDist

public SemDist(java.lang.String year,
               boolean morph)
Constructs a new SemDist object. Requires a year, either 1911 or 1987 to be passed to it. Second parameter for morphology to be applied to searches.

Parameters:
year -
morph -

SemDist

public SemDist()
Constructor for a new SemDist object. By default uses 1911 Thesaurus.


SemDist

public SemDist(boolean morph)
Constructor for a new SemDist object. By default uses 1911 Thesaurus. Second parameter for morphology to be applied to searches.

Method Detail

main

public static void main(java.lang.String[] args)
The Main method runs the program.

Parameters:
args -

getSimilarity

public int getSimilarity(java.lang.String word1,
                         java.lang.String word2)
Obtains the maximum similarity between two strings, passed as parameters. All Parts of speech are considered. The returned value is an integer valued 0, 2, ..., 16, where 16 is the most similar.

Specified by:
getSimilarity in interface Distance
Parameters:
word1 -
word2 -
Returns:
semantic relatedness between the words

getClosestPOS

public java.lang.String[] getClosestPOS(java.lang.String word1,
                                        java.lang.String word2)
Obtains the part of speech for the closest of these two words

Parameters:
word1 -
word2 -
Returns:
string array with POS's

getSimilarity

public int getSimilarity(java.lang.String word1,
                         java.lang.String word2,
                         java.lang.String pos)
Obtains the maximum similarity between two strings, passed as parameters. Only words of a given part of speech are considered. The returned value is an integer valued 0, 2, ..., 16, where 16 is the most similar.

Specified by:
getSimilarity in interface Distance
Parameters:
word1 -
word2 -
pos -
Returns:
semantic relatedness between the words

getSimilarity

public int getSimilarity(java.lang.String word1,
                         java.lang.String pos1,
                         java.lang.String word2,
                         java.lang.String pos2)
Obtains the maximum similarity between two strings, passed as parameters. Each word has a specified part of speech, other POS's are not considered. The returned value is an integer valued 0, 2, ..., 16, where 16 is the most similar.

Specified by:
getSimilarity in interface Distance
Parameters:
word1 -
pos1 -
word2 -
pos2 -
Returns:
semantic relatedness between the words