ca.site.elkb
Class Index

java.lang.Object
  extended by ca.site.elkb.Index
All Implemented Interfaces:
java.io.Serializable

public class Index
extends java.lang.Object
implements java.io.Serializable

Represents the computer index of the words and phrases of Roget's Thesaurus. According to Kirkpatrick (1998) "The index consists of a list of items, each of which is followed by one or more references to the text. These references consist of a Head number, a keyword in italics, and a part of speech label (n. for nouns, adj. for adjectives, vb. for verbs, adv. for adverbs, and int. for interjections). The keyword is given to identify the paragraph which contains the word you have looked up; it also gives and indication of the ideas contained in that paragraph, so it can be used as a clue where a word has several meanings and therefire several references." An example of an Index Entry is:

In this example stork is an Index Item and obstetrics 167 n. is a Reference. This Index object consists of a hashtable of Index Entries, hashed on the String value of the Index Item. For every key (Index Item) the value is a list of Reference objects. The hashtable is implemented using a HashMap.

Version:
1.4 2013
Author:
Mario Jarmasz and Alistsair Kennedy
See Also:
Serialized Form

Constructor Summary
Index()
          Default constructor.
Index(java.lang.String filename)
          Constructor that builds the Index object using the information contained in a file.
Index(java.lang.String fileName, int size, boolean breakPhrases)
          Constructor that builds the Index object using the information contained in a file and sets the initial size of the index hashtable.
 
Method Summary
 void addEntry(java.lang.String item, java.lang.String refs)
          addEntry Associates an index entry with its references The references are stored as a long string of pointers separated by colons, ex: 1234:7632:8732: If a phrase contains exactly two words, then it both of the words of the phrase are indexed ex: running track, running, track
 java.lang.String addReference(java.lang.String strPtr, java.lang.String sRef)
          addReference
 java.lang.String addReference(java.lang.String strPtr, java.lang.String sRef, java.lang.String sgNum, java.lang.String wordNum)
          Adds a reference.
 boolean containsEntry(java.lang.String key)
          Returns true if the specified entry is contained in this index.
 java.lang.String convertToPOS(int number)
          This converts the number, passed in string format to the correct POS
 java.lang.String convertToPOS(java.lang.String number)
          This converts the number, passed in string format to the correct POS
 int convertToPOSNumber(java.lang.String pos)
          Returns the number corresponding to a given pos, or -1 if an incorrect posis passed as an argument.
 java.util.TreeSet<java.lang.String> getEntry(java.lang.String key)
          Returns references to a word or phrase, but by default applies morphology rules.
 java.util.TreeSet<java.lang.String> getEntry(java.lang.String key, boolean morphology)
          Returns all references for a given word or phrase in the index.
 java.util.ArrayList<java.lang.String> getEntryList(java.lang.String key)
          Returns the list of references for a given word or phrase in the index.
 java.util.ArrayList<java.lang.String> getEntryList(java.lang.String key, int itemNo)
          Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference.
 java.util.ArrayList<int[]> getEntryListNumerical(java.lang.String key)
          Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference.
 java.util.ArrayList<int[]> getEntryListNumerical(java.lang.String key, boolean morphology)
          Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference.
 java.util.ArrayList<int[]> getEntryListNumerical(java.lang.String key, boolean morphology, java.lang.String pos)
          Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference.
 java.util.TreeSet<java.lang.String> getHeadNumbers(java.lang.String key)
          Returns a set of head numbers in which a word or phrase can be found.
 int getItemCount()
          Returns the number of entries in this index.
 int getItemsMapSize()
          Returns the number of items contained in the hash map of this index.
 int getRefCount()
          Returns the number of references in this index.
 java.util.ArrayList<Reference> getRefObjList(java.lang.String key)
          Returns an array of Reference objects.
 java.lang.String getRefPOS(java.lang.String key)
          Returns a string containing the part-of-speech of the references for a given index entry.
 java.lang.String getStrRef(java.lang.String strIndex)
          Returns a reference in String format as printed in Roget's Thesaurus.
 java.util.ArrayList<java.lang.String> getStrRefList(java.lang.String key)
          Returns a list of references in string format instead of pointers.
 int[] getStrRefNumerical(java.lang.String strIndex)
          Returns a reference in new String format as printed in Roget's Thesaurus.
 int getUniqRefCount()
          Returns the number of unique references in this index.
 void printEntry(java.lang.String key)
          Prints the index entry along with its references to the standard output.
 void printEntry(java.lang.String key, int itemNo)
          Prints the index entry along with its numbered references to the standard output.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Index

public Index()
Default constructor.


Index

public Index(java.lang.String filename)
Constructor that builds the Index object using the information contained in a file. The default file for the ELKB is elkbIndex.dat contained in the $HOME/roget_elkb directory.

Parameters:
filename -

Index

public Index(java.lang.String fileName,
             int size,
             boolean breakPhrases)
Constructor that builds the Index object using the information contained in a file and sets the initial size of the index hashtable. The default file for the ELKB is elkbIndex.dat contained in the $HOME/roget_elkb directory.

Parameters:
fileName -
size -
Method Detail

convertToPOS

public java.lang.String convertToPOS(java.lang.String number)
This converts the number, passed in string format to the correct POS

Parameters:
number -
Returns:
POS

convertToPOS

public java.lang.String convertToPOS(int number)
This converts the number, passed in string format to the correct POS

Parameters:
number -
Returns:
POS

convertToPOSNumber

public int convertToPOSNumber(java.lang.String pos)
Returns the number corresponding to a given pos, or -1 if an incorrect posis passed as an argument.

Parameters:
pos -
Returns:
POS as integer

addEntry

public void addEntry(java.lang.String item,
                     java.lang.String refs)
addEntry Associates an index entry with its references The references are stored as a long string of pointers separated by colons, ex: 1234:7632:8732: If a phrase contains exactly two words, then it both of the words of the phrase are indexed ex: running track, running, track

Parameters:
item -
refs -

getItemCount

public int getItemCount()
Returns the number of entries in this index.

Returns:
number of items

getRefCount

public int getRefCount()
Returns the number of references in this index.

Returns:
number of references

getUniqRefCount

public int getUniqRefCount()
Returns the number of unique references in this index.

Returns:
size of reference list

getItemsMapSize

public int getItemsMapSize()
Returns the number of items contained in the hash map of this index.

Returns:
size of item map

containsEntry

public boolean containsEntry(java.lang.String key)
Returns true if the specified entry is contained in this index.

Parameters:
key -
Returns:
true if entry found, false otherwise

printEntry

public void printEntry(java.lang.String key)
Prints the index entry along with its references to the standard output.

Parameters:
key -

printEntry

public void printEntry(java.lang.String key,
                       int itemNo)
Prints the index entry along with its numbered references to the standard output. The number of the first reference must be specified. The number is printed in front of each reference.

Parameters:
key -
itemNo -

getEntryList

public java.util.ArrayList<java.lang.String> getEntryList(java.lang.String key)
Returns the list of references for a given word or phrase in the index.

Parameters:
key -
Returns:
Arraylist of entries

getEntryList

public java.util.ArrayList<java.lang.String> getEntryList(java.lang.String key,
                                                          int itemNo)
Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference. References are in the form of a paragraph and head number as identifiers.

Parameters:
key -
itemNo -
Returns:
ArrayList of entries

getEntryListNumerical

public java.util.ArrayList<int[]> getEntryListNumerical(java.lang.String key)
Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference. References are in the form of an array of integers.

Parameters:
key -
Returns:
ArrayList of entries in numerical form

getEntryListNumerical

public java.util.ArrayList<int[]> getEntryListNumerical(java.lang.String key,
                                                        boolean morphology)
Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference. References are in the form of an array of integers. The boolean morphology is true if morphological variations on the word should be searched for, and false otherwise.

Parameters:
key -
Returns:
ArrayList of entries in numerical form

getEntryListNumerical

public java.util.ArrayList<int[]> getEntryListNumerical(java.lang.String key,
                                                        boolean morphology,
                                                        java.lang.String pos)
Returns the list of references for a given word or phrase in the index preceded by a number to identify the reference. References are in the form of an array of integers. The boolean morphology is true if morphological variations on the word should be searched for, and false otherwise. Only returns results of the given POS

Parameters:
key -
POS -
Returns:
ArrayList of entries in numerical form

addReference

public java.lang.String addReference(java.lang.String strPtr,
                                     java.lang.String sRef)
addReference

Parameters:
strPtr -
sRef -
Returns:
string of indexes

getEntry

public java.util.TreeSet<java.lang.String> getEntry(java.lang.String key)
Returns references to a word or phrase, but by default applies morphology rules.

Parameters:
key -
Returns:
TreeSet of related words

getEntry

public java.util.TreeSet<java.lang.String> getEntry(java.lang.String key,
                                                    boolean morphology)
Returns all references for a given word or phrase in the index. This is where the American to British spelling changes should be done, as well as the other tricks to access phrases. There are a few things to note: 1. Multiple spellings have been included in Roget's, for example tire and tyre. The meanings can be different for each spelling... 2. Often the space between phrases has been removed How come this method does not return null??? Returns all of the cross references for a given word or phrase If the second argument is true then this method should return the first of the following words that is found: + the supplied word + the biritsh spelling of the word + the base form of the word (Morphy) A future version should return all entries found in the index.

Parameters:
key -
morphology -
Returns:
TreeSet of entries

getRefPOS

public java.lang.String getRefPOS(java.lang.String key)
Returns a string containing the part-of-speech of the references for a given index entry. For example, getRefPOS("respect") will return "N.VB.ADV."

Parameters:
key -
Returns:
reference POS

getStrRefList

public java.util.ArrayList<java.lang.String> getStrRefList(java.lang.String key)
Returns a list of references in string format instead of pointers. For example box 194 N. instead of 778

Parameters:
key -
Returns:
List of references

getStrRef

public java.lang.String getStrRef(java.lang.String strIndex)
Returns a reference in String format as printed in Roget's Thesaurus. For example: way 624 N.

Parameters:
strIndex -
Returns:
reference

getStrRefNumerical

public int[] getStrRefNumerical(java.lang.String strIndex)
Returns a reference in new String format as printed in Roget's Thesaurus. For example: 1 1 2 3 123 3 2 1 5.

Parameters:
strIndex -
Returns:
reference in an integer array

getRefObjList

public java.util.ArrayList<Reference> getRefObjList(java.lang.String key)
Returns an array of Reference objects.

Parameters:
key -
Returns:
ArrayList of references

getHeadNumbers

public java.util.TreeSet<java.lang.String> getHeadNumbers(java.lang.String key)
Returns a set of head numbers in which a word or phrase can be found. Heads are stored as Strings.

Parameters:
key -
Returns:
TreeSet of head numbers

addReference

public java.lang.String addReference(java.lang.String strPtr,
                                     java.lang.String sRef,
                                     java.lang.String sgNum,
                                     java.lang.String wordNum)
Adds a reference.

Parameters:
strPtr -
sRef -
sgNum -
wordNum -
Returns:
string of indexes