strcmp2

This page (named as a successor to the C strcmp function) provides string comparison using various string comparison methods. The methods are implemented in a Perl script by my M.Sc. supervisor, Greg Kondrak.

String 1
String 2
Method IDENT: Simple identity; returns 1 if the strings are equal, 0 otherwise.
SIMARD: Length of the common prefix.
SOUNDEX: A venerable method of matching personal names.
DICE: Number of common bigrams.
TRI: Number of common trigrams.
XDICE: See (Brew and McKelvie, 1996).
XXDICE: Ibid.
LCSR: Longest common subsequence ratio.
EDIT: Levenshtein (edit) distance.
BI-SIM: See (Kondrak, 2005).
TRI-SIM: Ibid.
BI-DIST: Ibid.
TRI-DIST: Ibid.
Options Normalize by length

Using the script

The script is available here if you would like to use it yourself. Instructions for use are in the comments near the top of the eval.pl file.

If you use it in academic work, please cite Greg's below paper (Kondrak, 2005), along with the paper for the particular method you use as required.

References