Friday, May 28, 2010

Thesaurus Class (VB.Net)

Introduction
In some cases two values with similar meaning should be matched. I have created a Thesaurus class based on Hashtable to realise this functionality.

Source Code(vb.net)
Download thesaurus classes here.

Thesaurus File
The thesaurus source file is text file that contains a group of synonyms in each line separated by the specified char (see example file for nicknames with > used as value separator here).

Using the Code
Here is an example in which I compare two forenames using the Nicknames thesaurus file:
Public Sub test_thesaurus() As Boolean
 Dim t As New Thesaurus("Nicknames","c:\nicknames.txt", ">"c)
 t.Load()
 Dim s1 As String = "Robert"
 Dim s2 As String = "Bob"
 Return  (t.GetKey(s2) = t.GetKey(s1))
End Sub

Similarity Distance
One thing that can be also useful is distance between synonyms. Now you can specify distance value enclosed in square brackets in the thesaurus file. This functionality can be extended for score based matching.