Better Phonetic Matching Algorithm Than Soundex

Mar 6, 2002

Disgruntled with Soundex I went looking for a better phonetic matching algorithm.

Turns out there is a rather good one called Metaphone, which comes in two variants (Simple and Double)

I could find the source for this in C++, but I wanted to have it as a user function.

So here it is:

CREATE FUNCTION dbo.Metaphone(@str as varchar(70))
RETURNS varchar (25)
Metaphone Algorithm

Created by Lawrence Philips.
Metaphone presented in article in "Computer Language" December 1990 issue.
Translated into t-SQL by Keith Henry (

*********** BEGIN METAPHONE RULES ***********
Lawrence Philips' RULES follow:
The 16 consonant sounds:
|--- ZERO represents "th"
B X S K J T F H L M N P R 0 W Y
Drop vowels

Beginning of word: "ae-", "gn", "kn-", "pn-", "wr-" ----> drop first letter
Beginning of word: "x" ----> change to "s"
Beginning of word: "wh-" ----> change to "w"
Beginning of word: vowel ----> Keep it

B ----> B unless at the end of word after "m", as in "dumb", "McComb"

C ----> X (sh) if "-cia-" or "-ch-"
S if "-ci-", "-ce-", or "-cy-"
SILENT if "-sci-", "-sce-", or "-scy-"
K otherwise, including in "-sch-"

D ----> J if in "-dge-", "-dgy-", or "-dgi-"
T otherwise

F ----> F

G ----> SILENT if in "-gh-" and not at end or before a vowel
in "-gn" or "-gned"
in "-dge-" etc., as in above rule
J if before "i", or "e", or "y" if not double "gg"
K otherwise

H ----> SILENT if after vowel and no vowel follows
or after "-ch-", "-sh-", "-ph-", "-th-", "-gh-"
H otherwise

J ----> J

K ----> SILENT if after "c"
K otherwise

L ----> L

M ----> M

N ----> N

P ----> F if before "h"
P otherwise

Q ----> K

R ----> R

S ----> X (sh) if before "h" or in "-sio-" or "-sia-"
S otherwise

T ----> X (sh) if "-tia-" or "-tio-"
0 (th) if before "h"
silent if in "-tch-"
T otherwise

V ----> F

W ----> SILENT if not followed by a vowel
W if followed by a vowel

X ----> KS

Y ----> SILENT if not followed by a vowel
Y if followed by a vowel

Z ----> S

Declare@Result varchar(25),
@str2 char(2),
@str1 char(1),
@strp char(1),
@strLen tinyint,
@cnt tinyint

set @strLen = len(@str)

--Process beginning exceptions
set @str2 = left(@str,2)
if @str2 in ('ae', 'gn', 'kn', 'pn', 'wr')
set @str = right(@str , @strLen - 1)
set @strLen = @strLen - 1
if@str2 = 'wh'
set @str = 'w' + right(@str , @strLen - 2)
set @strLen = @strLen - 1
set @str1 = left(@str,1)
if @str1= 'x'
set @str = 's' + right(@str , @strLen - 1)
if @str1in ('a','e','i','o','u')
set @str = right(@str , @strLen - 1)
set @strLen = @strLen - 1

while @cnt <= @strLen
set @str1 = substring(@str,@cnt,1)
if @cnt <> 1
elseset@strp=' '

if @strp<> @str1
set @str2 = substring(@str,@cnt,2)

if @str1in('f','j','l','m','n','r')
set@Result=@Result + @str1

if @str1='q'set @Result=@Result + 'k'
if @str1='v'set @Result=@Result + 'f'
if @str1='x'set @Result=@Result + 'ks'
if @str1='z'set @Result=@Result + 's'

if @str1='b'
if @cnt = @strLen
if substring(@str,(@cnt - 1),1) <> 'm'
set@Result=@Result + 'b'
set@Result=@Result + 'b'

if @str1='c'

if @str2 = 'ch' or substring(@str,@cnt,3) = 'cia'
set@Result=@Result + 'x'
if @str2in('ci','ce','cy')and@strp<>'s'
set@Result=@Result + 's'
elseset@Result=@Result + 'k'

if @str1='d'
if substring(@str,@cnt,3) in ('dge','dgy','dgi')
set@Result=@Result + 'j'
elseset@Result=@Result + 't'

if @str1='g'
if substring(@str,(@cnt - 1),3) not in ('dge','dgy','dgi','dha','dhe','dhi','dho','dhu')
if @str2 in ('gi', 'ge','gy')
set@Result=@Result + 'j'
if(@str2<>'gn') or ((@str2<> 'gh') and ((@cnt + 1) <> @strLen))
set@Result=@Result + 'k'

if @str1='h'
if (@strp not in ('a','e','i','o','u')) and (@str2 not in ('ha','he','hi','ho','hu'))
if@strp not in ('c','s','p','t','g')
set@Result=@Result + 'h'

if @str1='k'
if @strp <> 'c'
set@Result=@Result + 'k'

if @str1='p'
if @str2 = 'ph'
set@Result=@Result + 'f'
set@Result=@Result + 'p'

if @str1='s'
if substring(@str,@cnt,3) in ('sia','sio') or @str2 = 'sh'
set@Result=@Result + 'x'
elseset@Result=@Result + 's'

if @str1='t'
if substring(@str,@cnt,3) in ('tia','tio')
set@Result=@Result + 'x'
set@Result=@Result + '0'
if substring(@str,@cnt,3) <> 'tch'
set@Result=@Result + 't'

if @str1='w'
if @str2 not in('wa','we','wi','wo','wu')
set@Result=@Result + 'w'

if @str1='y'
if @str2 not in('ya','ye','yi','yo','yu')
set@Result=@Result + 'y'
set @cnt=@cnt + 1
RETURN @Result

K e i t h H e n r y

Edited by - khenry on 03/06/2002 06:41:15

View 8 Replies


Phonetic Function

Jul 22, 2007

Hi All,

Is there any function which will return "MATCH PERCENTAGE"?

For Example, Function("United","Unitd") should return 99% or so based on how close the strings match.

I tried Difference(string1,string2) which returns only values between 0 and 4 (inclusive). I need to get a percentage.



View 2 Replies View Related

Getting Started With Phonetic Comparisons

May 29, 2006

I've been reading a bit about full-text searches, phonetic values and match-queries and just don't know where to begin.

What I'm eventually going to do, is make procedures for matching names, finding records that are close matches and presenting them in a subform below the actual member that you look up.

E.g. if an employee looks up Sergej, he or she will also see Sergey, Sergei etc. below the membersheet.

BOL isn't very practical in examples, and its about 7 years since I took my SQL-Server 7.0 MS courses, plus I've primarily worked as an administrator up until last fall, not a developer. So where to begin?

Thanks in advance,


View 3 Replies View Related


Apr 30, 2002

Can someone please explain how to do a query using the soundex function?
If I have the last name of Smith and I want to get similar matches
with soundex, what would be the syntax?

View 1 Replies View Related


Jul 1, 2004

Has anyone used SOUNDEX() practically?
Everytime i look through bol searching for a function i end up playing with soundex and diffference but have never been able (or wanted to take the time) to understand the values they generate..

just asking.

View 3 Replies View Related

Really Better Soundex?

Feb 6, 2008

Hi guys and gals (in case Tara reads this...)

We've installed this on our server:

(all the udf attempts were way too slow; this uda is nice and quick)

But we have not been able to reproduce the supposed improved functionality. No matter what misspelled name we send it, soundex and soundex_nara return the same amount of clients (in a db of over half a million). The return codes are sometimes different for the two functions but it doesnt ever result in an actual difference in query results. Has anyone actually experienced a difference in returned results?

Have a look at this:

View 1 Replies View Related


Oct 27, 2006

I have two tables in Access database. One table has 166,000 rows and another -75,000 rows.

Both tables consist of the ids, names and addresses.

I need to run a program, which was written in VBA, to find similar names and cities in these two tables, and then I need to populate the third table with id only from both tables. I forgot to mention I use Soundex function in Access. I try to run this program for 48 hours, and it still running. Do you think this is a good idea to use Access for this task ? Or could you please help me to find another solution?

Please let me know.


View 1 Replies View Related

Returning Matching/Non Matching Records

Feb 4, 2007

Hi All

I have a strange request that might not be possible based on the laws of relational databases but I thought I'd give it a try.

I have three tables which for simplicity I will call A, B and C. Table A contains my master records, Table B contains user details and the final table contains some extra data

In my initial search when joining A and B, I return 100 records. I then need to search in table C for these 100 records based on a criteria. the expected result should return all 100 rows for the ones that match and also the ones that do not match. The problem is that in Table C, not all the 100 IDs exist, so there will not be a corresponding record. Unfortunately, our users still want to see all 100 records in the output. Is this possible

As always any help or direction would be appreciated.

View 5 Replies View Related

Soundex For Dutch ?

Feb 15, 2005

Does the SOUNDEX function only apply to english (surnames) or can it be used for other languages e.g. Dutch.
Well, it can be used ofc. but does it produce meaningful output ?


View 3 Replies View Related

SOUNDEX Question

Mar 17, 2008

I there a minimum amount of characters I can send in. Seems like less than 3 characters does not work? Can I change that?

View 1 Replies View Related

Any Fuzzy Software ( Other Than SOUNDEX / NYSIIS )?

Jul 23, 2005

Hi AllWe are using soundex (and later tried Nysiis) for fuzzy name searchsoftware. But we faced a lot of problems the search accuracy was not verygood also we saw a lot of misses of relevant names.There are many problems other than precision and accuracy, with soundex andNYSIIS.e.g.Look for Smith and it will come-up with around 250 very popular last names.That dones not help much when a user is searching for "John Smith". Also, itdoes not return Creighton for Kryton as the search string.I googled a little and saw soming called NamiX. Without contacting thecompany ( ) directly, I wanted to get feedback from newsgroupsor people who are experts at this. Has anyone used this software? If so, Isit as good as they claim?Thanks a million in advance.Steve Creighton(please remove .antispam from email address) or post back your answers tothis group

View 1 Replies View Related

Soundex Support For Foreign Languages

May 11, 2006


We have an application we are developing that will run worldwide. When we are searching data for peoples names I would like to use Soundex but I am not sure it works effectivly for any language other than English. Does anyone know if it is effective in Dutch, German, French etc?

View 7 Replies View Related

Comparing Strings (Advanced Soundex)

May 11, 2006


I need to compare movie names from two systems. In both systems these names are entered manually by operators. I would like to compare them and give a rating on how close these names are equal.

Stripping special characters, and spaces is just not enough. It can happen that they key in sligthly different names. I've tried to use soundex but as we have over 15000 movie titles over the years i'm getting to many equal soundexes to use this as a comparison key.

Any ideas if there are techniques to do this ...

Kind Regards

View 1 Replies View Related

How To Use SOUNDEX In Full Text Search (SQL2000) ?

Feb 19, 2007

Hi thereI have this query in a stored procedureSELECT * FROM Product WHERE FREETEXT (*, @SearchString )When I run the query with the @SearchString = 'hoover' it return values but if I spelt it wrong (eg: hover) I have no value returnMy question is, How can I use the SOUNDEX in a full text searchRegards

View 7 Replies View Related

T-SQL (SS2K8) :: How To Make Soundex Functions Include The First Letter

Oct 8, 2015

I need DIFFERENCE('Kolton','Colton') to equal 4 rather than 3.

But SOUNDEX keeps the first letter of the word:

SOUNDEX('Kolton') = K435
SOUNDEX('Colton') = C435

View 6 Replies View Related


May 22, 2002

Does any have a algorithm that can divide A into B without using the divide
sign (/) or the multiplication sign ( * ).

View 1 Replies View Related

What Is The Best Algorithm To Use?

Nov 24, 2006

I am new to DM and I am not sure which algorithm would be best to use.

I am trying to build a custom comparitor application that companies can use to compare themselves against other companies based on certain pieces of information. I need to group a company with 11 other companies based on 6 attributes. I need the ability to apply weightings to each of the 6 attributes and have those taken into consideration when determining which 10 other companies each company is grouped with. Each group must contain 11 members, the company for the user logged in and 10 other companies that it will be compared against.

At first I thought that clustering would be a good fit for this but I can not see a way to mandate that each cluster contain exactly 11 members, I cannot see a way to weight the inputs, and I think each company can only be in one cluster at a time which do not meet my requirements.

Any help will be greatly appreciated!

View 3 Replies View Related


Jun 8, 2006

Well, i have read in claude seidman book about data mining that some algorithm inside in microsoft decision tree are CART, CHAID and C45 algorithm. could anyone explain to me about the tree algorithm and please explain to me how the tree algorithm used together in one case?

thank you so much

View 1 Replies View Related

Luhn Algorithm

Dec 11, 2006

Use this to check if Luhn has valid check digitCREATE FUNCTIONdbo.fnIsLuhnValid
@Luhn VARCHAR(8000)

IF @Luhn LIKE '%[^0-9]%'

@Multiplier TINYINT,
@Sum INT,

SELECT@Index = LEN(@Luhn),
@Multiplier = 1,
@Sum = 0

WHILE @Index >= 1
SELECT@Plus = @Multiplier * CAST(SUBSTRING(@Luhn, @Index, 1) AS TINYINT),
@Multiplier = 3 - @Multiplier,
@Sum = @Sum + @Plus / 10 + @Plus % 10,
@Index = @Index - 1

Peter Larsson
Helsingborg, Sweden

View 20 Replies View Related


Jul 23, 2005

Hello,Do you know if the algorithm for the BINARY_CHECKSUM function in documentedsomewhere?I would like to use it to avoid returning some string fields from theserver.By returning only the checksum I could lookup the string in a hashtable andI think this could make the code more efficient on slow connections.Thanks in advanced and kind regards,Orly Junior

View 3 Replies View Related

Algorithm Of The MAX Command In T-SQL

Dec 7, 2007

What kind of algorithm does the MAX command uses? I have a table that I need to get the last value of the Transaction ID and increment it by 1, so I can use it as the next TransID everytime I insert a new record into the table. I use the MAX command to obtain the last TransID in the table in this process. However, someone suggested that there is a problem with this, since if there are multiple users trying to insert a record into the same table, and processing is slow, they might essentially come up with the same next TransID. He came up with the idea of having a separate table that contains only the TransID and using this table to determine the next TransID. Will this really make a difference as far as processing speed is concerned or using a MAX command on the same table to come up with the next TransID enough? Do you have a better suggestion?


View 3 Replies View Related

Neural Net Algorithm

Sep 15, 2006


Would anyone be able to provide a reference paper on the neural net algorithm implemented in SQL Server 2005 to better understand how it works?

Thanxs for any info.

View 3 Replies View Related

Clustering Algorithm

Oct 29, 2007

Hi All!

I have few questions regarding Clustering algorithm.

If I process the clustering model with Ks (K is number of clusters) from 2 to n how to find a measure of variation and loss of information in each model (any kind of measure)? (Purpose would be decision which K to take.)

Which clustering method is better to use when segmenting data K-means or EM?

Thanks in advance!

View 4 Replies View Related

C# Algorithm/ Libraries

Jan 10, 2006


Does anyone know of or where I can find implementation of these C#  algorithm /class libraries:

a) RLS - Recursive Least Square algorithm?

b) MWAR - Multi-resolution Wavelet Auto-regresive algorithm?

c) AR - Autoregresive moving awerage algorithm?

d) EWMA - Exponentially Weighted Moving Average

The .NET framework System.Math class do not seem to have these libraries. 



View 2 Replies View Related

Which Algorithm Is Best For Perdiction

Jul 12, 2006


I want to predict which product can be sold together , Pl help me out which algorithm is best either association, cluster or decision and pl let me know how to use case table and nested table my table structure is


Rajesh Ladda

View 1 Replies View Related

Problem With AES_256 Algorithm

Feb 14, 2008

i am using sqlserver2005 as back end for my project.
actually we developing an stand alone web application for client, so we need to host this application in his server. he is not willing to install sql server 2005 edition in his sever so we r going by placing .mdf file in data directory of project.

but before i developed in server2005 i used aes_256 algorithm to encrypt n decrypt the pwd column by using symmetric is working fine.

but when i took the .mdf file of project n add into my project it is throwing error at creation of symmetric key that
"Either no algorithm has been specified or the bitlength and the algorithm specified for the key are not available in this installation of Windows."

please suggest me a solution

View 1 Replies View Related

Developing A New Plug-in Algorithm

Feb 7, 2008


i'm making my master thesis about a new plug-in algorithm, with the LVQ Algorithm.
I make the tutorial with the pair_wise_linear_regression algorithm and i have some doubts. i was searching for the code of the algorithm in the files of the tutorial and i didn't saw it. I have my new algorithm programmed in C++ ready to attach him, but i don't know where to put him, in which file i have to put him to start to define the COM interfaces? And in which file is the code of the pair_wise_linear_regression algorithm in the SRC paste of the tutorial?


View 3 Replies View Related

Algorithm : Data Mining

Feb 26, 2007

Hello friends,

Can u give some idea about the Algorithm in Data Mining for Clustering..

Please reply...


View 1 Replies View Related

Time Series Algorithm

Aug 17, 2006

I am trying to predict Revenue gererated by each Person.
My Input like this:

Month Person Revenue

20050101 Person1 $1000
20050101 Person1 $2000
20050201 Person1 $1000
20050101 Person2 $5000
20050201 Person2 $2000
20050201 Person2 $3000

Obviosly for Person1 and 200501 I expect to see on MS Time Series Viewer $3000, correct?
Instead I see REVENUE(actual) - 200501 VALUE =XXX,
Where XXX is absolutly different number.

Also there are negative numbers in forecast area which is not correct form business point
Person1 who is tough guy tryed to shoot me.
What I am doing wrong. Could you please give me an idea how to extract correct
historical and predict information?

Thnak you,

View 5 Replies View Related

Which Algorithm To Be Used With Symmetric Keys

Mar 28, 2006


I want to create a symmetric key that will be encrypted by certificate key. Can u guide me which algorithm is best out of the following:

DES, TRIPLE_DES, RC2, RC4, RC4_128, DESX, AES_128, AES_192, AES_256.

I tried using AES_128, AES_192, AES_256 but it says 'the algorithm specified for the key are not available in this installation of Windows.'

Pls tell me which else algorithm is best to use and pls specify why.



View 5 Replies View Related

Which Algorithm Is Better For Customer Retention

Jul 25, 2006


Pl any one tell me which algorithm is better for Customer retention Using SQL server 2005 analysis services

It will be great if some one can give the same with example of data model with key column , and rest

Thanks in Advance

Rajesh Ladda

View 3 Replies View Related

Questions About Microsoft_Linear_Regression Algorithm

Jul 2, 2007

Currently I want to run a vanilla multivariate regression and get some statistics back about the regression that is built. For instance, besides the coefficients, I also want the two-sided p-values on the coefficients and the R2 of the model.

I've tried playing with the Microsoft_Linear_Regression algorithm and have run into two issues. I'm doing all this programmatically using DMX queries rather than through the BI studio.

(a) I can never get the coefficients from the regression to match with results I would get from running R or Excel. The results are close but still significantly off. I suspect this is because the Linear Regression is just a subset of the Decision/Regression Trees functionality, in which case some kind of Bayesian prior is being incorporated here. Is that the issue? And if so, is there some way to turn off the Bayesian scoring and get a vanilla multivariate regression? I don't see anything in the inputs to the linear regression that would let me do this, and even running Microsoft_Decision_Trees with a few different settings, I can't get the output I'm looking for. If there's no way to turn off the Bayesian scoring, can someone explain to me what the prior being used here is and how Bayesian learning is being applied to the regression?

(b) Using the Generic Tree Viewer, I see that there are a few "statistics" values in the Node_Distribution, but I'm not sure what they're referring to. One of them looks like it might be the MSE. I could play with this some more to find out, but I'm hoping someone here can save me that work and tell me what these numbers are. Hopefully they will constitute enough information for me to rebuild the p-values and the R2.



View 3 Replies View Related

How To To Develope A New PlugIN Algorithm

Oct 18, 2006

I have a code for Nearest neighbour algorithm, I want to build a datamining algorithm using that code..

I have the following link that includes the source code for a sample plug-in algorithm written in C#.

(managed plug-in framework that's available for download here: )

But i am confused on where to insert my algorithm logic?

View 3 Replies View Related

Copyrights 2005-15, All rights reserved