Tracker
HOME    TRACKER    MS SQL Server
 SuperbHosting.net & Arvixe.com have generously sponsored dedicated servers and web hosting to ensure a reliable and scalable dedicated hosting solution for BigResource.com.

# Levenshtein Edit Distance Algorithm

## See here www.merriampark.com/ld.htm for information about the algorithm. This page has a link (http://www.merriampark.com/ldtsql.htm) to a T-SQL implementation by Joseph Gama: unfortunately, that function doesn't work. There is a debugged version in the also-referenced package of TSQL functions (http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=502&lngWId=5), but this still has the fundamental problem that it only works on pairs of strings up to 49 characters. CREATE FUNCTION edit_distance(@s1 nvarchar(3999), @s2 nvarchar(3999)) RETURNS int AS BEGIN DECLARE @s1_len int, @s2_len int, @i int, @j int, @s1_char nchar, @c int, @c_temp int, @cv0 varbinary(8000), @cv1 varbinary(8000) SELECT @s1_len = LEN(@s1), @s2_len = LEN(@s2), @cv1 = 0x0000, @j = 1, @i = 1, @c = 0 WHILE @j <= @s2_len SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1 WHILE @i <= @s1_len BEGIN SELECT @s1_char = SUBSTRING(@s1, @i, 1), @c = @i, @cv0 = CAST(@i AS binary(2)), @j = 1 WHILE @j <= @s2_len BEGIN SET @c = @c + 1 SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) + CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END IF @c > @c_temp SET @c = @c_temp SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1 IF @c > @c_temp SET @c = @c_temp SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1 END SELECT @cv1 = @cv0, @i = @i + 1 END RETURN @c END

Related Messages:
Edit Distance
Hi,
please, it is possible to know the edit distance used in the fuzzy lookup/grouping.
On this forum I read fuzzy lookup use 4-gram with fix size.
Does exist any document explaining how fuzzy lookup calculate the similarity? In other word, what kind of edit distance, algorithm is used by fuzzy lookup/grouping?
I hope I was enough clear with my poor english.
Thanks All

Sql 2005 Full-Text Based Web Searches With Levenstein Edit Distance And Double Metaphone Matches
Hi,

Is it possible to incorporate levenshtein edit distance and double-metaphone matching into Sql Server 2005's Full-Text Search queries using FREETEXTTABLE or CONTAINSTABLE?

I'm working on a web-site that allows user to search multiple tables, and i want the ability to pull back rows where the spelling is slightly out, or contains words that sound like those provided in the query, a bit like Google.

Also, I'd like to rank the results so that exact matches are returned at the top of the result-set, and nearest-match or phonetic matches lower down the list.

I've come across UDF's that provide these features but they don't appear to be compatible with the FTS engine. I've also heard of third-party alternatives to the FTS engine that integrate with Sql Server but I can't remember what they're called.

Any suggestions?

Ben

Nearest Distance
Hi
How do I get a nearest distance of a point? For example, I have two tables A and B and I want to find the nearest distance between the records of the two tables. In addition, one of the tables should also give me the distance. The data I have geo spatial data. Can this be done in SQL
Help will be appreciated

Clustering Distance
Hi !

My question may sound silly but I am new to data mining so I do not know much. Can I define a custom distance measure for a clustering algorithm ie. k-means ? I have created some clr utds and I want to try a clustering algorithm on them. Can I do this? How is the distance calculated ?

Sorry for my poor english.
Thank
ST.

Mirroring Over Distance????
Is there a recommended practice for mirroring in regards to distance?  Is it best practice to mirror with both nodes at the same physical location and use another method for failing over to a remote location or can one just put the other node in the mirror a few thousand miles away?  I'm suspecting not.

Distance Between Postal Codes
I'm looking to find out how I'd go about setting up a database where avisitor to my site could punch in their postal code, and find out how farthey are from another postal code. For example, AutoTrader has this featureI believe to tell you how far the vehicle is from you. Dating sites havethem so you can do proximity searches.Anyone have any ideas where I could start? I'm thinking the post office,but if anyone else has suggestions, I'm open to hear them.Thanks!

Help W/ Distance Calculation Query
I'm trying to run a dyncamic query that returns all records within a specific distance of a certain point. The longitude and latitude of each record is stored in the database. The query is constructed from two dynamic variables \$StartLatitude and \$StartLongitude with represent the starting point.

SELECT UserID, ACOS(SIN(\$StartLatitude * PI() / 180) * SIN(Latitude * PI() / 180) + COS(\$StartLatitude * PI() / 180) * COS(Latitude * PI() / 180) * COS((\$StartLongitude - Longitude) * PI() / 180)) * 180 / PI() * 60 * 1.1515 AS Distance
FROM HPN_Painters
HAVING (Distance <= 150)

It runs fine until I add the 'HAVING (Distance <= 150)' clause, in which I recieve the error: Invalid column name 'Distance' It seems that Distance cannot be referenced in the HAVING clause.

Cluster Euclidean Distance
I am new to data mining so please excuse my ignorance. Lets assume

- i have created a cluser model

- identified 3 clusters ( a, b, c)

- each record consists of 15 columns

- collecting new records( 15 variables) real time

what i would like to do is  plot these new records programmatically as i collect them realtime. I assume this new record will belong to one of these three clusters. I believe we can find the cluster this new record belongs to by ' SELECT Cluster()....' and distance from the center of the cluster by ClusterDistance(). To plot this on a 2-dimentional space i need (x, y).

ClusterDistance() could be Y but what will be X.

thanks.

Database Mirrors Over Distance
Various posts have noted that mirroring over distance is not advisable or that either async connections should be used.

Are there any limits/recommendations i.e. if two datacenters are a couple of files part with 10GBs fibre links and <50ms response times would this be acceptable for high-availability mirroring?

Distance Between Two Points Lat/long
I have a user defined function, I want to determine the distance between the 2 points. I have it working but i'm having a problem getting to print.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Code Snippetcreate function dbo.Distance( @lat1 float , @long1 float , @lat2 float , @long2 float)
returns float

as

begin

declare @Ans as float
declare @Miles as float

set @Ans = 0
set @Miles = 0

if @lat1 is null or @lat1 = 0 or @long1 is null or @long1 = 0 or @lat2 is
null or @lat2 = 0 or @long2 is null or @long2 = 0

begin

return ( @Miles )

end

set @Miles = 3959 * ATAN(SQRT(1 - SQUARE(@Ans)) / @Ans)

set @Miles = CEILING(@Miles)

return ( @Miles )

end

DECLARE @RC float
EXEC Distance '39.943762', '-78.122265', '32.334709', '-96.633546'
PRINT @RC /* in miles */

Great Circle Distance Calculation
Great Circle distance calculation
Is there any stored procedure or application that implements Great Circle distance calculation

Haversine SQL Trouble - Distance Between Zip Codes
I am trying to use the haversine function to find the distance betweentwo points on a sphere, specifically two zip codes in my database. I'mneither horribly familiar with SQL syntax nor math equations :), so Iwas hoping I could get some help. Below is what I'm using and it is,as best as I can figure, the correct formula. It is not however,giving me correct results. Some are close, others don't seem right atall. Any ideas?SET @lat1 = RADIANS(@lat1)SET @log1 = RADIANS(@log1)SET @lat2 = RADIANS(@lat2)SET @log2 = RADIANS(@log2)SET @Dlat = ABS(@lat2 - @lat1)SET @Dlog = ABS(@log2 - @log1)SET @R = 3956 /*Approximate radius of earth in miles*/SET @A = SQUARE(SIN(@Dlat/2)) + COS(@lat1) * COS(@lat2) *SQUARE(SIN(@Dlog/2))SET @C = 2 * ATN2(SQRT(@A), SQRT(1 - @A))/*SET @C = 2 * ASIN(min(SQRT(@A))) Alternative calculation*/SET @distance = @R * @Cthnx,cjrsumner

Great Circle Distance Calculation

Great Circle distance calculation
Is there any stored procedure or application that implements Great Circle distance calculation

Calculating Distance Based On Latitude And Longitude
I need to be able to take the latitude and logitude of two locations and compare then to determine the number of miles between each point.  It doesn't need to account for elevation, but assumes a flat plane with lat and long.

Does anyone have any algorithms in T-SQL to do this?

How To Control The Distance Between The Two Matrix? Or (matirx And Table )
hi  everyone:

the  report  show  two tables   two matrixs

how  can  i  control   the distance between them

I  want  to  set  the  same  distance between the table  and  matrix

or  (table and  table )

Any Distance Limited For Failover Clustering Solution?
Could I implement a failover cluster solution on the two DBs which are based in two different cities?
Possible?

Great Circle Distance Function - Haversine Formula
This function computes the great circle distance in Kilometers using the Haversine formula distance calculation.

If you want it in miles, change the average radius of Earth to miles in the function.

create function dbo.F_GREAT_CIRCLE_DISTANCE
(
@Latitude1 float,
@Longitude1 float,
@Latitude2 float,
@Longitude2 float
)
returns float
as
/*
fUNCTION: F_GREAT_CIRCLE_DISTANCE

Computes the Great Circle distance in kilometers
between two points on the Earth using the
Haversine formula distance calculation.

Input Parameters:
@Longitude1 - Longitude in degrees of point 1
@Latitude1 - Latitude in degrees of point 1
@Longitude2 - Longitude in degrees of point 2
@Latitude2 - Latitude in degrees of point 2

*/
begin

declare @lon1 float
declare @lon2 float
declare @lat1 float
declare @lat2 float

declare @a float
declare @distance float

-- Sets average radius of Earth in Kilometers

set @lon1 = radians( @Longitude1 )
set @lon2 = radians( @Longitude2 )
set @lat1 = radians( @Latitude1 )
set @lat2 = radians( @Latitude2 )

set @a = sqrt(square(sin((@lat2-@lat1)/2.0E)) +
(cos(@lat1) * cos(@lat2) * square(sin((@lon2-@lon1)/2.0E))) )

set @distance =
@radius * ( 2.0E *asin(case when 1.0E < @a then 1.0E else @a end ))

return @distance

end

Edit: corrected spelling

CODO ERGO SUM

Stored Procedure To Retrieve Zipcodes Within A Specified Zipcode And Distance
Hi All,
Does anyone have a Stored Procedure that works perfectly to retrieve all zipcodes within a specified zipcode and distance radius - a zipcode and radius is passed and the Store Procedure result shows all zipcodes that falls within that range.

Algorithm
Does any have a algorithm that can divide A into B without using the divide
sign (/) or the multiplication sign ( * ).

What Is The Best Algorithm To Use?
I am new to DM and I am not sure which algorithm would be best to use.

I am trying to build a custom comparitor application that companies can use to compare themselves against other companies based on certain pieces of information.  I need to group a company with 11 other companies based on 6 attributes.  I need the ability to apply weightings to each of the 6 attributes and have those taken into consideration when determining which 10 other companies each company is grouped with.  Each group must contain 11 members, the company for the user logged in and 10 other companies that it will be compared against.

At first I thought that clustering would be a good fit for this but I can not see a way to mandate that each cluster contain exactly 11 members, I cannot see a way to weight the inputs, and I think each company can only be in one cluster at a time which do not meet my requirements.

Any help will be greatly appreciated!

Algorithm
Well, i have read in claude seidman book about data mining that  some algorithm inside in microsoft decision tree are CART, CHAID and C45 algorithm. could anyone explain to me about the tree algorithm and please explain to me how the tree algorithm used together in one case?

thank you so much

BINARY_CHECKSUM Algorithm
Hello,Do you know if the algorithm for the BINARY_CHECKSUM function in documentedsomewhere?I would like to use it to avoid returning some string fields from theserver.By returning only the checksum I could lookup the string in a hashtable andI think this could make the code more efficient on slow connections.Thanks in advanced and kind regards,Orly Junior

Luhn Algorithm
Use this to check if Luhn has valid check digitCREATE FUNCTIONdbo.fnIsLuhnValid
(
@Luhn VARCHAR(8000)
)
RETURNS BIT
AS

BEGIN
IF @Luhn LIKE '%[^0-9]%'
RETURN 0

DECLARE@Index SMALLINT,
@Multiplier TINYINT,
@Sum INT,
@Plus TINYINT

SELECT@Index = LEN(@Luhn),
@Multiplier = 1,
@Sum = 0

WHILE @Index >= 1
SELECT@Plus = @Multiplier * CAST(SUBSTRING(@Luhn, @Index, 1) AS TINYINT),
@Multiplier = 3 - @Multiplier,
@Sum = @Sum + @Plus / 10 + @Plus % 10,
@Index = @Index - 1

RETURN CASE WHEN @Sum % 10 = 0 THEN 1 ELSE 0 END
END
Helsingborg, Sweden

C# Algorithm/ Libraries
Hi.

Does anyone know of or where I can find implementation of these C#  algorithm /class libraries:

a) RLS - Recursive Least Square algorithm?

b) MWAR - Multi-resolution Wavelet Auto-regresive algorithm?

c) AR - Autoregresive moving awerage algorithm?

d) EWMA - Exponentially Weighted Moving Average

The .NET framework System.Math class do not seem to have these libraries.

Regards

Shorin

Algorithm Of The MAX Command In T-SQL
What kind of algorithm does the MAX command uses?  I have a table that I need to get the last value of the Transaction ID and increment it by 1, so I can use it as the next TransID everytime I insert a new record into the table.  I use the MAX command to obtain the last TransID in the table in this process.  However, someone suggested that there is a problem with this, since if there are multiple users trying to insert a record into the same table, and processing is slow, they might essentially come up with the same next TransID.  He came up with the idea of having a separate table that contains only the TransID and using this table to determine the next TransID.  Will this really make a difference as far as processing speed is concerned or using a MAX command on the same table to come up with the next TransID enough?  Do you have a better suggestion?

Thanks

Clustering Algorithm

Hi All!

I have few questions regarding Clustering algorithm.

If I process the clustering model with Ks (K is number of clusters) from 2 to n how to find a measure of variation and loss of information in each model (any kind of measure)? (Purpose would be decision which K to take.)

Which clustering method is better to use when segmenting data K-means or EM?

Neural Net Algorithm
Hi,

Would anyone be able to provide a reference paper on the neural net algorithm implemented in SQL Server 2005 to better understand how it works?

Thanxs for any info.

Which Algorithm Is Best For Perdiction

Hi

I want to predict which product can be sold together , Pl help me out which algorithm is best either association, cluster or decision and pl let me know how to use case table and nested table my table structure is

Cust_ID
Age
Product
Location
Income

Thanks

Problem With AES_256 Algorithm

hi,
i am using sqlserver2005 as back end for my project.
actually we developing an stand alone web application for client, so we need to host this application in his server. he is not willing to install sql server 2005 edition in his sever so we r going by placing .mdf file in data directory of project.

but before i developed in server2005 i used aes_256 algorithm to encrypt n decrypt the pwd column by using symmetric keys.it is working fine.

but when i took the .mdf file of project n add into my project it is throwing error at creation of symmetric key that
"Either no algorithm has been specified or the bitlength and the algorithm specified for the key are not available in this installation of Windows."

Path Finding Algorithm In SQL
I am on this project that will search an optimal route for user from starting point to his/her destination on  a map in my SQL Server 2005. I hv create two versions to test out the performance of the path finding algorithm. I have a few classes, which are:

PriorityQueue class which is implemented as List() object and plus codes to sort them in order
PathNode class which are instances for the nodes of the search tree with information on heuristics value
DataSource class which stores data retrieved from the SQL Server 2005 into the RAM for faster execution of the path finding
PathFinding class which implements the path searching algorithm (based on A* algorithm), with PriorityQueue as the openlist, List() object as the closedlist, PathNode as the nodes in both  the list to store information and lastly retrieve data from DataSource object  that loads the whole table from SQL Server 2005In the first version, i simply use SELECT query to retrieve every correspondent nodes data from the SQL Server 2005 which makes the performance very low which i hv used SQL Server Profiler to check. Next, i use the current version to  load all the data into my RAM to increase the execution, which has successfulyl achieved <1sec as opppsed to the 1st version ~8secs.

Now, my problem is to port the algorithm part to my SQL Server 2005 as SQL CLR integration to achieved better results withour  the need to burden on client PC. My question is how am i going to do this? I tried before, and several erros like i need to serialize my current PathNode class and i did it. Do i need to make all class into UDT compatible? or??

Thank you very much.

Which Algorithm To Be Used With Symmetric Keys

Hi,

I want to create a symmetric key that will be encrypted by certificate key. Can u guide me which algorithm is best out of the following:

DES,       TRIPLE_DES,         RC2,       RC4,       RC4_128,          DESX,        AES_128,        AES_192,        AES_256.

I tried using  AES_128,        AES_192,        AES_256 but  it says 'the algorithm specified for the key are not available in this installation of Windows.'

Pls tell me which else algorithm is best to use and pls specify why.

Thanks

Gaurav

Developing A New Plug-in Algorithm
Hi,

i'm making my master thesis about a new plug-in algorithm, with the LVQ Algorithm.
I make the tutorial with the pair_wise_linear_regression algorithm and i have some doubts. i was searching for the code of the algorithm in the files of the tutorial and i didn't saw it. I have my new algorithm programmed in C++ ready to attach him, but i don't know where to put him, in which file i have to put him to start to define the COM interfaces? And in which file is the code of the pair_wise_linear_regression algorithm in the SRC paste of the tutorial?

Thanks

Derived Attributes -how Much Should I Help The Algorithm
Hi,
I am a novice data miner, working primarly in the BI field. I want to learn more about Data Mining so I am doing some experimenting.

I have a question regarding input attributes. I am particurlary wondering about the Neural Network algorithm, but also for Data Mining in general. What I am thinking about is if, and if so to what extend, I should create derived attributes for the algoritms. I´ll try to clarify with an example:

Lets say I am analysing sales performance for departments in a large company. Some of those departments has a high staff turnover, which might affect sales negatively (although I don't know that...). The high staff turnover could be detected, by the algorithm and humans, by looking at each sales, and which salesperson that handled it. If there are a lot more different salespersons in different departments by the same size and during the same time period, this is a sign of a high staff turnover.

Now is this info enough for the algorithm? Or should I add a column in the case dataset, where I discretesize the staff turnover as "High,Medium,Low"? Does this help the algorithm or can it affect the performance?

I hope you'll get the idea of my question, otherwise ask me!

Cheers,
AL

Process Association Algorithm Using ISS

Hi!

I need to deploy several Association algorithms, so I want to do it using ISS. Can anyone help me telling me which task should I have to use to do it?

Thanks!

Ezequiel

Association Rules Algorithm, Help?
I need to create a set of cases for a project that uses the Microsoft Association Rules algorithm to make recommendations for products to customers. My question is:  the set of scenarios must include all transactions of customers for training?. or is it sufficient some percentage of total transactions? If i do not use all transactions of customers, could be that the algorithm does not consider some products in their groups or rules and could not make recommendations about these?

thanx
Diego B.

Currently I want to run a vanilla multivariate regression and get some statistics back about the regression that is built.  For instance, besides the coefficients, I also want the two-sided p-values on the coefficients and the R2 of the model.

I've tried playing with the Microsoft_Linear_Regression algorithm and have run into two issues.  I'm doing all this programmatically using DMX queries rather than through the BI studio.

(a) I can never get the coefficients from the regression to match with results I would get from running R or Excel.  The results are close but still significantly off.  I suspect this is because the Linear Regression is just a subset of the Decision/Regression Trees functionality, in which case some kind of Bayesian prior is being incorporated here.  Is that the issue?  And if so, is there some way to turn off the Bayesian scoring and get a vanilla multivariate regression?  I don't see anything in the inputs to the linear regression that would let me do this, and even running Microsoft_Decision_Trees with a few different settings, I can't get the output I'm looking for.  If there's no way to turn off the Bayesian scoring, can someone explain to me what the prior being used here is and how Bayesian learning is being applied to the regression?

(b) Using the Generic Tree Viewer, I see that there are a few "statistics" values in the Node_Distribution, but I'm not sure what they're referring to.  One of them looks like it might be the MSE.  I could play with this some more to find out, but I'm hoping someone here can save me that work and tell me what these numbers are.  Hopefully they will constitute enough information for me to rebuild the p-values and the R2.

Thanks!

Wilfred

Help With Setting Algorithm Paramteres
I was walking through the Text Mining example - which at one step required me to set Algorithm Parameters - MAXIMUM_OUTPUT_ATTRIBUTES=0. When I tried that the project would not build giving an error -
Error (Data mining): The 'MAXIMUM_INPUT_ATTRIBUTES' data mining parameter is not valid for the 'XYZ' model.

I was getting the same error when I tried to set it for Microsoft_neural_netowrk - Hidden_Node_ratio. When I do a properties from "set Algorithm Properties" from Mining Model, I do not see these properties set as default.

I have installed SQLServer 2005 Standard Edition Microsoft SQL Server Management Studio 9.00.1399.00
Microsoft Analysis Services Client Tools 2005.090.1399.00

Any help would be much appreciated.

Thanks
Rajeev Gupta

Algorithm : Data Mining
Hello friends,

Can u give some idea about the Algorithm in Data Mining for Clustering..

Problem With Picking The Right Algorithm
Hi

I'm using SQL Server 2005. The problem I have is as follows. I have several production lines and as with everything parts in the line tend to break. I have data from all the breaks that occurred in the last 2 years. What I want to do is predict the next break and the production line it's going to happen on. I would also like to go to a future date and check what possible breaks might occur on that date. I've run quite a few models but none of them helps me with future events. I think I might be using the wrong algorithm or I€™m just not doing it right. If somebody can please suggest an algorithm and maybe help me with a web site that has a tutorial similar to my problem

Thanks
Elmo

Association Algorithm Itemsets
What is the algorithm that generates the itemsets in the Association model?  I'm looking to possibly use this part of the Association algorithm (i.e. the grouping into itemsets) in a separate plug-in algorithm.

Time Series Algorithm
Hi Jamie:

I am building data mining models to predict the amount of data storage in GB we will need in the future based on what we have used in the past. I have a table for each device with the amount of storage on that device for each day going back one year. I am using the Time Series algorithm to build these mining models. In many cases, where the storage size does not change abruptly, the model is able to predict several periods forward. However, when there are abrupt changes in storage size (due to factors such as truncating transaction logs on the database ), the mining model will not predict more than two periods. Is there something I can change in terms of the parameters the Time Series Algorithm uses so that it can predict farther forward in time or is this the wrong Algorithm to deal with data patterns that have a saw tooth pattern with a negative linear component.

Thanks,

How To To Develope A New PlugIN Algorithm
I have a code for Nearest neighbour algorithm, I want to build a datamining algorithm using that code..

I have the following link  that  includes the source code for a sample plug-in algorithm written in C#.

But i am confused on where to insert my algorithm logic?

Any Other Plugin Algorithm Developed??
hi,

as we know we get clustering algorithm with managed plugin algorithm API

does anyone have developed any other plugin algorithm as i want to check what are the things that needs to be modified. i am not data mining algorithm developer but i just want to check where we have to make changes. i  would  be better if i get source code for algorithm other than clustering

ANOTHER PLUGIN ALGORITHM REQURIED??

Time Series Algorithm
I am trying to predict Revenue gererated by each Person.
My Input like this:

Month         Person     Revenue

-----------------------------------------
20050101  Person1  \$1000
20050101  Person1  \$2000
20050201  Person1  \$1000
20050101  Person2  \$5000
20050201  Person2  \$2000
20050201  Person2  \$3000

Obviosly for Person1 and 200501 I expect to see on MS Time Series Viewer  \$3000, correct?
Instead I see REVENUE(actual) - 200501 VALUE =XXX,
Where XXX is absolutly different number.

Also there are negative numbers in forecast area which is not correct form business point
Person1 who is tough guy tryed to shoot me.
What I am doing wrong. Could you please  give me an idea how to extract correct
historical and predict information?

Thnak you,
Tim.

Which Algorithm Is Better For Customer Retention
Hi

Pl any one tell me which algorithm is better for Customer retention Using SQL server 2005 analysis services

It will be great if some one can give the same with example of data model with key column , and rest

CREATE INDEX ..using B-tree Algorithm
Is it possible to specify an algorithm like B-TREE in create index syntax of MSSQL Server.

In MySQL i have the option to specify this in create table syntax .

Its very urgent.

SQL Server 7.0 &#39;File Growth&#39; Algorithm
We are running SQL Server 7.0 SP2, and are experiencing the following out-of-
space error message:

"Could not allocate new page for database 'FooBar'.
There are no more pages available in filegroup SECONDARY.
or allowing file growth."

Needless to say, but the the database is set for 10% unlimited autogrowth and there
IS available space in the partition where the filegroup resides.

Any ideas as to why this is happening? What is SQL Server's algorithm for allocating
space when growing a database? Must it satisfy the request in one 'extent' and the
cause of our problem is that our disk is fragmented?

Bill Zimmer - zim@ibx.com

Better Phonetic Matching Algorithm Than Soundex
Disgruntled with Soundex I went looking for a better phonetic matching algorithm.

Turns out there is a rather good one called Metaphone, which comes in two variants (Simple and Double)

I could find the source for this in C++, but I wanted to have it as a user function.

So here it is:

CREATE FUNCTION dbo.Metaphone(@str as varchar(70))
RETURNS varchar (25)
/*
Metaphone Algorithm

Created by Lawrence Philips.
Metaphone presented in article in "Computer Language" December 1990 issue.
Translated into t-SQL by Keith Henry (keithh_AT_lbm-solutions.com)

*********** BEGIN METAPHONE RULES ***********
Lawrence Philips' RULES follow:
The 16 consonant sounds:
|--- ZERO represents "th"
|
B X S K J T F H L M N P R 0 W Y
Drop vowels

Exceptions:
Beginning of word: "ae-", "gn", "kn-", "pn-", "wr-" ----> drop first letter
Beginning of word: "x" ----> change to "s"
Beginning of word: "wh-" ----> change to "w"
Beginning of word: vowel ----> Keep it

Transformations:
B ----> B unless at the end of word after "m", as in "dumb", "McComb"

C ----> X (sh) if "-cia-" or "-ch-"
S if "-ci-", "-ce-", or "-cy-"
SILENT if "-sci-", "-sce-", or "-scy-"
K otherwise, including in "-sch-"

D ----> J if in "-dge-", "-dgy-", or "-dgi-"
T otherwise

F ----> F

G ----> SILENT if in "-gh-" and not at end or before a vowel
in "-gn" or "-gned"
in "-dge-" etc., as in above rule
J if before "i", or "e", or "y" if not double "gg"
K otherwise

H ----> SILENT if after vowel and no vowel follows
or after "-ch-", "-sh-", "-ph-", "-th-", "-gh-"
H otherwise

J ----> J

K ----> SILENT if after "c"
K otherwise

L ----> L

M ----> M

N ----> N

P ----> F if before "h"
P otherwise

Q ----> K

R ----> R

S ----> X (sh) if before "h" or in "-sio-" or "-sia-"
S otherwise

T ----> X (sh) if "-tia-" or "-tio-"
0 (th) if before "h"
silent if in "-tch-"
T otherwise

V ----> F

W ----> SILENT if not followed by a vowel
W if followed by a vowel

X ----> KS

Y ----> SILENT if not followed by a vowel
Y if followed by a vowel

Z ----> S
*/

AS
BEGIN
Declare@Result varchar(25),
@str3char(3),
@str2 char(2),
@str1 char(1),
@strp char(1),
@strLen tinyint,
@cnt tinyint

set @strLen = len(@str)
set@cnt=1
set@Result=''

--Process beginning exceptions
set @str2 = left(@str,2)
if @str2 in ('ae', 'gn', 'kn', 'pn', 'wr')
begin
set @str = right(@str , @strLen - 1)
set @strLen = @strLen - 1
end
if@str2 = 'wh'
begin
set @str = 'w' + right(@str , @strLen - 2)
set @strLen = @strLen - 1
end
set @str1 = left(@str,1)
if @str1= 'x'
begin
set @str = 's' + right(@str , @strLen - 1)
end
if @str1in ('a','e','i','o','u')
begin
set @str = right(@str , @strLen - 1)
set @strLen = @strLen - 1
set@Result=@str1
end

while @cnt <= @strLen
begin
set @str1 = substring(@str,@cnt,1)
if @cnt <> 1
set@strp=substring(@str,(@cnt-1),1)
elseset@strp=' '

if @strp<> @str1
begin
set @str2 = substring(@str,@cnt,2)

if @str1in('f','j','l','m','n','r')
set@Result=@Result + @str1

if @str1='q'set @Result=@Result + 'k'
if @str1='v'set @Result=@Result + 'f'
if @str1='x'set @Result=@Result + 'ks'
if @str1='z'set @Result=@Result + 's'

if @str1='b'
if @cnt = @strLen
if substring(@str,(@cnt - 1),1) <> 'm'
set@Result=@Result + 'b'
else
set@Result=@Result + 'b'

if @str1='c'

if @str2 = 'ch' or substring(@str,@cnt,3) = 'cia'
set@Result=@Result + 'x'
else
if @str2in('ci','ce','cy')and@strp<>'s'
set@Result=@Result + 's'
elseset@Result=@Result + 'k'

if @str1='d'
if substring(@str,@cnt,3) in ('dge','dgy','dgi')
set@Result=@Result + 'j'
elseset@Result=@Result + 't'

if @str1='g'
if substring(@str,(@cnt - 1),3) not in ('dge','dgy','dgi','dha','dhe','dhi','dho','dhu')
if @str2 in ('gi', 'ge','gy')
set@Result=@Result + 'j'
else
if(@str2<>'gn') or ((@str2<> 'gh') and ((@cnt + 1) <> @strLen))
set@Result=@Result + 'k'

if @str1='h'
if (@strp not in ('a','e','i','o','u')) and (@str2 not in ('ha','he','hi','ho','hu'))
if@strp not in ('c','s','p','t','g')
set@Result=@Result + 'h'

if @str1='k'
if @strp <> 'c'
set@Result=@Result + 'k'

if @str1='p'
if @str2 = 'ph'
set@Result=@Result + 'f'
else
set@Result=@Result + 'p'

if @str1='s'
if substring(@str,@cnt,3) in ('sia','sio') or @str2 = 'sh'
set@Result=@Result + 'x'
elseset@Result=@Result + 's'

if @str1='t'
if substring(@str,@cnt,3) in ('tia','tio')
set@Result=@Result + 'x'
else
if@str2='th'
set@Result=@Result + '0'
else
if substring(@str,@cnt,3) <> 'tch'
set@Result=@Result + 't'

if @str1='w'
if @str2 not in('wa','we','wi','wo','wu')
set@Result=@Result + 'w'

if @str1='y'
if @str2 not in('ya','ye','yi','yo','yu')
set@Result=@Result + 'y'
end
set @cnt=@cnt + 1
end
RETURN @Result
END

K e i t h H e n r y

Edited by - khenry on 03/06/2002 06:41:15

Dijkstra's Shortest Path Algorithm
Here it is, the long lasted algorithm I promised.., -- delete previous map
exec dbo.uspdijkstrainitializemap

-- create a new map

-- resolve route
exec dbo.uspdijkstraresolve 'a', 'i'This is the outputFromToCost
----------
ab 4
bc 6
cj18
jf26
fi37

Helsingborg, Sweden

Data Mining Plug In Algorithm
Hi !!

I read that it is possible to create a custom algorithm and use it as a plug in to sql server 2005. What programming language are available for this purpose ? C++ only ? Can I use .net ?

Thank you!