SQL Server 2008 :: Speed Up Text Search In Large Result Set?
Jul 14, 2015
I have a query below which filters detail field in the #TempLogins table. The details field is a text field which contains many types of text strings, some containing urls that have parts like "ResultID=5" which is what is contained in the ResultIDSearch and ResultSetIDSearch fields. The records with entries like "ResultID=5" are the ones I'm trying to filter for.
The problem I have is that the query takes way too long to run. The TempLogin table has around 200 K records and the TempSearch table has around 80 K records.
select * from #TempLogins a where exists
(select 1 from #TempSearch t1 where
a.detail like '%' + t1.ResultIDSearch + '%'
or
a.detail like '%' + t1.ResultSetIDSearch + '%')
I'm creating Indexed view by JOINING multiple tables and trying to create FULL TEXT search index. Unique column is generated by concatenating to ID columns from different table. I can successfully able to create unique index however when trying to create FULLTEXT INDEX getting below error.
"A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key."
The message clearly says the column should be single-column index, non-deterministic.
I am searching for the key word 'Platform Customer Support' using full text search. My code is as below
Set @KeywordSearch = 'Platform Customer Support'
Select AA, BB, CC, DD from SM9..TableName A Right Outer Join SM9_Experiment..TableName C On A.IncdTouchedGSF like '%' + C.SM9GroupName + '%' Where ( Contains(A.[Description], @KeyWordSearch) And A.OpenTime Between @StartDate and @EndDate And C.Classification = @GroupNameClassification )
The code is throwing:
Msg 7630, Level 15, State 3, Line 46 Syntax error near 'Customer' in the full-text search condition 'Platform Customer Support.
I have a scenario of where the standard Full-Text search identifies keywords but Semantic Search does not recognize them as keywords. I'm hoping to understand why Semantic Search might not recognize them. The context this is being used in medical terminology and the specific key words I noticed missing right off the bat were medications.
For instance, if I put the following string into a FT indexed table
'J9355 - Trastuzumab (Herceptin)' AND 'J9355 - Trastuzumab emtansine'
The Semantic Search recognized 'Herceptin' and 'Emtansine' but not 'Trastuzumab'
Nor in
'J8999 - Everolimus (Afinitor)'
It did not recognize 'Afinitor' as a keyword.
In all cases the Base of Full-Text did find those keywords and were identifiable using the dmvsys.dm_fts_index_keywords_by_document.It does show the index as having completed.
why certain words might not be picked up while others would be? Could it be a language/dictionary issue? I am using English and accent insensitive settings?
would you use sql server "full text search" feature as your site index? from some reason i can't make index server my site search catalog, and i wonder if the full text is the solution. i think that i wll have to you create new table called some thing like "site text" and i will need to write every text twice- one the the table (let's say "articles table") and one to the text. other wise- there is problems finding the right urlof the text, searching different tables with different columns name and so on... so i thought create site search table, with the columns: id, text, url and to write every thing to this table. but some how ot look the wrong way, that every forum post, every article, album picture or joke will insert twice to the sqr server... what do you think?
I have a pretty large DB and a fairly complex query. If I drop buffers and clear cache the query runs in 20 seconds returning 25K rows. Subsequent runs are 2 seconds. Is this the result of the results being cached, execution being cached, other? Are there good ways to close the gap between the initial and later runs? Does the cache stay present until the service restarts or does SQL recycle the memory and if so, based on what criteria?
I am asking this question on behalf of a friend. I have little knowledge of SQL 2005 but my friend is quite knowledgeable, although this is the first time he is dealing with large database for a client. So here's the story.
His client has a database containing 1.5 million books. Now he is setting up a website which will enable users to search books. Searching by ISBN is no problem as it only takes 1 seconds. The problem is, searching by Title takes more than 20seconds, which is unacceptable. My friend has only done smaller database and he just recently thought of implementing indexing and now looking for other ideas.
Each row contains book details such as Title, Author1, Author2, Author3, Publisher, Publication Date, ISBN, etc.
Can anyone who are more experienced in doing large database share with me some design ideas? His client is aiming for 8seconds or less.
Hello everyone ! I want to perform Full Text Search with SQL Server 2000. My documents (.doc, .xls, .txt, .pdf) are stored in a SQL Server field which is binary (the type of the column is image). I would like to know, how you can extract pieces of text from the documents. Example: I have a ASPX page with codebehind in C# making the search in a table in SQL server that is full text indexed. I make a search looking for the word "peace", than SQL server will take care about the search and return it to me the rows that match with that. But also I'd like to extract the 50 characters before and after where sql server found the word "peace" to show in the result page. Does anyone has any idea how to work around it ? Best regards. Yannick
Hi,I'm implementing mass information procedures that is stored in a SQL Database.What methods and actions i need to take in order for the process to be faster in a case like this (its more then 1000000 rows).I'm trying also to improve the memory usage since i use the DataTable in C# and I'm looking for a better way to process the retrieved data. is there a better class or method thats improving the speed and preventing any memory leaks ?please advise....Thanks for any help,Lior S ;)
so async cursor population is supposed to create the cursor and return the cursor id quickly, while the server works on async populating the results. For a keyset-driven cursor, SQL Server stores the key sets in tempdb, which it then uses to fetch data for cursor results. Anyway, this works fine for smaller tables, but I'm finding for large result sets, the async cursor population is very slow and indeed seems to approximate synchronous time. The wait stat I get while it is running (supposedly asynchronously) is TRANSACTION_MUTEX.
Example: --enable async cursor exec dbo.sp_configure 'cursor threshold', 0; reconfigure; declare @cursor int, @stmt nvarchar(max), @scrollopt int, @ccopt int, @rowcount int; --example of giant result set set @stmt = 'select * from sys.all_objects o1, sys.all_objects o1';
[code]...
Note that using the SQL "select * from sys.all_objects o1" is much faster than "select * from sys.all_objects o1, sys.all_objects o2". However, if cursor population is async, I'd expect the time to return a cursor id to be similar between the two.
* SQL Server 2008 R2 * Database was created from a third party product. The product writes to the 3 tables that I need to make changes to 24/7 and downtime is not an option. All changes must be done live. * Database overall size is ~200 GB * The 3 tables I must update make up ~190 GB of that space. * Tables have no primary key or ID columns. Therefore, the data is highly fragmented. * Of the ~190 GB of space allocated for the tables, there is roughly 70 GB of actual data. * Rows of the table are not guaranteed to be unique. In fact, on one of the tables, tests were ran with a small sample of data and duplicates were very much evident.
What I'm trying to accomplish here is to get an ID column added to the 3 tables and set that ID field as the primary key. Doing so will force the data to become much less fragmented than it is currently and with purging and new inserts, eventually fragmentation will be nearly non-existent.
Problem: Making table changes on tables this large while data is constantly being added poses many risks and can cause data loss. This was tried on a smaller table than these three and the entire table was lost in the process. Restore from backup was needed to get back to most recent log backup point.
Original Solution: My original plan was to create a backup of each table and run the script below to migrate the majority of the data temporarily into the new table. I could then update the original table (which now would contain much less data) and then migrate the data back.
Original Solution Problem: The problem with the solution above is that it calls the DELETE function on the original table using the values from the temporary table. When there are duplicate rows, which have not all been inserted into the backup table yet, they will all be removed from the original table because there is nothing unique to separate them out. In my testing, I had 10,000 rows in the original table and ended up with 9,959 rows in the backup table.
Question 1: Is my approach to making these table changes reasonable? Question 2a: If so, how can I make sure I don't lose data as part of this temporary migration of the data to my backup tables? Question 2b: If not, what would be a better approach that isn't going to cause disruption to the application that INSERTs data 24/7 and won't have any risk of data loss?
I have a large table containing about 800 million rows with an average row length of about 1K. The columns in the table are char columns. I need to move the contents of this table into a similar table where the target columns are varchar. The original table column definitions are compatible with the target table but the reverse is not necessarily true. For example, one column is being changed from int to bigint. The table is partitioned.
So, what is the fastest way to migrate the data. I was thinking to unload each partition into a flat file and load the target table running multiple load streams? Is this a good way?
I am developing a Recruitment Agency WebsiteI am using MS SQL Server in which I have a table called CV_Details which stores all the details of different Job Seekers. I have stored all the Word and PDF docs of the Job seekers in a 'mycatalog' using Index Server and stored the path in the CV_Details table along with rest of the cv details. Basically I want to download the cv's which i get after making a search in the doc's. I am able to get the filename, Path, Filesize by using Index Server Query.Is it anyway possible to extract all the CV Details data by matching the path.I want the cv details from the SQL Server of all the Related Paths at the same time not one at a time. Is it possible.I tried alot but i'm going nowhere. I'm giving my code below. Can anyone help me please.Thank you very much in Advance string strCatalog = "TestCatalog", strsearchstrings=""; string strQuery=""; strsearchstrings = txtsearch.Text; strQuery = "Select Filename,PATH from Scope() where FREETEXT('" +strsearchstrings+ "')"; // TextBox1.Text is the word that you type in the text box to query by using Indexing Service. string connstring = "Provider=MSIDXS.1;Integrated Security .='';Data Source="+strCatalog; System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(connstring); conn.Open(); System.Data.OleDb.OleDbCommand objcmd = new System.Data.OleDb.OleDbCommand(strQuery, conn); System.Data.OleDb.OleDbDataReader objRdr; objRdr = objcmd.ExecuteReader(); DataGrid1.DataSource=objRdr; DataGrid1.DataBind(); objRdr.Close(); conn.Close();
I have several data bases on a server (SQL Server 2000 only, no web server installed) and lately, as the company keeps gowing, my users complain saying the server gets slow, (this dbs are well designed and recieve optimizations and integrity checks, etc) because of this, Im thinking about getting a new server to repleace my old ProLiant ML 330 which was bought 4 years ago but Im concerned about what server arquitecture or characteristic can help me best to improve response performance, is it HD speed? Processor speed? or more Ram? I want to make a good decision, so I´d really appreciate your help...
I've got two databases on the same server and replicate some tables from one database to another.The replication is configured so not to drop the table if it exists, but to delete the data based on the filter if one exists. There are two tables on the subscriber that have some extra columns.
I get "field size too large" error when trying to replicate them. Is there a workaround without having to make the publisher and the subscriber tables identical by schema?
We have an existing BI/DW process that adds large chunks of data daily (~10M rows) to an existing table, as well as using Deletes to remove stale data. This scenario seems to beg for partitioning to support switching in/out data.
After lots of reading on this, I have figured out the mechanics of the switching, bit I still have some unknowns about the indexes needed to support this.
The table currently has several non-clustered indexes, including one on the partitioning column - let's call that column snapshotdate. Fortunately there are no FKs involved, and no constraints.
Most of the partitioning material I see focuses on creating a clustered PK to assist with switching. Not sure if this is actually necessary, but assume I create one using an Identity column (currently missing) plus snapshotdate.
For the other non-clustered, non-unique indexes, can I just add the snapshotdate to the end of the index? i.e. will that satisfy the switching requirement?
I have the default trace on a SQL Server 2008R2 instance enabled and found today that there is a gap of nearly 4 minutes in the trace during a time of the day when there most certainly is not going to be a 4 minute window of nothing.
What if anything could cause the default trace to have a gap like this? The SQL Server Instance (against my preferences) is hosted on VMware however it has its on HOST and so its resources are not being shared with any other server. The data & log files reside on different parts of the SANS. Our IT & Network admins are looking into the issue on their end but when I looked and found a near 4 minute gap in the default trace it hit me that this could be something above/outside of SQL Server.
I want to remove the nulls from the result set so the result is 1 line. Code and results are below:
SELECT CustomerNumber, (case when yearseq = 2012 then isnull(sum(mainPower),0)+isnull(sum(sidePower),0)+isnull(Sum(leftPower),0) +isnull(Sum(netappPower),0)+isnull(Sum(rightPower),0)+isnull(Sum(lowerPower),0) end) as '2012', (case when yearseq = 2013 then
[Code] ....
What can I do to my code to remove the Nulls to the entire result is just 1 line?
I have a well-structured but also very large binary data-set that is generated by a C++ application every five minutes. The data needs to be accessed by SQL applications. Since data is generated every five minutes, performance is key, both for write and read. The data set is about 500MB.If data is written to the file system, the write performance doesn't involve SQL server. For reading it, I have a CLR to read the portions of the data that I need based on offset and length. That works and is very fast. The problem is that data is stored in the file system, so it is not self-contained within the database.
A second option that I haven't explored yet, is to write the data into a table as VARBINARY(MAX). I would read the data using SUBSTRING with appropriate offset and length. Performance of SQL write/read of binary data of this size, and whether there is a third option I haven't thought off. I'm using SQL Server 2014.
Currently we has a database of size about 300G. Because our backup system failed some time past we were left with a transaction log file which grew to about 160G. However our backups are working again and everything is working fine. My understanding is that now the transaction log file is practically empty but the capacity remains at 160G.
When you delete records the deleted transactions are going to get logged to the transaction file. My understanding is when a backup is done these transactions get discarded out of the transaction file.
could I make use of this relatively large transaction file and start deleting transactions without out actually adding to the transaction file size.
The plan is to delete records from logging tables that are not referenced to by any other table without this increasing the transaction log file.For example over a period of a few weeks we can delete a chunk of records from a table. Then after it has completed a backup we can delete another chunk of records out of this table until we have got the table down to the records that we now need.Will this work?
I have some huge tables (think 200+GB for a single table) which are excellent candidates for sparse columns. The tables have many columns which are defined with decimal datatypes (13,2) with a large percentage of them (over 50% in most cases- some as much as 99%) being 0.00. Since this is very expensive in terms of storage my idea is to set all the 0.00 values equal to NULL then set them as sparse. Across 100 or so identical databases, I have 5 such tables, with 20-40 columns in each table.
1.) three steps for each column in each table in each db.
Step 1: update table to allow for nulls
Step 2: update tabe set column=null where column =0.00
Step 3 update table set sparse columns
2.)
Step 1: Create entirely new table with sparse column definitions
Step 2: copy entire table, transforming 0.00 to null for affected columns via SSIS
Step 3: drop original table, rename new table to original name
We have a database. It is enabled for mirroring. We need to delete the old records. That is around 500k records from a table. But it has foreign key relation. How to do in Production servers these kind of deletes?
Is there any improvments in SQL 2008 backup methods such as spliting backup files in manageable size(s), compress the backups and/or improving the speed of backup?
Though there are "commercial" tools available, it would be nicer if Microsoft SQL team can incorporate some core needed features in these areas for Small/Medium size businesses.
Interest rate has been stored in comments column along with other information ( e.g. mike's student loan is 5% and car payment is $ 150). I need to extract 5% using Contains .. Why Contains? because it's a 1.7 m rows dataset and searching for fours specific interest rate values (e.g. 3%, 9%, 12% and 15%)