Need Suggestions On Text File Parsing Into Database
Feb 28, 2007
I have a website, where people upload tab delimited text files of their product inventories, which the site parses and inserts into a database table. Here's the catch: Instead of insisting that each user use a standardized format, each user can upload the file in whatever column order they want, they just have to let the site know through a GUI which column is in which order. And, they may upload columns that if not mapped, will be ignored. Right now, I am doing all of this in code and it runs slow, I was thinking of offloading this to either a stored procedure, ssis, or bulk upload. But, with the varying format of the uploaded text file, I am not sure how I could do that. Any suggestions?
Thanks!
Hello all, I have a question regarding importing text file data into SQL Server. I'm hoping someone can point me in the right direction, as my searches haven't turned up anything specific enough. I'm trying to parse a large (24MB) text file. It's a fixed-width file, with multiple columns. I need to parse this file, check if a record already exists, and then import the data into the database. But I don't need to insert every column. There's only a few columns from the file I need to insert. This parsing also needs to occur at regular intervals (daily). I looked at BULK INSERT, but I can't find an example that uses only some of the columns. Every example uses all columns, and the file is delimited, not fixed-width. Is there anything within SQL Server that can accomplish this? I haven't turned up anything that will solve my problem. The only other solution I can think of is an application that parses the file for me and inserts the data into the database. But can I schedule that application to run every night at midnight (for example) through SQL Server? I'm not too familiar with SQL Server, so I appreciate any help offered. Thanks,Jay
Hi, Basically the above is a very common requirement, please comment on my solution which I've arrived at by searching through the web; -
In summary I have used 3 SSIS components these are "Flat File Source", "Derived Column" and "SQL Server Destination".
1) File Connections Manager Editor 1.1) Within File Connections Manager Editor; - Name the data type e.g. "INTERCHANGE_NET_APP_DATE_SRC" and assign a type to the data type e.g. string[DT_STR]
1.2) Click on the Preview button to ensure the expected text is assigned to the expected data type.
2.4) Select "database timestamp [DT_DBTIMESTAMP] " as Data Type.
2.5) Within the Mappings tab of the SQL Destination Editor have; - Input Column as INTERCHANGE_NET_APP_DATE and Destination Column as INTERCHANGE_NET_APP_DATE.
Please comment on the above, I will then pass on my suggestion to Microsoft.
I am trying to parse a text column using a cursor. Basically here is the statement I am trying to convert to the cursor: SELECT DATA_ROW, SUBSTRING(FAILURE_MESSAGE,35,5) AS INVALID_1 SUBSTRING(FAILURE_MESSAGE,70,5 AS INVALID_2 fROM TBL_ERRORS WHERE LEFT(FAILURE_MESSAGE,200) LIKE '%ORA%'
My table has 2 fields as 'count' and 'codes'. The 'codes' field has 'count' # of code values in each record. Size of each code is 4. For example, if my record is 2,'abcdefgh' then there are 2 codes and the values are 'abcd' and 'efgh'.
Currently I am using 'script component' to parse the field into multiple values. Since I have to read 1 million records and on an average, each record has 10 codes, it is taking hrs to load it.
Can it be done without 'script component' using some other transformations?
Hey Guys I knwo this may sound impossible but lets say I have a number of fields one of which is a Long blob or long text
is there a way to have MYSQL search the blobs for keywords and then to extract them to other fields? basically what I am asking is it possible to parse a long text blob for keywords and then grab data before or after those keywords?/
Currently have a single hard coded file path to the SSRS config file which parses the file and provides the reporting services web service url. My question is how would i run this same query against 100s of servers that may or may not share the same file path as the one hard coded ?
Is there a way to query the registry to find the location of the config file of any server ? which could be on D, E, F, H, etc.
I know I can string together the address followed by "reports" and named instance if needed, but some instances may not have used the default virtual directory name (Reports).
Am I going about this the hard way ? Is there a location where the web service url exists in a table ? I could not locate anything in the Reporting service database. Basically need to inventory all of my reporting services url's.
There is a column in a table that has values like '23 + 45 + 63', '2 + 54 - 22'... and so on. I want to get the result of this formula as float like 131, 34... and so on. Is that possible with a SELECT statement. I tried:
I'm trying to parse out a line of data that is separated by the text "atc1.", "atc2." etc.
For example,
[atc1.123/atc2.456/atc3.789/atc4.xyz/]
If I only want the data after atc2., then I could search the string for "atc2." and collect all the characters afterwards. But how can I make sure to trim off all the data after "atc3." to make sure I'm only collecting "456" from the example above?
Hello friends.... I am looking for 2 things(using c#.net or vb.net and sql svr 2000) 1.convert data from sql server 2000 database (say customers table from northwinds database) to a text file(separated by commas or just plain space) 2.Insert the data from text file back to database. Can someone pls give me the detailed code to achieve this....really need this on urgent basis.......Thank You.
I have a problem at the moment, where the client wants to be able to type in a custom algebraic formula with add/minus operators, and then to have this interpreted, so that the related datasets are then added and returned as a single dataset.
An example would be having a formula stored of [a] + [b] - [c]
and if I were to write the SQL to apply that formula, I might write something like (let's assume 1:1 relationships with the ID's)
select a.a + b.b - c.c as [result] from z inner join tblA a on z.id = a.id inner join tblB b on z.id = b.id inner join tblC c on z.id = c.id
The formula can change though, maybe things like:
[a] + [b] + [c] + [d] [a] + [b]
The developer before me wrote something SQL-based where they parsed the string and assigned each value of the formula as either positive or negative (e.g A is positive, B is positive, C is negative, now sum the datasets to get the result), and then created one large table of values then summed them. This does (kind of) work, I'm just contemplating potential alternatives, as it is quite a slow process, and feels like it is quite convoluted, when I get into the details. If I were to do something like this in SQL, I'd normally want each part of the expression to be a column, and then to just apply the operators, but because the formula can change, then the SQL would need to be somehow dynamic for this approach.
I need to parse an regularly outputted rtf file and was wondering if it is possible in SSIS. I am trying to use the flat file connection manager to do this.
Now, I can't treat tab stops in an rtf like tab stops in a csv, since when you treat an rtf as a text file, you see the format code of the rtf. If I open the rtf in a text editor, the entire file is one line, with lines breaking with:
par}
Columns are tab delimited in the rtf, and they look like this when you treat the rtf as a text file.
plain abfs16f4cf0cb1
(or something like that, the word "tab" is the important part.)
So I use the "plain ab" part to delimit in SSIS, since that is consistent (planning to parse out all the garbage later on). The problem is, sometimes lines don't have a "city" and "state", so it "tabs" right over to the next field. So like this (looking in MS Word):
Phone <tab> City <tab> State <tab> Date <tab> Other fields..... 847-111-2222 <tab> Omaha <tab> NB <tab> 9/14/2007 <tab> 222-222-3333 <tab> 9/14/2007 <tab> 555-121-1212 <tab> Houston <tab> TX <tab> 9/14/2007 <tab>
Now, if you treat an RTF as a text file, it has only one "plain abfs16f4cf0cb1" after the phone number, so even for the missing line there is only one tab, not 3. This is because in the beginning of the row tabs for each row are defined like this:
tql x90 ql x840 ql....etc...
with "tql" and "tx" tags basically saying where all the tab stops are for that row. So for the row above with missing info, it lists fewer tab stops. So the "date" (and associated garbage) ends up under "City" for this row. All of the "Houston" row's data starts appearing in the sql server output table's 2nd last field, as you might expect.
Any suggestions how to pull this in in SSIS during the transformation? I could deal with it after I pull it in, I still have all the data. I'm thinking the logic to do this could be complicated though. I take the data out of the last two fields of the missing row into some other table, use UPDATES to shift the values 2 fields to the right, and then figure out a way to take the data I just put in a temp table back in, but it all sounds a bit complicated.
Let me know if this makes sense--I've almost got it going, I just need to sort this last bit out.
i am trying to read a qfx file from quicken. it looks like xml, but its not, but i cannot figure out how to grab what ive got to parse the line. i put this into a derived column, but its not getting it
because inside the data, it lools like that's what brackets a tranasction; the data looks like this and varies by trntype, but the columns are tagged like so
I have a tab delimited file with 122 columns. Can any one let me know if there is a better way of parsing/extracting few columns (say about 15) from the file and loading it into a table using SSIS.
Summary A ABCD A Category MarketValue Margin A category1 1.0000000 1.000000 A category2 2.0000000 2.000000
H Totals Total Cash Net H 2.00000 200000 2000000
Another Summary B BCDE B Activity MarketValue Margin B activity1 3.00000 3.000000 B activity2 4.00000 4.000000
The items in blue are headers. I don't want to capture those. However, I want to capture all the data in black, and put it into 3 separate tables (or maybe the same table, under the appropriate column names)
This situation differs from anything I've done before in that you can't identify what row contains what data by what's in the row itself. That is, what's in the data rows is random and subject to change. So you can't search the row itself to determine which table it goes to.
However, if there's a way to capture all the rows after a certain header before the header changes again, that might work.
That is, get all rows between A Category MarketValue Margin and H Totals Total Cash Net and get all rows between H Totals Total Cash Net and Another Summary and get all rows after B Activity MarketValue Margin
The suggestion to do this is buried deep in one of my posts, however I still do not have a clear idea of how to do this.
I have a flat file which has several "bad rows" in it. Because file error redirection is buggy, I need a manual approach to get rid of these incomplete rows in my data file.
Phil, you suggested I read the file as one long string, then parse out the bad rows (using a script?).... however I have no idea as to how to actually do this.
I was wondering if it's possible to clarify the steps involved in doing this, or perhaps point me to an example I can look at, as I cannot seem to get around this problem on my own.
I know this has come up before and I have tried several of the solutions found within the forum but I just can't seem to import my file correctly and could use some input, please.
Sample file (less fields than actual file):
Name (str), Phone# (str), Description(str), Resolved(bool), Met(bool)
"Kay, Mary","123-4567","Used a "."not a"," in text", "1", "1"
The text is qualified with " and columns delimited with commas but the description field has embedded quotes and commas. Normally it works except if there embedded quotes and commas.
I have tried unqualified data and undouble, but that does not work either because of the embedded commas in quotes.
Do I need to do something before the data flow? Do I need to do custom code similar to undouble (I tried modifying undouble but using unqualified fields caused the source file to not like the data and go red)? Should the row be read as one field and parsed?
Hello Everyone, I would like to import a text file which contains one string (a large integer) per line not separated by commas or anything else except a carriage return. Does anyone know of an easy way to store this in a database file? I'm open to suggestions if there is more than one way to save this kind of information within a database. I have SQL server 2005 developer edition if that helps in any way. I'm also starting to learn about Linq so if there is some other way you would store this information for that purpose I would love to hear about that as well. C# code is preferable, but I can use the automatic translators if that's all you have. By the way, I'm a newbie to this subject (if you couldn't tell). Thanks in advance. Robert
Hi All, I want to load a text file into database without using Bulk Insert. I readed the text file and kept in a datatable. I need to insert this data into database. how can i bing data in datatable to dataset. how can i update changes in dataset to database. Please help me.... Thank you.
I have a CSV file with roughly 6 million rows. The file is unstructured; that is, some rows have 5 fields, others have 15, and there are as many 50 fields in one row.
I am using bulk insert to read the entire file into a table in database, with each row being a database record. With that, I have one column that contains a row of comma delimited fields. All fields are character string and I want to find a quick way of parsing each row and placing each comma-delimited value in a column. For example:
Column CSVString contains the a CSV row (I don't know how many filelds (no. of commas + 1) in the row, but if the row contains 10 fields, I need to populate columns C1-C10. If the row has 15 fields, I populate columns C1-C15.
How can I do this in a very efficient way? I tried CTE but performance was not very good.
Hi everyone I have a directory that contains a lot of text files that have data I need to draw from. I want to know if it is possible to write a program that will read all of the text files in the directory and pull out data and save it to a new textfile. For example: Each text file is formatted this wayColumn1, Column2, Column3"1","xxxx","yyyy""2", "xxxx", "yyyy""3", "XXXX", "yyyy" I want to put all lines that begin with 1 in one text file, all the lines that begin with two in another text file, and the same with all lines that begin with 3. my problem is I want to be able to point at the folder that contains those files and have it read every text file in the folder and perform the operation. If this is possible can someone point me in the right direction on how to get started.Thank you for any help!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Hello everyone!I'm having a problem with inserting the content of a text file into a Sql Server 2005 database.I'm reading the text file into a dataset, and works fine. What I can't do is what I suspect is the simple part: Insert all the data into a table that has exactly the same configuration that the file. I've never worked with dataset's before, and I can't seem to find the answer to this!This is what I have done so far: Dim i2 As Integer Dim j As Integer Dim File As String = Server.MapPath("..DocsFactsFORM_MAN_V3_1.txt") Dim TableName As String = "Facts" Dim delimiter As String = "9"
Dim result As DataSet = New DataSet() Dim s As StreamReader = New StreamReader(File) Dim columns As String() = s.ReadLine().Split(Chr(9)) result.Tables.Add(TableName) Dim strs1 As String() = columns For i2 = 0 To CInt(strs1.Length) - 1 Dim col As String = strs1(i2) Dim added As Boolean = False Dim [next] As String = "" Dim i As Integer = 0 While Not added Dim columnname As String = String.Concat(col, [next]) columnname = columnname.Replace(Chr(9), "") If Not result.Tables(TableName).Columns.Contains(columnname) Then result.Tables(TableName).Columns.Add(columnname) added = True Else i += 1 [next] = String.Concat("_", i.ToString()) End If End While Next i2 Dim strs2 As String() = s.ReadToEnd().Split(Chr(13) & Chr(10).ToString()) For j = 0 To CInt(strs2.Length) - 1 Dim items As String() = strs2(j).Split(Chr(9)) result.Tables(TableName).Rows.Add(items) Next j So now I have my dataset populated with all the information, but how can I insert it into the database?If anyone can help I would appreciate very, very much!Thank you Paula
I need to just back one table and not all the data in the table. Is there away to have SQL save the data return from a query to some text file that Ican then use to build the table in another table on another server?I am SQL server 7Thanks,S
I have been tasked with designing an automated process to restore production data to our testing environments on an as needed basis. The schedule would revolve around our software testing and deployment schedules. I'm looking for suggestions on best practices for this task in the form of advise / links to references / etc.. Instead of presenting all of my requirements here, I'll spare you that information :). Since part of it also needs to encompass data stored in Oracle (10g). I've done a several Google searches but would like to validate / invalidate my research against the advise of the experts here.
I'm trying to export data from one of the table in my SQL 7.0 database into text file. Can someone tell me how can i do this using SQL Query instead of using BCP (command line) ?? Thank you in advance.
you see these are the requirements.. 1. there is a text file named "Loan" 2. i must create a program using ASP.net wherein the text file can be imported in the system. 3. after importing, i must save the whole text file in a database.
my questions are:
1. what code can i do to open up a text file? 2. how can i save the text file into a database? 3. is there any way that i could import the file and view its content in my application? 4. after viewing, how can i save the text file as a text file in a database?