I have a CSV file with roughly 6 million rows. The file is unstructured; that is, some rows have 5 fields, others have 15, and there are as many 50 fields in one row.
I am using bulk insert to read the entire file into a table in database, with each row being a database record. With that, I have one column that contains a row of comma delimited fields. All fields are character string and I want to find a quick way of parsing each row and placing each comma-delimited value in a column. For example:
Column CSVString contains the a CSV row (I don't know how many filelds (no. of commas + 1) in the row, but if the row contains 10 fields, I need to populate columns C1-C10. If the row has 15 fields, I populate columns C1-C15.
How can I do this in a very efficient way? I tried CTE but performance was not very good.
I want to take this XML and put it into a table with CustomerId and MatchingSetId. With this SQL, each MatchingSetId gets assigned to each CustomerId instead of retaining the relationships in the XML.
Select... ,DISCHARGEHOUR.value('(./Discharge_x0020_Time/time/Hour)[1]', 'varchar(10)') AS [hour] ,DISCHARGEMINUTES.value('(./Discharge_x0020_Time/time/Hour:minute)[1]', 'varchar(10)') AS [Minutes] ,DISCHARGEAMPM.value('(./Discharge_x0020_Time/time/Hour/minute/AM_x002F_PM)[1]', 'varchar(10)') AS [ampm]
But minutes AND AMPM come up as NULL I assume I am setting up something wrong with the level on minutes AND AMPM. Also, can I disregard the ":" in the minutes.
I have a problem at the moment, where the client wants to be able to type in a custom algebraic formula with add/minus operators, and then to have this interpreted, so that the related datasets are then added and returned as a single dataset.
An example would be having a formula stored of [a] + [b] - [c]
and if I were to write the SQL to apply that formula, I might write something like (let's assume 1:1 relationships with the ID's)
select a.a + b.b - c.c as [result] from z inner join tblA a on z.id = a.id inner join tblB b on z.id = b.id inner join tblC c on z.id = c.id
The formula can change though, maybe things like:
[a] + [b] + [c] + [d] [a] + [b]
The developer before me wrote something SQL-based where they parsed the string and assigned each value of the formula as either positive or negative (e.g A is positive, B is positive, C is negative, now sum the datasets to get the result), and then created one large table of values then summed them. This does (kind of) work, I'm just contemplating potential alternatives, as it is quite a slow process, and feels like it is quite convoluted, when I get into the details. If I were to do something like this in SQL, I'd normally want each part of the expression to be a column, and then to just apply the operators, but because the formula can change, then the SQL would need to be somehow dynamic for this approach.
Currently have a single hard coded file path to the SSRS config file which parses the file and provides the reporting services web service url. My question is how would i run this same query against 100s of servers that may or may not share the same file path as the one hard coded ?
Is there a way to query the registry to find the location of the config file of any server ? which could be on D, E, F, H, etc.
I know I can string together the address followed by "reports" and named instance if needed, but some instances may not have used the default virtual directory name (Reports).
Am I going about this the hard way ? Is there a location where the web service url exists in a table ? I could not locate anything in the Reporting service database. Basically need to inventory all of my reporting services url's.
Historically I've always written a VB script to copy a file from a sharepoint library. I don't like this method because I have to input a username & password in the script and maintain a config file.
Yesterday I was playing around with using a file system task. The sharepoint file has a UNC path so why not? I created a simple test package with a single file system task that copies the sharepoint file (addressed via UNC) to another network location. Package runs fine locally.
I try running on our utility server but am getting a "The file name [SHAREPOINT UNC PATH] specified in the connection was not valid" error. Package is running with a proxy on the server and the proxy account has the same permissions to the sharepoint site (so far as I can tell) as me.
one of my database data file is 100 GB and the log file is 500 GB.DB is in full recovery model and the transaction logs happen once in 6 hours.Even then, the Database log file isn't reducing in size.
I need to parse an regularly outputted rtf file and was wondering if it is possible in SSIS. I am trying to use the flat file connection manager to do this.
Now, I can't treat tab stops in an rtf like tab stops in a csv, since when you treat an rtf as a text file, you see the format code of the rtf. If I open the rtf in a text editor, the entire file is one line, with lines breaking with:
par}
Columns are tab delimited in the rtf, and they look like this when you treat the rtf as a text file.
plain abfs16f4cf0cb1
(or something like that, the word "tab" is the important part.)
So I use the "plain ab" part to delimit in SSIS, since that is consistent (planning to parse out all the garbage later on). The problem is, sometimes lines don't have a "city" and "state", so it "tabs" right over to the next field. So like this (looking in MS Word):
Phone <tab> City <tab> State <tab> Date <tab> Other fields..... 847-111-2222 <tab> Omaha <tab> NB <tab> 9/14/2007 <tab> 222-222-3333 <tab> 9/14/2007 <tab> 555-121-1212 <tab> Houston <tab> TX <tab> 9/14/2007 <tab>
Now, if you treat an RTF as a text file, it has only one "plain abfs16f4cf0cb1" after the phone number, so even for the missing line there is only one tab, not 3. This is because in the beginning of the row tabs for each row are defined like this:
tql x90 ql x840 ql....etc...
with "tql" and "tx" tags basically saying where all the tab stops are for that row. So for the row above with missing info, it lists fewer tab stops. So the "date" (and associated garbage) ends up under "City" for this row. All of the "Houston" row's data starts appearing in the sql server output table's 2nd last field, as you might expect.
Any suggestions how to pull this in in SSIS during the transformation? I could deal with it after I pull it in, I still have all the data. I'm thinking the logic to do this could be complicated though. I take the data out of the last two fields of the missing row into some other table, use UPDATES to shift the values 2 fields to the right, and then figure out a way to take the data I just put in a temp table back in, but it all sounds a bit complicated.
Let me know if this makes sense--I've almost got it going, I just need to sort this last bit out.
i am trying to read a qfx file from quicken. it looks like xml, but its not, but i cannot figure out how to grab what ive got to parse the line. i put this into a derived column, but its not getting it
because inside the data, it lools like that's what brackets a tranasction; the data looks like this and varies by trntype, but the columns are tagged like so
I have a tab delimited file with 122 columns. Can any one let me know if there is a better way of parsing/extracting few columns (say about 15) from the file and loading it into a table using SSIS.
I have an unstructured SQL Function which takes around 2 hours to return a table with just nine hundred rows. I have deleted some text from code because it was more than the limit of this website.How to structure or optimize the below function to improve its performance.
Hello all, I have a question regarding importing text file data into SQL Server. I'm hoping someone can point me in the right direction, as my searches haven't turned up anything specific enough. I'm trying to parse a large (24MB) text file. It's a fixed-width file, with multiple columns. I need to parse this file, check if a record already exists, and then import the data into the database. But I don't need to insert every column. There's only a few columns from the file I need to insert. This parsing also needs to occur at regular intervals (daily). I looked at BULK INSERT, but I can't find an example that uses only some of the columns. Every example uses all columns, and the file is delimited, not fixed-width. Is there anything within SQL Server that can accomplish this? I haven't turned up anything that will solve my problem. The only other solution I can think of is an application that parses the file for me and inserts the data into the database. But can I schedule that application to run every night at midnight (for example) through SQL Server? I'm not too familiar with SQL Server, so I appreciate any help offered. Thanks,Jay
Summary A ABCD A Category MarketValue Margin A category1 1.0000000 1.000000 A category2 2.0000000 2.000000
H Totals Total Cash Net H 2.00000 200000 2000000
Another Summary B BCDE B Activity MarketValue Margin B activity1 3.00000 3.000000 B activity2 4.00000 4.000000
The items in blue are headers. I don't want to capture those. However, I want to capture all the data in black, and put it into 3 separate tables (or maybe the same table, under the appropriate column names)
This situation differs from anything I've done before in that you can't identify what row contains what data by what's in the row itself. That is, what's in the data rows is random and subject to change. So you can't search the row itself to determine which table it goes to.
However, if there's a way to capture all the rows after a certain header before the header changes again, that might work.
That is, get all rows between A Category MarketValue Margin and H Totals Total Cash Net and get all rows between H Totals Total Cash Net and Another Summary and get all rows after B Activity MarketValue Margin
The suggestion to do this is buried deep in one of my posts, however I still do not have a clear idea of how to do this.
I have a flat file which has several "bad rows" in it. Because file error redirection is buggy, I need a manual approach to get rid of these incomplete rows in my data file.
Phil, you suggested I read the file as one long string, then parse out the bad rows (using a script?).... however I have no idea as to how to actually do this.
I was wondering if it's possible to clarify the steps involved in doing this, or perhaps point me to an example I can look at, as I cannot seem to get around this problem on my own.
I know this has come up before and I have tried several of the solutions found within the forum but I just can't seem to import my file correctly and could use some input, please.
Sample file (less fields than actual file):
Name (str), Phone# (str), Description(str), Resolved(bool), Met(bool)
"Kay, Mary","123-4567","Used a "."not a"," in text", "1", "1"
The text is qualified with " and columns delimited with commas but the description field has embedded quotes and commas. Normally it works except if there embedded quotes and commas.
I have tried unqualified data and undouble, but that does not work either because of the embedded commas in quotes.
Do I need to do something before the data flow? Do I need to do custom code similar to undouble (I tried modifying undouble but using unqualified fields caused the source file to not like the data and go red)? Should the row be read as one field and parsed?
I have a website, where people upload tab delimited text files of their product inventories, which the site parses and inserts into a database table. Here's the catch: Instead of insisting that each user use a standardized format, each user can upload the file in whatever column order they want, they just have to let the site know through a GUI which column is in which order. And, they may upload columns that if not mapped, will be ignored. Right now, I am doing all of this in code and it runs slow, I was thinking of offloading this to either a stored procedure, ssis, or bulk upload. But, with the varying format of the uploaded text file, I am not sure how I could do that. Any suggestions? Thanks!
Hi, I need to categorize a lot of html or text files according to a list of terms and I wonder if terms lookup is adequate for this. The problem is that terms lookup can only take an Oledb source as input. My files can be up to 80 Kb big and aren't columns structured.
Should I import my files in a table ? But if so, how can I import a column with more than 8000 characters ?
I have one .mdf and two .ndf files on the same drive. The .mdf file size =275GB, one .ndf file size = 300GB and other .ndf file size = 135GB. Is this normal to have 3 different file size? if not what can I do to fix this? I don't have option to make all files to initial size equal to 300GB as a .ndf.If I have to add a .ndf file (in case of running out the above drive), what initial file size should I set up for new file on new drive? And how data gets distributed across all 4 files (including new .ndf on different drive)?
I have a package that extracts data from a Flat File. If any errors or truncation occur during the extraction of the input data, the package should fail. All fields that have erroneous values should be reported in the log file.
My Solution: - I have created a Data Flow Task that contains a Flat File Source Adapter and a dummy destination.
- I have left the default "Error Output" configuration of the Flat File Source adapter, namely if a truncation or an error occur for a certain column, then the reaction is "Fail Component".
Problem: This configuration gives me only the first erroneous column in the row being processed.
Question: Is it possible to make the Flat File Source adapter continue parsing the current row before it fails? This way, I would be able to get all the erroneous columns in the row in one shot.
I am working with SSIS package. It executes everyday.
It has the file system task. It moves the production backup from one server to the different server. In today's execution the package failed with the following error
Error Description:An error occurred with the following error message: "The process cannot access the file 'ECOSQLDumpsTest_backup_2015_02_03_230004_1557700.bak' because it is being used by another process.".
How to find which process is using that test backup file?
I work with sql server 2008 on a database.we have export schema and datas with the command export datas
click rigth on database => tasks => generate scripts => select all object => click advanced => select type of data to script => schema and data
Now we have a file with all datas and schema That's perfect ...But how i can insert the file in a other database?ok i can copy paste all datas in management studio and press f5 but when i do this the management studio fail because the size of the file is > 200 mega !
I want to import xml file directly from web page into microsoft sql table. At the moment the import is done after the XML file is downloaded local.I want to skip this step to manualy download the file.It can be done in SQL? when i change the path i get this error: Cannot bulk load because the file URL... could not be opened. Operating system error code 123(The filename, directory name, or volume label syntax is incorrect.)
below is the code
DECLARE @idoc INT DECLARE @doc XML SET @Doc = (SELECT * FROM OPENROWSET(BULK 'F:Folderbrfxrates.xml', SINGLE_CLOB) AS xmlData) -- 1 LOCAL works --SET @Doc = (SELECT * FROM OPENROWSET(BULK 'http://www.bnr.ro/nbrfxrates.xml', SINGLE_CLOB) AS xmlData) -- from web i get error SELECT @Doc
I know i could use Process explorer to find processes which are accessing a drive or a folder, i need exact same thing to be recorded/monitored. Basically i need list of all processes accessing a drive/folder.