How do I import a huge CSV file into Access 2007?
Posted 14 April 2010 - 10:56 AM
Posted 14 April 2010 - 12:01 PM
Just a couple thoughts...
I think the biggest problem you are hitting is the 2GB size limit with Access.
If you have Excel 2007, have you tried to import the CSV file into it first and if that is successful, import that file into seperate Access databases? You could then link the tables and join them in your main database once you have the data all imported. Linked tables do not cound toward the 2BG size limit.
Given you seeking to import the entire file, I guess all the rows and columns are required?
Hope this helps,
Posted 15 April 2010 - 07:52 AM
Posted 15 April 2010 - 12:03 PM
Glad you made some progress
Some additional thoughts...
Can you tell me which version of Excel you are using? I ask, since Excel 2003 has a row limit of 65,536 rows by 256 columns per worksheet.
Did the first 65k worth of rows all contain some data in them? How many columns were imported? In other words are there any blank rows? If there are no blank rows, then it's possible there is some empty values in the CSV file and Excel is only seeing those next four rows of data.
Also, I suppose you meant to state a .xls file, instead of .elsx?
Posted 15 April 2010 - 01:29 PM
I am using Excel 2007. Based on a quick scan of the first set of imported data, it seems that each of the 65,534 rows has data in it and the data seems to be in the correct columns (A to IV), I think that is 256 columns if I did the math correctly. The second import did 2 rows (256 columns) and then the first 5 columns of row 3. While I am not positive about what is in the CSV file (It came from the publicly available department of Health and Human Services website), the CSV file is 3.5 GB in size and other info on the HHS website suggests that it should have about 250,000 records in it. When I attempt to do the second import telling the import wizard to start at record 65,535 it goes through the dialog of (paraphrased) "the text file contains more data than will fit on a single worksheet and to do additional imports to pull in the rest and exclude data already imported". So I am thinking there is still more data but it is choking on getting through the second import.
By the way ... from what I have seen on the HHS website, there actually should be 314 total columns so some of the columns are also getting dropped off.
is there some way to look at the CSV file to determine what is in it (row count)?
Thanks .... Ken
Posted 15 April 2010 - 02:20 PM
I wouldn't mind going to the HHS website and trying myself if you want to share the exact report you saving.
Not sure at the moment regarding the above.
Posted 15 April 2010 - 09:34 PM
Now it's my turn to have problems...I've downloaded that ZIP file twice, extracted with two different ZIP utilities and still cannot get Excel's Import Wizard to even show that 3.5 GB file has any data in it.
However, I was able to confirm that there should be 314 columns with the second Excel file in the ZIP file (Columns A through LB).
Of course I only have Excel 2003 and Excel 2010 Beta to work with, and that may be the issue.
[Edit- I found a CSV Splitter utility that successfully broke the large file up] These individual files would be small enough to import directly into Access tables (though you still might need to split the table across a couple Access files to avoid the 2GB file limit).
Will look again at this tomorrow to see how the data was split, but all 314 columns are there, using Excel 2010 Beta.
Posted 16 April 2010 - 12:06 PM
Thanks for the update ... If the CSV file splitter works, that may solve everything. From what I can determine most of the columns ~50-300 are sort of "uninteresting" so if I can split, then import into excel, then delete columns 50-300, perhaps I can then glue the remaining stuff back together into one excel file or access database. Ken
Posted 16 April 2010 - 04:49 PM
I have to say the CSV Splitter utility works great. However, I've successfully split the file at across ten files with 100,000 rows each and I'm still not sure the entire file has been split.
Given the original file size of 3.5GB and those 10 split files totaled only 1.1GB in size, I suspect there are more than 240,000 records in that file. Not sure if anyone at that website could confirm the row count?
So, I can import the split files into Excel 2010, and show all columns, but it looks like you might need to explore with the Splitter tool to see exactly how many rows are in the file. The other thought is the tool is duplicating records, but I could not tell that is the case based on my quick reviews. Also, the number of columns would need to be reduced if you wanted to eventually import into Access to 255 or less.
Hope this helps and I've enjoyed working through your issue.
Posted 13 June 2011 - 01:23 PM
I did a quick search on Access column row max for 2003 and it is limited to 255 and the NPI record layout is 314 records (I think facts you have already identified).
My plan is to bring this into SQL2005. I'm sure I'll have a whole host of other issues when I try to do this, but wanted to see what your analysis of the data determined. I heard it was 6million records in the .CSV. I've downloaded into a .txt file and will try and open it since Excel cannot open that large file.
The reason I ask about the data is it looks like they have the following fields repeated 15 times):
Healthcare Provider Taxonomy Code_1
Provider License Number_1
Provider License Number State Code_1
Healthcare Proivider Primary Taxonomy Switch_1
Then the following fields repeated 50 times :
Other Provider Identifier Type Code_1
Other Provider Identifier State_1
Other Provider Identifier Issuer_1
Incredibly bad database design and wanted to know what you were able to figure out with the data. What is your final use of the data? I'm going to use the databaase lookup in a website from an outside vendor (Concur) for healthcare ID lookup.
Posted 28 June 2011 - 07:46 PM
I'm not sure if the OP managed to work with the data. I know I did not pursue this topic any further after my last post in April 2010.
Best of luck with your analysis...
Posted 08 June 2012 - 09:22 AM
I happen to run into this conversation and just started working on this same file to import it into Access and the exact problems are happening to me. DM27 were you able to do anything with this file?
Posted 08 June 2012 - 04:51 PM
Welcome to the forum!
To answer your question: No, I haven't tried anything else with this data since April 2010.