Jump to content

Welcome to Geeks to Go - Register now for FREE

Need help with your computer or device? Want to learn new tech skills? You're in the right place!
Geeks to Go is a friendly community of tech experts who can solve any problem you have. Just create a free account and post your question. Our volunteers will reply quickly and guide you through the steps. Don't let tech troubles stop you. Join Geeks to Go now and get the support you need!

How it Works Create Account
Photo

How do I import a huge CSV file into Access 2007?


  • Please log in to reply

#1
KenPC

KenPC

    Member

  • Member
  • PipPip
  • 43 posts
I have a CSV file that has about 315 columns and 250,000 rows with a total size of about 3.5 Gigabytes which I would like to convert into a database, preferably Access 2007. Using the Access import "wizard" does not seem to work ... I got an error message saying the file is too big. Is there any way to make an Access database from this CSV file? Are there any tools that I could use to "split" the CSV file into fewer rows? Thanks
  • 0

Advertisements


#2
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Hi KenPC,

Just a couple thoughts...

I think the biggest problem you are hitting is the 2GB size limit with Access.

If you have Excel 2007, have you tried to import the CSV file into it first and if that is successful, import that file into seperate Access databases? You could then link the tables and join them in your main database once you have the data all imported. Linked tables do not cound toward the 2BG size limit.

Given you seeking to import the entire file, I guess all the rows and columns are required?

Hope this helps,

Best regards,

dm
  • 0

#3
KenPC

KenPC

    Member

  • Topic Starter
  • Member
  • PipPip
  • 43 posts
Thanks for the import to Excel suggestion. I was able to import the first 65,534 rows (resulted in a .elsx file of about 19.6 MB ... a far cry from the total CSV file size of 3.5 GB) fine but when I went to import the rest of the data using the import wizard telling it to start at row 65,535 into a different Excel workbook (not a different tab in the first workbook) it only imported 4 rows. Any suggestions as to what I could do next?? Thanks
  • 0

#4
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Hello KenPC,

Glad you made some progress :)

Some additional thoughts...

Can you tell me which version of Excel you are using? I ask, since Excel 2003 has a row limit of 65,536 rows by 256 columns per worksheet.

Did the first 65k worth of rows all contain some data in them? How many columns were imported? In other words are there any blank rows? If there are no blank rows, then it's possible there is some empty values in the CSV file and Excel is only seeing those next four rows of data.

Also, I suppose you meant to state a .xls file, instead of .elsx?

Sincerely,

dm
  • 0

#5
KenPC

KenPC

    Member

  • Topic Starter
  • Member
  • PipPip
  • 43 posts
Hi ... Thanks again for the reply

I am using Excel 2007. Based on a quick scan of the first set of imported data, it seems that each of the 65,534 rows has data in it and the data seems to be in the correct columns (A to IV), I think that is 256 columns if I did the math correctly. The second import did 2 rows (256 columns) and then the first 5 columns of row 3. While I am not positive about what is in the CSV file (It came from the publicly available department of Health and Human Services website), the CSV file is 3.5 GB in size and other info on the HHS website suggests that it should have about 250,000 records in it. When I attempt to do the second import telling the import wizard to start at record 65,535 it goes through the dialog of (paraphrased) "the text file contains more data than will fit on a single worksheet and to do additional imports to pull in the rest and exclude data already imported". So I am thinking there is still more data but it is choking on getting through the second import.

By the way ... from what I have seen on the HHS website, there actually should be 314 total columns so some of the columns are also getting dropped off.

is there some way to look at the CSV file to determine what is in it (row count)?

Thanks .... Ken
  • 0

#6
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Interesting...I almost asked what the source of the CSV file was, but I figured it was private or confidential business data...


I wouldn't mind going to the HHS website and trying myself if you want to share the exact report you saving.


is there some way to look at the CSV file to determine what is in it (row count)?

Not sure at the moment regarding the above.

Best regards,

dm
  • 0

#7
KenPC

KenPC

    Member

  • Topic Starter
  • Member
  • PipPip
  • 43 posts
hi ... thanks for the follow-up: the website is: http://nppesdata.cms..._NPI_files.html and the file it is the only download on the page (NPI Files) and it is in ZIP format (347 MB) which when un-zipped turns into 3.5 GB plus a couple of other small files
  • 0

#8
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Hello KenPC,

Now it's my turn to have problems...I've downloaded that ZIP file twice, extracted with two different ZIP utilities and still cannot get Excel's Import Wizard to even show that 3.5 GB file has any data in it.

However, I was able to confirm that there should be 314 columns with the second Excel file in the ZIP file (Columns A through LB).

Of course I only have Excel 2003 and Excel 2010 Beta to work with, and that may be the issue.

[Edit- I found a CSV Splitter utility that successfully broke the large file up] These individual files would be small enough to import directly into Access tables (though you still might need to split the table across a couple Access files to avoid the 2GB file limit).

Will look again at this tomorrow to see how the data was split, but all 314 columns are there, using Excel 2010 Beta.

Best regards,

dm
  • 0

#9
KenPC

KenPC

    Member

  • Topic Starter
  • Member
  • PipPip
  • 43 posts
hi dm27:
Thanks for the update ... If the CSV file splitter works, that may solve everything. From what I can determine most of the columns ~50-300 are sort of "uninteresting" so if I can split, then import into excel, then delete columns 50-300, perhaps I can then glue the remaining stuff back together into one excel file or access database. Ken
  • 0

#10
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Hello KenPC,

I have to say the CSV Splitter utility works great. However, I've successfully split the file at across ten files with 100,000 rows each and I'm still not sure the entire file has been split.

Given the original file size of 3.5GB and those 10 split files totaled only 1.1GB in size, I suspect there are more than 240,000 records in that file. Not sure if anyone at that website could confirm the row count?

So, I can import the split files into Excel 2010, and show all columns, but it looks like you might need to explore with the Splitter tool to see exactly how many rows are in the file. The other thought is the tool is duplicating records, but I could not tell that is the case based on my quick reviews. Also, the number of columns would need to be reduced if you wanted to eventually import into Access to 255 or less.

Hope this helps and I've enjoyed working through your issue.

Sincerely,

dm
  • 0

Advertisements


#11
dmarshall05

dmarshall05

    New Member

  • Member
  • Pip
  • 1 posts
Hi DM27 & KenPC,

I did a quick search on Access column row max for 2003 and it is limited to 255 and the NPI record layout is 314 records (I think facts you have already identified).

My plan is to bring this into SQL2005. I'm sure I'll have a whole host of other issues when I try to do this, but wanted to see what your analysis of the data determined. I heard it was 6million records in the .CSV. I've downloaded into a .txt file and will try and open it since Excel cannot open that large file.

The reason I ask about the data is it looks like they have the following fields repeated 15 times):

Healthcare Provider Taxonomy Code_1
Provider License Number_1
Provider License Number State Code_1
Healthcare Proivider Primary Taxonomy Switch_1

Then the following fields repeated 50 times :
Other Provider Identifier Type Code_1
Other Provider Identifier State_1
Other Provider Identifier Issuer_1

Incredibly bad database design and wanted to know what you were able to figure out with the data. What is your final use of the data? I'm going to use the databaase lookup in a website from an outside vendor (Concur) for healthcare ID lookup.
  • 0

#12
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Hello dmarshall05,

I'm not sure if the OP managed to work with the data. I know I did not pursue this topic any further after my last post in April 2010.

Best of luck with your analysis...

Sincerely,

dm
  • 0

#13
Luis C

Luis C

    New Member

  • Member
  • Pip
  • 1 posts
Hello,
I happen to run into this conversation and just started working on this same file to import it into Access and the exact problems are happening to me. DM27 were you able to do anything with this file?
  • 0

#14
dm27

dm27

    Member

  • Member
  • PipPipPip
  • 272 posts
Hello Luis C.,

Welcome to the forum! :happy:

To answer your question: No, I haven't tried anything else with this data since April 2010.

Sincerely,

DM
  • 0

#15
jg166

jg166

    New Member

  • Member
  • Pip
  • 1 posts
I got the csv splitter to work; however it split the nppes file into 14 smaller files. Once I finally got into the file, was disappointed to see that it only had taxonomy code..now I have to map it to the taxonomy description. Does anybody have that data mapping table already?
  • 0






Similar Topics

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

As Featured On:

Microsoft Yahoo BBC MSN PC Magazine Washington Post HP