Which is the fastest way to upload the excelsheet having more than 1,00,000 lines of code in C#.NET ?

Apr 1, 2011 at 7:28 PM

I want to upload the Excelsheet having more than 100000 lines of code in SQL server database using C# code. Actually , Excelsheet contains 20 columns. Data of Excelsheet should get inserted into the 1 master table & other 4 subtables. I am using OLEDB Data Provider.I am getting that excelsheet records in 5 Datatables & then I am inserting it in Tables.

But it is taking too much time to upload the excelsheet. Again, when i tried to upload excelsheet second time, Data must get appended in Database. But because of large amount of data insertion & validations of data against the already present data in database taking time. & Software gets completely stopped.

If anyone knows the solution for this problem, please suggest me the solution ?

You can also give me reply on pankskolhe@yahoo.co.in

Thank You.

Apr 3, 2011 at 1:47 AM


An interesting problem, but I had a little difficulty knowing which way you wanted to go: DB to Excel or Excel to DB or both.  Regardless, here are a few things I think you might like to investigate and possibly experiment with to improve your application speed.

If it was me using C# (and I do consistently, although I love the groove of F#, so cool it is), then going from the Excel side to the database (and presuming the SQL Server is local to your machine):

Option 1: Compress and dataset the Excel data into its constituent tables then follow the normal ADO.Net update process to the database.

Option 2: If you were being really clever, then multi-thread 5 datasets with their associated tables and related ADO.Net commands, and allow SQL Server to use its pooling and multi-threading capability to provide concurrent updating to the required tables.

Option 3: Why bother with uploading the lot, why not just monitor Event changes to Excel cells and then upload those small changes to the database in real time.  (If you use a local database as a proxy of your Excel changes then you use SQL Server's Push-Pull replication to a remote SQL Server. Just an idea because its works really well in real time). 

Option 4: Dataset the Excel spreadsheet in it entirety in your C# application, and or convert to XML, then use Linq or stored procedure update query to SQL Server.

Option 5: If you are not reporting from SQL Server directly and just want basic I/O, then you could create classes to Read and Write to table associated files created in your C# program allowing you to use asynchronous I/O instead of standard synchronous I/O. Then allocate buffers with appropriate sizes and ByteStream directly into SQL Server OR serialize via the BinaryFormatter or SoapFormatter class and either transer into a dataset or pass-through to SQL Server.

Well as you see, there are numerous ways to attack the problem and many more, but all depends on whether your are using a local or remote SQL Server.  Local means, "What processor are you using i3 to i7 or AMD etc.  What average access speeds do you have to your hard drive, how much memory are you using and are you running other applications at the same time. Do you have your performance features turned off?", the list is as you know endless.  Oh by the way are you updating to a remote SQL Server directly or via the web?  And now I ask the question with respect: "What are we really trying to do here and would it be better to considered alternative solutions such as OLAP and MDX queries using ADO MD.Net.  (Using multidimensional technology would be more in keeping with Excel, also allowing drill down through data mining.)  

And now I get to the real set of answers for your problem and these totally and only relate to SQL Server itself.  And rather than explain each one,  I hand them over to you in bullet form for you to investigate and hopefully improve the speed of your application.

You need to:

1. Minimise the column widths using appropriate data types for the work. I/O cost is less on narrow tables!

2. Compressing you data inside SQL Server to reduce I/O contention and storage.  Consider both row and page compression and sparse columns to optimise NULL fields.

3. Check your space allocation relating to pages and extents.

4. Consider what you are doing with indexing, either table pages have clustered or non-clustered indexes.  Cluster will order the collections on the tables raising performance.  But you need to look at your query plan to assess each method.

5. Optimise your queries and use the query plan to help you.  Especially if you are running SQL queries from the C# front-end or as important, calling a stored procedure or trigger from the front-end.

6. Regarding data types, Excel is likely to be 80 to 90% numeric data, therefore seriously consider the VARDECIMAL data type.

7. If using stored procedures, run sp_configure and check memory allocated to tempdb, procedure cache (run DBCC PROCCACHE and DBCC MEMUSAGE, I think I got the right one!). Even so consider taking tempdb out of System so that it is not being affected by SQL Servers other under the cover processes. But all this information is in the SQL Server Performance Monitor, and you need to have a good look at this utility or checkout your local friendly DBA.

8. Other areas to check are: Locks ( row and page level, exclusive, lock escalation, etc.), sort pages, maximum worker threads, working set size, max async I/O, max lazywrite I/O, network packet size, priority boost SMP concurrency (setting the number of threads that SQL Server will allow to simultaneously access data, free buffers, and the buffer manager settings.  These areas may seem very daunting but worth looking into, and even having a go at setting the SQL Server performance monitor counters yourself and running your program.  The key one for me would be the SQL Server Locking Activity monitor and doing the analysis in that area first in conjunction with the way your SQL queries are written. And I am presuming that your code is optimised and not doing something strange like a recursive function or stuck in a cyclic class process.

I am not going into transaction management, middleware (a.k.a. SQL Server Service broker or Integration services).  Possibly these approaches could benefit but they need to be designed carefully, can be resource hungry if not correctly built and you need to be set up accordingly on your computer and any participating network hardware and software.  Your I/O problem would probably increase. Great solution but the wrong way.  Mind you I did come across a team of developers who deliberately delayed their software to make their customers feel that the software was doing something important.  Pretty unprofessional I thought!

As a final note, I do not think you have a problem with your front-end per se, but that is only a guess.  Using Visual studio you have the option to build a project using the Excel template or directly coding with C# bound Excel data fields.  And you also have the option when creating an Excel project in VS of opening an existing Excel spreadsheet.  So I don think the delay is going to be there unless your machine is under performing.  But I do think you may be running into either PAGE or ROW locking issues on SQL Server, cache problems and async I/O issues.  To be frank it may be better to better to drop the tables on SQL Server, recreate them empty and then fill from your C# program, each time it is run.  Anyway, there is a lot here to consider which I understand.  Quick and dirty is good sometimes, but achieving performance takes time, effort, analysis and more effort.  For example, a C# program using deliberate multi-threading will perform differently on different processors and motherboard configurations.  Try an iteration up to 100,000,000 writing out Hello World, and you will see what I mean.  Dead slow these computers, give me a Quantum or Optical Crystal any day.

I think it would be a good idea to simply output your Excel data from your C# program to a normal text file, just to get some idea of speed against SQL Server, (get millisecs on the DateTime/TimeSpan classes if necessary), as a sort of control if you like.  Also try your program on a few other machines if possible to check processing speeds.  One could be forgiven for thinking that simply connecting a C# application to SQL Server, everything should run at the speed of light, but reality shows that on many occasions it is not true and database-driven web applications are prime culprits when 2000 users are accessing and updating at the same time. This is the best I can do for you from whence I am.  Only some key suggestions I'm afraid.  Wish I could do more, but there you go.

Farewell. Quen Wilson - IOW, UK

P.S. Don't forget to take a backup of all your work before applying any of the above suggestions, if you go that way. 

P.S.S.  Friendly Question: "Why are you not using the optimised SQLDataAdapter instead of the OLE one, if you are processing from your C# app to SQL Server?" Seems rather strange or is it an Excel thing?