• 0

Parallel Hell :(


Question

I will love the person to death that can give me a solution for this issue I'm faced with. I have been at this since 11 am (it's now 9:11 PM my time) and have made little, to no progress, despite scouring every web site I can muster to find. I have a foreach loop that pulls from a client, studies it, and then makes db changes. This has been working out for quite sometime as the lists have always been very small, which leads me into the issue; the problem in itself, is very simple.

 

The problem is that the bigger that list gets, the longer the process takes. So if the list contains 8500 complex objects, the list takes an hour to complete. So I thought to myself, lets make it use the parallel and cut that time down as much as I can. Thus, the torture began.

 

TL;DR:

I have a data table, which by nature is read thread safe but not write thread safe. So I thought well no big deal I'll just lock the table rows when I need to add a new row and I'll be set. The problem is, the only way I can get it to work is if I put the entire body of everything into a lock call, which takes longer then the non-parallel version does as so:

Parallel.ForEach(pGrpLst, kvp =>
                                {
                                    try
                                    {
                                        pGrpM pGrp = kvp.Value;
                                        string baseSku = kvp.Value.BaseSku;
                                        lock (dbStore.MyTbl.Rows.SyncRoot)
                                        {
                                            var productDR = (from DataRow dr in dbStore.MyTbl.Rows
                                                             where dr["product_sku"] != DBNull.Value && (string)dr["product_sku"] == baseSku
                                                             select dr).FirstOrDefault();
                                            if (productDR == null)
                                            {
                                              
                                                    productDR = dbStore.MyTbl.NewRow();
                                                    dbStore.MyTbl.Rows.Add(productDR);
                                                    productDR["product_sku"] = baseSku;
                                                    productDR["product_class_id"] = 1;
                                                
                                            }
                                            productDR["product_name"] = pGrp.ProductName;
                                                productDR[5] = "Test";
                                        }
                                    }
                                    catch (Exception ex)
                                    {
                                        System.Diagnostics.Trace.TraceError(ex.ToString());
                                    }
                                });

It cut off the rest of my post :( but it's got to be b/c the whole lock statement is dragging down the performance. Now if I use this method:

Parallel.ForEach(pGrpLst, kvp =>
                                {
                                    try
                                    {
                                        ProductGrp pGrp = kvp.Value;
                                        string baseSku = kvp.Value.BaseSku;
                                        
                                            var productDR = (from DataRow dr in dbStore.ProductsTbl.Rows
                                                             where dr["product_sku"] != DBNull.Value && (string)dr["product_sku"] == baseSku
                                                             select dr).FirstOrDefault();
                                            if (productDR == null)
                                            {
                                                lock (dbStore.ProductsTbl.Rows.SyncRoot)
                                                {
                                                    productDR = dbStore.ProductsTbl.NewRow();
                                                    newPrdDRLst.Add(productDR);
                                                    productDR["product_sku"] = baseSku;
                                                    productDR["product_class_id"] = 1;
                                                
                                            }
                                        }
                                            else
                                            {
                                                productDR["product_name"] = pGrp.ProductName;
                                                productDR[5] = "Test";
                                            }
                                    }
                                    catch (Exception ex)
                                    {
                                        System.Diagnostics.Trace.TraceError(ex.ToString());
                                    }
                                });

It will work in amazing time about 0.455 seconds. But if I try to modify productDR it fails hard. I've tried many different locations of locks and different objects to lock (locking a generic object, locking the data row item array, the whole data table, etc). Unless I put it all in one big lock body (which defeats the purpose), it fails. Surley there has to be some way to do this? I don't care what order anything gets processed or added. I just want it to get done in the most efficient way possible.

Link to comment
Share on other sites

Recommended Posts

  • 0

Since Select counts as a write operation I'm not sure you actually can parallelize this at all. Then again, in order to solve your performance issue you first need to figure out where your bottleneck is. If DB access is dominant then parallelism won't help you much.

 

If it's the actual processing of the data and the DB isn't thread-safe then perhaps the easiest way would be to have one thread performing all the DB access (ensuring it is sequential), and multiple threads doing the processing without ever touching the DB. You could basically have two blocking queues (see BlockingCollection, ConcurrentQueue), one for work items, one for results. One thread is responsible for reading and writing to DB, produces work items (in-memory copies of DB data) and consumes results (writes them to DB). Other threads (ideally just Tasks running on the thread pool) consume work items and produce results.

 

Glorious drawing to illustrate:

Ir9OLNo.png

 

Still, don't go implementing any of that before having identified that it's actually the processing that's your bottleneck.

I really appreciate all of your awesome constructive feedback. I currently have implemented the design contained in the post prior to this one and it cut the response time of the same bulk load from 3 minutes to 26 seconds with just making the main loop parallel. So it's moving in the direction I am wanting to go in! :)

 

Performance reports on the dummy project (which to a point is apples and oranges since there is a lot of other non-database as well as database related calls / processing) show that the function doing the most individual work is DataRow_getItem() which is

var productDR = (from DataRow dr in dbStore.ProductsTbl.Rows

                 where dr["product_sku"] != DBNull.Value && (string)dr["product_sku"] == baseSku

                 select dr).FirstOrDefault();

 

45.0% Inclusive samples percentage for the whole procedure go to that call now. To a point maybe there isn't much that can be done with the current setup I have to make that more optimized, but either way your advice has been awesome and everyone else's comments have been as well. I haven't had to use a parallel framework before. I've used different threads and the async patterns but they have all been on independent work up to this point. Neowin.net to the rescue :)

 

EDIT: When I say I don't know if there is anything else that can be done to optimize that call I wasn't discounting your recent post, just meant that specific line. There is an AsParallel() extension on generic enumerations, but I will have to research if there are any pitfalls with running an AsParallel() extension inside of an already parallel process. That sounds like it could introduce all kinds of issues.

Link to comment
Share on other sites

  • 0

Performance reports on the dummy project (which to a point is apples and oranges since there is a lot of other non-database as well as database related calls / processing) show that the function doing the most individual work is DataRow_getItem() which is

var productDR = (from DataRow dr in dbStore.ProductsTbl.Rows

                 where dr["product_sku"] != DBNull.Value && (string)dr["product_sku"] == baseSku

                 select dr).FirstOrDefault();

 

45.0% Inclusive samples percentage for the whole procedure go to that call now.

I see that you're making that call twice in that line. Perhaps LINQ already optimizes the redundancy away for you, but I wouldn't assume that. You could write:

var productDR = (from DataRow dr in dbStore.ProductsTbl.Rows
                 let sku = dr["product_sku"]
                 where sku != DBNull.Value && (string)sku == baseSku
                 select dr).FirstOrDefault()
or more simply:

var productDR = dbStore.ProductsTbl.Rows.FirstOrDefault(dr => {
                    var sku = dr["product_sku"];
                    return sku != DBNull.Value && (string)sku == baseSku;
                });
Link to comment
Share on other sites

  • 0

I see that you're making that call twice in that line. Perhaps LINQ already optimizes the redundancy away for you, but I wouldn't assume that. You could write:

var productDR = (from DataRow dr in dbStore.ProductsTbl.Rows
                 let sku = dr["product_sku"]
                 where sku != DBNull.Value && (string)sku == baseSku
                 select dr).FirstOrDefault()
or more simply:

var productDR = dbStore.ProductsTbl.Rows.FirstOrDefault(dr => {
                    var sku = dr["product_sku"];
                    return sku != DBNull.Value && (string)sku == baseSku;
                });
Wouldn't (sku as string) == baseSku be more efficient?
Link to comment
Share on other sites

  • 0

Wouldn't (sku as string) == baseSku be more efficient?

Not at all in this case. In this case, if sku ever isn't a string, using "as" will silently continue execution whereas a regular cast will interrupt execution immediately. It's not really a matter of performance but what you want the program to do in this situation. Is it normal that this cast fails? If yes, use "as" and go on normally. If no, use a regular cast so you are forced to revise your assumptions in case of unexpected failure. It's way better to get an InvalidCastException at the point where your assumption is violated than a NullReferenceException at some arbitrary point down the line. Consider:

// GOOD: Expects the cast to succeed
var t = (T)a;
t.UseT();
// BAD: Expects the cast to succeed, but will fail with NullReferenceException instead of InvalidCastException
//      if the expectation is wrong, obfuscating the error
var t = a as T;
t.UseT();
// GOOD: Expects the cast not to succeed sometimes
var t = a as T;
if (t != null) t.UseT();
// BAD: Expects the cast not to succeed sometimes 
//      but pays the cost of an exception instead of using "as"
try {
    var t = (T)a;
    t.UseT();
}
catch (InvalidCastException) {
}
// BAD: variation on the above. Expects the cast not to succeed some of the time, 
//      but pays of the cost of two typechecks instead of one with "as"

if (a is T)
{
     var t = (T)a;
     a.UseT();
}
So the only case where it's a performance issue is if you expect failure but use a regular cast with try-catch or if (...is...) to deal with it. If you don't expect failure, a regular cast is what you want.
Link to comment
Share on other sites

  • 0

I would like to take the time to personally thank Andre for his awesome contributions for his work helping people like myself and many others throughout the forum; I really appreciate it.

The pleasure is all mine  :D

Link to comment
Share on other sites

This topic is now closed to further replies.