CPU bound - Context Switching - Slow thread processing

etl2016

Active member
Joined
Jun 29, 2016
Messages
39
Programming Experience
3-5
hi,

My environment is as follows : a virtual machine with 16 virtual processors and 64 GB memory. I have a small test csv file with hundred and sixty thousand rows. I am spinning 16 threads at the rate of one thread per CPU, thus, each sharing a work load of ten thousand rows. Each of the threads is converting ten thousand rows to equivalent in-memory DataTable. And, until this stage all the processing is happening in few seconds (in under 7-10 seconds from start of program, and at this point in time, I have a collection of 16 in-memory Datatables, each holding ten thousand rows). So, Disk IO ends here at this stage. From now on, CPU bound processing happens. Each of the 16 threads continue to loop through their respective work-load of ten thousand row in-memory DataTable and connects to Database to update one of the columns in their respective in-memory Datatables. The overall throughput is observed to be 90 minutes for these ten thousand rows, which is roughly 100 rows per minute. Each of the 16 threads takes exactly the same amount of time and within these common 90 minutes all the 16 threads process their respective work load of ten thousand rows. This looks very very low throughput. A closer look at the Database turn-around times is verified to be very fast, within a few milli/micro seconds the Database is able to respond back. There is no Disk IO within the loop. As soon as the value is looked up in the Database, the in-memory Datatable is updated. And, each of the these instructions themselves are happening extremely fast (from times logged). Whereas, it has been observed that between successive programming instructions within the loop, there is a short delay, probably due to context/thread switching.

Question-11 : In a hardware setup of 16 virtual processors with 64 GB memory, and the program spinning threads at a ratio of 1:1 (and almost zero user programs running in the VM), why is context switching happening at all? Question-2: Can Async sort of TPL Asynchronous programming help here - which I believe is of little help (please correct me otherwise), because there is nothing the program has to do while waiting for the response from Database. There are hardly 3 or 4 instructions within the for-loop processing the ten thousand row in-memory Datatable, and each of these programming instructions is dependent on the completion of the previous programming instruction, there is nothing an instruction X can do out-of-sync while its predecessor instruction (X-1) is still being processed. These 3 or 4 instructions in the for-loop have to go sequentially. X, X+1 , X+2 are happening extremely fast in milli/micro secs. It seems the root cause is the short delay between some X and X+1 or (x-1) and x or sometimes (X+2) and (X+3), giving an overall through put of just over 1 row per second.

Could you please share your views?

thank you
 
That fact that you have to hit the database for each row still indicates that you are IO bound. Even worse, if the database is on a network, that IO is subject to the network latency. So the actual database operation may be very fast, but the time it takes to send the query/command to the database and have it come back is time that you could be spending running more CPU instructions, instead of yielding the thread and waiting for the response.
 
As an aside, is there a chance that you have accidentally left some Console.WriteLine()'s in your code?
 
I'll ask the obvious question: What does your profiler tell you about where the time is being spent?
 
"..... Even worse, if the database is on a network, that IO is subject to the network latency ....." Agreed. The Database is cloud based, but is located in the same geographic region as my processing component. So, the positioning of the client-server is not very obvious as it would have been on-premise, other than knowing they belong to same region.

".....As an aside, is there a chance that you have accidentally left some Console.WriteLine()'s in your code? ...." that's true. In each Console.Writeline am capturing thread number from Managed-ThreadID and also, the system time. This could be adding to the cost. This is to monitor response times. I will remove these to know how long the actual non-logging times are. Will explore alternate options to monitor turn-around times.

"..... What does your profiler tell you about where the time is being spent? ...." am going to use this approach as an alternate option to Console.Writeline, haven't used this feature yet, any pointers are much appreciated, thanks.
 
I don't think hardware or network latency is the issue... Inserting 160K rows into an Azure database for example should not take more than a few minutes...

If your goal is to import the CSV into a database, you could connect to your database with entity framework. Import your rows into your local object set, and let EF manage the batch insert... The other solution is to build one (or more) giant batch SQL query dynamically to create those 160K rows, using VALUES... SQL statements are limited to 10MB max by default, so you may have to split the final query up in say 10K rows batches or so. That leaves you 1KB of query text for each row, and only 16 queries to actually run. And you can then parallel those and get full throughput from the server.

The one thing you should definitely NOT be trying to do is to foreach through it one query at a time...

In particular, if you are doing any string concatenation in there, it will really bog down the whole process...

It would help us help you if you would post your code.
 
Last edited:
Thank you Herman.

I examined runtime behaviour with Performance Profiler under VS2017. "......... Inserting 160K rows into an Azure database for example should not take more than a few minutes.. ....." Not all these 160k rows (nor entire row) are inserted into target DB. My databas is not traditional Azure SQL, it is NoSQL Azure Cache for Redis., its a {key, value} pair with limited features of standard SQL. The in memory Datatable with ten thousand rows is looped through row-by-row and a particular column is looked up in Redis and a) the value is retrieved if key column exists - if doesn't exist yet - b.1) a new value is constructed corresponding to the key being looked up and b.2) populated in Redis as new {key, value} pair and b.3) the in-memory Datatable is updated with the corresponding newly constructed value.

From performance profiling, it is observed that a) is taking less than 1 milli sec. b.1) is taking less than 1 ms b.2) is taking the longest, roughly 400 ms b.3) is taking 20 ms. The net effect is just over 1 and under 2 rows throughput per second or roughly 100 rows per minute. Which means, a Datatable with ten thousand rows is taking around one-and-half hours to complete its Get/Set cycle on Redis and subsequently updating the Datatable with corresponding value fetched from {key, value} pair

My .net code and Azure Redis Cache are on the same geographic region, so, network (may?) may not be the root cause.

thank you
 
Like I said, it would help to see the code... I don't know about Redis specifically, but from what I am reading it's a simple in-memory cache. It seems to support batch processing, have a look here for a library that does this, maybe this will help.
 
thank you, my Code is as below:

foreach (DataRow row in ten-thousand-row-DataTable.Rows) // each row is a 2 column temporary structure with {key, value=null} to start with before looping
{
1) var value = retrieve key's value from Redis ;
2) if (retrieved value is null) // key doesn't exist in Redis yet
2.1) value = construct a new value corresponding to the key being processed
2.2) Set newly constructed value to establish the new {key, value} pair in Redis database
2.3) Update the same newly constructed value in the ten-thousand-row-DataTable, corresponding to the key
3) else // which means Redis already had this {key, value} pair
Update the value corresponding to key column in ten-thousand-row-DataTable, using the fetched value from Redis
}

In here, 2.2) which is setting the newly Constructed value into Redis to populate a new {key, value} pair is the longest running instruction, taking nearly 400ms

As prototyped in Redis for .NET Developers – Redis Pipeline Batching | Taswar Bhatti , am trying to construct a batch, which is technically a .net list of ten thousand StringGetAsync (key) instructions, and upon completion of constructing this batch, I want to execute the batch.

Here I am trying to know how to write the .net TPL Async code, to associate Request A with Response A. If Redis holds 3 pairs {key1, value1} {key2, value2} and {key3, value3}, then when a batch of "get me the values of key1, key2, key3" is run and Redis responds back "here you go, your 3 values : value3, value1, value2", the question is how do I associate value1 corresponds to key1? What commonality ties the request and response together, especially in foreach loop with ten thousand units of requests? Any sample code is much appreciated (.net FW 4.6), thank you.
 
I honesty can't believe in this day and age, people need to instruct others how to post code and use code tags. This is how you post code by using the code tags feature provided by the editor of the forum :

CodeTags.gif


What you enclosed; is not code. It's pseudo code in the worst form. It's quite simple. Either you post your actual code, or there no reason to continue with this topic.

We don't need an explanation on what your code is doing, as that is what we will decide, and it's also why you are here. However, it might be more helpful to tell us what you want your code to do instead.

And can you kindly stop quoting like this : " other person said something " and actually use the quote button if its absolutely necessary, and then only quote the specific line(s) relevant to your reply. Other wise it makes difficult to read what is a quote and what is text I'd prefer not to read twice.

Thanks
 
thanks Herman, Sheepings. The code is as follows:

C#:
IBatch batch = redis.CreateBatch ();

var ListofGetTasks = new List <Task<RedisValue>> ();



foreach (DataRow row in in_memory_Datatable .Rows)

{

   var readTask = batch.StringGetAsync (row["key"] );

   ListofGetTasks.Add (readTask );

}

batch.Execute();



At this place, am trying to capture the responses from Redis corresponding to the requests and identify an association between each request and its corresponding response. And, update the in-memory-Datatable's value column, for each corresponding key.

For example, if the in-memory-Datatable held 3 rows as {key1, null}, {key2, null} and {key3, null} before receiving responses from Redis, then, in the end the expected behaviour is to let the in-memory-Datatable have content {key1, value1}, {key2, value2} and {key3, value3}

thank you
 
Ok, so from that code, we see that you are calling StringGetAsync (which I assume is a remote method) for each row of the in-memory table.

Since it's still missing a bunch of code, here's what I think you should be able to do. Get the entire dictionary in one go from Redis, using this . Keep this as your working set. Do your foreach, but instead of calling the service every time, just lookup your local working set.
 
The pseudo code you posted in post #9 does not look like the snippet of code in post #12. Where is the DataTable updates you mentioned in post #1 and #9? Where is the Redis database updates? Where are the Console.WriteLine()'s you mentioned in post #5?
 
thanks Skydiver.

Console.Writeln are now removed and Profiler is being used for performance monitoring. Also, the approach is being upgraded from earlier serial to asynchronous, to address network round trip issues.

Update to redis (sorry, better phrased as adding new {key, value} pair) when done serially, was the bottleneck that has been identified because of the response request round trip from .net code to intermediate StackExchange.Redis library to finally Redis database. Updates never happen to redis in this design, there are only Get/Set. Get operation is happening relatively much faster than Set operation.

The serial get/set is now being re-coded as a TPL/Async approach, in which batch 1 of all ten thousand Get instructions are first constructed and fired, as in lines 7 to 17 in the earlier post. As the next step, I am trying to figure out how to associate the ten thousand responses fedback by Redis to their respective requests so that the {key, value} pairs are aligned correctly. TPL code to achieve this association is being constructed. Any inputs are welcome.

Once the requests and responses are correctly associated, the in-memory DataTable update is yet to happen in TPL/Async manner, however, as examined using earlier row by row Synchronous approach, the amount of time for this in-memory Datatable operation for 1 row took was much less (10ms) than the Synchronous Set operation (400ms). So, the focus is now localised on re-writing Set approach to a Pipeline/Batch approach in a TPL Async fashion, first

As an alternate approach to TPL/Asynch pipeline batch mode of programming, am considering to see if Redis offers features to download all its {key, value} pairs into user's programming runtime environment, so network bandwidth problem is avoided altogether.

than you
 
Back
Top Bottom