Parallel.Foreach() - I get different types of warning

CodeForFun

New member
Joined
Apr 23, 2020
Messages
2
Location
UK
Programming Experience
1-3
I am poor at explaining myself so please bear with me, I'm trying to understand this, also if it's in the wrong section I do apologise.

So i have a code block like this

C#:
public Task DoSomeWork(int batch)
{
    IEnumerable<InputWork> work = await _db.InputWork.Where(x => x.Completed == false && x.AwaitingExecution == true).ToListAsync();

    foreach (InputWork item in work)
    {
        // do some work
        // just for the stake of it
        int mathA = 1;
        int mathB = 2;
        int result = mathA + mathB;
        await SomeHandler.HandleWorkResult(item, result);

        // Mark it as complete
        item.Completed = true;
        item.AwaitingExecution = false;
        _db.Entry(item).State = EntityState.Modified()
        await _db.SaveChangesAsync();
    }
}

In the final actual code, processing takes about 45 seconds per job, as the dataset has increased this has become longer and longer to run.

So I've been trying to make it run, each job as a thread, since the inputs once loaded are not output dependant, when its done the lifting work it calls other code to deal with the result, usually writing back to another table, or in some cases changing the work
entry to mark it as failed etc, but everytime i try to use Parallel.Foreach() i get different types of warning, usually about "eventual consistency" or a second action has started before the first action completed.

And I'm stumped, so can someone explain this to me, it could just be that what I'm trying to do is wrong, and i should find ways to make processing faster (even though that would be a huge amount of work)

Thanks
 
Hmm, ditch this : foreach (InputWork item in work). If I am understanding what you wrote, it looks like you are looking for Parallel.ForEach Method (System.Threading.Tasks) but from the snipped you showed, it limits my ability to give you more information including pseudo code. If you want to execute the code as a unit, then parallelism would help to speed up your process in such scenarios. Take a look at the docs i linked and then consider doing some refactoring, especially with IEnumerable<InputWork> work. If you're using an interface, be sure to use it to its fullest potential.
 
If you want us to help with warnings in code then we would need to see that code. Rather than post "I want to convert this, I tried, I failed, you do it for me", try posting "I want to convert this, this is what I tried, this is what happened when I tried, help me fix it". That's the sort of question you're most likely to get help with. Just show us your best attempt and we can go from there. It doesn't matter how bad it is. We all wrote rubbish code and times and probably still do.
 
processing takes about 45 seconds per job
Is the job encapsulated in this SomeHandler.HandleWorkResult();?
Or is it encapsulated in this DoSomeWork()?

Also, it looks like you are running at the bleeding edge of the language and the libraries. If so, taking advantage of the IAsyncEnumerable and await foreach maybe enough to get you the speed that you need.

Yes, parallelization will definitely get more done, assuming that you can efficiently distribute the operations across your threads, and if at all possible, not have the threads have to wait on each other to accumulate the results. Based or your description at the beginning of thing thread, it sounds like that is exactly the case.

From what I can see from the sketch of your code in post #1, the only reason why you need to come back to the primary thread that initiates it all is because you need the _db, which presumably is an Entity Framework DbContext and I am guessing has thread affinity like most everything else that Microsoft used to cook up. (I used to work for the Evil Empire long ago, and back then thread local storage was the shit.) What if you do the database updates on each thread that does each of the HandleWorkResult() assuming that it is not expensive to spin up a context in that thread? (I don't use EF and I try to avoid it like COVID-19 and the plague so I don't really know it's performance and thread characteristics.)

And to show my Entity Framework ignorance, why do you have to await _db.SaveChangesAsync(); ? Why can't you just merrily go on to the iteration of the loop? So what if you end up with multiple items pending changes to be saved? Isn't that what database engines and frameworks supposed to handle well: atomic operations?
 
Im, not a full-time developer, so forgive my ignorance, it's not a commercial app, it's just something that runs at home.

So let me address these in order

Hmm, ditch this : foreach (InputWork item in work). If I am understanding what you wrote, it looks like you are looking for Parallel.ForEach Method (System.Threading.Tasks) but from the snipped you showed, it limits my ability to give you more information including pseudo-code. If you want to execute the code as a unit, then parallelism would help to speed up your process in such scenarios. Take a look at the docs i linked and then consider doing some refactoring, especially with IEnumerable<InputWork> work. If you're using an interface, be sure to use it to its fullest potential.

This is how it runs now, its this I'm trying to make multi-threaded, and I provided the example because I thought being generic would be helpful, A job list is loaded from the database and an IEnumerable list, the data is processed for criteria
when matched HandleWorkResult() is called to take "some action" with it, normally writing something back to a different table or occasional writing back that the work is corrupt and can't be processed.

HandleWork(InputWork work, WorkResult result)

This will determine from the WORK where the RESULT should be saved

I decided to handle the results separately mainly because I would only in rare cases need to change the input work entry (except for IsFailed=true), each result is unique and could be returned to any number of tables, so i "thought"
the best thing would be to write each result separately to a given table

Rather than post "I want to convert this, I tried, I failed, you do it for me"

At no point have i asked for anyone to do anything for me, I'm trying to understand the process as i said, i don't even know if I'm barking up the wrong tree with this, and i very well could be, i have no objection to writing it myself
its how i learn, but i need more information, online help, for the most part, is to technical for me


Yes, parallelization will definitely get more done, assuming that you can efficiently distribute the operations across your threads, and if at all possible, not have the threads have to wait on each other to accumulate the results. Based or your description at the beginning of thing thread, it sounds like that is exactly the case.

Once work is loaded, no job has any impact on any other job, they do not need to wait for information from another thread or process, they have all the information they need from the database right from the start, i think the description is one thread
is totally agnostic of another


And to show my Entity Framework ignorance, why do you have to await _db.SaveChangesAsync(); ? Why can't you just merrily go on to the iteration of the loop? So what if you end up with multiple items pending changes to be saved? Isn't that what database engines and frameworks supposed to handle well: atomic operations?

I don't, i do believe the reason the codes async right now, is i wanted to play about with async purely that, but that said, the write back to the db, isn't really the expensive component of the work, the work itself is expensive to run, but that's why its async at the moment

--



For all, I know I'm barking up the wrong tree with threads, and I would be far better off taking what I have to slimline the process and improve speeds thru efficiency, and that's probably what ill end up doing, i have assumed multi-threading would be my answer, so I might as well try and understand it, there are a few things I'm not gonna go try from what people have said, I learn by breaking things, I learn by repetition so this is all good for me :)


PS - If i really wanted to a Hey I've tried this, can't do it, get this error you do it. I'd be on stack exchange asking for exactly that.
 
Is there any chance of also making HandleWork() async? If not, is all of that 45 seconds taken up by a single job CPU bound? If it is CPU bound, then the maximum level of parallelization you can really achieve is just the number of cores you have available, and the only way to scale is to add more cores. If it is not CPU bound, then you should definitely look at also making it async. because while a job is waiting for some IO, the CPU might as well be done work on another job until that job needs to wait for IO.
 
Back
Top Bottom