Commitment Issues

[Estimated Reading Time: 4 minutes]

No, not a relationship blog and no, not a rant about the relationship between Embarcadero and the Delphi community. This is a strictly and purely technical post about what “Committed” means in terms of Windows memory, and in particular a key aspect of how that applies to threaded applications.

Last week a user of the software that I work on in my 9-to-5 role reported an issue with the system that they had been experiencing for a while. A work-around had been devised but this was not particularly satisfying and it was felt that a more permanent solution should be within reach.

The issue involved an operation in the software that was called on to process a number of records in a file. For reasons peculiar to the particular user, in this case the process first involved separating each record in the file into a number of files, each containing just one record, and then processing each file in turn.

I should say at this point that these choices in processing are made by the users of the software, not within the software itself, which has to handle whatever processing choices any given user might determine as most appropriate to their particular needs.

If the file contained fewer than around 1500 records, everything was fine. But once the number of records approached or exceeded 1500, this particular process in the system would halt, reporting – via logs recorded in Windows’ Event Log – “insufficient storage”.

A critical aspect of the implementation is that each file processed by the system is allocated a thread on which to perform the processing.

1 file = 1 thread. 1500 files = 1500 threads.

That already looks like a worryingly high number, and when you take into account a further crucial factor, the reason for the failure becomes immediately obvious.

Make Way For The Stack

Every thread is, well, a thread of execution. For that each thread requires a number of things, the important one for the purposes of this discussion is that it needs a stack.

It just so happens that in Delphi code, the default maximum stack size is 1MB for an application. Any thread created by that application process will inherit it’s own maximum stack size from the process unless otherwise specified (see “Shameless Plug” footnote, below).

1500 files = 1500 threads = 1500MB of stack space! (in very round numbers)

Given that a 32-bit Windows process has only 2,000MB to play with, it’s little wonder that that number of threads in a process responsible for other processing and placing other demands on memory, should cause it to run out of storage.

Except for one thing.

If you monitor the process in Task Manager you will see that it’s memory usage never climbs above approx 200MB!

What’s going on?!

Commitment. That’s what.

Old New Things

It just so happens that I visit Raymond Chen’s “Old New Thing” blog pretty much every day. I know that he writes a lot of his articles months in advance and they are auto-posted on a daily basis. So it’s a phenomenal coincidence that the subject of his post that I just linked to, should happen to have come up in the past few days, precisely to answer that “What’s going on?!” question.

I was confident that the thread count and the thread stacks were responsible for chewing up available memory, but the apparent reported memory usage just didn’t seem to support this. Why not?

The answer is, that each thread assuredly does consume a minimum of 1MB of memory, reserved for it’s stack. But if that thread only consumes a 16KB stack, then 16KB is all the memory that it will use.

But the 1MB is reserved regardless. Just in case.

This is that thread’s share of the “Commit Charge” you see referenced in Task Manager.

The Commit Charge is the amount of memory that Windows is currently promising to make available, should the need arise. That is, if every thread in every process were to suddenly require the actual use of all the memory that those threads and processes had indicated they might need, the “Commit Charge” is how much memory the system would be called upon to simultaneously provide. That “provision” may be met from the page file as well as physical memory of course, which is why the Commit Charge can (and quite often does) exceed the physical RAM installed in your machine.

So basically what happens when you create a thread is that Windows is asked to reserve 1MB of memory for that thread’s stack. If the thread never starts executing then it will use very little of that stack – if any. But Windows maintains it’s promise to provide that memory, should the need arise.

But as well as Windows keeping it’s promises, your process has to have realistic expectations.

In particular, in 32-bit Windows there is no point asking Windows to commit to reserving > 2GB of RAM for your process. If you do, Windows will politely apologise but explain that this is simply not possible.

You have “insufficient storage”.

Handling Committment

Fortunately in the case I was confronted with the solution was relatively straightforward.

Although the system was creating 1500+ threads, each thread ended up being executed sequentially. There was really no need to create all these threads simultaneously at all. Instead a work list could be created and a new thread created to work it’s way through that list, creating a single thread as required to process each item in that list in turn.

The minimum Commit Charge for this aspect of the systems processing consequently fell from 1.5GB+ to just 2MB!

More importantly, this meant that the process no longer came to a grinding halt when presented with a huge volume of work to perform. The revised implementation can now handle an unlimited number of records.

So before you jump on the parallel programming bandwagon ans start throwing threads at every piece of parallelable code you can find, you might wish to consider the impact this may have on the minimum commit charge for your application.

At the very least you should perhaps start being a bit more thorough about determining, and requesting from the OS, an appropriate stack allocation for your threads.

http://msdn.microsoft.com/en-us/library/ms682453(VS.85).aspx

Shameless Plug

Unfortunately, the TThread class in Delphi does not provide an easy means for you to set a specific stack size for your threads (to be honest, this is just covering myself – I’m fairly sure there is no way at all to set stack size using TThread, but some enterprising soul may have found some devious mechanism).

My alternate TThread and TMotile classes on the other hand now do.

Coming soon to a CodePlex project near you.

Indeed (and just to be clear, the architectural issue in this case predated my coming to work on the code in question).

Having said that, it’s also true that the particular scenario that uncovered the issue was extremely unusual – a case of the software being capable of working in a way wasn’t perhaps originally envisaged it would be used in and so consequently had not been specifically designed for.

It’s a testament to the overall quality and architecture of the system that what was a not insignificant change could be made quickly, easily and – most importantly – robustly. All-up, fixing this issue took about 45 minutes. It took longer to recreate the initial test case to demonstrate the behaviour and re-run that test afterward than it did to adjust the behaviour itself.

As regards optimal # of threads and the relationship to # of cores… I think that 1:1 ratio only holds for entirely CPU-bound threads. If you have threads that have wait states related to I/O (network, database or file access) then the optimum ratio of threads:cores can be surprisingly high.

In some cases the optimum thread count can be significantly affected – and overall performance further improved – by careful allocation of specific threads to specific CPU cores (CPU “affinity”).

In one highly threaded system I worked on we had to come up with a way of parameterising the size of our thread pools and the core affinities of classes of thread because the optimal configuration on 1, 2 or 4 core systems was drastically different, and leaving it up to Windows to schedule the threads “normally”, i.e. without intervention from us, resulted in unacceptably poor performance on a 4-core configuration.

3 thoughts on “Commitment Issues”

Craig Stuntz says:

07 Oct 2009 at 01:41

If the combined stack size of the total number of threads is a problem, the root cause, it seems to me, is the number of threads, not the combined stack size. For work which can be done concurrently, optimum number of non-suspended threads will be the same as the number of cores in the system, which is typically less than 10. You can have additional, suspended threads without much of a downside, but if I saw an application using more than 50 or so (cough, Outlook, cough) I start to wonder if there are architectural issues. The reserved memory size for 50 threads is small enough that it should not typically be a problem.

When the number of tasks is significantly higher than the number of threads the system can run efficiently, the classic solution is a thread pool with a work-stealing queue.
1. Jolyon Smith says:
  
  07 Oct 2009 at 07:31
  
  Indeed (and just to be clear, the architectural issue in this case predated my coming to work on the code in question).
  
  Having said that, it’s also true that the particular scenario that uncovered the issue was extremely unusual – a case of the software being capable of working in a way wasn’t perhaps originally envisaged it would be used in and so consequently had not been specifically designed for.
  
  It’s a testament to the overall quality and architecture of the system that what was a not insignificant change could be made quickly, easily and – most importantly – robustly. All-up, fixing this issue took about 45 minutes. It took longer to recreate the initial test case to demonstrate the behaviour and re-run that test afterward than it did to adjust the behaviour itself.
  
  As regards optimal # of threads and the relationship to # of cores… I think that 1:1 ratio only holds for entirely CPU-bound threads. If you have threads that have wait states related to I/O (network, database or file access) then the optimum ratio of threads:cores can be surprisingly high.
  
  In some cases the optimum thread count can be significantly affected – and overall performance further improved – by careful allocation of specific threads to specific CPU cores (CPU “affinity”).
  
  In one highly threaded system I worked on we had to come up with a way of parameterising the size of our thread pools and the core affinities of classes of thread because the optimal configuration on 1, 2 or 4 core systems was drastically different, and leaving it up to Windows to schedule the threads “normally”, i.e. without intervention from us, resulted in unacceptably poor performance on a 4-core configuration.
Craig Stuntz says:

07 Oct 2009 at 07:49

Yes, I consider waiting threads essentially the same as suspended threads insofar as scheduling/allocation goes.

Comments are closed.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31