.NET

C# > ThreadPool in details

shyaway 2018. 8. 30. 23:44

ThreadPool in details.


I'd like to write a post for Task, but first things first, I'm going to talk about the thread pool since it is the underlying mechanism of the task. I already put up a post for the basic of the thread pool in the last post, I'm going to look at the details this time.



ThreadPool has two types of threads.

There's a thread pool API ThreadPool.GetMinThreads(out int workerThreads, out int completionPortThreads); and it returns two types of the minimum number for the thread respectably. What are they exactly?
    • WorkerThread
      This means pretty much every threads except for the main thread. The work that the thread pool offloads is typically a delegate and it means any jobs in your code. 

    • CompletionPortThread
      It's called I/O Thread and Threadpool in .NET separately maintains this thread to dispatch NativeOverlapped Callback. 


First of all, these two threads are technically just normal threads. The reason the thread pool logically splits those into two types and maintains separately is to avoid deadlock. Imagine that there are too many tasks in the queue and the thread pool is offloading the works to the worker threads to consume the waiting tasks faster, ended up allocating jobs to the threads for the native I/O callback. There's no thread reserved for dispatching the I/O completion callback and it directly leads to the occupation of I/O resource for a long period, and eventually causes deadlock.


    • NativeOverlapped Callback
      Kind of struct, and it means the overlapped win32 call. What it means by win32 call is the windows APIs in user32.dll or kernel32.dll that could be invoked with PInvoke in managed code.


If you learn a little bit about the diagram stacks below, you may see the point in CompletionPortThread. I recommend you to read There is no thread by Stephen Cleary, which explains how a thread in your code will be interacting with your devices in a brief detail. The Overlapped yellow square is the exact spot for NativeOverlapped callback.







Algorithm in ThreadPool

ThreadPool manages the life cycle of threads. It adds a new thread or delete one on demand to improve the TPS by offloading the jobs in the queue to its available threads. With no doubt,  a group of brilliant developers gathered together and put a lot of efforts to make the best thread pool ever, thinking thoroughly for the variety of scenarios where the thread pool might show lower performances. But why there is still a performance issue with regard to the thread pool like I encountered before. I posted an article about the slow performance issue already, but it was just a detailed, but yet half proof of my wild guessing. So I really wanted to know the inside of the thread pool and this is it.

    • Starvation
      In computer science, Starvation means a case of concurrent computing in which a process is repeatedly denied the required resources to process a job. An intentional DoS attack could result in starvation, I think that's the good case. If demanding tasks are offloaded to all of worker threads in the thread pool and the queue is piling up the tasks, there would be a plenty of starved items in the thread pool queue. To deal with this, the thread pool has a mechanism to add a new thread for the waiting task in its queue.

    • Hill climbing heuristic
      Heuristic means "find / discover" and more accurately, it's a process or method that actively and responsively find a better way to solve a problem for itself. Hill Climbing is one of algorithms and It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found. This is the behind scene in the thread pool.

      This algorithm works entirely for effectively utilizing the CPU cores when threads are blocked by I/O or in waiting state by the process occupation. It makes a decision itself to add a new thread to its pool every time one of the threads finished its job or the static 500ms interval. When the thread pool thinks it'd better to create another thread, then one will be joining the forces otherwise will be falling back untill it reaches the minimum number of threads, which you can get from ThreadPool.GetMinThreads API. This responsive technique is called Hill climbing heuristic.




Forget about the algorithm. What about having a lot of threads in the pool?

Well creating a new thread makes consuming the work items in the work queue in the thread pool faster, then having a lot of threads from the start gets rid of the necessity of the hill climbing heuristic? The answer would be No.

    • The more threads you have, the more context-switching there would be. This can lead to CPU overhead and if you have a lot of threads, the overhead could pose a significant impact on the system.

    • The more threads you have, the more memory stacks should be created, and this can impact the data locality. If CPU has a memory stack per a thread and caches them like it juggles them all, then the caching would be very less effective.


Holding more threads than the number of the logical processors has a limited benefit. It just makes CPU always busy and do more works by offloading some work items to available threads when there're some of blocked, unavailable threads. But we should try not to overreact to the block state.


To put it in a nutshell, the hill climbing heuristic takes care of this kind of scenario as well, so if you cannot come up with the better, breaking-through solutions, accepting it as it is might be good for your mental health. FYI, there's a paper for the thread pool design and implementation. ( It's published by 3 Microsoft employees, but it looks like one of them is working for Google right now. )





ThreadPool in the past.

Before the introduction of TPL ( Task Parallel Library ), namely .NET 4.0, what the thread pool used to look like? In the early version of CLR, A work item went into a linked list, instead of a queue, when you used ThreadPool.QueueUserWorkItem in the thread pool. And the mutual exclusion was managed by using the Monitor class to acquire lock. This meant that when you called the thread pool API, a fresh new node should be added to the linked list, which is way more expensive than enqueueing an item to a queue, and it meant that the GC should traverse all of the items in the linked list. Acquiring lock every time enqueueing and dequeueing happened was a plus overhead.

This approach wasn't so pain in the neck at that time as it may look now. A unit of heavy workload was what the thread pool was designed for, take ASP.NET web application as its example. A task in the web application meant generating an entire web page. This kind of work itself took long enough to ignore the CPU overhead of acquiring lock and adding nodes.

However the concurrency requirement in applications keep increasing as the brand new hardwares usher in the new era of multiple CPU cores. Once the thread pool's purchase was taking a large chunk of tasks and offloading them to a few threads, now is taking independent, fine-grained, light weight small tasks that might take less than a second. That's why the thread pool was improved at the release of .NET 4.0.




Faster FIFO

Due to the CPU overhead of getting lock every time enqueue and dequeue happens and traversing the linked list thoroughly I mentioned above, the thread pool should minimize the synchronization overhead by using a data structure that doesn't use lock. Moreover the thread pool needs a GC friendly queue, for which developers still should allocate an object ( delegate ), but it is now much smaller than before and easily traceable, so GC wouldn't have to suffer from searching through a data structure. The queue in the thread pool is basically identical to System.Collections.Concurrent.ConcurrentQueue<T> introduced in .NET 4.0.

Improving QueueUserWorkItem is good in that it would make an existing application using the thread pool run faster without changing any code. This performance improvement wouldn't much shine on the application that still works on a heavy workload with the thread pool, but it will be significantly beneficial for the application that's developed on a multi-core hardware environment and executes a small, fine-grained work.

But we still don't know much about the works we are allocating through ThreadPool.QueueUserWorkItem API, which is just a fire and go, so there're still something we want to make differences.



Introduction of Task Parallel Library

The TPL is what the .NET Threadpool team has added for breaking through the limitation of the API. Now it's finally time for writing about TPL. My next post will cover the basics and the details of TPL.