For the implementation of the Timemachine I needed to figure out, how thread scheduling under Linux and FreeBSD works. Unfortunaly I was not able to find any document, that really describes how these systems behave and how thread scheduling can be tuned. This document summaries my experience with thread scheduling. I gathered this information from various online sources, man pages, and some test running mulithreaded test programs.
I assume that you have a basic knowledge of threading in general and pthreads in particular. I will only discuss thread scheduling in this document.
In Section 2 will describe the test environment, Section 3 will cover general scheduling terms and will give an overview of thread scheduling, Section 4 and Section 5 cover the pthread implementations on FreeBSD and Linux. Finally Section 6 summarizes my findings and concludes this document.
I cannot guarantee that the results documented here are accurate! The tests were rahter sort. I only tested the pthread implementations with a limited amount of background workload. I also did not test what happens when several multithreaded processes are run in parallel. Nevertheless my test results were significant enough to base this document on them.
Test environment used was:
The following pthread implementations were tested:
There are two possible contention scopes. PTHREAD_SCOPE_SYSTEM and PTHREAD_SCOPE_PROCESS. They can be set with pthread_attr_setscope(). The scope of a thread can only be specified before the thread is created.
A thread that has a scope of PTHREAD_SCOPE_SYSTEM will content with other processes and other PTHREAD_SCOPE_SYSTEM threads for the CPU. That is if there is one process P1 with 10 threads with scope PTHREAD_SCOPE_SYSTEM and a single threaded process P2, P2 will get one timeslice out of 11 and every thread in P1 will get one timeslice out of 11. I.e. P1 will get 10 time more timeslices than P2.
All threads of a process that have a scope of PTHREAD_SCOPE_PROCESS will be grouped together and this group of threads contents for the CPU. If there is a process with 4 PTHREAD_SCOPE_PROCESS threads and 4 PTHREAD_SCOPE_SYSTEM threds, then each of the PTHREAD_SCOPE_SYSTEM threads will get a fifth of the CPU and the other 4 PTHREAD_SCOPE_PROCESS threads will share the remaing fifth of the CPU. How the PTHREAD_SCOPE_PROCESS threads share their fifth of the CPU among themselves is determined by the scheduling policy and the thread's priority.
If there are other processes running, then every PTHREAD_SCOPE_SYSTEM and every group of PTHREAD_SCOPE_PROCESS threads (i.e. every process with PTHREAD_SCOPE_PROCESS threads) will be handled like a seperate process by the system scheduler.
A PTHREAD_SCOPE_PROCESS thread has a priority. Whenever a thread is runnable and no other thread (of this process) has a higher priority the thread will get the CPU. Note that this might lead to starvation of other threads. When two or more runnable threads have the same priority and no other runnable thread has a higher priority, then the scheduling policy will determine which of these highest priority threads to run.
The priority is assigned staticly with pthread_setschedparam(). The scheduler will not change the priority of a thread.
The scheduling policy can either be SCHED_FIFO or SCHED_RR. FIFO is a first come first serve policy. RR is a round robin policy that might preempt threads. But again, the policy only effects threads that have the same priority.
A more extensive description of priorites and policies can be found in  and . Note that these documents discuss process scheduling, but the principle is the same.
Note: The priority and scheduling policy settings are meaningless when a thread has scope PTHREAD_SCOPE_SYSTEM.
It is also possible to do realtime process scheduling.  explains how realtime process scheduling works. sched_setscheduler() is used to set the process scheduling parameters.
The nice value of a process also influences the scheduling behaviour. A process (and the threads therein) with a lower nice value (i.e., higher priority) will get a higher share of the CPU time. Starting a program with nice works as expected. Using the nice() system-call from a threaded program has not been tested (the question is: does a nice() call effect the whole process or the current thread. This may well depend on the pthread imlementation and scope).
The default pthread implementation on FreeBSD 6 is libpthread. It is possible to change the implementation on a per binary base using /etc/libmap.conf(5). In FreeBSD 7, libthread has been removed and libthr is the default implementation. However, libkse can be used in FreeBSD 7 as a replacement for libpthread.
Note: I did not repeat all the experiments with FreeBSD 7. Furthermore a different machine and test program was used for FreeBSD 7.
When settings the priority for threads, the supported min and max values for the priority are 0 and 31 respectivly. Please note, that the manpages claim that the macros PTHREAD_MIN_PRIORITY and PTHREAD_MAX_PRIORITY shall be used to determine the allowed range of priorities, but they are note defined in pthread.h
The libthr implementation uses 1:1 threading. Its default scheduling scope is PTHREAD_SCOPE_SYSTEM. libthr can utilize multiple CPUs.
It is possible to assign some threads a scope of PTHREAD_SCOPE_PROCESS. Changing the priority of threads has unexpected effects. If only one thread has a higher priority than the others, scheduling behaviour is as expected (the high priority thread gets the CPUs whenever it is runnable).
When assigning two or more threads a higher priority than the other threads or when using more than two different priorities scheduling behaviour is unpredictible.
Note: libthr has not been re-evaluated in-depth for FreeBSD 7, but behaviour looks similar to FreeBSD 6
The libpthread implementation uses M:N threading. Its default scheduling scope is PTHREAD_SCOPE_PROCESS, the default policy SCHED_RR, and the default priority is 15.
Assinging priorities to threads and mixing PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM works as expected and described above. libpthread can utilize multiple CPUs regardless of the scope of the threads.
Behave like libpthread on FreeBSD 6.
I will only discuss the ntpl (native posix thread library) implementation, which is the default pthread implementation used in current glibcs. The nptl implementation only uses a 1:1 thread model. The scheduler handles every thread as if it were a process. Therefor only the supported scope is PTHREAD_SCOPE_SYSTEM. The default scheduling policy is SCHED_OTHER, which is the default Linux scheduler. The nptl implementation can utilize multiple CPUs.
It is possible to assign a thread a SCHED_FIFO and SCHED_RR policy. Since the scheduler handles every thread as if it were a process, a thread with one of these policies will be handled like a process with realtime schduling priority (see ). I.e the thread will content for the CPU with all other procesess and not with the threads of the same process. This implies that a thread could starve other processes! Essentially the pthread_set_schedparam() call maps to the sched_setscheduler() call. This operation requires root privileges, since it can have a vital impact on the whole system.
For SCHED_OTHER the allowed min and max priorities are 0. Therefor it is not possible to change the priority. SCHED_FF or SCHED_RR have an allowed range of 1 ... 99 (see sched_get_priority_max(2) and sched_get_priority_min(2)).
All tree implementations can schedule threads to multiple CPUs and achive a high CPU utiliztion with when running one multithreaded process. When control over the thread scheduling is not required, any of the implementations can be used.
I my experience libhtr performs slightly better than libptread, but this probably depends on the workload. I have not compared Linux VS FreeBSD performance.
When control over the thread scheduling is desired, than FreeBSD with the libpthread (FreeBSD 6) or libkse (FreeBSD 7) implementation is by far the best choice, since it is the only implementation, that can be really tuned. Linux with nptl can be used but only for a small set of applications, since root privilege is required and realtime processes are highly dangerous. The libthr cannot be used since its behaviour is not predictable.
Other References are, of course, the man pages of the various functions, altough the contents of the man pages differ widely between different systems.
Updated version of this document can be found here here. Comments and "bug" reports are welcome.