Threadsafe programming Rodolphe Ortalo 1998/01/30 (ed. 98/2/4) General thoughts and information about threadsafe programming 1. Preface Well... Threadsafe programming is not such a difficult thing, but you need a clear mind concerning mutual exclusion, semaphores/mutexes, and, as a corollary who Djikstra is. I've spent several months working on the Chorus microkernel where threads are the common (and not the special) thing. Maybe I can state a few general things. (Section titles are here to help the hurrying reader.) 2. Cultural First, Djikstra is a computer science professor who made several theoretical contributions in the programming languages area (and probably many other areas :-). He is most famous due to his "invention" of computer related semaphores (which have nothing to do with traffic lights of any kind). Second, in my opinion, the easiest way of considering the problem of thread protection is to view a multiple-threaded program as a PARALLEL program. Think of the problem with ALL threads executing simultaneously (really simultaneously) even though it is not the case, and you will see the annoying cases much more easily. (And let the linux kernel do its scheduling in the order he wants. Our parallel brain has a better way of exhibiting the problems.) Third, there are two relatively different problems addressed by the use of semaphores and mutexes: o threads synchronization; o and thrads shared data access protection. The latter is probably the easiest way of seeing the problem, and the most common case. Synchronization may also be an issue, but... well, let's speak of the data protection. 3. Basic The problem: when parallel (or real asynchronous) control flows (ie: threads) access the _same_ data, you HAVE to protect such accesses. Full stop. There are read/write differences, but, you may think to the problem in a simple way: shared data should be protected so, you should: o do something before accessing it, o and do another thing after (in order for the other threads to access it). ==> Suppose you have a global variable called "the_data" used by several threads. You _must_ protect it, so you use a mutex: "the_data_mutex". Think to the mutex as a switch (or a barrier with a big Guardian Minion). Before accessing the_data (potentially accessed by multiple threads) you do : MutexGet(the_data_mutex). You will be allowed to proceed only when the_data is free (and you will sleep until then). After you've done things with the_data you do: MutexRelease(the_data_mutex). Then you get out of the protected area and other threads may be allowed to enter in it. NB: "local" C variables are really local (they are on the stack and each thread has its own stack). So you never need to protect them, of course. Here we are. We have protected a simple thing. Of course, you may put a big abstract mutex "the_mutex_for_everything_shared" in your program, and then Get/Release it each time you do something in the global space. But this would be relatively inefficient. So if you have "first_data" and "second_data", and their repective: "first_mutex" and "second_mutex", you think that you should do Get/Release(first_mutex) before accessing first_data, and Get/Release(second_mutex) before accessing second_data. You are right ! But: there is a new problem arising now. Consider 2 threads doing this: ______________________________________________________________________ Th. 1 Th. 2 | | MutexGet(first_mutex) MutexGet(second_mutex) | | MutexGet(second_mutex) MutexGet(first_mutex) | | MutexRelease(second_mutex) MutexRelease(first_mutex) | | MutexRelease(first_mutex) MutexRelease(second_mutex) | | + + ______________________________________________________________________ Each of these threads seems to do the right thing, but in fact, they don't! We are completely asynchronous, so the worst can arrive. Suppose Th.1 successfully Get the first_mutex, and then just after Th. 2 successfully Get the second_mutex. After that, Th.1 will try to get the second_mutex and Th.2 the first_one, and none of them will never get it. They are waiting for each other to release it so... This is a DEADLOCK. ===> Therefore, the conclusion is: threads should do The Right Thing, i.e. always access the mutexes in the same order ! This is a general rule: always access the mutexes in the same order! If you ever get fb_mutex before gc_mutex, you should always do this in all your threads! Of course, you may access only fb_mutex or gc_mutex independently... But if you get gc_mutex, don't ever try to Get fb_mutex. Okay ? 4. In GGI In GGI: it seems that the IS_CRITICAL, ENTER_CRITICAL and LEAVE_CRITICAL macros implement the "the_mutex_for_everything_shared" abstraction. So, if you do something that has to be thread-safe, you may surround all your code with ENTER_CRITICAL and LEAVE_CRITICAL pairs. This may be inefficient, but well, probably not, and you are safe at least. It means that nothing else will be done by any GGI code before you LEAVE_CRITICAL. (To others: Is it the actual meaning of these macros ?) NB: A code section surrounded by a MutexGet/Release pair is generally called a critical section (hence the macros names). But I don't really like this naming, because in fact a section is always critical with respect to something (the data accessed, and nothing else). You can encapsulate _several_ critical sections inside each other without any harm if you are careful. You may even interleave several critical sections without any harm, but, you have to be _very_ careful with this (and I wouldn't recomment this to anyone). You should not protect the library calls, you should protect _your_ data ! (Well, if you program a library, your data is library data, but...:-) Of course, if you use thread-UNsafe libraries, your program will not be thread-safe either and you are doing extra work. But well, libggi is supposed to be thread-safe one day (and then, it must rely on a thread-safe libc for example...). However, if you pass a "pointer" to a library function, the associated data is _your_ data (even though the library will modify it, you simply delegate the work to it). So you should protect it: but only if your main program is multi-threaded. The problem with GGI accelerated commands is that, implicitly, a new "thread" (in fact a control-flow) can always appear: the "program" that executes on the card's chipset (bitBLT engine, 3D engine). The data to protect (your data) is the framebuffer... We could use a real mutex to protect this things, but well, it is simpler to have a "wait_until_finished" approach. You can (and you will be obliged) to wait for the on-video-board "thread" to finish. Hence, each time you want to draw on the framebuffer ((either yourself or thanks to the library), you must do a ggiFlush (or maybe ggiFinish as Andreas wanted to change the name), to let the engine leave the framebuffer to you. If you don't do that: you may face a data corruption problem (as with first_data or second_data). (NB: It is not so "critical" for graphical data, it's "only" ugly...) You may ask me: but, well, we don't have a true mutex, so, if I think truly "parallel", what about the case where the 2D engine starts drawing something on the framebuffer and I start also ? Should I use ggiFinish _before_ drawing. The answer is no: the underlying assumption is that the accelerated engine will never start something by itself (at least I hope so). So, you should ggiFinish only after a drawing command (and before the next one of course). By the way, we also do a synchronization with ggiFinish as the _whole_ framebuffer will be "ready" after that call. Why not implement a "complete" locking system on the framebuffer (ie: Get/Release, or Enter/Leave) ? Simply because it is a very big data, and it would be inefficient. Furthermore, unless you use the card's memory to store program code, corruption of this big data is only garbage on screen, so... Let it be simpler, and the programmer will check that things are correct, and be synchronous if he doesn't want to worry about this. In this case, the programmer forgets about the bitBLT or 3D engine running asynchronously, and simply says to the libggi: be SYNC and worry about these stupid things yourself. This is a "convenience" mode, that effectively makes use of a complete locking semantic on the framebuffer (maybe a clever one, but well only because we are good persons) and manages it itself. If the programmer wants to go ASYNC, then, _he_ will have to manage the ggiFinish problems himself. The advantage ? He _can_ do a finer grain control of the data and make his application draw in the framebuffer at the same time as the card engines, (hopefully in different areas, or the result will be... funny :-). In this case, GGI (kgi+libggi) only does controls wrt hw safety (prevent hardware locks if needed, etc.). We may rename these macros: GARBAGE_FORBIDDEN, GARBAGE_ALLOWED... :-) NB: However, even if you are synchronous wrt graphics (ie: a single drawing thread), this does not mean that your program shouldn't protect any data. If the AI thread sends data to the drawing thread, these data should be protected in the application program. 5. Tricky Now, that you have two simple rules. Some other ones (left unjustified here): o don't use a C int to protect the data ! Always use a mutex (or a semaphore if you know how it works) ! It is a common fault (and sometimes not a fault). This is architecture dependent ! The final operation behind a mutex MUST be atomic, really atomic. Therefore, it should finish by a test-and-set _hardware_ instruction. I don't think C guarantees its test-and-set language ops. to be atomic at the hardware level... So you must use a dedicated library. (NB: This is why a mutex is never called an int...) Don't try to implement a mutex yourself, it may be tricky (it may work on x86 with gcc and not on alpha with cc for example). o don't do this in too many very concurrent threads: ___________________________________________________________________ while (1) { MutexGet(a_mutex); /* some work */ MutexRelease(a_mutex); } ___________________________________________________________________ if you are not sure that a "schedule" entry point may appear in the /* some work */ (a system call for example). Use semaphores (or verify the "semantic" of mutexes). In this case, you may finish with a new "starvation" problem: on preemptive systems you only create a perfor- mance slowdown, on real-time system, some threads will never execute. (NB: This problem is very rare on real systems. In fact I don't know any system on which it appears. But this is a symptom showing that a mutex is _not_ a semaphore with initial value of 1. Theoretically at least, in practice, it is always implemented in this way, and that's why you never face the problem ;-) o be careful on our Unix systems... We don't have true parallel systems, we have time-sharing deterministic systems. This means that your program may work _perfectly_, on all the runs you can try, and _not_ be thread safe. This is simply because, with your environment, the deadlocks or data corruption conditions never occur. If you change the workload, or the time-slice, or something else, it may fail. These faults are _very_very_very_ difficult to identify a posteriori. The best solution is to READ the code and check that you have no obscure zone that you don't understand. (NB: This a general good coding practice by the way: I know that, I rarely do so and I have lots of bugs. :-) If you have more precise questions or comments, you are welcome, of course.