Tue 24 May 2011
Building a bicycle-built-for-two isn’t a great metaphor for designing multi-threaded software, but I needed catchy title, so there you have it.
Few companies and few developers like to talk about the warts in their software, but let’s face it — for decades, SlickEdit has been a single threaded application, nothing but a bicycle-built-for-one. That single thread rode high and strong and did what it wanted, with all the powers and dangers of the world granted to it to do as it pleased. Let’s call this thread George, King George, as it was.
Then we decided to let another thread into our world. Well, needless to say, King George did not like this newcomer in his kingdom. Suddenly, he had to synchronize his peddling and braking and balancing with another thread, and let me tell you, he was not used to doing so much paperwork. Sorry, George, you have to acquire a Mutex before you can use the memory allocator. Sorry, George, you have to release the Mutex when you are done. George was angry, frustrated, and not accustomed to waiting in line behind these mere peasants. They were supposed to make his life easier. Instead, they pervasively filled his kingdom with landmines. George had to step carefully from now on. His smallest comfort was that those little threads also had to step carefully, very carefully. The King was also comforted to know that there still were places that only he could step.
Converting a single-threaded piece of software into multi-threaded software is a task that requires vision. It also requires strictness and thorough investigation of what-calls-what and what-uses-what. The vision comes from finding ways to utilize asynchronous processing and deciding what parts of the application to make thread safe first. The vision also comes from learning new ways to design less stateful APIs and data structures so that you can minimize access to shared data, or at least make the access thread-safe.
It can’t be done all at once.
The techniques for writing thread-safe software and multi-threaded software in general are well documented. It is certainly not a new science, but each thing you convert poses new challenges.
My challenge in SlickEdit 2010 and SlickEdit 2011 was to make it possible for a thread to parse code and insert indexing information into a tag database. Well, that certainly can’t be real hard. That is, until you meet King George.
Step 1: A thread that parses code
SlickEdit has great parsers for many, many languages; the problem was, none of them were written in a thread-safe manner. The lexical analyzer framework was completely stateful and also lacking in flexibility and power. Every parser would need a new lexer, and it had to perform well. The invocation mechanism depended on the Slick-C interpreter, which is still far away from being a thread-safe component. The parsers used global data when they wanted to and communicated directly with the database. To make a long story short, a lot of code had to change.
Step 2: A tagging job queuing framework
Once we could parse code, we couldn’t just start creating threads all willy-nilly. We needed a block of threads to do the work and pass the results forward to the editor. We needed to define what a tagging job was and how the indexing information was collected. We needed to reconsider every place where the editor would launch tagging jobs and see if those jobs could be done on a thread. As it turned out, the answer was “nearly everywhere”. Oh, by the way, King George liked that.
Step 3: A thread-safe tag database
This is one of the big steps forward in SlickEdit 2011. A thread can write to the tag database, making it possible to do everything required to build or update a tag database completely in the background. This was no easy task, because the tag database is a sophisticated component that was built specifically for King George. We had to rethink how we traversed through items in the database. We had to get rid of shared global variables. We had to refine the database block cache so it could be shared by threads.
Step 4: Getting a list of files to tag on a thread
The final step was to scale the threading up from allowing the main thread to schedule jobs one at a time to having a thread schedule a a list of files to be tagged. We needed this thread to find all the files in your workspace, check dates, figure out what language the files were, and finally schedule anything that was new or out-of-date to be tagged. This meant rewriting a lot of single-threaded Slick-C code as thread-safe C++. The speed gains from this change were significant.
The final result
SlickEdit 2011 is a huge improvement over SlickEdit 2010 with respect to its handling of background tagging. King George is beginning to appreciate what these little threads are doing for him and how they are making his job easier now.
The worst problem: Mutex acquisition order
This is what I regard as the worst thread-safe code synchronization problem. Now, if you are fortunate enough to be writing thread-safe code from scratch, where you have an existing, clearly modularized code whose critical sections are small, short-lived, and well encapsulated, it is not a problem likely to hurt you unless you do something silly. But, when you have a large base of single-threaded code with large amounts of shared data that you have to try to make access to thread safe, things get ugly really fast. Here is the classic deadlock condition.
VSMUTEX mutex1; VSMUTEX mutex2;
In the main thread, King George does this.
mutex1.lock(); ... // I own mutex1 and rule supremely ... // except that I don't own mutex2 yet mutex2.lock();
In another thread, some peasant does this.
mutex2.lock(); ... // I own mutex2, nobody else can have it, ... // not even my reverent King mutex1.lock();
When King George has mutex2 and is trying to get mutex1, and simultaneously, another thread has mutex1 and is trying to get mutex2, they will deadlock forever. There are a few ways around this, but the best way is the classic solution of don’t do that. The problem is that it is hard to see how pervasive this problem can be when some special case, such as error handing, causes everything you think you know about what order you acquire Mutex’s in to be reversed. This is a problem that can only be solved with thorough analysis and testing and fundamentally sound designs. But, given the circumstances surrounding migrating older single-threaded code to support threading, such designs are harder to come by than one would hope.
If you read this whole thing looking for deep insight on how to write solid thread-safe code, well, you probably finished sadly disappointed. If you are planning on renting a bicycle built-for-two with your spouse and were looking for riding tips, then you really, really read the wrong article. If you came here because you always thought SlickEdit should do tagging in the background and wondered why it took us so long to implement it that way, then maybe now you understand the obstacles and hurdles a little better. We are still iteratively improving the system and working to improve the tagging throughput. You can look forward to seeing even more performance gains and scaling in the first update (16.0.1).
SlickEdit is moving forward, and finally, George isn’t the only one peddling the bike.