Wed 28 Nov 2007
Everyone loves to talk about software reuse. Everything is object oriented this, design patterns that, frameworks here, components there. It’s all good stuff, but at the end of the day, it doesn’t really help much with struggles of day to day programming. I’m talking about the “bite-sized tasks“, the kind of programming you face day-in and day-out:
- Letting the user manage a collection of objects with a dialog
- Creating an MRU menu
- Running a task on a background thread
- Paging a large set of data in a grid
- Writing drag-and-drop code
There are many bite-sized tasks that programmers face every day. It’s the unglamorous and tedious side of programming. Most developers spend the far majority of their time tackling these tasks that are often too specific to be effectively handled by a general purpose framework. Yet, there is a commonality to most of them that all programmers will recognize and respond to with something like, “Oh yeah, I’ve done something like that before.” Reusable frameworks that support the core of a software project are crucial to its long term stability and scalability, but most programmers don’t spend their day working on them. The list of bite-sized tasks is just too big and specific to be encapsulated in a such a framework.
Some frameworks have tried to address these tasks by creating single use objects designed to provide a general starting point. For instance, .Net provides the BackgroundWorker class to help run tasks on a background thread. It’s a very good class and I use it whenever I need to run tasks asynchronously, but it’s a standalone class. It has no relationship to any of the other objects in the .Net framework. There are many classes like this that I would not categorize as “framework” classes… they’re just helpers.
Generalized helper classes certainly serve a need in software development, but even those can only carry the ball so far. For one thing, you still have to bridge the gap between the generalized functionality a helper class provides and the specific functionality you are trying to accomplish. Sometimes this can be just as challenging as not using the helper class at all. Other times, the helper does 90% of what you need and the remaining 10% makes using the helper class impossible. If you can’t get inside the helper class and tweak it to do that extra 10%, then the helper class becomes useless for that task.
Where reuse falls short
This doesn’t mean that those bite-sized tasks haven’t been tackled before, though. On the contrary, they’re tackled over and over and over again by different developers in different companies that never know of each others’ existence. This is a big hole in the state of reuse in today’s software development.
However, outside of the commercial frameworks and components are endless code samples cataloged on the internet. Sites like Source Forge, Code Project, Koders, Google Code Search, CodePlex, O’Reilly Code Search, etc, provide endless amounts of code that do those little tasks. Disassemblers are also available to get the source code from compiled .Net or Java assemblies. Need to read a comma delimited file? Somewhere, someone has done that before and it’s probably on the internet. Find a project that does that, look at the code and use it… it’s that simple.
Ok, maybe not. In order to do that you have to do “code scavenging”. Doing this involves the following process:
- Finding public code that does what you need to do.
- Scanning the code and pulling out just what is needed to accomplish the task.
- Tweaking the found code to make it work with your own code.
Many people will stop reading here and argue that this is software reuse at it’s worst… cut, paste and retrofit to solve the problem. My answer to that is… yes, it is, and that’s ok. If you think about the hierarchy of a software project, the farther we get from the core, the more task-specific the code becomes. At some point in that hierarchy, you reach a point of diminishing returns on object oriented reusability. Striving for reuse simply for reuse’s sake at this level will hurt the code, rather than improve it. These task-specific extremities in the software hierarchy most likely represent the bulk of the day to day coding work. That’s not being pessimistic about reuse, it’s being realistic.
A new methodology
I picture the “code scavenging” style of programming as a new methodology of reusable software development. Practicing this type of reuse is very different than practicing component or framework reuse. It requires resourcefulness to find sample code, technical knowledge to quickly understand the found code, and insight to know how to retool that found code into your own code. Applied properly, though, it can turbo-boost productivity and quickly solve problems that might otherwise take days to work out.
Unfortunately, there doesn’t seem to be very many tools or documented techniques for this type of programming. Some authors have mentioned it (, , ) but overall, little work has been done on how to do it effectively. There are several barriers to code scavenging as a methodology as well. Web sites hosting code samples are disconnected and sometimes poorly cataloged. Samples are not guaranteed to be good or even work properly. They can also be copyrighted by the original author under any number of legal bindings.
Despite the difficulties, I believe there is a need to develop a methodology around code scavenging. This methodology would address best practices in the following areas:
- Finding sample code: Finding samples that fits your needs is not always as easy as a quick Google search. What are the techniques for finding quality sample code?
- Determining the root code sections that perform the required task: The root code is the minimum set of code that performs the task you are trying to accomplish. This may be in one or several different locations in the sample. What are the best ways to identify these areas?
- Finding the “incision points”: Incision points are the areas of code in the sample where you can “cut” that still enables the root code sections to work. This includes initialization functions, non-local variables, etc. How are these best identified?
- Merging the extracted code into your own work: This is the area where a set of documented techniques would shine. What is the best way to add the found functionality without having to rework a lot of your existing code? Even better, how do you keep the sample decoupled from your code in case it fails to work as you expected. Are these techniques the same for UI vs. back end work?
I certainly don’t have the answers to all of these questions, but I have a lot of ideas and I’m really interested in exploring this further. Code scavenging is a technique that I use often… it’s practically replaced paper books as my method of learning new concepts. Sometimes I just use it as a means to understand a concept before working on a problem. I even used a sample project to learn how to use the BackgroundWorker helper class mentioned earlier. As I said in another blog post, “Just like a picture is worth a thousand words, a sample project is worth a thousand reference books”.