Wed 28 Nov 2007
Code Scavenging: A New Software Development Methodology
Posted by Scott Hackett under Programming
[16] Comments
Everyone loves to talk about software reuse. Everything is object oriented this, design patterns that, frameworks here, components there. It’s all good stuff, but at the end of the day, it doesn’t really help much with struggles of day to day programming. I’m talking about the “bite-sized tasks“, the kind of programming you face day-in and day-out:
- Letting the user manage a collection of objects with a dialog
- Creating an MRU menu
- Running a task on a background thread
- Paging a large set of data in a grid
- Writing drag-and-drop code
There are many bite-sized tasks that programmers face every day. It’s the unglamorous and tedious side of programming. Most developers spend the far majority of their time tackling these tasks that are often too specific to be effectively handled by a general purpose framework. Yet, there is a commonality to most of them that all programmers will recognize and respond to with something like, “Oh yeah, I’ve done something like that before.” Reusable frameworks that support the core of a software project are crucial to its long term stability and scalability, but most programmers don’t spend their day working on them. The list of bite-sized tasks is just too big and specific to be encapsulated in a such a framework.
Helper classes
Some frameworks have tried to address these tasks by creating single use objects designed to provide a general starting point. For instance, .Net provides the BackgroundWorker class to help run tasks on a background thread. It’s a very good class and I use it whenever I need to run tasks asynchronously, but it’s a standalone class. It has no relationship to any of the other objects in the .Net framework. There are many classes like this that I would not categorize as “framework” classes… they’re just helpers.
Generalized helper classes certainly serve a need in software development, but even those can only carry the ball so far. For one thing, you still have to bridge the gap between the generalized functionality a helper class provides and the specific functionality you are trying to accomplish. Sometimes this can be just as challenging as not using the helper class at all. Other times, the helper does 90% of what you need and the remaining 10% makes using the helper class impossible. If you can’t get inside the helper class and tweak it to do that extra 10%, then the helper class becomes useless for that task.
Where reuse falls short
This doesn’t mean that those bite-sized tasks haven’t been tackled before, though. On the contrary, they’re tackled over and over and over again by different developers in different companies that never know of each others’ existence. This is a big hole in the state of reuse in today’s software development.
However, outside of the commercial frameworks and components are endless code samples cataloged on the internet. Sites like Source Forge, Code Project, Koders, Google Code Search, CodePlex, O’Reilly Code Search, etc, provide endless amounts of code that do those little tasks. Disassemblers are also available to get the source code from compiled .Net or Java assemblies. Need to read a comma delimited file? Somewhere, someone has done that before and it’s probably on the internet. Find a project that does that, look at the code and use it… it’s that simple.
Ok, maybe not. In order to do that you have to do “code scavenging”. Doing this involves the following process:
- Finding public code that does what you need to do.
- Scanning the code and pulling out just what is needed to accomplish the task.
- Tweaking the found code to make it work with your own code.
Many people will stop reading here and argue that this is software reuse at it’s worst… cut, paste and retrofit to solve the problem. My answer to that is… yes, it is, and that’s ok. If you think about the hierarchy of a software project, the farther we get from the core, the more task-specific the code becomes. At some point in that hierarchy, you reach a point of diminishing returns on object oriented reusability. Striving for reuse simply for reuse’s sake at this level will hurt the code, rather than improve it. These task-specific extremities in the software hierarchy most likely represent the bulk of the day to day coding work. That’s not being pessimistic about reuse, it’s being realistic.
A new methodology
I picture the “code scavenging” style of programming as a new methodology of reusable software development. Practicing this type of reuse is very different than practicing component or framework reuse. It requires resourcefulness to find sample code, technical knowledge to quickly understand the found code, and insight to know how to retool that found code into your own code. Applied properly, though, it can turbo-boost productivity and quickly solve problems that might otherwise take days to work out.
Unfortunately, there doesn’t seem to be very many tools or documented techniques for this type of programming. Some authors have mentioned it ([1], [2], [3]) but overall, little work has been done on how to do it effectively. There are several barriers to code scavenging as a methodology as well. Web sites hosting code samples are disconnected and sometimes poorly cataloged. Samples are not guaranteed to be good or even work properly. They can also be copyrighted by the original author under any number of legal bindings.
Despite the difficulties, I believe there is a need to develop a methodology around code scavenging. This methodology would address best practices in the following areas:
- Finding sample code: Finding samples that fits your needs is not always as easy as a quick Google search. What are the techniques for finding quality sample code?
- Determining the root code sections that perform the required task: The root code is the minimum set of code that performs the task you are trying to accomplish. This may be in one or several different locations in the sample. What are the best ways to identify these areas?
- Finding the “incision points”: Incision points are the areas of code in the sample where you can “cut” that still enables the root code sections to work. This includes initialization functions, non-local variables, etc. How are these best identified?
- Merging the extracted code into your own work: This is the area where a set of documented techniques would shine. What is the best way to add the found functionality without having to rework a lot of your existing code? Even better, how do you keep the sample decoupled from your code in case it fails to work as you expected. Are these techniques the same for UI vs. back end work?
I certainly don’t have the answers to all of these questions, but I have a lot of ideas and I’m really interested in exploring this further. Code scavenging is a technique that I use often… it’s practically replaced paper books as my method of learning new concepts. Sometimes I just use it as a means to understand a concept before working on a problem. I even used a sample project to learn how to use the BackgroundWorker helper class mentioned earlier. As I said in another blog post, “Just like a picture is worth a thousand words, a sample project is worth a thousand reference books”.
16 Responses to “ Code Scavenging: A New Software Development Methodology ”
Comments:
Leave a Reply
Trackbacks & Pingbacks:
-
Pingback from Don’t Read Source Code | Scientific Ninja
December 1st, 2010 at 1:48 am[...] may be able to apply his insight and knowledge to the code and divine some utility from it (“code scavenging” is waxing in popularity and legitimacy, after all), but a beginner can’t do [...]
-
Trackback from Avr Code Examples,Avr Code,atmega128 code,source code Avr ,
December 6th, 2011 at 3:27 pmAvr Code Examples,Avr Code,atmega128 code,source code Avr ,…
[...]Code Scavenging: A New Software Development Methodology » "Hello World" – The SlickEdit Developer Blog[...]…
-
Pingback from Find. Cut. Paste. Tweak. | On the Way to Somewhere Else
July 14th, 2012 at 5:01 pm[...] suggested that the concept of – find, copy, paste, tweak – was similar to “Code Scavenging” described by Scott Klemmer at Stanford. One of Klemmer’s articles gives the example of [...]
November 28th, 2007 at 10:06 am
You missed a very important step – validating that the scavenged code does what’s it says it will do and only that. A very high percentage of the online code that I’ve reviewed over the years is broken in usually subtle ways. Now, any code I scavenge like this gets unit tests written right away to validate what it does and doesn’t do.
November 29th, 2007 at 1:10 am
Hi Scott,
One of the challenges we run into with code search is that the results for what I call semantic search (looking for something that does X) are often not very good.
A while back I wrote a blog post titled Semantic Code where I discussed this issue in more depth, but the fundamental problem is that programmers usually don’t do a good job of including comments that conceptually describe what something does – comments, when they exist, focus more on how something works.
The best solution we’ve got so far is to use project descriptions to help find useful components, but that doesn’t help much with code scavenging. Here the technique I most often use is to do a search for API calls that I expect to find in functions which implement the desired functionality.
Sometimes that works well, other times it’s a frustrating exercise in wandering through a bunch of almost-but-not-close-enough code.
But I still scavenge code, and I’m glad you came up with a good term for it, though it does make me feel like a bit of a vulture
– Ken
PS – And I totally agree with what Bob said…never trust code without a unit test. Though that applies to my own code too.
November 29th, 2007 at 1:13 am
Hi Scott,
Since I can’t edit comments, hopefully you can fix up the busted link on my previous reply – thanks!
– Ken
November 29th, 2007 at 9:37 am
Bob, you’re right, using sample code found in the public domain is about as safe as using Wikipedia for a research paper. “Trust nothing and test” is always a good practice.
Ken, that’s true that most publicly available code is poorly commented at best. I like what you guys are doing over at your site to find more contextual information about searched code. Your site’s been bookmarked.
I’m really shooting for something like a workflow pattern, here. I’ve seen this technique used by lots of Photoshoppers, where they have set patterns for taking their digital photos then running them through the paces of one or more workflows. I do this too when I write software, I just never really but much thought into the fact that when I have to implement something I’ve done before, I have a pretty well established workflow that I go through that’s very different from when I start a new task from scratch.
Thanks for the feedback!
November 29th, 2007 at 6:48 pm
At my previous job, I had the responsibility to create reusable SW modules that worked across a wide range of processors and platforms. I really like the analysis you posted, and would like to add some of my thoughts to it.
First of all, it’s almost silly to try and re-use code unless it was originally designed to be reusable. At our company, people were trying the cut-and-paste-then-revise method of reuse. It’s similar to the code scavenging part, with the added bonus of going back to the original project and trying to make both work with the same code. A big no-no. Code that was never designed to be re-used shouldn’t be messed with.
Second, it takes about 3-4 iterations to flesh out most of the reuse requirements. The first time you write a module, you’re focused on the specific task at hand. The second time, you’ll notice where things weren’t properly abstracted, and you’ll be able to refactor it some. The third time, you’ll start to see more subtle areas that you need to consider, and only then does the module become fairly reusable. Needless to say, you need to properly scope the module to define its behavior, which goes back to the first point – DESIGNING the code to be reusable.
I do like the code scavenging aspect of things; that’s how we learn new techniques and ideas. It’s also a valid method of reuse, although perhaps not as efficient as having a single module work across multiple platforms. At this point, one needs to decide whether the effort to make a module truly reusable is worth the benefit, or whether its more effective to make a copy of the code and modify to suit your taste.
Eventually, it would be nice to have the WikiSourceCode, but that’s probably a ways away. I imagine that would be easier to do for RTOS objects (such as UART drivers and frameworks) but a lot more difficult for app specific stuff.
Thanks for listening,
-SPaik
December 1st, 2007 at 9:37 am
First-class Copy & Paste for Code Reuse
Like most problems, textual code reuse via copy & paste has already been studied, and a language based on first-class copy & paste is being developed [1]; it’s called the Subtext language [2]. See a more detailed discussion of it on lambda-the-ultimate [3]. Academia is not so far removed from everyday programming as people seem to think!
[1] http://subtextual.org/OOPSLA06.pdf
[2] http://subtextual.org/
[3] http://lambda-the-ultimate.org/node/691
December 1st, 2007 at 4:25 pm
Hi Scott,
I had actually given this topic some thought a few months ago, and wrote up a proposal (I don’t have a blog), which has been sitting around and gathering dust.. never got around to publishing it. I think it echoes some of the very same points that you raise. Do take a look and let me know if you have any comments.
http://www.cc.gatech.edu/~avr/distcode.pdf
thanks!
anirudh
December 1st, 2007 at 7:43 pm
Hi Scott,
Good post. You mentioned a number of open source repositories (e.g. Source Forge) and several code search engines (Google Code Search, O’Reilly Code Search), but failed to mention Krugle. (www.krugle.com)
I could be wrong, but I think we (Krugle) have the largest index of open source code and are the code search engine used by SourceForge, IBM DeveloperWorks, Yahoo! developer network and several others. We make this available for free directly from our site or through the partners mentioned above. We also sell an appliance that does code search inside of your company called Krugle Enteprise. Was this an oversite or maybe you don’t know about us?
Steve Larsen
Krugle
December 1st, 2007 at 10:36 pm
@Steve… yes, I hadn’t heard of your web site or product before reading Ken’s comment a few days ago. Like I replied to him, I really like what you guys are doing there and will definitely use that in the future.
We face the same thing here… it’s hard to get the word out about yourself. Some days I knock myself out harder trying to think of ways to get our name in front of people than I do working on the product itself.
Good luck with your site and your product!
December 1st, 2007 at 10:37 pm
@Anirudh… I will read that and write back here, thanks for the link!
December 1st, 2007 at 10:58 pm
Anirudh, I read your article and it was excellent. It’s interesting, we flirted with the idea of doing something very similar to that and decided against it because of the P2P logistics and the need for a pre-existing community to make it worthwhile to use. In other words, as a product, something like that doesn’t work right out of the box if you’re the first guy (or girl) in the sharing network. Someone here pointed me to CPAN (http://www.cpan.org/) as a very close resemblance to that type of functionality in the world of Perl.
One thing that I did a while ago was make an add-in for VS2005 that managed downloads from Code Project:
http://www.codeproject.com/csharp/cpbrowser.asp
We submitted it and it’s source as an article and it did fairly well. As a prototype, I think there’s something really worthwhile there that’s worth exploring. We’ll see where that goes.
Thanks again for the link to your work!
March 7th, 2008 at 1:43 pm
I think the title methodology is overused. Code scavenging feels more like a technique or practice (listen to interview with Ivar Jacobson at http://www.spamcast.net) that could be incorporated into other methods (whether agile or plan based I am not sure it matters). In your opinion could this practice be integrated into more formal methods?
March 7th, 2008 at 2:00 pm
I completely agree and would love to see the technique become more formalized. If nothing else, it would be great to have a good set of scavenging tools at your disposal. CodeProject and those sites I mentioned are great at what they do, but in the end you still wind up with what amounts to a code junk yard… scraps of code and half baked projects scattered all over the place on your machine. It would be very nice to have real methods for reusing what others have done before you, and using that code like a map to help you get to where you need to go. The right tool set could help a lot with that.