Thu 18 Sep 2008
Can we standardize version control?
Posted by Scott Hackett under Productivity
[10] Comments
There are an unbelievable number of version control systems out there. They all have their strengths and weaknesses, and almost everyone has a strong opinion about which one is best. These software tools weren’t written by some group of developers that were out of touch with their end users either. They are all developer tools, written by developers for developers. Many of them were born from previous version control packages, such as how SVN evolved from CVS. The purpose was to keep what was good about the previous system and redo what they felt was lacking.
At this point, we’re pretty far along in the evolution of version control development. However, there’s a fundamental problem with all of these version control systems. Even though they set out to do the pretty much the same thing, there’s no common interface to any of them. Even worse, the interpretation of common concepts are completely different. It’s not noticeable if you’ve only used one. The more you use, though, the more obvious it becomes. The only attempt at such a thing that I can recall is Microsoft’s SCC interface. Unfortunately, it served a very specific purpose and aimed for the lowest common denominator.
If you have ever tried to switch from one version control system to another, you have suffered through this. If you’ve ever had to port the version history from one system to another, then you likely gave up and stuck with the old system or lost all of your history. The interoperability of version control systems is horrendous.
Standardized interpretation
Let’s start with version numbers, the ID that identifies a particular version of a file… how basic is that? CVS has it’s own dot notation. SVN and Team Server keep a global number that gets incremented when any file is checked in. This difference in interpretation leads to fundamental differences in how these systems implement labeling and branching.
How could these common concepts be interpreted so differently? It means that in order to move from one version control system to another, you not only have to learn the new commands, but you must understand that system’s interpretation of what they mean. Labeling and branching in CVS and SVN are very different, even though the basic concept is the same. It would be nice to see some a standardization of the implementation of these concepts between version control systems.
Standardized commands
Imagine a world where databases each had their own proprietary language that was completely unrecognized by any other database. Alright, maybe that’s not so far off from reality, but beyond PL/SQL, T-Sql and others, there’s ANSI SQL, which is the common language of all relational databases. It theoretically (note the italics) allowed developers or DBAs to write SQL that could be understood by any database that supported ANSI SQL. It’s the idea of polymorphism at work outside the code, creating a common interface to the functionality, yet allowing each database engine implement that functionality as they wanted.
Version control would be such a happier place if there was a common command language for it, that each version control system was required to implement. On a ground floor level, it would make the simple day-to-day operations consistent. Update, commit, compare and revert would all be standard. Once you learned how to use one version control system, you knew how to use ten version control systems. Is this so hard?
Import/export standardized formats
A huge point of pain in version control is having to switch from one system to another. It sounds crazy… who would switch version control systems at the same company? However, it’s happened to me three times in my career. Companies and departments merge, where both sides used different version control systems. Sometimes your needs require a version control system with more functionality. There are lots of reasons for switching systems.
Unfortunately, each time I’ve had to switch version control systems, it’s resulted in a complete loss of history. Therefore, there needs to be a standardized import/export format. Many applications make use of XML as a way to import and export their data to share with other applications. RSS is probably one of the best examples of this. Version XML would allow you to export your version history for all files in the repository. That history, as well as the files or deltas, could then be archived in tar or zip format into a single file. The corresponding import functionality would be able to read this standard format and structure, and would be able to import that version history. This would completely remove the pain of having to switch from one version control system to another.
Version Query Language
Basic file check in and check out is the core of any version control system. However, aside from being able to diff two specific versions of a file there is very little cross version analysis in most systems, despite the fact that the source control repository inherently contains this information. Going back to the SQL analogy used before, a version control repository is a database and users should be able to query it.
I would love to see a standard version query language and table structure that allowed you to run any query you wanted through the version control system’s engine. You might be able to write queries like the following (pseudo-queries only):
Who wrote the code at a specific location in a source code file?
select author where file = “x” and line = y
where x is the file and y is the file line number.
Across all files, which version’s check in comments contain a specific string pattern?
select revision_number, file where comment like “x”
where x is the check in comment to look for.
What check ins has a particular person made in the last 10 days?
select file, revision_number where author = “x” and checkin_date >= y
where x is the user and y is the date 10 days ago.
Which check ins had the biggest effect on a particular file?
select top 5 revision_number where file = “x” order by lines_changed desc
where x is the file.
We took a stab at that in our Tools for Visual Studio product with the Find Version feature. It allows you to find the versions of one or more files that match specific criteria, such as the ones above. It does so with a GUI interface, not a query language like I proposed, though. The most difficult part about this feature was finding a way to do this across several different version control systems, yet maintaining a consistent interface.
In an industry like software development, where most people understand the concept of one interface and many implementations, it’s amazing to me that this hasn’t caught on yet in the area of version control. Maybe some day that will happen, and moving to a new version control system won’t waste time that could otherwise be used writing code.
September 18th, 2008 at 11:47 am
No.
September 18th, 2008 at 12:03 pm
The flaw in your argument is that relational databases are extremely similar and therefore a common query language becomes feasible.
Version control systems have widely differing fundamental models. Trying to pave over these differences with common terminology would probably be deceptive at best.
September 18th, 2008 at 12:10 pm
Also No.
That you failed to include ideas from dvcs systems adds up to fail.
September 19th, 2008 at 1:15 am
It doesn’t need to support every feature of every VCS. There are plenty of times where supporting a small subset of the features would be useful. Imagine if every editor could get basic checkout/diff/commit support from the major VCSs by implementing support for this standard.
You could allow the protocol to support the features of each VCS (in the future) and make it so you can write translators to map the features of one VCS to the other if necessary.
September 19th, 2008 at 10:05 pm
DVCSs gain a lot over centralized systems by abandoning a lot of older ideas and introducing new ones.
There might be short-term gains to be had by creating this sort of standardized interface, but in the long term it will become an albatross to systems attempting to revolutionize version control workflows.
September 21st, 2008 at 7:15 pm
Scott,
Nice post. Yes, it sure would be nice if more things could be standardized to the extent that SQL/databases are.
But look at SQL! It has got to be the most successful and “permanent” IT technology/language ever. It’s more than 25 years old, but still going very strong.
Part of the reason for this is the strength of the relational model. Part of it is that databases are, well, permanent. More so than applications.
I think the main reason the different VCS systems are so different is that the model is still changing. I like the Subversion model a lot better than the CVS one. But I think DVCS systems will change this again.
Regards
John Hurst
September 29th, 2008 at 4:11 am
I don’t want to promote things I’m not getting payed for but… the guys at Atlassian have done at least the latter. That is searching:
http://www.atlassian.com/software/fisheye/features/search.jsp
And I believe they are quite far in the standardization part aswell as they index quite different vcs systems. No dvcs of course
Anyway, quite a lot of wishful thinking you have there… I think USB is the only example that more or less works always.
October 14th, 2008 at 3:21 pm
so you want just a query language? you can write it yourself, because all the things you listed can be done by SVN IDEs.
select author where file = “x” and line = y
svn annotations
Across all files, which version’s check in comments contain a specific string pattern?
select revision_number, file where comment like “x”
svn ide filter
select file, revision_number where author = “x” and checkin_date >= y
svn filter on name and by hand see the days
select top 5 revision_number where file = “x” order by lines_changed
you can do that only manually by now, but its not hard to write. you just get history of the file and compare it with each other.
Svnkit is great library to access svn, if you make it good people will use it and you can make next great open-source project. it could use pseudo-columns. Or even better idea would be to export all the normalized data to db and let db handle sql. you could also give it update option(so you would only have to import everything once, i know that svn has dump option). it could be fun. think about it.
October 15th, 2008 at 7:12 am
@ raveman:
Thanks… we did that exact thing actually, it’s the basis for our Versioning Toolbox. The tricky part wasn’t getting it to work for SVN, it’s getting it to work uniformly for any source control system. The Versioning Toolbox doesn’t offer a query language per se, but it does offer a consistent UI across all source control systems to get this type of info.
However, the whole time we worked on it, you couldn’t help but feel that it was a replication of work. It’s equivalent to interfacing to a legacy database by making a second database with tables to match your needs, then replicating on a regular basis.
The SVN toolkit sounds good, I’ll check it out. It would be nice, though, if all source control implementations thought through these needs as well.
July 31st, 2009 at 1:26 pm
You can use a product called Randolph (http://www.nobhillsoft.com/Randolph.aspx) to scan databases and push them into version control repository (SVN, TFS and SourceSafe) once you have it set up, you can do queries like that on your files (its own internal repository is SQL Server-based, so you can query it with TSQL)