The Linux Letter: Subversion

Linux / Open Source
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

I have been a fan of Concurrent Versions System (CVS) for many years and have used it to manage the changes not only to my software, but to my server configurations as well. No doubt about it, CVS has been an excellent software assistant for me (and innumerable others). But there's a new game in town. As CVS was created to enhance and address the shortcomings of its predecessor, Revision Control System (RCS), so too was Subversion created to enhance and address the shortcomings of CVS. This month, we'll look at the new-and-improved version control system that has become my favorite.

CVS Redux

As I wrote in an earlier article, CVS is a handy tool that allows you to keep track of all changes to your source code. Multiple users can be working in the same project--even in the same source file--and CVS will keep track of who changed what and when. Should there be a conflict caused by two developers working on the same section of a source file, CVS will call out the area in conflict and help you merge the changes successfully. While CVS is quite powerful, it has a few warts that can make its use somewhat trying.

Number one on my list of CVS warts is that, in CVS, it's a bear to rearrange your source file directory structure. While it's easy enough to physically move the directories on your checked-out version (using either a command line utility or drag-and-drop), telling CVS that you have done this requires a convoluted incantation consisting of multiple 'cvs add' and 'cvs remove' commands, which gets even worse if multiple developers have the same project checked out. Now, I'm sure that someone will point me to some drag-and-drop tool that invokes the appropriate deities to accomplish this seemingly simple task, but I've learned to give some thought to my source tree directory structure prior to starting a project. It just makes things easier.

Number two on my list of CVS warts is that CVS doesn't do binary files, such as jar files, word processing documents, or zip files. Let me rephrase that. CVS doesn't do binary files well. Should you have the need to store these in your CVS repository, you'll need to take care to let CVS know that the file is binary when you add it. Adding a file using the command form 'cvs add -kb' tells CVS that the file is binary and deserves special treatment, exempting it from attempts to expand embedded CVS keywords that may spuriously appear within the file. Once so tagged, CVS will simply copy the file whenever it is checked out or committed to the repository. Failure to add the tag can result in a corrupted file, so if you use CVS, keep the -kb switch in mind.

Other developers have their own personal favorites that could be added to my two items, and if you do any googling for "CVS shortcomings," "CVS design flaws," or "CVS annoyances," you'll get a good sampling of them.

A Better CVS

I have been aware of the Subversion project for a lot longer than I have been using it, having seen references to it at Web sites that I visit and in some articles I have read. Like most of you, I tend to stick with tools that have become "comfortable" to use, and to me, CVS fits into that category, shortcomings notwithstanding. Thus, I found no real reason to investigate Subversion--until my projects started to incorporate an increasing number of binary file types, such as Java jar files, OpenOffice.org documents, and various graphic files. CVS's handling of such files was becoming enough of an impediment to inspire me to visit the Subversion Project's Web site to learn more about it. A short description of the project appears on the site and states simply, "The goal of the Subversion project is to build a version control system that is a compelling replacement for CVS in the open source community. The software is released under an Apache/BSD-style open source license." My interest was piqued even further when I read the first entry in their features list: "Subversion is meant to be a better CVS, so it has most of CVS's features. Generally, Subversion's interface to a particular feature is similar to CVS's, except where there's a compelling reason to do otherwise." That entry alone gave me enough incentive to download and install Subversion on my laptop.

Once the software was loaded, I needed to spend only a few minutes reading the documentation provided on the Web site (click on the link for "Subversion Book") before I realized just how similar the tools were. There is literally a one-to-one mapping of CVS functionality to Subversion functionality. The only major difference between the two is that with CVS, everything is done with variations on one command, cvs. With Subversion, the suite is split into a few commands, based on whether the desired function is administrative (to operate on the repository itself) or client-oriented (to operate on projects within the repository). Thus, instead of issuing the command cvs init to initialize the repository (as in CVS), I instead issued the command svnadmin create.

Initializing a repository is a one-time operation. For the things I do daily, the similarities between the two packages are amazing. The commands to import, check out, add, remove, and commit projects and files are cvs import, cvs co, cvs add, cvs del, and cvs commit, respectively. Change "cvs" to "svn" in those CVS commands and you have the Subversion command for the equivalent function. That meant that all I needed to do to make the switch to Subversion was to reprogram my memory to type "svn" instead of "cvs." It didn't take very long before my transition was complete.

So Much Faster

If the only difference between the two packages was the command used, there wouldn't be much point to a transition from CVS to Subversion. Fortunately, that isn't the case.

The thing about CVS that I found most irritating was the incredible pain induced by a reorganization of a project's source code directory structure. It's something that can be done with CVS, but not easily. One of the commands that CVS lacks is Subversion's svn move command, which makes rearranging a project simple. Let's say I have a project that contains two directories: DirA and DirB. In directory DirA, I have a file, X, that I want to move to DirB. In Subversion, the command svn move DirA/X DirB does automatically what CVS requires you to do manually; it physically moves the file from DirA to DirB, issues an svn delete DirA/X command, and issues an svn add DirB/X command. This works on individual files, groups of files, or entire directories. Talk about a welcome improvement!

As to my issues with CVS's handling of binary file types, I can only say that with Subversion, they're all gone. I haven't had one single corruption of a binary file since I started using Subversion. The reason for this can be found the Subversion FAQ:

How does Subversion handle binary files?

When you first add or import a file into Subversion, the file is examined to determine if it is a binary file. Currently, Subversion just looks at the first 1024 bytes of the file; if any of the bytes are zero, or if more than 15% are not ASCII printing characters, then Subversion calls the file binary. This heuristic might be improved in the future, however.

If Subversion determines that the file is binary, the file receives an svn:mime-type property set to "application/octet-stream". (You can always override this by using the auto-props feature or by setting the property manually with svn propset.)

Subversion treats the following files as text:

  • Files with no svn:mime-type
  • Files with a svn:mime-type starting "text/"
  • Files with a svn:mime-type equal to "image/x-xbitmap"
  •  Files with a svn:mime-type equal to "image/x-xpixmap"

All other files are treated as binary, meaning that Subversion will:

  • Not attempt to automatically merge received changes with local changes during svn update or svn merge
  • Not show the differences as part of svn diff
  • Not show line-by-line attribution for svn blame

In all other respects, Subversion treats binary files the same as text files, e.g. if you set the svn:keywords or svn:eol-style properties, Subversion will perform keyword substitution or newline conversion on binary files.

Note that whether or not a file is binary does not affect the amount of repository space used to store changes to that file, nor does it affect the amount of traffic between client and server. For storage and transmission purposes, Subversion uses a diffing method that works equally well on binary and text files; this is completely unrelated to the diffing method used by the 'svn diff' command.

Anyone who uses CVS has embedded CVS keywords (such as $Id$) in their source code so that CVS will expand them automatically upon a commit. In the source code, CVS expands keywords such as $Id$ into strings like $Id: cvs-notes.html,v 1.2 2001/02/08 05:16:06 joeuser Exp $, allowing you to document the version, date, time of commit, etc. This is CVS's default behavior, which gets turned off when you use the -kb switch. Subversion does the opposite and won't attempt to expand any keywords unless you specifically tag a file to enable expansion on keywords that the file contains. So it's unlikely that a keyword in a binary file will randomly get expanded and thus corrupt the file.

Besides automatically identifying binary files, Subversion also uses a binary diffing algorithm to send changes when a commit is requested, so only the changed parts are sent. CVS, on the other hand, copies the entire file. While this functionality may be irrelevant if you're attached to a high-speed network, it's wonderful if you're using a dial-up connection. And even if you are on a high-speed network, are you patient enough to wait for those huge jar files to get transferred in their entirety every time they change? Subversion makes the whole process so much faster.

Client Options

Subversion repositories can be accessed in a variety of ways. Clients wishing to access repositories hosted on their own machine have it the easiest, as no additional configuration is required. Just create the repository using the command svnadmin create /path/to/repository (if you haven't already done so) and start it using the svn command with a URL of the form file:///path/to/repository. Clients wishing to access remote repositories have little more to do than to change the URL to point to the correct server, with the correct protocol.

Setting up a remote server for access to Subversion repositories isn't difficult. The instructions to do that are included in the Subversion Book. The administrator can elect to provide access using Subversion's own server (either directly or over SSH) or by adding a module (WebDAV/SVN) to an Apache 2 Web server. Which method is most appropriate is dictated by your security requirements and the type of clients that you wish to support. Again, consult the book for further information.

Once you have a server configured, the only difference to the client is in the protocol specified in the URL. The client commands are all the same; only the URL changes. Local access uses the file:// URL, whereas remote access URLs are svn://, svn+ssh://, and http://. At my shop, I have set up Apache 2 to dole out access (so my URLs are http://) as this makes it easy to navigate the repositories using only a Web browser.

Tempered Enthusiasm

Having placed a considerable quantity of code under the management of CVS, I found my enthusiasm for a commitment to Subversion (pun intended) to be somewhat tempered. Even though I knew that, for me, a switch was inevitable, I had to decide what to do with the existing repository. One strategy that I briefly considered was to leave in CVS those projects that were already in CVS but to put new projects into Subversion. I discounted that idea because I wasn't interested in any additional maintenance headaches or confusion that would result from running multiple version control systems. Another idea that I entertained for even less time was the concept of leaving up the old CVS repository and then, when preparing to work on an existing project, exporting the project from CVS (using cvs export) and then importing the project into Subversion. This had all of the disadvantages of my first strategy while adding the loss of the project history and version tags. Yuck!

Fortunately, there is a wonderful tool called cvs2svn that will migrate your CVS repository to a Subversion repository. It worked fine for me, but I will give you this tip. The Web site says, "It [cvs2svn] is designed for one-time conversions, not for repeated synchronizations between CVS and Subversion." That's an understatement. If you decide to do a conversion, do yourself a big favor and do it into a freshly created repository. My first attempt caused the corruption of my current Subversion repository. Fortunately, I had made a backup prior to the conversion, so recovery was trivial. I'm sure that my enthusiasm for Subversion would have been tempered even more had I not taken the precaution of a backup.

Tool Integration

The longevity of CVS has resulted in its integration into many popular programming tools. As one example, we have Eclipse, the tool du jour for i5 programmers. Through the use of the Eclipse CVS plug-in, repository access is made quite simple. If you are a CVS/Eclipse user, you'll be happy to learn that there is a Subversion plug-in (called Subclipse) that can make your switch to that version control program seamless.

Given the rapid adoption rate of Subversion in the open-source world, I would bet that your favorite development tool supports Subversion right now. And if it doesn't, I'd bet that it won't be long before it does, assuming that it already supports CVS.

Runs on i5/OS

Your choice of server on which to host Subversion isn't limited to Linux. As listed on the site, you have a choice of hosting your repository on "all modern flavors of Unix, Win32, BeOS, OS/2, and MacOS X." Best of all, you can even host your Subversion repositories on i5/OS (V5R1 and above).

If you are using CVS, you ought to consider Subversion. If you're not using any version management tool, you ought to consider Subversion. The flexibility of Subversion makes it useful not only for programming projects (which is what most people use it for) but for any computer-related projects, such as presentations, documentation, or anything else you may store on a computer. Once you get into the habit of using a version control system, you'll wonder how you ever got along without it.

OhioLinux 2005

Last year, I wrote a column about the remarkable Ohio LinuxFest technology conference. The intrepid volunteers who made this happen are doing it again. On Saturday, October 1, 2005, the Ohio LinuxFest will be held in Columbus, Ohio. There is no registration fee to attend the event, and judging by the list of presenters, it should be even better this year than it was last. I hope to see you there!

 

Barry L. Kline is a consultant and has been developing software on various DEC and IBM midrange platforms for over 21 years. Barry discovered Linux back in the days when it was necessary to download diskette images and source code from the Internet. Since then, he has installed Linux on hundreds of machines, where it functions as servers and workstations in iSeries and Windows networks. He co-authored the book Understanding Linux Web Hosting with Don Denoncourt. Barry can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$