A brief introduction to CVS

Why?

CVS, the Concurrent Versions System, is the developer’s magic bullet. It’s like an unusually patient project manager, but it doesn't harass you about deadlines, and it politely keeps your code in check.

Traditionally, CVS is intended for use in typical programming projects, and most of its provided examples are composed of makefiles and headers. If you can get past the strange handling of binary files and handle a paradigm shift or two, it makes a wonderful web development platform and organizational tool, and will become near-indispensable as you welcome it into your workflow.

In the following pages, I will explain some of the basics of using CVS in a typical web development workflow, and provide links to more helpful material on advanced topics.

What?

CVS ’ traditional selling point is the ‘C’ in the name: concurrency. It provides a structure within which multiple developers can work on the same files simultaneously, and resolve conflicts without interrupting work. CVS’ benefits also extend to the single developer: every version of a file kept in CVS is stored permanently, so changes can be tracked and historical versions of files can be resurrected where necessary. CVS frees you from the nagging worries of making mistakes, by providing a powerful, global UNDO feature.

Who?

You don’t need special privileges to use CVS: most Unix and Linux distributions (including Mac OS X’s developer tools CD) include it by default, and many GUI clients for Windows, Mac, and Linux are available. A CVS repository (see below) can be created anywhere.

How?

CVS is a client/server application. The master copy of a project is stored in a module, in the repository (a repository can contain multiple modules, each of which normally corresponds to a given project), and individual users download copies into a local checkout, modify files there, and then commit them back in to the repository. CVS tells you if the changes you’ve made are in conflict with someone else’s, and whether someone else has made changes that might affect you.

This means that all of your work takes place in your local sandbox. You don’t even have to think about CVS until it’s time to commit your changes back into the system, but you can feel free to edit or delete anything you want, safe in the knowledge that it will be faithfully resurrected if you choose to roll back. This also means that CVS’ greatest benefits come to those who use it most often, constantly checking in batches of edits, and keeping the repository up-to-date with each new addition to a site.

The local sandbox method is extremely well suited to web development. Typically, each developer will maintain their own local copies of files, and perhaps also a personal space on a webserver where code can be tested with live data. As you segregate your work in this way, you will lessen potential conflicts and confusion - let CVS handle conflict resolution in its own way. When another user commits their changes to the repository, after you've checked out your working copy, you can update it to see their edits (and possible conflicts) reflected.

Even the live version of the site itself should be a CVS checkout - if you are willing to tolerate the existence of the "CVS" subdirectories that CVS uses to maintain information about the repository a given checkout belongs to, pushing updates to a live site can be as simple as a one-line update command.

As you make changes to your local copy, you will want to know what you’ve done, how it differs from the repository, and what needs to be committed. CVS works on a per-file basis, so it has the drawback of not doing aggregate operations on lists of files, but many developers have written Perl or shell scripts to help interpret and use the output of CVS status requests. Reconciling your edits with the repository is a relatively simple matter of updating your copy to ensure that you are working with up-to-date file revisions, adding or removing files as needed (CVS must be explicitly told what needs to be added and what needs to be removed from the repository, and won’t apply these changes until you tell it to commit them), and committing your edits.

Benefits and Pitfalls

The above section describes a linear workflow, and only barely touches on the tip of the iceberg. CVS has a few other major benefits:

  • Automation

    Commands can be added to the loginfo file in your CVS repository root. Sometimes, it’s desirable to place a call to mail here, so that developers can be notified of changes to the repository at the time they are made.

    If your repository lives on the same machine as your live site, you can add in a command to automatically update the live version, so that committing content is synonymous with publishing. Read more about automation on the CVS website.

  • Branching

    Branches are a huge advantage offered by CVS. In addition to storing historical information about files, CVS can maintain separate, parallel histories for files, and easily coordinate merges between them. This means that one developer can be working on the live version of a site, while another adds sections or new features, and neither has to worry about the other’s activities until a decision is made to merge the two branches. Read more about branching in Open Source Development With CVS.

There are also pitfalls:

  • Binary Files

    Unix, DOS and Macintosh all use different standard end-of-line delimiter characters, and CVS easily handles the differences, serving up carriage returns to a Mac client and carriage return linefeeds to a DOS client, and ignoring the difference internally. This has the drawback of having to specify which files are binary, so that CVS does not accidentally corrupt them by attempting to convert line endings. Be sure to correctly identify binary files as such at the time they are added, either with the -kb switch to add on the command-line, or the less-obtuse methods provided by the various GUI clients. Read more about binary file handling in Open Source Development With CVS.

Recap

CVS can help you to enforce self-discipline in your own code, and provides a mechanism for cooperative development for code you share with others. Its client/server repository/checkout methodology provides you with a means to synchronize edits to a project among multiple locations.

Further Reading...

Because there exist a number of CVS clients, I have stayed away from implementation-specific examples above. Instead, I have used terms such as checkout, commit, update, and repository that are standard to CVS, and shared among most popular client applications. CVS servers can be hosted on Unix or Windows systems, and clients exist for most popular operating systems.

Comments

booyah

I'm amazed that there hasn't been an article regarding this in the past... I just thought, 'Who wouldn't know about it?'. I was forced to use VSS for years and it just annoyed me. I think it's worth mentioning that as far as win32 clients go, Tortoise is far less bloated and much easier to use than WinCVS. If your using CVS for the first time, I think WinCVS just isn't intuitive enough. Whereas I've been able to give a CVS virgin a quick rundown on what it is (I'll just send them this URL now ;)) and they've been able to work most things out themselves. Also worth mentioning is CVSWeb (http://stud.fh-heilbronn.de/~zeller/cgi/cvsweb.cgi/), an open-source web-interface which lets you view the revision history of all the files in the repository as well as check diff's. This has saved me a lot of hassle...

checkout VS export

Even the live version of the site itself should be a CVS checkout - if you are willing to tolerate the existence of the "CVS" subdirectories

And if you won't tolerate these directories you can simply use CVS export instead of checkout.

re: checkout VS export

Even the live version of the site itself should be a CVS checkout - if you are willing to tolerate the existence of the "CVS" subdirectories
And if you won't tolerate these directories you can simply use CVS export instead of checkout.

Using export instead of checkout is definitely an option, but it has the disadvantage of disassociating the live site from the original repository. A site upgrade will then consist of a full export every single time (a slightly longer process), instead of just 'cvs up -Pd'. This is fine if your site has no local changes from the repository, but if there are local configuration or temp files, I think that update is ultimately a better method - the local changes can be preserved though judicious use of .cvsignore and the resultant CVS subdirectories can be protected by configuring your webserver to restrict access.

Security?

Are there any security implications to having CVS folders on your live website? They seem to be world-readable and contain lots of info about what's been going on...

Re: Security?

Are there any security implications to having CVS folders on your live website? They seem to be world-readable and contain lots of info about what's been going on...

Not really - the CVS directories generally contain only 3 files: Root contains a string showing where the CVSROOT is located (usually just a directory path, sometimes a host name too), Repository contains a string showing where the current module is located (same general form as Root), and Entries contains revision information about the files contained in that directory.

None of these things can be considered security risks, but if you don't want them made public, Apache can be easily configured too deny access to any directory named "CVS". See deny in the directory context.

Interesting problem CVS on Hp-UNIX

One of the users the Unix cities message board posted the message Interesting problem CVS on Hp-UNIX but still did't receive the solution. Has anyone out there done anything similar?

Subversion

I learnt about subversion recently and it looks pretty nice, will be checking that when I have the time. It's by the same guys that wrote CVS but they wanted to start out a new project that will not have the limitations of CVS and yet be compatible with it to the maximum possible extent. To the dot-links guy, check your environment, usually that's CVS_RSH or something like that that's not set correctly.