Templatized C++ Command Line Parser Manual

SourceForge.net Logo

Basic usage

There are a few key classes to be aware of. The first is the CmdLine (command line) class. This is the class that parses the command line passed to it according to the arguments that it contains. Arguments are separate objects that are added to the CmdLine object one at a time. There are five types of arguments, ValueArg, UnlabeledValueArg, SwitchArg, MultiArg and UnlabeledMultiArg. These are templatized classes meaning they can be defined to parse a value of any type**. Once the arguments are added to the command line object, the command line is parsed which assigns the data on the command line to the specific argument objects. The values are accessed by calls to the getValue() methods of the argument objects.

Here is a simple example ...

#include < string >
#include < iostream >
#include < tclap/CmdLine.h >
using namespace std;

int main(int argc, char** argv)
{
	// Wrap everything in a try block.  Do this every time, 
	// because exceptions will be thrown for problems. 
	try {  

	// Define the command line object.
	CmdLine cmd("Command description message", ' ', "0.9");

	// (deprecated, but still functional)
	// CmdLine cmd(argv[0], "Command description message", "0.9"); 

	// Define a value argument and add it to the command line.
	ValueArg < string >  nameArg("n","name","Name to print",true,"homer",
	                             "nameString");
	cmd.add( nameArg );

	// Define a switch and add it to the command line.
	SwitchArg caseSwitch("u","upperCase","Print in upper case", false);
	cmd.add( caseSwitch );

	// Parse the args.
	cmd.parse( argc, argv );

	// Get the value parsed by each arg. 
	string name = nameArg.getValue();
	bool upperCase = caseSwitch.getValue();

	// Do what you intend too...
	if ( upperCase )
		transform(name.begin(),name.end(),name.begin(),::toupper);
	else
		transform(name.begin(),name.end(),name.begin(),::tolower);

	cout << "My name is " << name << endl;

	} catch (ArgException e)  // catch any exceptions
	{ cerr << "error: " << e.error() << " for arg " << e.argId() << endl; }
}



The output should look like:


% tester -u -n mike
My name is MIKE

% tester -n mike -u
My name is MIKE

% tester -n mike
My name is mike

% tester -n MIKE
My name is mike

% tester
PARSE ERROR:
	One or more required arguments missing!

Brief USAGE:
	tester  [-u] -n  [--] [-v] [-h]

For complete USAGE and HELP type:
	tester --help

% tester --help

USAGE:

   tester  [-u] -n  [--] [-v] [-h]


Where:

   -u,  --upperCase
     Print in upper case

   -n ,  --name 
     (required)  (value required)  Name to print

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   -v,  --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.


   Command description message



This example shows a number of different properties of the library...

Basic Properties

Arguments, whatever their type, have a few common basic properties. These properties are set in the constructors of the arguments. First is the flag or the character preceeded by a dash(-) that signals the beginning of the argument on the command line. Arguments also have names, which can, if desired also be used as a flag on the command line, this time preceeded by two dashes (--) [like the familiar getopt_long()]. Next is the description of the argument. This is a short description of the argument displayed in the help/usage message when needed. The boolean value in ValueArgs indicates whether the argument is required to be present (SwitchArgs can't be required, as that would defeat the purpose). Next, the default value the arg should assume if the arg isn't required or entered on the command line. Last, for ValueArgs is a short description of the type that the argument expects (yes its an ugly hack).

SwitchArgs are what the name implies, simple on/off, boolean switches. Use SwitchArgs anytime you want to turn some sort of system property on or off. Note that multiple SwitchArgs can be combined into a single argument on the command line. If you have switches -a, -b and -c it is valid to do either:

		% command -a -b -c

or
		% command -abc

or
		% command -ba -c


This is to make this library more in line with the POSIX and GNU standards (as I understand them).

ValueArgs are arguments that read a value of some type from the command line. Note that the order of arguments on the command line (so far) doesn't matter. Any argument not matching an Arg added to the command line will cause an exception to be thrown (for the most part, with some exceptions).

Note in the output of the USAGE above, that there are three arguments that were not explicitly specified by the user in the code. These are the help and version and -- SwitchArgs. Using either the -h or --help flag will cause the USAGE message to be displayed and -v or --version will cause any version information to be displayed and -- or --ignore_rest will cause the remaining labeled arguments to be ingored. These switches are included automatically on every command line. Currently there is no way to turn this off, but then, thats kind of the point. More later on how we get this to work.

Complications

Naturally, what we have seen to this point doesn't satisfy all of our needs.

I tried passing multiple values on the command line with the same flag and it didn't work...

Correct. You can neither specify mulitple ValueArgs or SwitchArgs with the same flag in the code nor on the command line. Exceptions will occur in either case. For SwitchArgs it simply doesn't make sense to allow a particular flag to be turned on or off repeatedly on the command line. All you should ever need is to set your state once by specifying the flag or not (yeah but...).

However, there are situations where you might want multiple values for the same flag to be specified. Imagine a compiler that allows you to specify multiple directories to search for libraries...

		% fooCompiler -L /dir/num1 -L /dir/num2 file.foo 


In situations like this, you will want to use a MultiArg. A MultiArg is essentially a ValueArg that appends any value that it matches and parses onto a vector of values. When the getValue() method is called, a vector of values, instead of a single value is returned. A MultiArg is declared much like a ValueArg:

		...

		MultiArg < int > itest("i", "intTest", "multi int test", false,"int" );
		cmd.add( itest );

		...


Note that MultiArgs can be added to the CmdLine in any order (unlike UnlabeledMultiArgs).

I don't like labelling all of my arguments...

To this point all of our arguments have had labels (flags) indentifying them on the command line, but there are some situations where flags are burdensome and not worth the effort. One example might be if you want to implement a magical command we'll call copy. All copy does is copy the file specified in the first argument to the file specified in the second argument. We can do this using UnlabeledValueArgs which are pretty much just ValueArgs without the flag specified, which tells the CmdLine object to treat them accordingly. The code would look like this:

		...

		UnlabeledValueArg < float >  nolabel( "name", "unlabeled test", 3.14,
	                                          "nameString"	);
		cmd.add( nolabel );

		...


Everything else is handled identically to what is seen above. The only difference to be aware of, and this is important: the order that UnlabeledValueArgs are added to the CmdLine is the order that they will be parsed!!!! This is not the case for normal SwitchArgs and ValueArgs. What happens internally is the first argument that the CmdLine doesn't recognize is assumed to be the first UnlabeledValueArg and parses it as such. Note that you are allowed to intersperse labeled args (SwitchArgs and ValueArgs) in between UnlabeledValueArgs (either on the command line or in the declaration), but the UnlabeledValueArgs will still be parsed in the order they are added. Just remember that order is important for unlabeled arguments.

I want an arbitrary number of arguments to be accepted...

Don't worry, we've got you covered. Say you want a strange command that searches each file specified for a given string (lets call it grep), but you don't want to have to type in all of the file names or write a script to do it for you. Say,

		% grep pattern *.txt


First remember that the * is handled by the shell and expanded accordingly, so what the program grep sees is really something like:

		% grep pattern file1.txt file2.txt fileZ.txt


To handle situations where multiple, unlabled arguments are needed, we provide the UnlabeledMultiArg. UnlabeledMultiArgs are declared much like everything else, but with only a description of the arguments. By default, if an UnlabeledMultiArg is specified, then at least one is required to be present or an exception will be thrown. The most important thing to remember is, that like UnlabeledValueArgs: order matters! In fact, an UnlabeledMultiArg must be the last argument added to the CmdLine!. Here is what a declaration looks like:



		...

		//
		// UnlabeledMultiArg must be the LAST argument added!
		//
		UnlabeledMultiArg < string > multi("file names");
		cmd.add( multi );
		cmd.parse(argc, argv);

		vector < string >  fileNames = multi.getValue();

		...




You must only ever specify one (1) UnlabeledMultiArg. One UnlabeledMultiArg will read every unlabeled Arg that wasn't already processed by a UnlabeledValueArg into a vector of type T. Any UnlabeledValueArg or other UnlabeledMultiArg specified after the first UnlabeledMultiArg will be ignored, and if they are required, expections will be thrown. When you call the getValue() method of the UnlabeledValueArg argument, a vector will be returned. If you can imagine a situation where there will be multiple args of multiple types (stings, ints, floats, etc.) then just declare the UnlabeledMultiArg as type string and parse the different values yourself or use several UnlabeledValueArgs.

I want one argument or the other, but not both...

Suppose you have a command that must read input from one of two possible locations, either a local file or a URL. The command must read something, so one argument is required, but not both, yet neither argument is strictly necessary by itself. This is called "exclusive or" or "XOR". To accomodate this situation, there is now an option to add two or more Args to a CmdLine that are exclusively or'd with one another: xorAdd(). This means that at exactly one of the Args must be set and no more.

xorAdd() comes in two flavors, either xorAdd(Arg& a, Arg& b) to add just two Args to be xor'd and xorAdd( vector xorList ) to add more than two Args.


	...

	ValueArg < string >  fileArg("f","file","File name to read",true,"homer",
                                 "filename");
	ValueArg < string >  urlArg("u","url","URL to load",true, 
	                            "http://example.com", "URL");

	cmd.xorAdd( fileArg, urlArg );
	cmd.parse(argc, argv);

	...


Once one Arg in the xor list is matched on the CmdLine then the others in the xor list will be marked as set. The question then, is how to determine which of the Args has been set? This is accomplished by calling the isSet() method for each Arg. If the Arg has been matched on the command line, the isSet() will return TRUE, whereas if the Arg has been set as a result of matching the other Arg that was xor'd isSet() will return FALSE. (Of course, if the Arg was not xor'd and wasn't matched, it will also return FALSE.)


	...

	if ( fileArg.isSet() )
		readFile( fileArg.getValue() );
	else if ( urlArg.isSet() )
		readURL( urlArg.getValue() );
	else
		// Should never get here because TCLAP will note that one of the
		// required args above has not been set.
		throw("Very bad things...");

	...



I have more arguments than single flags make sense for...

Some commands have so many options that single flags no longer map sensibly to the available options. In this case, it is desirable to specify Args using only long options. This one is easy to accomplish, just make the flag value blank in the Arg constructor. This will tell the Arg that only the long option should be matched and will force users to specify the long option on the command line. The help output is updated accordingly.


	...

	ValueArg < string >  fileArg("","file","File name",true,"homer","filename");

	SwitchArg  caseSwitch("","upperCase","Print in upper case",false);

	...



I want to constrain the values allowed for a particular argument...

There are now constructors for all of the Args that parse values that allow a list of values to be specified for that particular Arg. When the value for the Arg is parsed, it is checked against the list of values specified in the constructor. If the value is in the list then it is accepted. If not, then an exception is thrown. Here is a simple example:

	...

	vector allowed;
	allowed.push_back("homer");
	allowed.push_back("marge");
	allowed.push_back("bart");
	allowed.push_back("lisa");
	allowed.push_back("maggie");
	
	ValueArg nameArg("n","name","Name to print",true,"homer",allowed);
	cmd.add( nameArg );

	...

Instead of a type description being specified in the Arg, a type description is created by concatenating the values in the allowed list using the operator<< for the specified type. The help/usage for the Arg therefore lists the allowable values. Because of this, it is assumed that list should be relatively small, although there is no limit on this.

Obviously, a list of allowed values isn't always the best way to constrain things. For instance, one might wish to allow only integers greater than 0. In this case, the best strategy is for you to evaluate the value returned from the getValue() call and if it isn't valid, throw an ArgException. Be sure that the description provided with the Arg reflects the constraint you choose.

Visitors

Disclaimer: Almost no one will have any use for Visitors, they were added to provide special handling for default arguments. Nothing that Visitors do couldn't be accomplished by the user after the command line has been parsed. If you're still interested, keep reading...

Some of you may be wondering how we get the --help, --version and -- arguments to do their thing without mucking up the CmdLine code with lots of if statements and type checking. This is accomplished by using a variation on the Visitor Pattern. Actually, it may not be a Visitor Pattern at all, but thats what inspired me.

If we want some argument to do some sort of special handling, besides simply parsing a value, then we add a Visitor pointer to the Arg. More specifically, we add a subclass of the Visitor class. Once the argument has been successfully parsed, the Visitor for that argument is called. Any data that needs to be operated on is declared in the Visitor constructor and then operated on in the visit() method. A Visitor is added to an Arg as the last argument in its declaration. This may sound complicated, but its pretty straightforward. Lets see an example.

Say you want to add an --authors flag to a program that prints the names of the authors when present. First subclass Visitor:

#include "Visitor.h"
#include < string >
#include < iostream >

class AuthorVisitor : public Visitor
{
	protected:
		string _author;
	public:
		AuthorVisitor(const string& name ) : Visitor(), _author(name) {} ;
		void visit() { cout << "AUTHOR:  " << _author << endl;  exit(0); };
};


Now include this class definition somewhere and go about creating your command line. When you create the author switch, add the AuthorVisitor pointer as follows:

		...

		SwitchArg author("a","author","Prints author name", false, 
				         new AuthorVisitor("Homer J. Simpson") );
		cmd.add( author );

		...




Now, any time the -a or --author flag is specified, the program will print the author name, Homer J. Simpson and exit without processing any further (as specified in the visit() method).

Exceptions

Like all good rules, there are many exceptions....

Ignoring arguments...

The -- flag is automatically included in the CmdLine. As (almost) per POSIX and GNU standards, any argument specified after the -- flag is ignored. Almost because if an UnlabeledValueArg that has not been set or an UnlabeledMultiArg has been specified, by default we will assign any arguments beyond the -- to the those arguments as per the rules above. This is primarily useful if you want to pass in arguments with a dash as the first character of the argument. It should be noted that even if the -- flag is passed on the command line, the CmdLine will still test to make sure all of the required arguments are present.

Of course, this isn't how POSIX/GNU handle things, they explicitly ignore arguments after the --. To accomodate this, we can make both UnlabeledValueArgs and UnlabeledMultiArgs ignoreable in their constructors. See the API Documentation for details.

Multiple Identical Switches

If you absolutely must allow for multiple, identical switches, then don't use a SwitchArg, instead use a MultiArg of type bool. This means you'll need to specify a 1 or 0 on the command line with the switch (as values are required), but this should allow you to turn your favorite switch on and off to your hearts content.

Type Descriptions

Ideally this library would use RTTI to return a human readable name of the type declared for a particular argument. Unfortunately, at least for g++, the names returned aren't particularly useful.

More Information

For more information, look at the API Documentation and the examples included with the distribution.

Happy coding!


** In theory, any type that supports operator>> and operator<< should work, although I've really only tried things with basic types like int, float, string, etc.