ScanDoc

Introduction
ScanDoc features
Getting Scandoc
Running ScanDoc
Writing ScanDoc Comments
Document Tags
The Template File
To-Do List
Credits

Introduction

ScanDoc is a Perl script which scans C++ source code for specially-formatted comments and produces attractive, organized, indexed documentation.

ScanDoc is designed to generate the highest-quality documentation with as little effort as possible on the part of the programmer writing the code to be documented. To this end, ScanDoc not only uses the documentation supplied by the programmer, but supplements it by parsing the actual C++ data structure declarations.

Unlike other documentation scanners, Scandoc is themable, meaning that the appearance of the output documentation can be controlled via a "template" file. Here is an example of a template which produces HTML output incorporating frames and indices can be seen.
ScanDoc was written by Talin and is Copyright ©1997-2000. Scandoc may be freely distributed under the Artistic License (see COPYING)

ScanDoc features

Portability -- because ScanDoc is written in Perl, it can be executed on any platform that runs Perl.
Ease of Use -- Once ScanDoc is set up it is very easy to use. There are only a few command-line switches, all of which are optional.
Convenient -- ScanDoc's comments are a superset of javadoc and are very easy to write.
Customizable -- ScanDoc uses a user-modifiable template file as the source of all output text. You can give your documentation a unique style without modifying the ScanDoc script itself. Scandoc has been designed primarily to support HTML output, however templates can readily be modified to support other output file types such as postscript, TeX, .info, ASCII, etc.
Comprehensive -- ScanDoc understands a wide range of C++ syntax, including operator overloads, templates and template arguments, nested classes, and friend functions.
Fast -- a typical header file takes 1-2 seconds to process.
Flexible -- functions can be grouped in any way you like. You decide which functions go into which .HTML files.

Getting Scandoc

Scandoc is available via anonymous CVS:
cvs -d:pserver:anonymous@cvs.scandoc.sourceforge.net:/cvsroot/scandoc login
<(hit return when asked for password.>
cvs -z3 -d:pserver:anonymous@cvs.scandoc.sourceforge.net:/cvsroot/scandoc co scandoc
There is also a .tar.gz archive available on the ScanDoc Home Page.

Runing Scandoc

Scandoc takes several command line switches (all of which are optional) and a list of input source files (which can include wildcards). Here is the command line syntax:
perl ScanDoc.pl -i document-template -p output-path -t tabsize -d sym=value input-files ...
The document-template argument specifies which file to use for the template file. This template file is used to define the format of the output text. You can edit this file to customize the "look" of your documentation. The default is "template.html".

The output-path argument specifies the directory where the resulting documentation should be written to. This should include a directory seperator character ("\" for PC, ":" for Mac using MPW, and "/" for unix) as the last character. The default is the current directory.

The tabsize argument specifies how many spaces tabs should be expanded to. The default is 4.

The sym=value argument can be used to define a symbol. This symbol will be defined within the scope of the expanded template, and can be used as part of the output text.

It does not matter if there are whitespace characters between the switches and their arguments.

Example:
perl ScanDoc.pl -itemplate.html -p./test/ -t4 *.h

Writing ScanDoc Comments

In order to use ScanDoc, you must embed special comments within your C++ source files. ScanDoc recognizes two forms of comments, those beginning with "/**" and those beginning with "//*". The first form are C-style multi-line comments. The second style are C++ single-line comments. There must be a space after the "/**" and "//*" tokens -- ScanDoc does not recognize comments of the form "/************" and such. (However, if ScanDoc detects a row of asterisks, equals signs, or dashes on the first or last line of a C-style comment, it will remove them from the documentation. So you can say "/** ===============" if you want a big, bold banner.)

Whenever ScanDoc sees a special comment, it knows that the next C++ declaration (class, function, or variable) should be documented. Any declaration which is not preceded by a special comment will be omitted from the output file. The purpose for this is to allow you to have private functions or classes which are not present in the ScanDoc documentation. You can use ordinary C and C++ style comments to document these declarations within the source code, since ScanDoc ignores such comments.

The simplest way to use the special comments is to simply write a description of the item within the comments:
/** Documentation for class Foo */
class Foo : private class Bar {
  int Baz( void );
};

Document Tags

Document tags are special code which can be inserted within a ScanDoc comment. They allow you to control many aspects of the generated documentation. All document tags begin with an '@' character.

All tags must come at the beginning of a line of text (before any non-blank characters). All of the text on that line is considered part of the tag.

Many of the tags described are persistent, in the sense that they remain in effect until the next tag of the same type. A persistent tag affects all documented items which come after it, in other words it's effect can last for longer than a single documentation comment block.

Some of the tags described are continuable, which means that they can be continued on the next line. There is no need to repeat the tag. Continuable tags last until the end of the comment block, or until they are overridden by another tag.

Text which occurs before the first tag is considered "description" text, in other words it's the actual explanation of the declaration which follows the comment. This is also true of any lines of text which do not begin with a tag, and are not a continuation of a previous continuable tag. A blank line within the description text is converted into an HTML paragraph tag. Note that you are allowed to insert HTML tags into the description text, however if you are outputting the text to a format other than HTML, the output template file may or may not be smart enough to translate the tag into an appropriate textual entity.

Example description:
/** This function adds two vectors together.
    @param      inVector1  The first vector
    @param      inVector2  This is the second vector
    @return     The vector sum of the two vectors
    @see        #VectorSubtract
    @keywords   Vector addition subtraction math
*/
const Vector &VectorAdd( const Vector &inVector1, const Vector &inVector2 );

The @package tag

Often times C++ functions and classes fall into natural groups. In Java, these are called "packages", however in C++ there is no such construct, so the @package tag can be used to group C++ declarations into a package. ScanDoc writes the output documentation for each package as a seperate .HTML file containing all of the documentation for that package named "packagename.HTML". (Of course, that's only the default behavior. You can change it by editing the document template file, as well as changing the names of the two index files that ScanDoc creates.)

You can have classes from several different header files be grouped into the same package, or you can have several classes from a single source file go into several different packages. There is no one-to-one mapping.

The syntax for @package is:
@package package-name
The @package tag is persistent -- it remains in effect until the end of the file, unless superceded by another @package tag. If no package tag is specified, the default package named "General" is used.

The @author tag

The @author is used to specify the author of the subsequent declarations. The entire line after @author is taken as the name of the author(s). Like the @package tag, it is persistent -- it remains in effect until the end of the file, unless superceded by another @author tag.

The @version tag

The @version is used to specify the version of the subsequent declarations. The entire line after @version is taken as the version string. Like the @package tag, it is persistent -- it remains in effect until the end of the file, unless superceded by another @version tag.

The @keyword tag

The @keyword tag is primarily designed for use with HTML search engines. The rest of the line after the @keyword tag is taken as a series of search keywords. The keywords will not actually appear in the documentation, instead they are placed into an HTML comment just before the documentation appears.

The @see tag

Each documentation entry can have a "See also" section, which is a list of hypertext references to other relevant documentation. Each occurance of an @see tag defines a hyperlink. (You can also embed normal HTML hyperlinks within the class description and other places).

Immediately following the @see tag is the name of the class, class member, or global being referenced. A single name with no special seperator characters is taken to be the name of a class. The package name can be specified by giving the name of the package, followed by '#', followed by the name of the class or function. If the package name is omitted, it assumes that the item being referred to is in the current package. Class members can be referenced by appending '::' and the member name to the class name. A '::' with no class name indicates a global function or member. Member functions and global functions should not have argument lists or parenthesis (Currently there is no way to indicate which one of a set of overloaded functions is being referred to)

You can also include hyperlinks to other documentation that was not created by ScanDoc by using a normal HTML hyperlink, which ScanDoc will insert verbatim into the output file.

Here are all of the supported forms:
@see  classname
@see  package#classname
@see  classname::member
@see  package#classname::member
@see  ::function
@see  package#::function
@see  <a href="ref">Description</a>

The @param tag

The param tag is used for documenting function parameters. Following the @param keyword is the name of the parameter, and then the description. The @param tag is continuable, which means you can continue the description on the next line. Blank lines within the parameter descriptsion are converted into HTML paragraph tags.

Example:
@param inRect   The input rectangle to process. This will be scaled and copied to outRect.
@param outRect  Where to place the scaled rectangle

The @return tag

The @return tag is used to describe the function return. If the function returns nothing, this tag can be omitted. The tag is continuable, which means you can continue the description on the next line without repeating the @return tag.

The @exception tag

The @exception tag is used to document any exceptions that may be thrown by this function. Note that unlike Java, it is difficult in C++ to determine what exceptions might potentially be thrown by subroutines of the functions being documented, so it is questionable whether programmers will be able to easily maintain a list of every exception that could be thrown from the function. Ultimately, the decision of how to handle this will depend on local coding standards and practices.

The format of this section is exactly the same as @param.

The @heading tag

The @heading tag is used to insert a heading into your description text. The remainder of the line after @heading is taken as the heading text (the @heading tag is not continuable). When processed using the example template file, ScanDoc converts the heading text to a level two heading.

The @deffunc tag

Occasionally, a function will have a syntax so strange that ScanDoc cannot parse it. This is mainly due to the fact that ScanDoc does not have a complete C++ parser within it. Also, ScanDoc ignores all preprocessor directives, so it is difficult to add documentation entries for C macro functions. The @@deffunc tag overcomes this limitation by allowing the programmer to manually insert a "fake" function declaration.

The @deffunc tag is short for "define function". The effect of this tag is exactly the same as if ScanDoc had actually parsed the function declaration. Note that this means that the @deffunc must be the last tag in the comment block, since any text or tags which come after it will be applied to the next declaration.

The format of @deffunc is:
@deffunc short-name declaration
The "short-name" is the version that will appear in the index, i.e. just a name with no argument list or return type. The declaration part is optional, and should be the complete prototype.

Here is an example (Note that unlike normal declarations, you can use HTML formatting within the actual prototype declaration):
/** Assert that a condition is TRUE, or print a message and exit.
   @param expression The condition to test.
   @deffunc ASSERT ASSERT( <expression< );
*/
#define ASSERT( expr ) _assert( expr, __FILE__, __LINE__ )

The @defvar tag

The @defvar tag is exactly like @deffunc except it defines a variable instead of a function.

The @caution, @warning, @tip and @bug tags

Each of these tags inserts a small icon into the text at the point where the tag occurs. For example, in the supplied example template, the "caution" tag inserts a paragraph break followed by a triangular yellow "caution" sign. These icons can be used to highlight a particular aspect of the text. It should be noted that none of these three tags is in fact recognized by ScanDoc itself -- the substitution of icons for tags is done in the template file, and as such the template file creator is free to define any new tags that they wish.

The @todo tag

The @todo tag records the name of the current source file and the text of the tag (which is continuable) into a special "todo" table. This table is then written out as a seperate file, allowing a conveniently summarized "To-Do" list for the project. Note that a special comment with only an "@todo" entry in it and no other description text or tags will be included in the generated To-Do list, but not in any other generated documentation; The reason for this is that you might not want to document everything that also has a To-Do entry associated with it, so the scanner does not consider a documentation entry valid unless it has at least some descriptive text or tags.

The Template File

The template file tells ScanDoc how to format the output files. There is virtually no knowledge of HTML within ScanDoc itself, all of the rest is supplied by the template file. (Actually, there are some functions that make generating HTML easier, but templates don't have to use them.)

ScanDoc comes with an example template file called "template.html". If all you want to do is change the name of the project or insert your company logo, you need read no further; Simply edit the "template.html" file and insert your project name or logo in the appropriate fields at the top of the file.

If you want to do more detailed customization, however, you'll need to understand how a template file is actually intepreted, which requires some understanding of ScanDoc's overall order of operations. You'll also need a basic understanding of the Perl language.

When ScanDoc scans source files, it builds a large data structure which stores all of the packages, classes, member functions, parameters, documentation and other entities that it finds. These are stored using nested Perl hash tables.

After parsing is complete, the template file is parsed and executed. The template file consists mostly of output text, with occasional parameter substitutions and embedded program code. ScanDoc translates the template file into a long string which is a Perl program, and then executes that string. So, you can embed Perl code directly into the template, allowing you to open output files, iterate using "for" loops, create comma-seperated lists, etc. This embedded code has access to all of the data structures built during the parsing phase.

Any text which is not embedded code will be written directly to the current output file. This text can have parameter substitutions in it. There are two primary types of substitions. The first type is the normal Perl interpolation sequence, i.e. $variable. This means that the value of the variable will be inserted into the output text at that point. The other type of substitution is the sequence $(object.fieldname). This retrieves the named field from the given object, and inserts the value of that field into the output text at that point. (Note: The way this is implemented is that ScanDoc translates the $(object.fieldname) pattern into the sequence: "print $object->fieldname()").

Embedded code is indicated by using double angle brackets, i.e. <<code>>. Any code which is within the angle brackets will be executed at that point. For loops can be written as seperate pieces, i.e.
<<foreach $a (@list) {>>
    <h2>$(a.name)</h2>
    $(a.description)<p>
<<}>>
Access to the parse tables and other data is done through global functions, which are as follows:

ScanDoc Global Functions
Function Meaning

file( "filename.txt" ) Open a new output file and make it the current output file

packages Return a list of references to all packages, in order by name.

todolist_files() Returns a list of all source files which had "to-do" entries.

todolist_entries( file ) Returns a list of all "to-do" entries for a given file.

Each package is a reference to a Perl object of type "PackageRecord". Access to the classes and globals within the package is done via the member functions of the package.

"PackageRecord" Member functions
Function Meaning

classes Returns a list of references to all classes in the package, in order by name.

globals Returns a list of all global functions and variables in the package, in order by name.

globalvars Returns a list of references for all global variables in the package.

globalfuncs Returns a list of references for all global functions in the package.

name Returns a string containg the name of the package

url Returns the suggested HTML url of the package documentation.

anchor Returns the suggested HTML anchor of the package documentation.

Each class returned by the classes() member function is a reference to a Perl object of type ClassRecord. This class has the following member functions.

"ClassRecord" Member functions
Function Meaning

keywords Returns a string the list of keywords associated with this class.

author Returns the name of the author of the class.

version Returns a string containing the version information for the class.

name Returns a string containing the "short" name of the class, i.e. without "class" or "struct", and without any template params or scoping information.

longname Similar to "name" but includes the "class" or "struct" tag.

fullname Includes the "class" or "struct" tag and the template arguments.

scopename The complete class name including scoping information for embedded classes.

source file The name of the source file where the class was defined.

description The description text of the class documentation.

seealso The list of "see also" tags. This is a list of references to "DocReference" objects.

url Returns the suggested HTML URL of the class documentation.

anchor Returns the suggested HTML anchor of the class documentation.

members Returns a list of references to all class members.

membervars Returns a list of references to all class member variables.

memberfuncs Returns a list of references to all class member functrions.

baseclasses Returns a list of references to all base classes.

subclasses Returns a list of references to all subclasses.

Each member function record returned by the members() function (as well as the membervars() and memberfuncs()) function is a reference to a Perl object of type MemberRecord. MemberRecord is also used for the references returned by globals(), globalvars(), and globalfuncs() which are returned at the package level.

"MemberRecord" Member functions
Function Meaning

keywords Returns a string the list of keywords associated with this member.

author Returns the name of the author of the member.

version Returns a string containing the version information for the member.

name Returns a string containing the "short" name of the class, i.e. without the type or argument list.

longname Similar to "name" but includes '()' at the end if it's a function.

fullname Includes the type of the variable and the arguments if any.

scopename The complete member name including scoping information.

source file The name of the source file where the member was defined.

description The description text of the member documentation.

seealso The list of "see also" tags. This is a list of references to "DocReference" objects.

url Returns the suggested HTML URL of the member documentation.

anchor Returns the suggested HTML anchor of the member documentation.

type 'func' if it's a function, else 'var' if it's a variable.

params Returns a list of parameters (as defined by the @param tags) for this item.

exceptions Returns a list of exceptions (as defined by the @exception tags) for this item.

returnval Returns the text of the @return tag.

Parameters and exceptions are Perl objects of type "ArgRecord", which has the following members:

"ArgRecord" Member functions
Function Meaning

name Returns a string containg the name of the argument

description Returns the description text for the argument.

Finally, the list of references returned by the "seealso" function are references to Perl objects of type "DocReference":

"DocReference" Member functions
Function Meaning

name Returns a string containg the name of the reference

url If ScanDoc knows about this reference, it will return the URL string that it suggested; If the item is not recognized, it will return 0.

Base classes: In some cases, a base class mentioned in another class's "baseclasses" list will be a class that ScanDoc does not know about. Because ScanDoc does not parse #include directives, it's possible for a class to inherit from a base class that is defined outside the set of files being parsed by ScanDoc. In this case, ScanDoc will create a "partial" class record, consisting of only the name, longname, fullname, and scopename fields. In particular, the "url()" member function will return 0, since ScanDoc does not know from where this class originates. In such a case, the output template should detect that there is no URL and not attempt to create a hyperlink for the class reference.
Description Filtering: The description text returned by the "description()" function returns the bare text as found within the source code. The only filtering that ScanDoc does on this text is to expand all tabs to spaces. The template code is responsible for any other filtering, such as converting blank lines to paragraphs, converting @heading tags to the appropriate style, and inserting the caution, warning and bug icons. Note that the template is free to define new icons or tags which can be filtered at this time.

To Do list

This is a list of enhancements that are needed for ScanDoc.

Templates for formats other than HTML: Currently, HTML is the only output format supported because it's the only one that I am familiar with. However, it seems that a lot of documentation these days is in TeX format, which is then used to generate .info, .dvi. etc. It would be nice to have template files for these formats. Note also that there is no reason why ScanDoc could not be modified to support multiple templates in a single execution, which would be relatively fast since parsing the input classes is what takes 95% of the execution time.
Other HTML templates: It would be nice to have a selection of HTML templates for different styles. For example, the current template file generates documentation which takes advantage of the "frames" feature which is not supported by all browsers, although the documentation can still be viewed with a browser that does not support frames. However, it might be nice to provide templates which don't generate any frames information. Similarly, there is much that can be done in terms of improving the overall attractiveness and organization of the documents, especially by taking better advantage of tables.
Include files: Currently, ScanDoc does not attempt to parse "#include" statements. (As much as I like Perl, it's not a great language for writing recursive descent parsers in my opinion.) Unfortunately, this means that ScanDoc has to "guess" which identifiers represent types as opposed to function and variable names. The current heuristic handles all of the cases I've found so far, but it would be nice to be able to know for sure. Of course, doing a complete job would also require that we recognize and expand C macros. Having a complete set of type information and better parsing would also allow individual function arguments and return values to be hyperlinks.
Hyperlinks on arguments and return values: Even without a more complete parser, it would be possible to modify the current HTML template file to break up argument lists into a sequence of bare words and see if any of those bare words match up with any of the current classes that ScanDoc knows about. Hyperlinks could then be created to those classes. This wouldn't work every time (for example, implicit references to classes defined within the current class's scope would have problems), but it would cover most of the common cases.
Improved description filtering: Currently, the "processDescription" function in the HTML template does not handle the case of '<' and '>' symbols embedded in the description text. This means that attempts to mention template arguments within documentation generally don't look right. Greater-than and Less-than signs should usually be transformed into < and > sequences, unless they are part of a valid HTML tag that has been deliberately inserted into the documentation text. I would need to come up with a look-ahead regex that would match all of the HTML tags that might reasonably occur inside a documentation entry.
Persistent Scan Info: Several people have suggested that they would like to keep their documentation files always up to date, in other words running ScanDoc whenever they do a make. Unfortunately, because ScanDoc scans every file in the project, this is far too slow for realistically-sized projects. One idea would be to create a persistent database of documentation, which could be updated incrementally, allowing ScanDoc to only scan those files that have actually changed. This would also allow the code analyzer to be written in a different language than the templates. For example, we could use a real parser generator, and make a C analyser that would do a much more complete job of parsing, as well as being much faster. This would also have the benefit of making the parser somewhat readable, which it certainly isn't now.
Because ScanDoc can pretty much ignore anything that's inside a code block, a C parser could potentially parse much faster than an actual compiler. This means that scanning just a few files and updating a database would add an unnoticeable amount of delay to the build process. The only question is how to maintain the database in a way that's portable.
This idea of a persistent database of documentation could be taken even further. For example, rather than generating static pages, the documentation pages could be served up directly from the database, using something like PHP to create HTML pages of documentation on the fly as needed. This would also allow more intelligent queries, for example "give me documentation for all classes that call member function 'foo' in class 'bar'." Of course, we would have to parse things a lot more deeply than we do now for this to work. And one problem with dynamic pages is that they are hard to distribute in an archive.

History and credits

The current version of ScanDoc is actually the sixth generation. The first one was written in C, sometime in the early 1993-1994 range, and was inspired by (and functionally similarly to) the "autodoc" utility on the Commodore Amiga. Later versions were inspired by Sun's JavaDoc utility. I've also read about Don Knuth's "Literate Programming" efforts, but I wanted something that was much lighter weight and easier to integrate into existing environments.

Robert McNally of Dangerous Games came up suggested to me the idea of having embedded icons in the documentation to signify important paragraphs.