Here is a breakdown of the generation steps doxygen uses to visualize class hierarchy. Requisite software includes doxygen-1.8.3 on C++/C++11 files, compilation and development environment is Fedora 18/x86_64 using GNU C++ version 4.8.
Doxygen Overview
Some Doxygen basics, and internals. The Fedora package is doxygen-1.8.3-3.fc18.x86_64, the command line invocation is: doxygen, which is a C++ binary.
To make the doxygen binary debuggable, check out doxygen in subversion and configure the build with --debug. (On Fedora, some other dependencies are required, like qt-devel. An alias between what the Makefiles expect, ie code and the installed qmake-qt4 needs to be defined).
For this investigation, the subject of most interest is the language parser for C++, breaking in parseInput(). The doxygen parse phase lowdown:
The task of the parser is to convert the input buffer into a tree of entries (basically an abstract syntax tree). An entry is defined in src/entry.h and is a blob of loosely structured information. The most important field is section which specifies the kind of information contained in the entry.
The other area of interest is the output generator for graphviz sources and then generated diagrams. So, breaking in function generateOutput() (see src/doxygen.cpp), step until
if (Config_getBool("HAVE_DOT"))
{
DotManager::instance()->run();
}
This is the part that generates the graphviz source files and then uses dot to create output from the previously-parsed C++ source data. Breaking in function DotManager::run() (see src/dot.cpp) allows stepping through individual graph creation.
XXX file name, class name mapping.
Graphviz Overview
Some graphviz basics, DOT language reference and users guides, wiki. Of particular interest are the “Node, Edge, and Graph Attributes.”
Usual command-line invocation looks like:
dot -v -Tsvg:cairo -o myfile.svg myfile.gv
And then fonts are in ~/.fonts or /usr/share/fonts, and can be controlled via the following attributes:
fontpath="/usr/share/fonts/dejavu"
fontname="DejaVuSansMono"
These should map to installed fonts, ie
%fc-match "DejaVuSansMono"
DejaVuSansMono.ttf: "DejaVu Sans Mono" "Book"
Doxygen Settings
Parts of the doxygen configuration file that matter, the config settings used, and any commentary.
HAVE_DOT=YES
CLASS_GRAPH=YES
UML_LOOK=NO
COLLABORATION_GRAPH=NO (Interesting on a per-class basis only. For larger projects the noise becomes overwhelming.)
CALL_GRAPH=NO (Same.)
CALLER_GRAPH=NO (Same.)
INCLUDE_GRAPH=NO (Same.)
INCLUDED_BY_GRAPH=NO (Same.)
TEMPLATE_RELATIONS=NO (Relations between primary templates and template instances is very cluttered, noise value high. Template relationships and class hierarchy relations in non-UML mode are displayed on the same diagram, but use a different visual grammar. Classes inherit base to derived. Throw templates in and they read "as if" from base to primary template to specific instance. This should instead be base to specific instance.)
DOT_GRAPH_MAX_NODES=50
MAX_DOT_GRAPH_DEPTH=0
DOT_IMAGE_FORMAT=SVG (Resolution-independant text, editable, lossless)
INTERACTIVE_SVG=YES (Focus control for big diagrams)
With these settings, a PDF file of the GNU C++11 API runs over three thousand five hundred pages.
Doxygen XML attribute for Graphviz
Legend for doxygen-generated graphviz diagrams.
2) what attributes are needed in XML to represent this?
3) what are the added attributes/markup needed to get longstanding bugs fixed? Or are these solely parse errors?
Generated Diagram Quality
Sample set is GCC-4.8.0 C++ docset, based on a generated output on 2013-03-10.
Representative diagrams for more traditionally-styled C++ code, in the form of OO-style class hierarchies, are found based off of the std::exception and std::ios_base root elements.
- std::exception, just the class hierarchy diagram, and the Exceptions Module
- std::ios_base, just the class hierarchy diagram, and the IO Module
Starting with the exception diagram, because the lack of templates in this hierarchy is a useful simplifying factor. This diagram is accurate. Layout issues include: names overflowing the bounding boxes (__gnu_pbds::intsert_error) and line break issues (same, but others like __gnu_cxx::recursive_init_error). Many of the line connectors and paths to endpoints are infuriatingly erratic. These issues are largely look to be the kind of thing that could be tweaked via various dot settings, or related graphviz tool settings (like neato).
Next diagram: io. This hierarchy diagram is largely accurate, but with distracting elements, and extraneous information. This is a multi-level class hierarchy with both base classes and base class templates. Starting from the left, reading to the right. The least-derived base is ios_base. A class template for basic_ios derives from it, taking two parameters. In this diagram, two templates derive from ios_base: the primary class template for basic_ios, and a fully-specialized class template for basic_ios instantiated with the integer type char. A couple of things to note, one level in to the diagram:
- restricting to just primary templates would be useful, ie this diagram without any specializations. Indeed, this is what the diagrams evolve into once level two and above diagrams are cleaved off, ie starting from basic_istream (instead of ios_base) and going to basic_streamstream.
- there are actually two specializations for this hierarchy, both
char and wchar_t. Where’s wchar_t?
- there are actually typedefs like ios and wios for the char and wchar_t instantiations of the basic_ios class template.
basic_ostream char specialization is duplicated, once for basic_ostream<char>, and one for basic_ostream<char, char_traits, char>. Neither of these instances actually exists. There’s a similar phantom template instance for basic_iostream.
Let’s stop here with this diagram. The rest of the hierarchy has similar issues.
Next, let’s examine some template-heavy components and idioms, like policy based design.
Components that use this idiom are found in the class hierarchies for std::allocator, std::unordered_map (and the other unordered_containers), __gnu_cxx::vstring, and the policy-based data structures extensions for which __gnu_pbds::trie.
- std::allocator
- __gnu_cxx::__versa_string
- __gnu_pbds::trie
For the first class, std::allocator, the generated diagram is accurate. Ideally, there would be a visual marker for the allocator void specialization, and note about the grouping of the superset of extension allocators as the base class for std::allocator.
The two extension classes share common issues, and none of have accurately-generated class hierarchies. Both make use of multiple base classes and policy-based design.
Finally, a pass at some C++11 features. Of note are things like variadic templates (invented for std::tuple and then used elsewhere) and template aliases (used in may parts of the library with policy-based designs, like std::allocator and std::unordered_map.)
- std::tuple
- std::unordered_map
Just making a quick pass here. From the tuple diagram, ponder the implied template relations. This is hard, since making sense of this with a visual grammar would require better grouping between primary template, partial specializations, and full specializations.
And for unordered_map, apparently the complexity of the derivation, plus the templates, plus the use of C++11 features like alias templates aborts the graph. No hierarchy is not an accurate hierarchy.
Doxygen use is not considered harmful, even with these flaws it is an invaluably useful tool. Reasonable people may differ, of course.
Sources (C++11, graphviz)
For a given set of sources:
struct base
{
enum mode : short { in, out, top, bottom };
typedef long value_type;
};
struct A : public base
{
int _M_i;
int _M_n;
};
struct B : public base
{
value_type _M_v;
constexpr B(value_type __v = 6) : _M_v(__v) { };
};
struct C : public B
{
constexpr value_type
square() { return _M_v * _M_v; }
};
struct D : public A
{
D(const D& __d) : A() { };
~D() { }
};
Next, use doxygen to generate HTML, with HAVE_DOT set to YES and DOT_CLEANUP set to NO in the doxygen configuration file. With this configuration tweak, when doxygen is used to generate HTML, the doxygen-generated graphviz sources used to create the class diagrams are not destroyed. On examination, they produce the following graphic:

And then, look at the generated graphviz for the base class, the root of the diagram:
digraph "base"
{
// edge and node defaults
edge [fontname="FreeSans",fontsize="9",labelfontname="FreeSans",labelfontsize="9"];
node [fontname="FreeSans",fontsize="9",shape=record];
// actual graph
Node1 [label="base",height=0.2,width=0.4,color="black", fillcolor="grey75", style="filled" fontcolor="black"];
Node1 -> Node2 [dir="back",color="midnightblue",fontsize="9",style="solid",fontname="FreeSans"];
Node2 [label="A",height=0.2,width=0.4,color="black", fillcolor="white", style="filled",URL="$struct_a.html"];
Node2 -> Node3 [dir="back",color="midnightblue",fontsize="9",style="solid",fontname="FreeSans"];
Node3 [label="D",height=0.2,width=0.4,color="black", fillcolor="white", style="filled",URL="$struct_d.html"];
Node1 -> Node4 [dir="back",color="midnightblue",fontsize="9",style="solid",fontname="FreeSans"];
Node4 [label="B",height=0.2,width=0.4,color="black", fillcolor="white", style="filled",URL="$struct_b.html"];
Node4 -> Node5 [dir="back",color="midnightblue",fontsize="9",style="solid",fontname="FreeSans"];
Node5 [label="C",height=0.2,width=0.4,color="black", fillcolor="white", style="filled",URL="$struct_c.html"];
}