[7] in linux-announce channel archive
New C++ class library Available for FTP
daemon@ATHENA.MIT.EDU (Lars Wirzenius)
Sun Dec 18 12:11:47 1994
Date: Sun, 18 Dec 1994 17:48:11 +0200
From: Lars Wirzenius <wirzeniu@cc.helsinki.fi>
To: linux-activists@niksula.hut.fi, linux-announce@vger.rutgers.edu
X-Mn-Key: announce
From: steveb@bga.com (Steve Benz)
Newsgroups: comp.lang.c++,comp.os.linux.announce,alt.sources
Subject: New C++ class library Available for FTP
Organization: Real/Time Communications - Bob Gustwick and Associates
Keywords: C++, class library
Approved: linux-announce@tc.cornell.edu (Lars Wirzenius)
Followup-to: comp.os.linux.misc,comp.lang.c++,alt.sources.d
Announcing the pre-alpha, 0.0 release of the PCG class library for C++.
The PCG class library is a freely distributable C++ class library that
provides type-safe classes as well as a powerful dynamic typing scheme.
As such, it is more comparable with STL than it is to NIHCL or libg++.
In addition to classes for collections, it provides some very powerful
features for dealing with databases. A more comprehensive review of the
differences between PCG, NIHCL, libg++, STL and Rogue Wave's Tools.h++
can be found in the Introduction to PCG enclosed at the end of this posting.
As a pre-alpha release, it is intended for folks who want to familiarize
themselves with the package and for people who would like to contribute
to the development of the package.
It currently is only known to run on the platform it was developed on,
a 486 Linux box running g++ 2.5.8 and sparcs also running g++ 2.5.8.
I hope that a network of alpha testers can be put together to port
and test this package to other machines and compilers, and would be
glad to help anyone attempting to make a port.
The source code and documentation is available via anonymous FTP from
zilker.net in /pub/steveb/pcg-0.0.tgz. This contains vast amounts of
documentation, all of which is in HTML format - which means you can read
it with any WWW browser (e.g. xmosaic, lynx, and chimera). If you do not
know what these are, see the file WWW.faq included in this distribution.
It is possible, although kindof annoying, for me to generate Ascii and
postscript versions. I also could set up the documentation to be available
over WWW. If there is sufficient demand, I can do one or both of these.
- Steve
steveb@realtime.net
AN INTRODUCTION TO THE PCG CLASS LIBRARY
----------------------------------------
BACKGROUND AND MOTIVATION
The PCG class library was originally intended to be a much more
modest package, but, as things do, it grew unexpectedly. It was
originally intended to be the undercarriage for an application that
would need to deal with a variety of database technologies ranging
from the simple to the moderately sophisticated (state of the art
Relational databases, perhaps some OODBMSs.) To do that, I felt I
really had to have a dynamic typing system; consequently, dynamic
types are at the core of this package.
This package will be of interest to anybody who is looking for a
powerful class library and of special interest to developers of large
systems. Not just because of the data structures that it provides, but
the way that it approaches scalability - this last point will be
described in more detail in the discussion on performance and
templates. Its database classes will interest anyone who is thinking
about an application that will deal with a variety of databases. Its
facilities for dealing with trivial databases will be of particular
interest in some circles as well. The fact that it runs under G++ will
peak the interest of people developing software with that compiler.
PCG is intended to focus on collections and I/O - it is intended as
an extension to C++, not a wrapper for the Operating System. In this
respect, it is quite the same as its chief competition, STL. (see
Michael Vilot's article An Introduction To The Standard Template
Library in the October 1994 C++ Report).
Sizing Up The Competition
As I was working on PCG, it became obvious that this package would be
of general interest. I set about enhancing it with three goals in
mind: it must be portable, cheap, and good. At the time, there was no
package I could use that would satisfy all three of these points and
run under Linux (the Operating system that I chose to develop under.)
G++lib is portable, but due to the copyleft, infinitely expensive,
and, for reasons described later, not what I would consider "good".
NIHCL falls down for reasons of portability and intent. It is aimed
at encapsulating alot of things that are not entirely portable and
which I held no interest in (i.e. it has classes for processes.)
Commercial products, notably Rogue Wave's products, were fairly cheap
and fairly portable - just not to Linux and G++. Rogue Wave's product
improves on G++ and NIHCL by allowing typesafe collections through
templates, but there are still some points about its design that are
troublesome.
At the start of this project, I was not aware of STL. I am not sure
that I would have considered that work as entirely relevent, given
what I understood about C++ at that time. When STL becomes
available, it will definitely be portable and good - cheap seems
likely but one never knows. However, PCG has a few improvements over
STL, the most noticable is the database section. But there are many
differences that one should take into account when choosing from among
these class libraries. We will discuss the relative merits of PCG,
G++lib, NIHCL, and Rogue Wave's Tools.h++ 5.1 in the following
sections. Note that this review, not suprisingly, does not treat
G++lib, NIHCL, and Tools.h++ 5.1 very well. The addition of
templates and exceptions has changed the face of C++ development since
the ideas that these libraries were based on were formed. When talking
about the qualities of a "good" library, we mean it in the light of
reason as we now know it.
GOOD CLASS LIBRARIES ARE NOT POINTER-HAPPY
A great many good C programmers have gone through a phase in their
career where they have decided that the pointer is the one true divine
instrument, and wrote many a line of code with two, three, or even
four levels of indirection and thought it good. Most such folk would
now agree that though the pointer is an important data type, it's also
a core dump waiting to happen.
There are a number of language design decisions in C++ that have
steered programmers in that direction. The central one is this, if you
have declarations, like so:
Person p;
Person *pp;
Then p is an instance of Person and nothing but a Person, while pp
is a pointer that can point to an instance of Person or an instance
of any subclass of Person. There is no way to have a variable that
contains an instance of anything other than its type. This shortcoming
is only an occasional problem when taken by itself (e.g. since one can
usually get around it by declarations such as Person &p.) However,
the difficulties grow in signifigance when implementing collections,
as when one says List<Person>, one usually means that the list will
contain a number of instances of the class Person or its subtypes.
Without some serious scullduggery in the implementation, that's not
what you will get.
Perhaps the easiest and most common solution to these problems is to
implement collections of things as collections of pointers to things.
Sometimes this is what you want, but sometimes not. In general, most
folks would not call this a good solution, declaring "If I'd'a wanted
a collection of pointers, I'd'a said so."
GOOD CLASS LIBRARIES ARE TYPE-SAFE
Another wrinkle to the incredible perpetuating pointer problem can be
found by looking at virtual methods - subclasses are not allowed to
add any type information to a virtual function's signature. Therefor,
in the world of C++ class libraries we have seen a series of classes,
such as Rogue Wave's RWCollection and RWCollectable classes,
where, when one adds a member to a collection, the only type
specification is RWCollectable * and, worse, the lookup methods return
RWCollectable * as well - which leads to casts which lead to core
dumps.
There are at least three ways to deal with this problem. One is to
take the STL approach, where they have no virtual functions like
this and thus avoid the problem altogether. The other approach is
known as the "letter and envelope" strategy. That is, if we define a
class which knows more type information than its parent, the parent's
method is rendered invisible to the user in place of the more specific
method of the child. This is the approach that PCG takes. Tools.h++
takes yet another approach, and that is creating a separate batch of
template-based collections. This helps, but the fact that
template-based objects are not interoperable with the
non-template-based objects is unfortunate.
GOOD CLASS LIBRARIES SUPPORT RUN-TIME PARAMETERIZATION
Since libg++, NIHCL and Tools.h++ are all essentially
pointer-based, they support run-time parameterization easily and
implicitly. STL, being entirely a child of C++ templates, does not -
although it could, with some clever extensions. PCG, being a strange
mix of pointer-based and template based, does support run-time
parameterization and manages to intermingle this with type-safety very
smoothly.
GOOD CLASS LIBRARIES SHOULD ALLOW FOR CHANGE
My experiences with large C++ applications have taught me several
lessons that I consider of paramount importance when looking at the
software development cycle. The highest one on the list is that one
should expect and plan for surprises - and the best way to prepare is
to design interfaces such that large changes in the design can be
accomodated without changes to the interface. By doing this, large
changes to the design can often be achieved without large changes to
the code.
In anticipation of this, PCG does something that no other class
library seems to do. It turns classes like List into abstract classes,
leaving the actual implementation to subclasses.
To understand the importance of this, consider that there are several
ways to implement a list. First, one could just allocate an array of
elements, extending it when the size of the list exceeds the size of
the array. This implementation would work very well for small lists
where most of the insertions are done at the end, or in lists where
the speed of addition is unimportant compared to the speed of
iteration. A variant of this might create a series of blocks, each
block containing a small array of elements - when one of these arrays
is filled, it is split into two blocks. This implementation would
perform fairly well in all weather, but, possibly if the blocks happen
to be large and moving elements is very expensive, it would not work
as well as a plain old linked list.
The secret here is that when an application is originally conceived,
it may not be entirely clear which sort of list will offer the best
performance. Or, perhaps at conception it was clear that a specialized
implementation would ultimately be needed in order to perform well in
the real world, but for prototyping purposes, one of the stock
implementations would work as well.
Under PCG's approach, changing the underlying implementation requires a
call to a different list's constructor. Under the usual approach, one
has to change every reference to the object's type. This can be
particularly unpleasant if there are routines where the implementation
is of small importance (i.e. print( const List &l ).) For functions
such as these, you would need two implementations, one for regular
lists, one for the specialized list. Yuk.
This problem is not just one of performance. For instance one might
decide during development that a set should be kept in sorted, rather
than random order.
Of course, the STL counter-argument would be that the above approach
means virtual functions, and virtual functions lead to performance
penalties. This is true, although the cost of a call to a virtual
function is usually trivial compared to the cost of, say computing a
hash function or any of the other operations associated with
collections. But more importantly, in my experience, small penalties
in run-time performance are well worth the cost if they reduce time to
market.
GOOD CLASS LIBRARIES HELP WITH REAL-LIFE I/O PROBLEMS
I have never seen a class library that had a really satisfying
solution to the problems of I/O.
In STL, superficially the emphasis is on istream and ostream.
PCG makes no mention of these facilities. This is largely a
religious issue, as I have never been satisified that the facility
makes the problem of I/O signifigantly better - to me it seems like
the gains in type-safety and extensibility are at the cost of
readability. Further, I do not regard the whole problem of
stream-oriented I/O as particularly important. Most people understand
how to do this sort of I/O quite well, and, in my possibly limited
view, few products suffer delays to market due to problems in this
area.
Even so, the lack of a real enthusiasm for text I/O in PCG is not a
good thing. I have mainly left it out because I really feel that there
is a much better model than iostream and stdio waiting to be found
(or for somebody to boff me over the head with it.) As an example of
why I think iostream is not the answer, compare text processing using
C++ to text processing using perl or Icon. Most would agree that both
of these languages are a quantum leap above C++ with any set of
classes.
Definitely a more important problem than stream i/o is interfacing
with windowing systems and databases. Many people have done a good
deal of very good work, and have basically boiled the problem of
dealing with windowing systems to converting data to strings and
calling the appropriate function to display it. PCG, STL and the
rest make some gestures at assisting in converting objects to strings.
Some are better than others, but none are signifigantly better.
Storing objects in databases is one area where PCG does some fairly
good work. Whereas NIHCL and libg++ both support persistent
stores, PCG goes beyond that by allowing users to create their own
persistent stores, either from scratch or from existing database
technology, and further, it allows the data to be stored in any
format, allowing PCG applications to use and update the data of
other applications. STL is not intended to directly solve the
problem of database access, rather, it looks at them through
"allocators". Taking this approach does much to abstract the issues of
the database from users, but we have yet to see if it abstracts them a
little too much. One can write a PCG database so that it looks like
a collection, but the suggested approach in PCG is to write the
database to look like a database, and integrate collections over
certain datasets within the database.
Which approach is the better one is likely to be a matter of religion
for some time to come. In any case, the STL lobby could point out
that most of the mappings which PCG discovers at runtime could have
been discovered and generated at compile-time, and that approach would
be better than run-time typing for effeciency reasons. I tend to
agree, except that generating code like that requires (or at least
suggests) a C++ parser, and other rather sophisticated technology. The
run-time mappings that PCG uses are, if not blindingly fast, at
least relatively simple.
Rogue Wave produces an extension to their Tools.h++ called
DBTools.h++ that provides a very advanced set of libraries for
dealing with the largest and most popular commercial databases. For
most commercial applications, this would be a substantial improvement
over the facilities of PCG and the rest. However, PCG's support of
smaller, simpler databases and extensibility will be important
considerations for some people - for instance, one might note that
while simpler databases are usually less capable than their larger
cousins, they are occasionally much faster.
As to the problem of input from a stream, PCG does support String
class and RegExp classes that are much like NIHCL's and
libg++'s.
GOOD CLASS LIBRARIES ARE CONCEPTUALLY CLEAR
Writing readable code is key in software developed by a team. This
means that classes, methods and functions should do what their name
implies, and their name should imply what they do. At the heart of
this is type safety - an object should tell you exactly what it
expects its parameters to be and exactly what its result should be. In
that respect STL surpasses all others, including PCG, in some
degree. PCG compromises on this a bit for run-time typing and
performance, but few users will actually see these sections of the
package.
One point in PCG's favor is its separate treatment of interfaces and
implementations, ListOf means any list - regardless of implementation.
Therefor, you could create, say, a bag which apparently contains 100
constant elements, iterate over this bag, add it to another,
differently-implemented bag, print it, whatever, all with the same
functions that work with other bags. In STL, this is not quite so
easy, and you end up creating classes such as the ConstantIterator
described in Andrew Koenig's article, File Iterators, in the
November-December 1994 issue of Journal of Object Oriented
Programming. This class is apparently an iterator over an array of
constant objects. With PCG, you can directly talk about collections
whose members are computed, in STL you can talk about such objects -
but not with the same clarity as with PCG.
GOOD CLASS LIBRARIES ARE EFFECIENT
As said before, efficiency is very important - it can, occasionally,
be compromised in favor of development cost, but only so much. As PCG
does dynamic typing, its performance is comparable to NIHCL and
libg++, but it does better than them since it knows a bit more
about the target object, and does not rely on the virtual function
table for its knowledge. Consequently, NIHCL and libg++
will, when asked to create a collection of integers, create an array
of pointers to integer-wrappers allocated from the heap.
("Integer-wrappers" meaning subclasses of int with virtual function
tables e.g. Tools.h++'s RWInteger class.) This sort of approach
is, obviously, a huge waste of space, compared to a flat array of
integers, which is what STL and PCG do (essentially).
When it comes to manipulating that array, STL enjoys a slight
advantage over PCG as the compiler is given more information about
the data being manipulated, and hence can do better optimizations.
PCG is also slightly hampered by having to make at least one
virtual function call to get to the routine, and the fact that some
typing decisions have to be made at run-time. (I.e. PCG has to
select the appropriate constructor from its list of virtual
constructors at run-time, STL computes this at compile-time.)
However, since we know that the same implementations of a collection
sort of collection can either perform brilliantly or awfully,
depending on what they are asked to do, STL's monolithic
ideas about collection implementation should be considered to be a
serious performance impedement - possibly enough to outweigh the
advances it made above.
One area of STL that seriously concerns me is that, from my
experience, use of too many templates leads to enormous code size - a
problem that C++ has in spades already. Given that C++ is mainly
intended for large applications - of 100,000 lines or more, with
perhaps 1000 classes, STL's reliance on templates could lead to a
circumstance where developers find that just as they are finally
getting their product near completion, their program's size has
already far outstripped the memory capacity of their target machine.
Even in systems where templates have been use carefully, the program's
size frequently grows so large that the determining factor in its
performance is not the CPU, but the speed of the virtual memory
system.
The designers of STL have more advanced knowledge of the future of
template optimization than I do, so perhaps these problems will not
blight STL's future. PCG, of course, uses templates too, but it
has been written so that every template's methods are defined
inline, and are very brief - almost all are one line and one
function call. Theoretically, an application that uses PCG's
template classes would be no larger than the same application using
the PCG's generic interface. Whether this actually is the case or
not depends on the good graces of the compiler. In particular, we know
that inlines are generally not expanded when debugging is turned on,
as it is through the bulk of the software development cycle.
In all, though, when discussing issues of CPU performance, it is hard
to say whether the STL libraries, in their final form, will
outperform the PCG libraries, in their final form in the real world.
But both are likely to shine over the current crop of class libraries.
FUTURE DIRECTIONS
I have released the library to the public domain in the hopes that
people will find it useful. It has outstripped my capacities to
continue as the sole developer, so I enthusiastically welcome
contributions of code, test cases, improved documentation, improved
packaging, and better ideas on other issues.
The most immediate and important challenges facing PCG are to make
it more reliable and more portable. That done, new collection
implementations will need to be added, as well as new database
interfaces. Whether it should expand in other directions is unknown,
and perhaps an issue of packaging.
RELATED MATERIAL
For more information on PCG, see A User's Guide to PCG in this
distribution. Rogue Wave Software can be contacted at 260SW Madison,
Corvallis, Oregon, 97333, 1-800-487-3217. Note that Rogue Wave is
planning to release an STL-compliant version of their product in
1995, and that will supercede the 5.1 release, discussed here. For
more information on NIHCL, see the book Data Abstraction and
Object-Oriented Programming in C++ by Keith Gorlen, Sandy Orlow, and
Perry Plexico (ISBN 0471 92346 X), 1990, John Wiley and Sons. Gnu's
libg++ can be had via anonymous FTP from prep.ai.mit.edu or
through the Free Software Foundation. "Tools.h++" is Copyright (C)
1989-1992 Rogue Wave Software, Inc.
--
Send submissions for comp.os.linux.announce to: linux-announce@tc.cornell.edu
PLEASE remember Keywords: and a short description of the software.