[7] in linux-announce channel archive

home help back first fref pref prev next nref lref last post

New C++ class library Available for FTP

daemon@ATHENA.MIT.EDU (Lars Wirzenius)
Sun Dec 18 12:11:47 1994

Date: Sun, 18 Dec 1994 17:48:11 +0200
From: Lars Wirzenius <wirzeniu@cc.helsinki.fi>
To: linux-activists@niksula.hut.fi, linux-announce@vger.rutgers.edu

X-Mn-Key: announce

From: steveb@bga.com (Steve Benz)
Newsgroups: comp.lang.c++,comp.os.linux.announce,alt.sources
Subject: New C++ class library Available for FTP
Organization: Real/Time Communications - Bob Gustwick and Associates
Keywords: C++, class library
Approved: linux-announce@tc.cornell.edu (Lars Wirzenius)
Followup-to: comp.os.linux.misc,comp.lang.c++,alt.sources.d

Announcing the pre-alpha, 0.0 release of the PCG class library for C++.

The PCG class library is a freely distributable C++ class library that
provides type-safe classes as well as a powerful dynamic typing scheme.
As such, it is more comparable with STL than it is to NIHCL or libg++.
In addition to classes for collections, it provides some very powerful
features for dealing with databases.  A more comprehensive review of the
differences between PCG, NIHCL, libg++, STL and Rogue Wave's Tools.h++
can be found in the Introduction to PCG enclosed at the end of this posting.

As a pre-alpha release, it is intended for folks who want to familiarize
themselves with the package and for people who would like to contribute
to the development of the package.

It currently is only known to run on the platform it was developed on,
a 486 Linux box running g++ 2.5.8 and sparcs also running g++ 2.5.8.
I hope that a network of alpha testers can be put together to port
and test this package to other machines and compilers, and would be
glad to help anyone attempting to make a port.

The source code and documentation is available via anonymous FTP from
zilker.net in /pub/steveb/pcg-0.0.tgz.  This contains vast amounts of
documentation, all of which is in HTML format - which means you can read
it with any WWW browser (e.g. xmosaic, lynx, and chimera).  If you do not
know what these are, see the file WWW.faq included in this distribution.
It is possible, although kindof annoying, for me to generate Ascii and
postscript versions.  I also could set up the documentation to be available
over WWW.  If there is sufficient demand, I can do one or both of these.

					- Steve
steveb@realtime.net


                    AN INTRODUCTION TO THE PCG CLASS LIBRARY
                    ----------------------------------------
   
                           BACKGROUND AND MOTIVATION
                                       
   The PCG class library was originally intended to be a much more
   modest package, but, as things do, it grew unexpectedly. It was
   originally intended to be the undercarriage for an application that
   would need to deal with a variety of database technologies ranging
   from the simple to the moderately sophisticated (state of the art
   Relational databases, perhaps some OODBMSs.) To do that, I felt I
   really had to have a dynamic typing system; consequently, dynamic
   types are at the core of this package.
   
   This package will be of interest to anybody who is looking for a
   powerful class library and of special interest to developers of large
   systems. Not just because of the data structures that it provides, but
   the way that it approaches scalability - this last point will be
   described in more detail in the discussion on performance and
   templates. Its database classes will interest anyone who is thinking
   about an application that will deal with a variety of databases. Its
   facilities for dealing with trivial databases will be of particular
   interest in some circles as well. The fact that it runs under G++ will
   peak the interest of people developing software with that compiler.
   
   PCG is intended to focus on collections and I/O - it is intended as
   an extension to C++, not a wrapper for the Operating System. In this
   respect, it is quite the same as its chief competition, STL. (see
   Michael Vilot's article An Introduction To The Standard Template
   Library in the October 1994 C++ Report).
   
Sizing Up The Competition

   As I was working on PCG, it became obvious that this package would be
   of general interest. I set about enhancing it with three goals in
   mind: it must be portable, cheap, and good. At the time, there was no
   package I could use that would satisfy all three of these points and
   run under Linux (the Operating system that I chose to develop under.)
   G++lib is portable, but due to the copyleft, infinitely expensive,
   and, for reasons described later, not what I would consider "good".
   NIHCL falls down for reasons of portability and intent. It is aimed
   at encapsulating alot of things that are not entirely portable and
   which I held no interest in (i.e. it has classes for processes.)
   Commercial products, notably Rogue Wave's products, were fairly cheap
   and fairly portable - just not to Linux and G++. Rogue Wave's product
   improves on G++ and NIHCL by allowing typesafe collections through
   templates, but there are still some points about its design that are
   troublesome.
   
   At the start of this project, I was not aware of STL. I am not sure
   that I would have considered that work as entirely relevent, given
   what I understood about C++ at that time. When STL becomes
   available, it will definitely be portable and good - cheap seems
   likely but one never knows. However, PCG has a few improvements over
   STL, the most noticable is the database section. But there are many
   differences that one should take into account when choosing from among
   these class libraries. We will discuss the relative merits of PCG,
   G++lib, NIHCL, and Rogue Wave's Tools.h++ 5.1 in the following
   sections. Note that this review, not suprisingly, does not treat
   G++lib, NIHCL, and Tools.h++ 5.1 very well. The addition of
   templates and exceptions has changed the face of C++ development since
   the ideas that these libraries were based on were formed. When talking
   about the qualities of a "good" library, we mean it in the light of
   reason as we now know it.
   
  GOOD CLASS LIBRARIES ARE NOT POINTER-HAPPY
  
   A great many good C programmers have gone through a phase in their
   career where they have decided that the pointer is the one true divine
   instrument, and wrote many a line of code with two, three, or even
   four levels of indirection and thought it good. Most such folk would
   now agree that though the pointer is an important data type, it's also
   a core dump waiting to happen.
   
   There are a number of language design decisions in C++ that have
   steered programmers in that direction. The central one is this, if you
   have declarations, like so:
   

        Person p;
        Person *pp;

   Then p is an instance of Person and nothing but a Person, while pp
   is a pointer that can point to an instance of Person or an instance
   of any subclass of Person. There is no way to have a variable that
   contains an instance of anything other than its type. This shortcoming
   is only an occasional problem when taken by itself (e.g. since one can
   usually get around it by declarations such as Person &p.) However,
   the difficulties grow in signifigance when implementing collections,
   as when one says List<Person>, one usually means that the list will
   contain a number of instances of the class Person or its subtypes.
   Without some serious scullduggery in the implementation, that's not
   what you will get.
   
   Perhaps the easiest and most common solution to these problems is to
   implement collections of things as collections of pointers to things.
   Sometimes this is what you want, but sometimes not. In general, most
   folks would not call this a good solution, declaring "If I'd'a wanted
   a collection of pointers, I'd'a said so."
   
  GOOD CLASS LIBRARIES ARE TYPE-SAFE
  
   Another wrinkle to the incredible perpetuating pointer problem can be
   found by looking at virtual methods - subclasses are not allowed to
   add any type information to a virtual function's signature. Therefor,
   in the world of C++ class libraries we have seen a series of classes,
   such as Rogue Wave's RWCollection and RWCollectable classes,
   where, when one adds a member to a collection, the only type
   specification is RWCollectable * and, worse, the lookup methods return
   RWCollectable * as well - which leads to casts which lead to core
   dumps.
   
   There are at least three ways to deal with this problem. One is to
   take the STL approach, where they have no virtual functions like
   this and thus avoid the problem altogether. The other approach is
   known as the "letter and envelope" strategy. That is, if we define a
   class which knows more type information than its parent, the parent's
   method is rendered invisible to the user in place of the more specific
   method of the child. This is the approach that PCG takes. Tools.h++
   takes yet another approach, and that is creating a separate batch of
   template-based collections. This helps, but the fact that
   template-based objects are not interoperable with the
   non-template-based objects is unfortunate.
   
  GOOD CLASS LIBRARIES SUPPORT RUN-TIME PARAMETERIZATION
  
   Since libg++, NIHCL and Tools.h++ are all essentially
   pointer-based, they support run-time parameterization easily and
   implicitly. STL, being entirely a child of C++ templates, does not -
   although it could, with some clever extensions. PCG, being a strange
   mix of pointer-based and template based, does support run-time
   parameterization and manages to intermingle this with type-safety very
   smoothly.
   
  GOOD CLASS LIBRARIES SHOULD ALLOW FOR CHANGE
  
   My experiences with large C++ applications have taught me several
   lessons that I consider of paramount importance when looking at the
   software development cycle. The highest one on the list is that one
   should expect and plan for surprises - and the best way to prepare is
   to design interfaces such that large changes in the design can be
   accomodated without changes to the interface. By doing this, large
   changes to the design can often be achieved without large changes to
   the code.
   
   In anticipation of this, PCG does something that no other class
   library seems to do. It turns classes like List into abstract classes,
   leaving the actual implementation to subclasses.
   
   To understand the importance of this, consider that there are several
   ways to implement a list. First, one could just allocate an array of
   elements, extending it when the size of the list exceeds the size of
   the array. This implementation would work very well for small lists
   where most of the insertions are done at the end, or in lists where
   the speed of addition is unimportant compared to the speed of
   iteration. A variant of this might create a series of blocks, each
   block containing a small array of elements - when one of these arrays
   is filled, it is split into two blocks. This implementation would
   perform fairly well in all weather, but, possibly if the blocks happen
   to be large and moving elements is very expensive, it would not work
   as well as a plain old linked list.
   
   The secret here is that when an application is originally conceived,
   it may not be entirely clear which sort of list will offer the best
   performance. Or, perhaps at conception it was clear that a specialized
   implementation would ultimately be needed in order to perform well in
   the real world, but for prototyping purposes, one of the stock
   implementations would work as well.
   
   Under PCG's approach, changing the underlying implementation requires a
   call to a different list's constructor. Under the usual approach, one
   has to change every reference to the object's type. This can be
   particularly unpleasant if there are routines where the implementation
   is of small importance (i.e. print( const List &l ).) For functions
   such as these, you would need two implementations, one for regular
   lists, one for the specialized list. Yuk.
   
   This problem is not just one of performance. For instance one might
   decide during development that a set should be kept in sorted, rather
   than random order.
   
   Of course, the STL counter-argument would be that the above approach
   means virtual functions, and virtual functions lead to performance
   penalties. This is true, although the cost of a call to a virtual
   function is usually trivial compared to the cost of, say computing a
   hash function or any of the other operations associated with
   collections. But more importantly, in my experience, small penalties
   in run-time performance are well worth the cost if they reduce time to
   market.
   
  GOOD CLASS LIBRARIES HELP WITH REAL-LIFE I/O PROBLEMS
  
   I have never seen a class library that had a really satisfying
   solution to the problems of I/O.
   
   In STL, superficially the emphasis is on istream and ostream.
   PCG makes no mention of these facilities. This is largely a
   religious issue, as I have never been satisified that the facility
   makes the problem of I/O signifigantly better - to me it seems like
   the gains in type-safety and extensibility are at the cost of
   readability. Further, I do not regard the whole problem of
   stream-oriented I/O as particularly important. Most people understand
   how to do this sort of I/O quite well, and, in my possibly limited
   view, few products suffer delays to market due to problems in this
   area.
   
   Even so, the lack of a real enthusiasm for text I/O in PCG is not a
   good thing. I have mainly left it out because I really feel that there
   is a much better model than iostream and stdio waiting to be found
   (or for somebody to boff me over the head with it.) As an example of
   why I think iostream is not the answer, compare text processing using
   C++ to text processing using perl or Icon. Most would agree that both
   of these languages are a quantum leap above C++ with any set of
   classes.
   
   Definitely a more important problem than stream i/o is interfacing
   with windowing systems and databases. Many people have done a good
   deal of very good work, and have basically boiled the problem of
   dealing with windowing systems to converting data to strings and
   calling the appropriate function to display it. PCG, STL and the
   rest make some gestures at assisting in converting objects to strings.
   Some are better than others, but none are signifigantly better.
   
   Storing objects in databases is one area where PCG does some fairly
   good work. Whereas NIHCL and libg++ both support persistent
   stores, PCG goes beyond that by allowing users to create their own
   persistent stores, either from scratch or from existing database
   technology, and further, it allows the data to be stored in any
   format, allowing PCG applications to use and update the data of
   other applications. STL is not intended to directly solve the
   problem of database access, rather, it looks at them through
   "allocators". Taking this approach does much to abstract the issues of
   the database from users, but we have yet to see if it abstracts them a
   little too much. One can write a PCG database so that it looks like
   a collection, but the suggested approach in PCG is to write the
   database to look like a database, and integrate collections over
   certain datasets within the database.
   
   Which approach is the better one is likely to be a matter of religion
   for some time to come. In any case, the STL lobby could point out
   that most of the mappings which PCG discovers at runtime could have
   been discovered and generated at compile-time, and that approach would
   be better than run-time typing for effeciency reasons. I tend to
   agree, except that generating code like that requires (or at least
   suggests) a C++ parser, and other rather sophisticated technology. The
   run-time mappings that PCG uses are, if not blindingly fast, at
   least relatively simple.
   
   Rogue Wave produces an extension to their Tools.h++ called
   DBTools.h++ that provides a very advanced set of libraries for
   dealing with the largest and most popular commercial databases. For
   most commercial applications, this would be a substantial improvement
   over the facilities of PCG and the rest. However, PCG's support of
   smaller, simpler databases and extensibility will be important
   considerations for some people - for instance, one might note that
   while simpler databases are usually less capable than their larger
   cousins, they are occasionally much faster.
   
   As to the problem of input from a stream, PCG does support String
   class and RegExp classes that are much like NIHCL's and
   libg++'s.
   
  GOOD CLASS LIBRARIES ARE CONCEPTUALLY CLEAR
  
   Writing readable code is key in software developed by a team. This
   means that classes, methods and functions should do what their name
   implies, and their name should imply what they do. At the heart of
   this is type safety - an object should tell you exactly what it
   expects its parameters to be and exactly what its result should be. In
   that respect STL surpasses all others, including PCG, in some
   degree. PCG compromises on this a bit for run-time typing and
   performance, but few users will actually see these sections of the
   package.
   
   One point in PCG's favor is its separate treatment of interfaces and
   implementations, ListOf means any list - regardless of implementation.
   Therefor, you could create, say, a bag which apparently contains 100
   constant elements, iterate over this bag, add it to another,
   differently-implemented bag, print it, whatever, all with the same
   functions that work with other bags. In STL, this is not quite so
   easy, and you end up creating classes such as the ConstantIterator
   described in Andrew Koenig's article, File Iterators, in the
   November-December 1994 issue of Journal of Object Oriented
   Programming. This class is apparently an iterator over an array of
   constant objects. With PCG, you can directly talk about collections
   whose members are computed, in STL you can talk about such objects -
   but not with the same clarity as with PCG.
   
  GOOD CLASS LIBRARIES ARE EFFECIENT
  
   As said before, efficiency is very important - it can, occasionally,
   be compromised in favor of development cost, but only so much. As PCG
   does dynamic typing, its performance is comparable to NIHCL and
   libg++, but it does better than them since it knows a bit more
   about the target object, and does not rely on the virtual function
   table for its knowledge. Consequently, NIHCL and libg++
   will, when asked to create a collection of integers, create an array
   of pointers to integer-wrappers allocated from the heap.
   ("Integer-wrappers" meaning subclasses of int with virtual function
   tables e.g. Tools.h++'s RWInteger class.) This sort of approach
   is, obviously, a huge waste of space, compared to a flat array of
   integers, which is what STL and PCG do (essentially).
   
   When it comes to manipulating that array, STL enjoys a slight
   advantage over PCG as the compiler is given more information about
   the data being manipulated, and hence can do better optimizations.
   PCG is also slightly hampered by having to make at least one
   virtual function call to get to the routine, and the fact that some
   typing decisions have to be made at run-time. (I.e. PCG has to
   select the appropriate constructor from its list of virtual
   constructors at run-time, STL computes this at compile-time.)
   
   However, since we know that the same implementations of a collection
   sort of collection can either perform brilliantly or awfully,
   depending on what they are asked to do, STL's monolithic
   ideas about collection implementation should be considered to be a
   serious performance impedement - possibly enough to outweigh the
   advances it made above.
   
   One area of STL that seriously concerns me is that, from my
   experience, use of too many templates leads to enormous code size - a
   problem that C++ has in spades already. Given that C++ is mainly
   intended for large applications - of 100,000 lines or more, with
   perhaps 1000 classes, STL's reliance on templates could lead to a
   circumstance where developers find that just as they are finally
   getting their product near completion, their program's size has
   already far outstripped the memory capacity of their target machine.
   Even in systems where templates have been use carefully, the program's
   size frequently grows so large that the determining factor in its
   performance is not the CPU, but the speed of the virtual memory
   system.
   
   The designers of STL have more advanced knowledge of the future of
   template optimization than I do, so perhaps these problems will not
   blight STL's future. PCG, of course, uses templates too, but it
   has been written so that every template's methods are defined
   inline, and are very brief - almost all are one line and one
   function call. Theoretically, an application that uses PCG's
   template classes would be no larger than the same application using
   the PCG's generic interface. Whether this actually is the case or
   not depends on the good graces of the compiler. In particular, we know
   that inlines are generally not expanded when debugging is turned on,
   as it is through the bulk of the software development cycle.
   
   In all, though, when discussing issues of CPU performance, it is hard
   to say whether the STL libraries, in their final form, will
   outperform the PCG libraries, in their final form in the real world.
   But both are likely to shine over the current crop of class libraries.
   
   
                               FUTURE DIRECTIONS
                                       
   I have released the library to the public domain in the hopes that
   people will find it useful. It has outstripped my capacities to
   continue as the sole developer, so I enthusiastically welcome
   contributions of code, test cases, improved documentation, improved
   packaging, and better ideas on other issues.
   
   The most immediate and important challenges facing PCG are to make
   it more reliable and more portable. That done, new collection
   implementations will need to be added, as well as new database
   interfaces. Whether it should expand in other directions is unknown,
   and perhaps an issue of packaging.
   
                               RELATED MATERIAL
                                       
   For more information on PCG, see A User's Guide to PCG in this
   distribution. Rogue Wave Software can be contacted at 260SW Madison,
   Corvallis, Oregon, 97333, 1-800-487-3217. Note that Rogue Wave is
   planning to release an STL-compliant version of their product in
   1995, and that will supercede the 5.1 release, discussed here. For
   more information on NIHCL, see the book Data Abstraction and
   Object-Oriented Programming in C++ by Keith Gorlen, Sandy Orlow, and
   Perry Plexico (ISBN 0471 92346 X), 1990, John Wiley and Sons. Gnu's
   libg++ can be had via anonymous FTP from prep.ai.mit.edu or
   through the Free Software Foundation.  "Tools.h++" is Copyright (C)
   1989-1992 Rogue Wave Software, Inc.

--
Send submissions for comp.os.linux.announce to: linux-announce@tc.cornell.edu
PLEASE remember Keywords: and a short description of the software.


home help back first fref pref prev next nref lref last post