Dr. James F. Gimpel, the founder and President of Gimpel Software
and principle architect of PC-lint and FlexeLint, offers some personal insight
into his design strategy.
Version 8.00 Designer's Notes
It's been almost four years since we have produced a new version
of PC-lint/FlexeLint (Version 7.5 was first released in September 1997).
But I believe this new version is well worth the wait.
The primary focus of recent development efforts is in the inter-function
value tracking. For example:
int f(int);
int g()
{ return f(0); }
int f( int n )
{ return 10/n; }
By noting and remembering that a call to f(0) was made, we can alert
the programmer to the fact that this would result in a divide by zero.
Initially, the job was perceived as accumulating all the information
concerning arguments passed to a function and initializing the
formal parameters with this composite information. This leads
immediately to the notion of multiple passes over the source input.
All this was implemented fairly rapidly and initial results looked
very promising but there were two problems. We were now
detecting causes of bugs at locations quite remote from where their
effects would be felt. Although that's a good thing in that
we are finding bugs that are not obvious by inspection, it
was a bad thing if we couldn't provide an assignment trail back
to the point of origin. This problem was addressed by tagging
each property that a variable might have with a sequence of
source code locations that contributed to the variable getting
that property. Now, when a condition results
in a semantic message, we provide a "Reference List" of locations
that would help to create that condition.
The reference list raised another issue; how is the programmer
to deal with this information. A list of locations, especially if
they involve several modules, can provide a tedious exercise for
the programmer as he gropes from cause to cause. Happily this
process can be automated quite readily using existing machinery
by providing a separate message (831) whose sole purpose is to provide
a kind of anchor for each location in the reference list.
In effect the same mechanism that you might use to bounce from
message to message can now be used to bounce from reference to
reference.
OK, you say, but why did this take four years?
This had more to do with the second problem alluded to above.
Early testing showed that a composite argument list which represented
the set of all calls so far detected to a function, was much too crude
an abstraction and resulted in too many bogus messages.
Why was this so? Some functions, it seems, are designed in such
a way that there is an intended correlation between arguments.
For example, it may be perfectly OK to pass in a NULL pointer if the
first argument is a 1. This led to a redesign in which
each uniquely discernable call would be tried separately from the
rest in a process we call the Specific Walk.
This had several beneficial properties. Not only would
correlations between arguments be taken care of, but the return value
would be nicely correlated with the arguments which produced it.
Since we are dealing with only one specific call at a time, we can
record each call location.
Specific walks can in turn produce Specific calls, and keeping
around the locations of the calls
enables us to produce, along with the message and the reference list,
a function callback list that further helps in identifying the
remote cause of a warning condition.
Since hundreds of calls to the same function could conceivably
be explored, it was necessary to capture each function in the form
of a tree so that repeated calls would not need to repeat the syntactic
analysis phase of processing the function. This had
the side benefit of producing an internal structure for each function
which could be more rapidly analyzed for a wide variety of purposes.
But converting to tree form was not a walk in the park. There were
some "non-linear" forms we had to contend with such as the one-line
error suppression and our own lint options such as -unreachable that
may be buried in the code. We even came up with a scheme to suppress messages on a block
basis (Option -e{...}) which becomes
embedded in the tree.
We also discovered that there are subtle differences between
Specific Walks and the more general processing. For example,
generally, each statement has a purpose and can be assumed to be
reached some day and we can make deductions on that basis.
However, in Specific Walks, this is no longer the case. For example,
if a parameter has a specific value like 101, we have to make sure
we don't take paths that are in some way inconsistent with that value
or we wind up providing a lot of false hits such as "subscript out
of bounds" in perfectly safe code.
Here, the statement tree came to the rescue.
Since the tree forced us to employ different code in the specific and
general case, it was now a simple matter to make subtle alterations
in the specific case without affecting good sound code in the general.
Now all this may still not sound like four years worth of development, even if you
add in all the many new options and messages that
we've added.
For the residue I plead the complexity of C++ and templates.
Version 7.50 Designer's Notes
Ever since we've added our much heralded value tracking to our
arsenal of program analysis and since we've endowed about 100
built-in functions with special checking of arguments and
return value deductions, many of our users have requested
a more general facility so that user functions, and entire
libraries, conventional and class libraries, may be similarly
checked. Version 7.50 responds to these petitions
by providing a complete but familiar language (the language of
C expressions) to specify constraints for arguments and return
values. To provide problem-oriented constraint specifications,
all forms of C constant manifests are supported including macros,
enumeration's, and const variables. The result is both powerful
and flexible (but what else would you expect?). See example
f.cpp
We have greatly expanded our pointer checking; noting carefully the
origins of pointer values (new, malloc, address
of auto , increment, etc.) so
that obvious incompatibilities can be spotted and such
menaces as the inappropriate deallocation and the dreaded memory leak
can be caught early and dealt with appropriately.
One reviewer (Scott Meyers as it turns out) discovered
that we were deficient in detecting anomalies heralded by a
prominent C++ authority. Red-faced
we increased our C++ sensitivity training, and implemented a whole
new batch of subtle defect checks based on available C++ literature.
This suite
of new checks, all by itself, makes the upgrade worthwhile.
Version 7.00 Designer's Notes
Version 7.00 of our lint represents the culmination of a personal
goal I've had for most of the 10 years we've been developing our lint products.
That goal is to understand the computational sequences sufficiently to
comment on semantic peculiarities as well as the traditional syntactic
idiosyncracies of C/C++ programs. One of the several litmus tests of such
semantic checking was to determine that, for example,
int i;
int a[10];
for ( i = 0; i <= 10; i++)
a[i] = 0;
results in an out-of-bounds reference.
I'm happy to report that with Version 7.00, virtually all of my early
goals, and more, have been achieved with a methodology we call value-tracking.
With value tracking, we are able to ascertain through assignments, initialization
and, even, predicates, what values variables may reasonably be expected
to have. Both auto variables and class data members (for member functions)
are subject to analysis. This ultimately aids in diagnosing those banes
of programming: the out-of-bounds subscript and the out-of-range or Null
pointer. Unlike a compiler or interpreter, things are done on a “possible/conceivable”
basis so that all potential lines of flow are checked for irrational behavior.
With value tracking as an enabling technology, we are able to thoroughly
check calls to standard library functions. With our traditional flexibility,
these checks can be extended to user-defined functions as well.
As if value tracking were not enough, we’ve added several important
new checks involving macros and have added some important new features
involving strong type checking. We've also added lob support for C++ (at
long last).
|
|
Home | Contact | Order
PC-lint and FlexeLint are trademarks of Gimpel Software
Copyright © 2006, Gimpel Software, All rights reserved.
|
|