Dr. James F. Gimpel, the founder and President of Gimpel Software
and principle architect of PC-lint and FlexeLint, offers some personal insight
into his design strategy.
Version 9.00 Designer's Notes (September 2008)
Version 9.00 represents the first new version in seven years and I believe
the wealth of new features have made this version worth the wait.
Static Variables
We now track static variables. This includes all variables of static
storage duration that, of course, includes global variables as well as
those that are nominally static. Just one of the difficulties of
tracking statics is that you then have to know which functions, if called,
will modify the static and that information requires a complete
extended call graph (the call graph needs to be extended to handle
potential overrides of virtual calls).
To combine static variables with interfunction value tracking, it
was necessary to combine values of static variables with argument
values on each function that was to be walked. But this raised the
specter of an entire swarm of static variable values accompanying
each Specific Call. Specific Call storage was itself the tall
pole in the storage tent and we couldn't survive raising it arbitrarily
higher. The problem was resolved by allowing the user to adjust
the size of the pool of static variables that each function was
required to pass on to its callees. We call this
the static depth. A static depth of n means a function will pass on
to its callees the values of all static variables it uses
(reads or writes) directly, plus all those used directly by functions
reachable in n-1 calls.
This allows a programmer to opt for either faster analysis or a more thorough
analysis. Circumstances vary; program sizes can differ by orders
of magnitude; time requirements vary from seconds to overnight.
The static_depth option serves to meet these varying needs.
Multi-threading
For years we had been receiving requests to assist
the troubled programmer who is engaged with multi-threaded programs.
In particular, we were asked to ensure that thread
locks were balanced properly with thread unlocks.
Here we leaned on our semantics option facility (-sem) and
created the new semantics thread_lock and thread_unlock.
Our next multi-threading concern was shared variables.
With call graphs as an
enabling technology and a knowledge of critical sections together
with a knowledge of which functions served as thread heads
(more semantics), we
were able to determine unsound accesses to shared variables.
One of the difficulties was to form an error message that would adequately
describe a situation involving one variable, two functions,
two threads and two types of access.
Another bane of developers was in checking for thread unsafe
functions and this led to the identification and detection of five distinct
categories of thread unsafety with sufficient options
to specify them adequately.
Pre-compiled Headers
Pre-compiled headers are an important addition to our feature
suite in Version 9.00. Pre-compiled headers (PCH) have long been
a feature compilers have used to accelerate the speed of compilation.
But these are especially tricky for programs such as lint that need
to process multiple modules. For a compiler it means starting off
at a particular state if a PCH is included. For lint the
situation is different. The first module may not have included
the PCH at all and when the PCH is finally included it will be
necessary to blend harmoniously the PCH information with prior
info. In fact, we initially implemented a notion called a bypass
header. An ordinary header could be designated as bypass and, once
included, would thereafter be by-passed in the normal inclusion
process. Why, we reasoned, if a header would furnish the
declaration of a class or function or data object and which declared item
would of necessity be placed in a dormant state between passes, could
we not emulate a #include of that same file by the simple
process of using a magic wand to bring these dormant items back
to a normal active state? The answer is we could and did but not
before encountering the usual sequence of obstacles
(such as statics), that made the job not quite as simple as it
had first appeared.
Once finished with by-pass headers, PCH was not too far behind and
leveraging off our LOB experience we were able to relatively quickly
implement our first PCH header.
Dimensional Analysis
One of the things that bugged me about our strong types was that
they did not support the classical dimensional analysis that
engineers and physicists have traditionally employed. But spurred on
by customer requests, we bit the bullet and can now offer a version
that supports this feature. Consider the option:
-strong( AcJX, Met, Sec, Velocity = Met / Sec )
What's new here is that Velocity is not only defined as a new
strong type but that it is hierarchically connected to another
new strong type that has the name "Met/Sec".
Since the rules of C and C++ preclude using a '/' in the middle
of a type name you won't be able to use that name in your programs;
rather the name will be deduced by analyzing the expression.
Consider:
Met m; Sec s; Velocity v;
...
v = m * s / s * s; // mistake
which is taken by the compiler to be
v = ((m * s) / s) * s;
The type on the right will then be deduced to be "(Met*Sec)"
and that is the name used in the resulting warning message.
As you can appreciate, a canonical form for these types
needs to be deduced so that type comparisons can be made.
The canonical form consists of sorting the component types and
removing common factors. A special string processing engine was
constructed to make slicing and dicing the strings simple and
bug free.
MISRA
Version 9.00 continues our support for the MISRA (Motor Industry
Standards Assn.) Reliability Guidelines. The support comes
in the form of author files au-misra1.lnt, au-misra2.lnt, and
the new (2008) au-misra-cpp.lnt. We attempt to support other
guidelines through author files (such as au-sm....lnt for
Scott Myers) and there are others on the horizon which we
expect to support in future releases. For the most part,
author files contain +e options for messages that would have
appeared anyway, but they also append messages alluding to sections
of a book to further clarify the nature of the warning.
This can be invaluable, especially for the introductory programmer
who may well need the additional information. Many times
we will activate an Elective Note (using +e...)
since the message would seem to be of interest only to a
minority of programmers.
+esym
In the case of MISRA
we have Elective Notes (960, 961, 1960, 1963) that contain a whole
barrage of messages since they would appear to be useful only in
the adherence to a particular guideline. Nevertheless users
found that they may want to enable one or two of these messages
without enabling the rest. This was formerly possible only by
individually suppressing, using -esym,
all but the one or two they were interested in.
Needless to say, this was not a very pleasant solution.
The solution was to implement a true +esym which would enable
a parameterized message only for the given parameter.
This required introducing a voting system in which, though the message
was generally suppressed, the +esym's vote would tip the balance
in favor of issuing the message.
Stack Usage
The MISRA guidelines deprecate recursion and
this prompted our efforts to investigate stack usage.
Again, leveraging off the call graph technology, we were able
to report not only those functions that are recursive but
were able to estimate the stack usage of all those functions
that were not. See +stack.
Side-effects
The MISRA guidelines suggested a report on expressions
that have side-effects on the right side of && and ||.
The determination of side-effects turns out to be non-trivial
since one has to follow a chain of all calls and this can
be done only after all the related functions have been seen
which, in our case, becomes pass 2.
Determining the side-effects property of expressions is also useful
when examining expressions which appear in contexts in which
it is only their side-effects that are of interest. This occurs
in the first and third subexpressions of the for clause and, of
course, in the expression statement.
Output
The Gimpel Software On-Line Demo (created by Reg. Charney)
happened to include a color-coded HTML form of the output.
This was so neat that we had to implement it for our users.
This effect can now be achieved by using env-html.lnt.
The file form is probably more beneficial than simply throwing a switch
that will blindly crank out the HTML since env-html.lnt and
the java script file that it includes contain a number of
parameters that can be tweaked to create the precise effect needed.
Pre-Determined Predicates
We haven't yet spoken about the 146 new messages and all the
new diagnostic information that these imply.
You can read the What's New section to find out about them since
in most cases the messages are self-explainatory and from
an implementation standpoint not particularly memorable.
But the Pre-Determined Predicates messages (587, 588, 589)
do stand out. One of our users noted that we should have
informed him that the expression:
(x & 0x10) == 0x11
could never be true. Shamefacedly I realized he was right and that
led me on an examination of all those expressions that could be
manipulated into the form of ((variable op constant) compare constant).
Through the magic of combining variables, flipping arguments and
constant folding, this became quite a few.
Taking into account signed and unsigned expressions and all of the
many C operators, a large table was constructed and conditions
established for when the predicate could be pre-determined.
This large table was then analyzed by a SNOBOL4 program which
produced an extensive amount of code to determine if the conclusions
stated in the text were actually true with real live numbers.
The entire process took longer than expected but the resulting code
is quite solid.
That last statement is hopefully true of Version 9.00 itself.
Version 8.00 Designer's Notes (July 2001)
It's been almost four years since we have produced a new version
of PC-lint/FlexeLint (Version 7.5 was first released in September 1997).
But I believe this new version is well worth the wait.
The primary focus of recent development efforts is in the inter-function
value tracking. For example:
int f(int);
int g()
{ return f(0); }
int f( int n )
{ return 10/n; }
By noting and remembering that a call to f(0) was made, we can alert
the programmer to the fact that this would result in a divide by zero.
Initially, the job was perceived as accumulating all the information
concerning arguments passed to a function and initializing the
formal parameters with this composite information. This leads
immediately to the notion of multiple passes over the source input.
All this was implemented fairly rapidly and initial results looked
very promising but there were two problems. We were now
detecting causes of bugs at locations quite remote from where their
effects would be felt. Although that's a good thing in that
we are finding bugs that are not obvious by inspection, it
was a bad thing if we couldn't provide an assignment trail back
to the point of origin. This problem was addressed by tagging
each property that a variable might have with a sequence of
source code locations that contributed to the variable getting
that property. Now, when a condition results
in a semantic message, we provide a "Reference List" of locations
that would help to create that condition.
The reference list raised another issue; how is the programmer
to deal with this information. A list of locations, especially if
they involve several modules, can provide a tedious exercise for
the programmer as he gropes from cause to cause. Happily this
process can be automated quite readily using existing machinery
by providing a separate message (831) whose sole purpose is to provide
a kind of anchor for each location in the reference list.
In effect the same mechanism that you might use to bounce from
message to message can now be used to bounce from reference to
reference.
OK, you say, but why did this take four years?
This had more to do with the second problem alluded to above.
Early testing showed that a composite argument list which represented
the set of all calls so far detected to a function, was much too crude
an abstraction and resulted in too many bogus messages.
Why was this so? Some functions, it seems, are designed in such
a way that there is an intended correlation between arguments.
For example, it may be perfectly OK to pass in a NULL pointer if the
first argument is a 1. This led to a redesign in which
each uniquely discernable call would be tried separately from the
rest in a process we call the Specific Walk.
This had several beneficial properties. Not only would
correlations between arguments be taken care of, but the return value
would be nicely correlated with the arguments which produced it.
Since we are dealing with only one specific call at a time, we can
record each call location.
Specific walks can in turn produce Specific calls, and keeping
around the locations of the calls
enables us to produce, along with the message and the reference list,
a function callback list that further helps in identifying the
remote cause of a warning condition.
Since hundreds of calls to the same function could conceivably
be explored, it was necessary to capture each function in the form
of a tree so that repeated calls would not need to repeat the syntactic
analysis phase of processing the function. This had
the side benefit of producing an internal structure for each function
which could be more rapidly analyzed for a wide variety of purposes.
But converting to tree form was not a walk in the park. There were
some "non-linear" forms we had to contend with such as the one-line
error suppression and our own lint options such as -unreachable that
may be buried in the code. We even came up with a scheme to suppress messages on a block
basis (Option -e{...}) which becomes
embedded in the tree.
We also discovered that there are subtle differences between
Specific Walks and the more general processing. For example,
generally, each statement has a purpose and can be assumed to be
reached some day and we can make deductions on that basis.
However, in Specific Walks, this is no longer the case. For example,
if a parameter has a specific value like 101, we have to make sure
we don't take paths that are in some way inconsistent with that value
or we wind up providing a lot of false hits such as "subscript out
of bounds" in perfectly safe code.
Here, the statement tree came to the rescue.
Since the tree forced us to employ different code in the specific and
general case, it was now a simple matter to make subtle alterations
in the specific case without affecting good sound code in the general.
Now all this may still not sound like four years worth of development, even if you
add in all the many new options and messages that
we've added.
For the residue I plead the complexity of C++ and templates.
Version 7.50 Designer's Notes (August 1997)
Ever since we've added our much heralded value tracking to our
arsenal of program analysis and since we've endowed about 100
built-in functions with special checking of arguments and
return value deductions, many of our users have requested
a more general facility so that user functions, and entire
libraries, conventional and class libraries, may be similarly
checked. Version 7.50 responds to these petitions
by providing a complete but familiar language (the language of
C expressions) to specify constraints for arguments and return
values. To provide problem-oriented constraint specifications,
all forms of C constant manifests are supported including macros,
enumeration's, and const variables. The result is both powerful
and flexible (but what else would you expect?). See example
f.cpp
We have greatly expanded our pointer checking; noting carefully the
origins of pointer values (new, malloc, address
of auto , increment, etc.) so
that obvious incompatibilities can be spotted and such
menaces as the inappropriate deallocation and the dreaded memory leak
can be caught early and dealt with appropriately.
One reviewer (Scott Meyers as it turns out) discovered
that we were deficient in detecting anomalies heralded by a
prominent C++ authority. Red-faced
we increased our C++ sensitivity training, and implemented a whole
new batch of subtle defect checks based on available C++ literature.
This suite of new checks, all by itself, makes the upgrade worthwhile.
Version 7.00 Designer's Notes (November 1995)
Version 7.00 of our lint represents the culmination of a personal
goal I've had for most of the 10 years we've been developing our lint products.
That goal is to understand the computational sequences sufficiently to
comment on semantic peculiarities as well as the traditional syntactic
idiosyncracies of C/C++ programs. One of the several litmus tests of such
semantic checking was to determine that, for example,
int i;
int a[10];
for ( i = 0; i <= 10; i++)
a[i] = 0;
results in an out-of-bounds reference.
I'm happy to report that with Version 7.00, virtually all of my early
goals, and more, have been achieved with a methodology we call value-tracking.
With value tracking, we are able to ascertain through assignments, initialization
and, even, predicates, what values variables may reasonably be expected
to have. Both auto variables and class data members (for member functions)
are subject to analysis. This ultimately aids in diagnosing those banes
of programming: the out-of-bounds subscript and the out-of-range or Null
pointer. Unlike a compiler or interpreter, things are done on a “possible/conceivable”
basis so that all potential lines of flow are checked for irrational behavior.
With value tracking as an enabling technology, we are able to thoroughly
check calls to standard library functions. With our traditional flexibility,
these checks can be extended to user-defined functions as well.
As if value tracking were not enough, we’ve added several important
new checks involving macros and have added some important new features
involving strong type checking. We've also added lob support for C++ (at
long last).
|
|
|
Home | Contact
| Order
PC-lint and FlexeLint are trademarks of Gimpel Software
Copyright © 2011, Gimpel Software, All rights reserved.
|
|