Order        Lint 8.00 Patches        Discussion Forum   
Contact      Site Map       
   Home
   Bug of the Month
   Products
   Order
   Support
   Company
   Links
   Interactive Demo
  Search:
site web
Dr. James F. Gimpel, the founder and President of Gimpel Software and principle architect of PC-lint and FlexeLint, offers some personal insight into his design strategy.

Version 8.00 Designer's Notes

It's been almost four years since we have produced a new version of PC-lint/FlexeLint (Version 7.5 was first released in September 1997). But I believe this new version is well worth the wait.

The primary focus of recent development efforts is in the inter-function value tracking. For example:

    int f(int);
    int g()
        { return f(0); }
    int f( int n )
        { return 10/n; }

By noting and remembering that a call to f(0) was made, we can alert the programmer to the fact that this would result in a divide by zero.

Initially, the job was perceived as accumulating all the information concerning arguments passed to a function and initializing the formal parameters with this composite information. This leads immediately to the notion of multiple passes over the source input.

All this was implemented fairly rapidly and initial results looked very promising but there were two problems. We were now detecting causes of bugs at locations quite remote from where their effects would be felt. Although that's a good thing in that we are finding bugs that are not obvious by inspection, it was a bad thing if we couldn't provide an assignment trail back to the point of origin. This problem was addressed by tagging each property that a variable might have with a sequence of source code locations that contributed to the variable getting that property. Now, when a condition results in a semantic message, we provide a "Reference List" of locations that would help to create that condition.

The reference list raised another issue; how is the programmer to deal with this information. A list of locations, especially if they involve several modules, can provide a tedious exercise for the programmer as he gropes from cause to cause. Happily this process can be automated quite readily using existing machinery by providing a separate message (831) whose sole purpose is to provide a kind of anchor for each location in the reference list. In effect the same mechanism that you might use to bounce from message to message can now be used to bounce from reference to reference.

OK, you say, but why did this take four years?

This had more to do with the second problem alluded to above. Early testing showed that a composite argument list which represented the set of all calls so far detected to a function, was much too crude an abstraction and resulted in too many bogus messages. Why was this so? Some functions, it seems, are designed in such a way that there is an intended correlation between arguments. For example, it may be perfectly OK to pass in a NULL pointer if the first argument is a 1. This led to a redesign in which each uniquely discernable call would be tried separately from the rest in a process we call the Specific Walk. This had several beneficial properties. Not only would correlations between arguments be taken care of, but the return value would be nicely correlated with the arguments which produced it. Since we are dealing with only one specific call at a time, we can record each call location. Specific walks can in turn produce Specific calls, and keeping around the locations of the calls enables us to produce, along with the message and the reference list, a function callback list that further helps in identifying the remote cause of a warning condition.

Since hundreds of calls to the same function could conceivably be explored,  it was necessary to capture each function in the form of a tree so that repeated calls would not need to repeat the syntactic analysis phase of processing the function. This had the side benefit of producing an internal structure for each function which could be more rapidly analyzed for a wide variety of purposes. But converting to tree form was not a walk in the park. There were some "non-linear" forms we had to contend with such as the one-line error suppression and our own lint options such as -unreachable that may be buried in the code. We even came up with a scheme to suppress messages on a block basis (Option -e{...}) which becomes embedded in the tree.

We also discovered that there are subtle differences between Specific Walks and the more general processing. For example, generally, each statement has a purpose and can be assumed to be reached some day and we can make deductions on that basis. However, in Specific Walks, this is no longer the case. For example, if a parameter has a specific value like 101, we have to make sure we don't take paths that are in some way inconsistent with that value or we wind up providing a lot of false hits such as "subscript out of bounds" in perfectly safe code. Here, the statement tree came to the rescue. Since the tree forced us to employ different code in the specific and general case, it was now a simple matter to make subtle alterations in the specific case without affecting good sound code in the general.

Now all this may still not sound like four years worth of development, even if you add in all the many new options and messages that we've added. For the residue I plead the complexity of C++ and templates.


Version 7.50 Designer's Notes

Ever since we've added our much heralded value tracking to our arsenal of program analysis and since we've endowed about 100 built-in functions with special checking of arguments and return value deductions, many of our users have requested a more general facility so that user functions, and entire libraries, conventional and class libraries, may be similarly checked. Version 7.50 responds to these petitions by providing a complete but familiar language (the language of C expressions) to specify constraints for arguments and return values. To provide problem-oriented constraint specifications, all forms of C constant manifests are supported including macros, enumeration's, and const variables. The result is both powerful and flexible (but what else would you expect?). See example f.cpp

We have greatly expanded our pointer checking; noting carefully the origins of pointer values (new, malloc, address of auto , increment, etc.) so that obvious incompatibilities can be spotted and such menaces as the inappropriate deallocation and the dreaded memory leak can be caught early and dealt with appropriately.

One reviewer (Scott Meyers as it turns out) discovered that we were deficient in detecting anomalies heralded by a prominent C++ authority. Red-faced we increased our C++ sensitivity training, and implemented a whole new batch of subtle defect checks based on available C++ literature. This suite of new checks, all by itself, makes the upgrade worthwhile.


Version 7.00 Designer's Notes

Version 7.00 of our lint represents the culmination of a personal goal I've had for most of the 10 years we've been developing our lint products. That goal is to understand the computational sequences sufficiently to comment on semantic peculiarities as well as the traditional syntactic idiosyncracies of C/C++ programs. One of the several litmus tests of such semantic checking was to determine that, for example,

        int i;
        int a[10];
        for ( i = 0; i <= 10; i++)
                a[i] = 0;

results in an out-of-bounds reference.

I'm happy to report that with Version 7.00, virtually all of my early goals, and more, have been achieved with a methodology we call value-tracking. With value tracking, we are able to ascertain through assignments, initialization and, even, predicates, what values variables may reasonably be expected to have. Both auto variables and class data members (for member functions) are subject to analysis. This ultimately aids in diagnosing those banes of programming: the out-of-bounds subscript and the out-of-range or Null pointer. Unlike a compiler or interpreter, things are done on a “possible/conceivable” basis so that all potential lines of flow are checked for irrational behavior.

With value tracking as an enabling technology, we are able to thoroughly check calls to standard library functions. With our traditional flexibility, these checks can be extended to user-defined functions as well.

As if value tracking were not enough, we’ve added several important new checks involving macros and have added some important new features involving strong type checking. We've also added lob support for C++ (at long last).


Home | Contact | Order

PC-lint and FlexeLint are trademarks of Gimpel Software
Copyright © 2006, Gimpel Software, All rights reserved.