Gimpel Software
  Order        Patches        Discussion Forum        Blog 
Contact      Site Map       
   Home
   Bug of the Month
   Products
   Order
   Support
   Company
   Links
   Interactive Demo
Dr. James F. Gimpel, the founder and President of Gimpel Software LLC and principle architect of PC-lint and FlexeLint, offers some personal insight into his design strategy.

Version 9.00 Designer's Notes (September 2008)

Version 9.00 represents the first new version in seven years and I believe the wealth of new features have made this version worth the wait.

Static Variables

We now track static variables. This includes all variables of static storage duration that, of course, includes global variables as well as those that are nominally static. Just one of the difficulties of tracking statics is that you then have to know which functions, if called, will modify the static and that information requires a complete extended call graph (the call graph needs to be extended to handle potential overrides of virtual calls).

To combine static variables with interfunction value tracking, it was necessary to combine values of static variables with argument values on each function that was to be walked. But this raised the specter of an entire swarm of static variable values accompanying each Specific Call. Specific Call storage was itself the tall pole in the storage tent and we couldn't survive raising it arbitrarily higher. The problem was resolved by allowing the user to adjust the size of the pool of static variables that each function was required to pass on to its callees. We call this the static depth. A static depth of n means a function will pass on to its callees the values of all static variables it uses (reads or writes) directly, plus all those used directly by functions reachable in n-1 calls.

This allows a programmer to opt for either faster analysis or a more thorough analysis. Circumstances vary; program sizes can differ by orders of magnitude; time requirements vary from seconds to overnight. The static_depth option serves to meet these varying needs.

Multi-threading

For years we had been receiving requests to assist the troubled programmer who is engaged with multi-threaded programs. In particular, we were asked to ensure that thread locks were balanced properly with thread unlocks. Here we leaned on our semantics option facility (-sem) and created the new semantics thread_lock and thread_unlock.

Our next multi-threading concern was shared variables. With call graphs as an enabling technology and a knowledge of critical sections together with a knowledge of which functions served as thread heads (more semantics), we were able to determine unsound accesses to shared variables. One of the difficulties was to form an error message that would adequately describe a situation involving one variable, two functions, two threads and two types of access.

Another bane of developers was in checking for thread unsafe functions and this led to the identification and detection of five distinct categories of thread unsafety with sufficient options to specify them adequately.

Pre-compiled Headers

Pre-compiled headers are an important addition to our feature suite in Version 9.00. Pre-compiled headers (PCH) have long been a feature compilers have used to accelerate the speed of compilation. But these are especially tricky for programs such as lint that need to process multiple modules. For a compiler it means starting off at a particular state if a PCH is included. For lint the situation is different. The first module may not have included the PCH at all and when the PCH is finally included it will be necessary to blend harmoniously the PCH information with prior info. In fact, we initially implemented a notion called a bypass header. An ordinary header could be designated as bypass and, once included, would thereafter be by-passed in the normal inclusion process. Why, we reasoned, if a header would furnish the declaration of a class or function or data object and which declared item would of necessity be placed in a dormant state between passes, could we not emulate a #include of that same file by the simple process of using a magic wand to bring these dormant items back to a normal active state? The answer is we could and did but not before encountering the usual sequence of obstacles (such as statics), that made the job not quite as simple as it had first appeared.

Once finished with by-pass headers, PCH was not too far behind and leveraging off our LOB experience we were able to relatively quickly implement our first PCH header.

Dimensional Analysis

One of the things that bugged me about our strong types was that they did not support the classical dimensional analysis that engineers and physicists have traditionally employed. But spurred on by customer requests, we bit the bullet and can now offer a version that supports this feature. Consider the option:

    -strong( AcJX, Met, Sec, Velocity = Met / Sec )

What's new here is that Velocity is not only defined as a new strong type but that it is hierarchically connected to another new strong type that has the name "Met/Sec". Since the rules of C and C++ preclude using a '/' in the middle of a type name you won't be able to use that name in your programs; rather the name will be deduced by analyzing the expression. Consider:

    Met m; Sec s; Velocity v;
    ...
        v = m * s / s * s;      // mistake
which is taken by the compiler to be
        v = ((m * s) / s) * s;

The type on the right will then be deduced to be "(Met*Sec)" and that is the name used in the resulting warning message.

As you can appreciate, a canonical form for these types needs to be deduced so that type comparisons can be made. The canonical form consists of sorting the component types and removing common factors. A special string processing engine was constructed to make slicing and dicing the strings simple and bug free.

MISRA

Version 9.00 continues our support for the MISRA (Motor Industry Standards Assn.) Reliability Guidelines. The support comes in the form of author files au-misra1.lnt, au-misra2.lnt, and the new (2008) au-misra-cpp.lnt. We attempt to support other guidelines through author files (such as au-sm....lnt for Scott Myers) and there are others on the horizon which we expect to support in future releases. For the most part, author files contain +e options for messages that would have appeared anyway, but they also append messages alluding to sections of a book to further clarify the nature of the warning. This can be invaluable, especially for the introductory programmer who may well need the additional information. Many times we will activate an Elective Note (using +e...) since the message would seem to be of interest only to a minority of programmers.

+esym

In the case of MISRA we have Elective Notes (960, 961, 1960, 1963) that contain a whole barrage of messages since they would appear to be useful only in the adherence to a particular guideline. Nevertheless users found that they may want to enable one or two of these messages without enabling the rest. This was formerly possible only by individually suppressing, using -esym, all but the one or two they were interested in. Needless to say, this was not a very pleasant solution. The solution was to implement a true +esym which would enable a parameterized message only for the given parameter. This required introducing a voting system in which, though the message was generally suppressed, the +esym's vote would tip the balance in favor of issuing the message.

Stack Usage

The MISRA guidelines deprecate recursion and this prompted our efforts to investigate stack usage. Again, leveraging off the call graph technology, we were able to report not only those functions that are recursive but were able to estimate the stack usage of all those functions that were not. See +stack.

Side-effects

The MISRA guidelines suggested a report on expressions that have side-effects on the right side of && and ||. The determination of side-effects turns out to be non-trivial since one has to follow a chain of all calls and this can be done only after all the related functions have been seen which, in our case, becomes pass 2. Determining the side-effects property of expressions is also useful when examining expressions which appear in contexts in which it is only their side-effects that are of interest. This occurs in the first and third subexpressions of the for clause and, of course, in the expression statement.

Output

The Gimpel Software On-Line Demo (created by Reg. Charney) happened to include a color-coded HTML form of the output. This was so neat that we had to implement it for our users. This effect can now be achieved by using env-html.lnt. The file form is probably more beneficial than simply throwing a switch that will blindly crank out the HTML since env-html.lnt and the java script file that it includes contain a number of parameters that can be tweaked to create the precise effect needed.

Pre-Determined Predicates

We haven't yet spoken about the 146 new messages and all the new diagnostic information that these imply. You can read the What's New section to find out about them since in most cases the messages are self-explainatory and from an implementation standpoint not particularly memorable. But the Pre-Determined Predicates messages (587, 588, 589) do stand out. One of our users noted that we should have informed him that the expression:

    (x & 0x10) == 0x11

could never be true. Shamefacedly I realized he was right and that led me on an examination of all those expressions that could be manipulated into the form of ((variable op constant) compare constant). Through the magic of combining variables, flipping arguments and constant folding, this became quite a few. Taking into account signed and unsigned expressions and all of the many C operators, a large table was constructed and conditions established for when the predicate could be pre-determined. This large table was then analyzed by a SNOBOL4 program which produced an extensive amount of code to determine if the conclusions stated in the text were actually true with real live numbers. The entire process took longer than expected but the resulting code is quite solid.

That last statement is hopefully true of Version 9.00 itself.


Version 8.00 Designer's Notes (July 2001)

It's been almost four years since we have produced a new version of PC-lint/FlexeLint (Version 7.5 was first released in September 1997). But I believe this new version is well worth the wait.

The primary focus of recent development efforts is in the inter-function value tracking. For example:

    int f(int);
    int g()
        { return f(0); }
    int f( int n )
        { return 10/n; }

By noting and remembering that a call to f(0) was made, we can alert the programmer to the fact that this would result in a divide by zero.

Initially, the job was perceived as accumulating all the information concerning arguments passed to a function and initializing the formal parameters with this composite information. This leads immediately to the notion of multiple passes over the source input.

All this was implemented fairly rapidly and initial results looked very promising but there were two problems. We were now detecting causes of bugs at locations quite remote from where their effects would be felt. Although that's a good thing in that we are finding bugs that are not obvious by inspection, it was a bad thing if we couldn't provide an assignment trail back to the point of origin. This problem was addressed by tagging each property that a variable might have with a sequence of source code locations that contributed to the variable getting that property. Now, when a condition results in a semantic message, we provide a "Reference List" of locations that would help to create that condition.

The reference list raised another issue; how is the programmer to deal with this information. A list of locations, especially if they involve several modules, can provide a tedious exercise for the programmer as he gropes from cause to cause. Happily this process can be automated quite readily using existing machinery by providing a separate message (831) whose sole purpose is to provide a kind of anchor for each location in the reference list. In effect the same mechanism that you might use to bounce from message to message can now be used to bounce from reference to reference.

OK, you say, but why did this take four years?

This had more to do with the second problem alluded to above. Early testing showed that a composite argument list which represented the set of all calls so far detected to a function, was much too crude an abstraction and resulted in too many bogus messages. Why was this so? Some functions, it seems, are designed in such a way that there is an intended correlation between arguments. For example, it may be perfectly OK to pass in a NULL pointer if the first argument is a 1. This led to a redesign in which each uniquely discernable call would be tried separately from the rest in a process we call the Specific Walk. This had several beneficial properties. Not only would correlations between arguments be taken care of, but the return value would be nicely correlated with the arguments which produced it. Since we are dealing with only one specific call at a time, we can record each call location. Specific walks can in turn produce Specific calls, and keeping around the locations of the calls enables us to produce, along with the message and the reference list, a function callback list that further helps in identifying the remote cause of a warning condition.

Since hundreds of calls to the same function could conceivably be explored,  it was necessary to capture each function in the form of a tree so that repeated calls would not need to repeat the syntactic analysis phase of processing the function. This had the side benefit of producing an internal structure for each function which could be more rapidly analyzed for a wide variety of purposes. But converting to tree form was not a walk in the park. There were some "non-linear" forms we had to contend with such as the one-line error suppression and our own lint options such as -unreachable that may be buried in the code. We even came up with a scheme to suppress messages on a block basis (Option -e{...}) which becomes embedded in the tree.

We also discovered that there are subtle differences between Specific Walks and the more general processing. For example, generally, each statement has a purpose and can be assumed to be reached some day and we can make deductions on that basis. However, in Specific Walks, this is no longer the case. For example, if a parameter has a specific value like 101, we have to make sure we don't take paths that are in some way inconsistent with that value or we wind up providing a lot of false hits such as "subscript out of bounds" in perfectly safe code. Here, the statement tree came to the rescue. Since the tree forced us to employ different code in the specific and general case, it was now a simple matter to make subtle alterations in the specific case without affecting good sound code in the general.

Now all this may still not sound like four years worth of development, even if you add in all the many new options and messages that we've added. For the residue I plead the complexity of C++ and templates.


Version 7.50 Designer's Notes (August 1997)

Ever since we've added our much heralded value tracking to our arsenal of program analysis and since we've endowed about 100 built-in functions with special checking of arguments and return value deductions, many of our users have requested a more general facility so that user functions, and entire libraries, conventional and class libraries, may be similarly checked. Version 7.50 responds to these petitions by providing a complete but familiar language (the language of C expressions) to specify constraints for arguments and return values. To provide problem-oriented constraint specifications, all forms of C constant manifests are supported including macros, enumeration's, and const variables. The result is both powerful and flexible (but what else would you expect?). See example f.cpp

We have greatly expanded our pointer checking; noting carefully the origins of pointer values (new, malloc, address of auto , increment, etc.) so that obvious incompatibilities can be spotted and such menaces as the inappropriate deallocation and the dreaded memory leak can be caught early and dealt with appropriately.

One reviewer (Scott Meyers as it turns out) discovered that we were deficient in detecting anomalies heralded by a prominent C++ authority. Red-faced we increased our C++ sensitivity training, and implemented a whole new batch of subtle defect checks based on available C++ literature. This suite of new checks, all by itself, makes the upgrade worthwhile.


Version 7.00 Designer's Notes (November 1995)

Version 7.00 of our lint represents the culmination of a personal goal I've had for most of the 10 years we've been developing our lint products. That goal is to understand the computational sequences sufficiently to comment on semantic peculiarities as well as the traditional syntactic idiosyncracies of C/C++ programs. One of the several litmus tests of such semantic checking was to determine that, for example,

        int i;
  int a[10];
  for ( i = 0; i <= 10; i++)
    a[i] = 0;

results in an out-of-bounds reference.

I'm happy to report that with Version 7.00, virtually all of my early goals, and more, have been achieved with a methodology we call value-tracking. With value tracking, we are able to ascertain through assignments, initialization and, even, predicates, what values variables may reasonably be expected to have. Both auto variables and class data members (for member functions) are subject to analysis. This ultimately aids in diagnosing those banes of programming: the out-of-bounds subscript and the out-of-range or Null pointer. Unlike a compiler or interpreter, things are done on a “possible/conceivable” basis so that all potential lines of flow are checked for irrational behavior.

With value tracking as an enabling technology, we are able to thoroughly check calls to standard library functions. With our traditional flexibility, these checks can be extended to user-defined functions as well.

As if value tracking were not enough, we’ve added several important new checks involving macros and have added some important new features involving strong type checking. We've also added lob support for C++ (at long last).


Home | Contact | Order

PC-lint and FlexeLint are trademarks of Gimpel Software LLC
Copyright © 2015, Gimpel Software LLC, All rights reserved.