Two of Three Automatic Module Test Methods

James Alan Farrell
Ensco, Inc.
3 Holiday Hill rd
Endicott, NY 13760

Abstract

This paper will present the concepts of and issues surrounding automatic module testing (AMT), and discuss two methods for automating module testing (1). The advantages and shortcomings of the automatic module test concept will be discussed, as will the advantages and shortcomings of both test methods.

In recent years more and more interest has been raised in the area of AMT. Companies that develop software, and especially companies that develop system critical software, spend billions of dollars annually in testing software. This process is laborious and repetitive. If the process could be automated literally billions of dollars could be saved annually. In addition, programmers would be freed up for more interesting and challenging tasks.

Until recently the idea of AMT was rejected, since it is essentially equivelant to the halting problem. This theorem tells us that a perfect ATM tool cannot be created. Never-the-less, a less than perfect tool can save large sums of money, and the better the tool is constructed the more it can save.

1. The Concept of Automated Module Testing

1.1 Testing

To verify a software system a number of test types are performed. Module tests prove that the inner workings of each function are correct. Integration testing proves that the interfaces between the modules are correct: that the proper parameters, each with the proper type, is passed on the stack or in registers. Theoretically these two tests types, if performed rigorously and completely, can verify a software system 100%.

In practice, however, it is near impossible to completely test any function of any size. Some functions are more testable than others. Functions that require user input or recursive functions can be particularly troublesome to test. Many companies, therefore, do not test modules completely. Instead they implement a round of alpha and beta user tests. During this level of testing, the software package is given to users who are made aware that the software could contain bugs. These users then use the program and notify the developers of the bugs they find. At the end of the beta test period the most commonly used functionality of the software should be (almost) bug free. Bugs can still lurk in lesser used portions of the software, however, as attested to by the amount of highly used software on the market that does contain bugs.

This method of software testing is, of course, less than perfect. For system critical software, such as software used by a jet engine controller, much more rigorous testing must be carried out. Software modules must be coded to standards that assure that they are testable. Even so, it is a near impossible task to completely test a module.

1.2 Module Testing

To completely test a module, each input to the module must be tested over its entire range. That is, each input must be set to its minimum value, and the module must be run to make sure the outcome is as expected. Then one input must be incremented by the minimum that input is capable of being incremented by, and the module must be run again. Then the input is reset to its minimum and the next input is increment. Now the first input has to be incremented through its entire range again. By the time all inputs have been through their entire ranges, the module has been exhaustively tested.

This method is exhaustive, but it is also impractical. It has been written that to exhaustively test a standard IEEE floating point divide would take more than fifteen billion years (about the time the universe has been in existance - Pentium users take note!) So more efficient, less exhaustive techniques must be found.

Three standard methods of testing are Black Box, White Box, and Intuitive testing. Black box testing consists of developing tests based on the specifications of function. The tester has no knowledge of how the function is coded. Such tests generally consist of setting each input to its maximum and to its minimum, in combination with all others. Tests should are also run with each input set to zero, and with each input set to positive and negative midpoints (somewhere between the extremum of the input's range and zero). The tester compares the outputs of these tests to what the specifications indicates the output should be.

White box testing involves looking at the implementation of the code to determine where it might fail. The tester looks for things such as divides and attempts to set up the inputs so a divide by zero occurs, or square roots and attempts to perform a square root of a negative number. The tester might also run the module on a simulator or debugger that will allow break points to be set on decision points and loop exits to determine that they behave as they should.

Intuitive testing is performed after black box and white box testing. The tester uses knowledge of where gained from these tests to try and find holes in the implementation. There are no hard and fast rules for this type of testing. The tester simply takes potshots at the module and tries to make if fail.

These tests are usually performed in the order shown here. Someone other than the coder should always be made responsible for each test (the same person is often made responsible for all three tests). The coder might perform some intuitive testing, but a seperate tester should be made responsible for passing or failing the module.

1.3 Automated Module Testing

Black box and white box testing can be highly automated. If the module specifications can be put into a machine readable form, perhaps using a CASE tool, that specification can be used to create black box tests. If the specification is machine readable, then that specification can be used to generate code. If the that specification is used to create code then that code can be used for white box testing.

Intuitive testing cannot be performed by a computer.

The idea the black box and white box testing can be performed by a computer is not perfect. It is often felt that there is a conflict of interest if the computer creates the code and the module tests. This is the same situation as having the coder perform the testing. If there is a bug in the code generator, that same bug could show in the test generator. This problem will be addressed in this paper.

2. The AMT Process

The process for automatically testing a module generally runs something like this: First the module is specified, using a code generating CASE tool, such as BEACON or MatrixX. Prime code is generated for the target platform, and separate simulation code is generated. In addition a tool must be available to create a test case file, which generates black box and white box test cases. The test case file is then fed into a simulator, along with the prime code. Another simulator is run with the simulation code. The output from the two simulation runs is compared. Any differences raise a flag, and must be investigated. This indicates an error either in the prime code or the simulation code. Figure 1 illustrates this process.


Figure 1. The AMT Process

For this process to work correctly, there must be two independent paths. The simulation code cannot be produced by the same path that the prime code is produced by. If the both sets of code are produced by the same path then a bug could show up in both code sets. Then both simulation runs will produce the same erroneous output, and the problem will not be flagged. The test cases should be produced by one of these two paths. If the test cases are produced by a third independent path a bug could cause a needed test case not to be generated. This will be discussed below.

The term path requires precise definition. A path starts at the diagram produced by the CASE tool. This is seen to be the module specification. The path then is the process by which code or tests cases is produced. Keeping paths separate can be tricky. For example, the diagram must be represented internally somehow, perhaps as a parse tree. If the parse tree is used by both code generators, then it must be proven that the parse tree exactly represents the diagram. If not, there must be a lower level representation of the diagram in memory. This lower level representation must then be used to create two independent parse trees, which are used by each of the two independent code generators.

When the module test fails due to both branches containing common elements, the failure is termed a single point failure.

3. Code Generation

Methods of code generation are discussed elsewhere[1], so they need not be covered in this paper. Keep in mind that there must be two paths for code generation. The method for doing this is straight forward: implement two code generators, that work either directly from the diagram, or from a structure that is proven to be equivelant to the diagram.

4. Test Case Generation, the Iterative Method

There are several methods that can be used to generate test cases. This paper will present two. The first approach, called the Iterative method, takes an approach similar to the exhaustive method described above. The range for each input is specified to the test case generator, along with a step value. The test case generator steps each input combinatorially by the step count.

With inputs where the range is symmetric around zero, the black box test cases can be generated easily and efficiently by setting the step count such that five values will be hit for each input. These five values will be the minimum, maximum, zero, positive midrange point and negative midrange point.

For test cases that come closer to being exhaustive, a smaller step count can be used. This can be time consuming, however. The time this approach takes is on the order of mn, where n is the number of inputs, and m is the step count. Complex signal flow diagrams with about twenty inputs and a high step count can take 12 hours or more on a SPARC-10. Of course 12 hours on a SPARC-10 is quite cheap compared to three or four days of programmer time.

There is another disadvantage to this method, however. If the step count is set to 5, and there is a divide by input minus 3, there is a potential divide by zero that could be missed. The only way this tool can be sure to catch all potential problems is to set the step count to the precision of the inputs. Then we are faced with the same scenario described above, where testing the diagram will require fifteen billion years.

Thus an iterative tool can be used to generate black box tests, but white box test generation becomes a combination of using the tool and having a programmer determine what tests the tool missed and providing those tests by hand. It becomes possible that the tool will end up in this case costing more time than it saves, because now the programmer needs to analyze the diagram AND the test cases created by the tool.

5. Test Case Generation, the Regressive Method

The regressive method of test case generation is well suited to signal flow diagrams. A range is specified for each input. The next block downstream from an input is stressed by upper and lower bounds of the range, and (perhaps) zero. The next block downstream from that block is stressed by the highest and lowest possible outputs of the previous block, and (perhaps) zero. The diagram is traversed from each input, and each time a new block is encountered, the diagram is traversed backwards (regressed) to find the diagram inputs needed to stress the block. When a diagram input is reached, the input value needed by that input to stress the target block is recorded. This regression must occur for all non-I/O blocks on the diagram. The tool can also test for singularities (divide by zero, square root of a negative, etc). Figure 2 illustrates this process.


Figure 2. The Regressive AMT process

For regressive test case generation to work the inverse for each block on the diagram needs to be found, so that the inputs needed to give the highest and lowest outputs can be found. The best method for implenting this would be to automatically determine the inverse function. This generally is not feasible. Thus a separate inverse function must be coded for each block used. Thus the test cases are not coded by one of the paths through which code is generated. If there is a bug in the coding of an inverse function, it is possible that a needed test case would not be generated.

Once test cases have been found for all blocks, the test cases have to be compared for repeats. It is common with this method of testing that a number of blocks will be tested by the same test case.

The remaining set of test cases are not complete. Each test case will now stress a subset of blocks on the diagram. Inputs that connect only to blocks that are not being stressed will not have values. It often happens that different tests will test mutually exclusive sets of blocks. In other words two tests might have completely different sets of inputs. If this is the case, then those two tests can be combined. It is also possible that tests can overlap partially, and the overlapping inputs have the same values. In this case these tests can again be combined. Once all possible test cases have been combined, dummy values are assigned to inputs that do not yet have input values.

This method of test case generation is more efficient than the iterative method. Its time complexity is x*Log(xy) where x is the number of blocks in the diagram and y is the average number of inputs to each block. On most diagrams y is close to 1, so the time complexity for most diagrams is close to x*Log(x).

Regressive test case generation comes much closer to the concept of white box testing than iterative test case generation. It does suffer from some problems however. The first problem is that it has a very local view of how blocks should be stressed. When an input splits, one branch undergoes a change of range, and the two paths are then rejoined with a change of sign, the regressive test generator comes up with two incorrect values for one input for one test case. This problem has been labelled the branch/merge problem. Figure 3 illustrates the branch/merge problem.


Figure 3. The Branch/Merge Problem

In figure 3 the range that regressive test generation sees is in square brackets, and the actual range is in curly braces. Note the ranges on the output. The problem occurs because regressive test generation only looks at the inputs from the summing junction when determining the output range from the summing junction. The algorithm will think that the lowest value the that output can hold is -6, and the input will have to provide a -2 to the summing junction and a 3 to the gain block at the same time.

The problem of ambiguous test cases can be solved easily by breaking the test case into two. One of the new test cases will test at the lower bound, and the other will test at the upper bound. The problem of generating incorrect ranges has proven more difficult to solve. Since the tool generates a range of -6..9, and that range is never reached, the simulator is in the habit of flagging a test error. The output has not been tested to the extent of its range. The only method found to "correct" this problem has been to document it!

The branch/merge problem results in quirky behavior for the regressive test generator tool, but it has been found in practice that the module testers quickly come to understand the quirks and limitations of the tool, and learn what to look for.

The other, much larger limitation of this tool is that it does not work very well for control flow (flowchart) diagrams. In particular, this method cannot find the range of a loop in a control flow diagram.

6. Conclusion

This paper has looked at the concept of automatic module test generation, discussed its strengths points and weaknesses. In particular, it is inexpensive compared to traditional test methods, but currently modules cannot be fully tested using automatic methods. A human tester still needs to review the results of the process, and fill in white box and intuitive tests. Another short coming is that it is difficult to implement an AMT process that does not suffer from single point failure.

Two methods of AMT were also presented. The first, the iterative approach, iterates all inputs over their ranges, by a step count given by the tester. This approach is slow for complex diagrams, since it suffers from combinatorial explosion. It also could miss test cases in ways that are difficult or time consuming for the tester to catch.

The second method of AMT is the regressive method. This method traverses the diagram forward, then for each block traverses backwards until it finds all inputs that contribute to the block, building a list of input values needed to give the block its greatest output range. This is much more efficient than the iterative method, but the method is quirky. It has been found in practice that module testers quickly learn the quirks of the system, and can use it efficiently. This method also lacks flexibility.

7. End Notes

1 While developing the BEACON Automatic Unit Test Tool three possible methods were discussed. Both the methods discussed by this paper were tried then rejected as being too innefficient or clumsy. The designer of the method that was adopted is now investigating patenting that method. Once the issue of patents has been cleared up a paper on that method might appear here.

8. References

[1] Rimvall, et al. An Open Architecture for Automatic Code Generation using the BEACON CACE Environment. Joint IEEC/IFAC Symposium on Computer-Aided Control System Design, Tuscon, AZ, March 1994.

This page last update 02/25/95

Return to CASE Tools Page

Return to HiTech Page