
that will exercise the features of the IVR, and provide as little “instruction” as pos-
sible. It is fair and often informative to include a task that the IVR will not support.
It allows you to see if a person can determine that the IVR will not do what she
was trying to accomplish or if better feedback is required. Tasks that contain
incorrect or partial information can be used to test the error recovery elements
of the IVR. It is a good idea to test the error recovery paths in addition to the
“sunny-day path” (in which the user does everything just as the IVR expects).
Our preferred test regimen is one or more iterative usability tests on proto-
types of the IVR to improve usability, utility, and accessibility, followed by a sum-
mative usability test on the last prototype or preferably the production system to
characterize the expected performance of the IVR. Iterative testing of a prototype
can quickly improve a design. It is especially effective if a developer who can
modify the prototype is on hand during testing. Using this method, weakness in
the design can be corrected overnight between participants or, in some cases, as
soon as between tasks. With IVRs, it is also helpful if the voice talent who recorded
the prototype is available during the study. In many cases you may fill the roles
of designer, tester, voice talent, and prototyper, giving you full control and
responsibility to use the iterative process to get the IVR in final form.
7.5.2 Signal Detection Analysis Method
One technique that can be used to great effect in IVR testing borrows from signal
detection theory, which is explained quite well in Wickens’s
Engineering Psychol-
ogy and Human Performance
(1984). By observation it can be determined if the
user successfully completes a task. It can be quite instructive, however, to ask
the user if he believes he has completed the task successfully. One hopes that
each task will result in both successful task completion and perceived success, a
“hit” in signal detection terms. The second best outcome is a “correct rejection,”
where the user fails the task and correctly believes that she has failed. In these
two cases, the user has an accurate picture of her situation and can make an intel-
ligent decision as to what to do next. The other two conditions, actual failure with
perceived success (a false alarm) and actual success perceived as a failure
(a miss), cause significant problems if they occur in deployed systems.
Given the task of making a car reservation at an airport, imagine what hap-
pens to the user. In a system that generates false alarm states, the user believes
that he has rented a car when in fact he has not. Most often, some part of the
interaction has given the user the impression that he has finished before having
completed all necessary steps. Perhaps a case of too much feedback, too soon—
or a poorly organized process that does not linearly drive the user to a successful
conclusion. In any event, the user confidently hangs up, boards a flight, and lands
thousands of miles from home without a car reserved. Systems that generate
misses cause users to mistakenly repeat a process to a successful conclusion more
times than they intend. The user is not getting the feedback that he is done, or
7.5 Techniques for Testing the Interfa ce
243