How to Use a Voice Switch

Introduction

A voice switch (sometimes known as a VOX) generates electrical events that are synchronised with sound events. A sound event, such as a spoken word, actually consists of two events - the start and the end. The voice-switch generates an electrical on or off signal in response to a sound. The sound must be supplied to the voice-switch by an audio-electrical device, such as a microphone, tape recorder, or TV.

Description of Outputs

The voice- switch produces a three main outputs, all synchronised with the sound. The first (labeled VOX) is on for as long as the sound is detected. The second (labeled L-H) is on only when the sound starts, and the third (labeled H-L) is on only when the sound ends. In addition, each of these outputs has a matching, but inverted, output. The  inverted output is on while its counterpart is off, and vice-versa. By providing non-inverted and inverted outputs, it is possible to control a much wider range of equipment. A summary of the outputs follows:

Some equipment is activated by the steady presence of a signal, and the VOX output (or its inverted output) should be used for this. Other equipment is activated by a change of level from off to on, or on to off. For such equipment, the L-H and H-L outputs (or their inverted outputs) should be used. Some devices cannot be stopped while the start signal is still on. For this reason, the L-H and H-L outputs (and their inverted counterparts) are designed to last 10 microseconds, which is long enough to trigger all out devices. If the output stayed on longer than the sound did, the controlled device might not stop properly.

Suggested Uses

In all typical cases, the timer is stopped by the onset of a subject's vocal response. This is detected by a microphone connected to the input of the voice-switch, and the L-H output of the voice-switch is connected to the timer's stop input. The only thing that varies in the examples below is the means of starting the timer.

Timing Vocal Responses to Visual Stimuli

Visual stimuli can be presented by a computer, and by a tachistoscope. In both cases, the timer is started when the stimulus appears, with the tachistoscope or computer providing a synchronised signal to the start input of the timer.

Timing Vocal Responses to Audio Stimuli

Audio stimuli can be presented by a computer or an audio device such as a tone generator, or a hi-fi device. Audio events are normally discrete binary tones with a steady amplitude, which are easy to delimit with the voice-switch. The timer might be started at the onset of the sound stimulus, by connecting its start input to the L-H output of the voice-switch. Alternatively, the timer can be started at the end of the sound by connecting the timer's start input to the voice switch's H-L output.

Timing Vocal Responses to Vocal Stimuli

In this case, the stimuli are presumably spoken words. The timer would normally be started at the end of the stimulus, as the sound would have to be completed before its full semantic content could be understood. Two participants might be present - the person saying the stimuli, and the subject responding to them. Alternatively, the stimuli could be pre-recorded onto a cassette tape. Either way, two voice-switches and microphones would be needed - one for each sound source. Apart from the voice-switch and microphone for the subject, one voice-switch would be connected to a microphone (or cassette player) monitoring the participant saying the stimuli. The H-L output of this voice-switch would be connected to the start input of the timer.

Issues in Detecting Sounds

The problem with detecting any sound event is that it is difficult to define where the sound starts and ends. Discrete, binary sounds are either off or on, and when on they have a constant level (amplitude). However, most sounds we encounter are not so easily delimited. Vocal sounds, for example, usually contain a major subjective component, and even the same sound is variable in pitch, amplitude and duration.

In the case of spoken words, the problems include:

In order to handle different sound amplitudes, the voice-switch has a widely variable volume control. If the sensitivity is too low, the sound might not be detected at all, or only the loudest components (such as stressed consonants) will be detected. On the other hand, if the sensitivity is too high, background sounds will be detected. The sensitivity control should be adjusted with the microphone in situ, and for each new subject. Also, the microphone should be held as close to the subject as possible, so that the chance of extraneous sounds being detected is reduced.


Home About Me
Copyright © Neil Carter

Content last updated: 2000-08-22