Speech Recog Board Document
Speech Recog Board Document
Speech Recog Board Document
The Speech Recognition Kit is a complete easy to build programmable speech recognition circuit.
Programmable in the sense that you train the words (or vocal utterances) you want the circuit to
recognize. This kit allows you to experiment with many facets of speech recognition technology.
Specifications
HM2007 - Self-contained stand alone speech recognition circuit.
User Programmable through keys.
2 Nos. 7-Segment Display (To Display Voice Recognize Commands).
4x3 Matrix Key Pads (Train Voice Commands).
32K X 8 Non volatile Memory with Battery Backup.
20 or 40 Word vocabulary, Multi-lingual.
Power ON status Indication LED.
2 Nos. of Port I/O Connector to Easily interfaced to control external circuits & appliances.
MIC Interface.
Applications
9V/5V DC Input
SRAM
32Kx 8 Manual Mode
(Battery Backup)
On-Board 5V
Regulated Supply MIC Interface
HMC 2007
4x3 Keypads Power ON
Train Voice Indication
Commands
Output
Two 7-Segment Termination
Display Headers
1. Mode of Operations
This article details the construction and building of a stand alone trainable speech recognition
circuit that may be interfaced to control just about anything electrical, such as; appliances,
robots, test instruments, VCR's TV's, etc. The circuit is trained (programmed) to recognized
words you want it to recognize.
To control and command an appliance (computer, VCR, TV security system, etc.) by speaking to
it, will make it easier, while increasing the efficiency and effectiveness of working with that
device.
At its most basic level speech recognition allows the user to perform parallel tasks, (i.e. hands
and eyes are busy elsewhere) while continuing to work with the computer or appliance.
This circuit allows one to experiment with many facets of speech recognition technology.
The heart of the circuit is the HM2007 speech recognition integrated circuit. The chip provides
the options of recognizing either forty .96 second words or twenty 1.92 second words. This
circuit allows the user to choose either the .96 second word length (40 word vocabulary) or the
1.92 second word length (20 word vocabulary). For memory the circuit uses an 32K X 8 static
RAM.
The chip has two operational modes; manual mode and CPU mode. The CPU mode is designed
to allow the chip to work under a host computer. This is an attractive approach to speech
recognition for computers because the speech recognition chip operates as a co-processor to
the main CPU. The jobs of listening and recognition doesn't occupying any of the computer's
CPU time. When the HM2007 recognizes a command it can signal an interrupt to the host CPU
and then relay the command code. The HM2007 chip can be cascaded to provide a larger word
recognition library.
The Speech Recognition Board we are building operates in the manual mode. The manual
mode allows one to build a stand alone speech recognition board that doesn't require a host
computer and may be integrated into other devices to utilize speech control.
Applications
Software Approach
Currently most speech recognition systems available today are programs that use personal
computers. The add on programs operate continuously in the background of the computers
operating system (windows, OS/2, etc.). These programs require the computer to be equipped
with a compatible sound card. The disadvantage in this approach is the necessity of a
computer. While these speech programs are impressive, it is not economically viable for
manufacturers to add full blown computer systems to control a washing machine or VCR. At
best the programs add to the processing required of the computer's CPU. There is a noticeable
slow down in the operation and function of the computer when voice recognition is enabled.
Learning to Listen
We take our ability to listen for granted. For instance we are capable of listening to one person
speak among several at a party. We sub-consciously filter out the extemporaneous
conversations and sound. This filtering ability is beyond the capabilities of today's speech
recognition systems.
Speech recognition is classified into two categories, speaker dependent and speaker
independent.
Speaker dependent systems are trained by the individual who will be using the system. These
systems are capable of achieving a high command count and better than 95% accuracy for word
recognition. The drawback to this approach is that the system only responds accurately only to
the individual who trained the system. This is the most common approach employed in
software for personal computers.
Speech recognition systems have another constraint concerning the style of speech they can
recognize. They are three styles of speech: isolated, connected and continuous.
Isolated speech recognition systems can just handle words that are spoken separately. This is
the most common speech recognition systems available today. The user must pause between
each word or command spoken. The speech recognition circuit is set up to identify isolated
words of .96 second lengths.
Connected is a half way point between isolated word and continuous speech recognition.
Allows users to speak multiple words. The HM2007 can be set up to identify words or phrases
1.92 seconds in length. This reduces the word recognition vocabulary number to 20.
Continuous is the natural conversational speech we are use to in everyday life. It is extremely
difficult for a recognizer to shift through the text as the word tend to merge together. For
instance, "Hi, how are you doing?" sounds like "Hi,.howyadoin" Continuous speech recognition
systems are on the market and are under continual development.
3. Speech Recognition Circuit
The demonstration circuit operates in the HM2007's manual mode. This mode uses a simple
keypad and digital display to communicate with and program the HM2007 chip.
When the circuit is turned on, the HM2007 checks the static RAM. If everything checks out the
board displays "00" on the digital display and lights the red LED (READY). It is in the "Ready"
waiting for a command.
To Train
To train the circuit begin by pressing the word number you want to train on the keypad. The
circuit can be trained to recognize up to 40 words. Use any numbers between 1 and 40. For
example press the number "1" to train word number 1. When you press the number(s) on the
keypad the red led will turn off(status). The number is displayed on the digital display. Next
press the "TRAIN" key for train. When the "Train(SW13)" key is pressed it signals the chip to
listen for a training word and the red led turns back on. Now speak the word you want the
circuit to recognize into the microphone clearly. The LED should blink off momentarily, this is a
signal that the word has been accepted.
Continue training new words in the circuit using the procedure outlined above. Press the "2"
key then "TRAIN(SW13)" key to train the second word and so on. The circuit will accept up to
forty words. You do not have to enter 40 words into memory to use the circuit. If you want you
can use as many word spaces as you want.
Testing Recognition
The circuit is continually listening. Repeat a trained word into the microphone. The number of
the word should be displayed on the digital display. For instance if the word "directory" was
trained as word number 25. Saying the word "directory" into the microphone will cause the
number 25 to be displayed.
Error Codes
To erase all the words in the RAM memory (Training) press "99" on the keypad then press the
"CLR" key. The display will scroll through the numbers 1-40 quickly, clearing out the memory.
To erase a single word space press the number of the word you want to clear, then press the
"CLR" key.
This demo circuit allows you to experiment with dependent as well as independent systems.
The system is typically trained as speaker dependent. Meaning the voice that trained the circuit
also uses it.
To train the system for speaker independent recognition (Multi-user) use the following
technique. We will use four word spaces for each target word. Let's arrange the words so that
the words can be recognized by just decoding the lest significant digit (number) on the digital
display.
To accomplish this word spaces 01, 11, 21 and 31 are allocated to the first target word. By only
decoding the least significant digit number, in this case 1 of "X" "1" (where X is any number 0 -
3) we can recognize the target word.
We do this for the remaining word spaces. For instances, the second target word will use word
spaces 02, 12, 22 and 32. We continue in this manner until all the words are programmed.
If possible use a different person speaking the word. This will enable the system to recognize
different voices, inflections and enunciations of the target word. The more system resources
that are allocated for independent recognition the more robust the circuit will become.
There are certain caveats to be aware of. First you are trading off word vocabulary number for
speaker independence. The effective vocabulary drops from forty words to ten words.
The decoding circuit that recognizes the word number and performs a function must be
designed to recognize error codes 55, 66 and 77 and not confuse them with word spaces 5, 6
and 7. Our interface circuit does this.
This HM2007 wasn't designed for use in a voice security system. But this doesn't prevent you
from experimenting with it for that purpose. You may want to use three or four keywords that
must be spoken and recognized in sequence in order to activate a circuit that opens a lock or
allows entry.
Application circuit