Разработана технология, позволяющая передавать речь без необходимости произношения.
Ролик на русском, кратко описывающий принцип и возможности
Интервью с разработчиком на инглише, переводить лень )
Chief Scientist for Neuroengineering
NASA Ames Research Center, Moffett Field, CA
Chuck Jorgensen is a NASA scientist whose team has begun to computerize human, silent reading using nerve signals in the throat that control speech. In preliminary experiments, they discovered that small, button-sized sensors, stuck under the chin and on either side of the 'Adam's apple,' could gather nerve signals, and send them to a processor and then to a computer program that translates them into words.
NASA Tech Briefs: What is a sub-vocal speech system?
Chuck Jorgensen: Subvocal speech is silent, or sub-auditory, speech, such as when a person silently reads or talks to himself. Biological signals arise when reading or speaking to oneself with or without actual lip or facial movement. A person using the subvocal system thinks of phrases and talks to himself so quietly, it cannot be heard, but the tongue and vocal cords do receive speech signals from the brain.
NTB: How did you study the patterns of the complex nerve signals in the throat that control speech?
Jorgensen: Small electrodes are placed at the location of the tongue and the side of the throat near the larynx, and we differentially capture a signal that is the result of the difference between the two. They are processed first by transforming them into a matrix of wavelet coefficients (a special kind of wavelet called a dual tree wavelet), and secondly by using a neural-net to classify that wavelet coefficient matrix to associate a particular pattern with the signal energies that are present in the magnitude of coefficients. Next, because the mapping is one that we determine, we associate the result of the type of signal with a particular action - such as the writing out of a word or the control of a device.
In the demonstration that we prepared, we controlled a small Mars rover. We took the words stop, go, left, and right, and sent them to the rover on a Mars terrain, and could direct the rover to go to different locations without any audible sound.
In another demonstration, we used the digits 0-9 and some of the control worlds, like go, to control a modified Web browser. We coded the alphabetic characters in terms of a number sequence, basically using a matrix so that the letter A, for example, would be 1,1 in the matrix, the letter B would be 1,2, and so on. This allowed us to spell out the word NASA in the Web browser. We then used the word “go” and when it found the results for NASA, instead of the results being highlighted (like in a normal Web browser), we had the browser number them so that each of the items came back as text with a number on the side. This way we could say, “go to 1, go to 3, or go to 4,” and start moving around and browsing the Web. What we were trying to illustrate with these demonstrations were two different ways of using subvocal speech: (1) to control a device and (2) to control an information source or a computer program.
NTB: How did you “train” the software to recognize speech patterns?
Jorgensen: The wavelet transform is used to create a vector of coefficients. We use about 50 vector elements that are fed to the neural net as inputs. So we have 50 examples of the different words that are then transformed and produce 50 vectors for each training sample. The training sample is fed to the neural net, which associates one of the input samples with an output category, which we assign. So the output category might be that one sample corresponds to when the person said, “Stop,” and another sample corresponds to when the person said, “Go.”
The network is trained to create a mathematical relationship between the pattern that it sees and the label that we choose for it. When it is trained properly by repeating this process many times, the error of the mapping that is produced is reduced to a level where we are then capable of giving it new examples that it hasn’t seen before. It will then be able to correctly map those new examples to the right categories.
NTB: What was the success rate of recognition?
Jorgensen: It has varied since we first started. In the first report it was 92% and now, for small numbers and words, we are up to 99% plus. We’ve added a much larger number of words and are taking a look not just at individual words, but at vowels and consonants, which are the building blocks from which the words are made. We’re using about 40 of those right now and the if we are successful in getting those recognition rates up high enough (we are currently somewhere in the 70s for those), we should be able to feed it directly into a full-blown speech recognition engine, so it wouldn’t have to learn individual words anymore. We would function much in the same way as some of the auditory speech recognition systems work and hopefully be able to use some of that technology.
NTB: How could NASA utilize this technology?
Jorgensen: There are basically three ways that we view NASA using this technology. The first is the case where we have either noisy environments or we have atmospheric conditions; such as the breathing gases that astronauts might be using that change their acoustic patterns. Additionally, under conditions of low pressure, different gas mixes – much like you would have with a deep-sea diver – and microgravity, which changes the muscle responses all over the body, the voice sounds also change. This has caused communication difficulties and also difficulties for recognition with classical speech techniques. That is one of the first applications for the subvocal speech system.
The second application is in the ability of techniques that map directly from the nervous system to a control response to provide design engineers or space habitat engineers with new options for how human beings can communicate with machines. We have demonstrated a similar technology for electromyographic signals that let us type without a keyboard and fly an airplane without a joystick. The idea here is that the designers may not have to have specific workstations for all of the tasks that have to be accomplished. They may be able to have some generic workstations which are accessed electronically or neuro-electronically that provide some flexibility in deciding how much weight can be saved by using some hardwired-types of interfaces like knobs, dials, and handles, and other types of interfaces that might be neuro-electronic that could be anyplace in a building or on a device.
The third area deals with emergency safety. If there is physical injury, for example if the voicebox or arm is injured, and you don’t necessarily have a prosthetic available on Mars, we wanted to have an additional emergency method where someone could take the electrodes or even the existing medical monitoring sensors that they are going to have, and use their electrical signals to access or control a device. An example would be if you’re in a spacesuit and it’s pressurized and so stiff you can’t really type with it you could still move your fingers inside the gloves. Because we’re picking up the electronic signals being sent to the fingers, potentially performing typing operations that could control something inside a station. If a two-man team of astronauts are outside the space station and needed to electronically change some code or enter something, they could have an additional safety system available to them for emergency use.
NTB: Does this system have any commercial applications?
Jorgensen: Quiet cell phones would be one commercial application; possibly communication between divers, is another. Anyone who needs to use noisy haz-mat suits or work in high-noise environments could benefit from this technology. Environments where you want privacy, such as in teleconferencing and you want to talk to someone around the table – the neuro-electronic methods that we are discussing here pick up more than just word patterns that you might have sub-vocally. They can also identify who the speaker is, and track whether the speaker is tired, angry, happy, or sad, so we have a possibility (we have not done this) here of speech enrichment as well as just communication.
NTB: What is the next step for this technology?
Jorgensen: One area that we are investigating is to see how much of a speech system can we generate. We are in the equivalent of the early stages of auditory speech recognition, where we only have one speaker and individual words. Ultimately you want to have multiple speakers and continuous speech.
The second thing concerns the sensors. We do not seriously intend for people to walk around with wires hanging from their throats and grounding lines on their hands. There are both dry sensors that don’t require physical contact and we’ve begun to use those. We are also developing an entirely new type of sensor that doesn’t even have to touch the body, called a capacitive sensor. What we would like to do is be able to embed those sensors in either clothing or some kind of simple appliance that would be very convenient for someone to have where the electrical signals would be picked up in a non-invasive and comfortable way and available any time they wanted to use them for controlling a device.