
Our major research topics include ;
- Speech processing, including speech recognition/understanding/information retrieval, speech synthesis,
voice conversion, and multi-modal dialogue interface.
- Acoustic signal processing, including a microphone array, sound field control/reproduction and
sound field coding.
- A new speech media applied to speech-based universal communications,
such as a quiet speech media (NAM: Non-Audible Murmur), and a new speech separation algorithm
such as Blind Source Separation (BSS).
We are currently researching effective usages of speech for universal communication, multi-modal interface,
sound/speech media utilization in networks/communications, sound effects in multi-media paradigm and speech/sound
applications in real acoustic environments.
- Speech-based natural interface/information retrieval, statistical language modeling and acoustic modeling
for high-quality speech recognition/understanding.
- Robust speech recognition and dialog systems in real acoustical environments.
- Multi-modal speech interface, lip reading, visual agent for speech interface, speech dialogue with a robot,
and Web retrieval by speech recognition.
- Hands-free speech recognition by a microphone array, distant-talking speech recognition in reverberant environments,
speech enhancement by nonlinear array signal processing and speech dialogue robots.
- Blind source separation, fast-learning algorithm on independent component analysis and online sound-source separation
even for moving sounds.
- Speech synthesis by rules, speech analysis-by-synthesis, voice conversion, and speech morphing.
- Sound field reproduction system, robust sound field reproduction in real acoustical environments,
multi-loud-speaker systems, 3-D sound field coding and virtual sound realization.
- Non-Audible Murmur(NAM) applied to non-voice speech recognition, very quiet telephone,
and aids for speech handicapped people.
- Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano,
"Lip Movement Synthesis from Speech Based on Hidden Markov Models,"
Speech Communication, Vol.26, Nos.1-2, pp.105-115, 1998.
- Tetsuya Takiguchi, Satoshi Nakamura, Kiyohiro Shikano,
"HMM-Separation-Based Speech Recognition for a Distant Moving Speaker,"
IEEE Transactions on Speech and Audio Processing, Vol.9, No.2, pp.127-140, 2001.
- Takeshi Yamada, Satoshi Nakamura, Kiyohiro Shikano,
"Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array,"
IEEE Transactions on Speech and Audio Processing, Vol.10, No.2, pp.48-56, 2002.
- Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, and Kiyohiro Shikano,
"A new phonetic tied-mixture model for efficient decoding",
Proc. ICASSP2000, pp.1269-1272, 2000.
- Yosuke Tatekura, Hiroshi Saruwatari, and Kiyohiro Shikano,
"An iterative inverse filter design method for the multichannel sound field reproduction system,"
IEICE Trans. Fundamentals, Vol.E84-A, No.4, pp.991-998, 2001.
- Hiroshi Saruwatari, Toshiya Kawamura, Tsuyoki Nishikawa, Kiyohiro Shikano,
"Fast-Convergence Algorithm for Blind Source Separation Based on Array Signal Processing",
IEICE Trans. Fundamentals, vol.E86-A, no.3, pp.634-639, March 2003
- Tomoya Takatani, Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano,
"High-Fidelity Blind Separation of Acoustic Signals Using SIMO-Model-Based Independent Component
Analysis," IEICE Trans. Fundamentals, Vol.E87-A, No.8, pp. 2063--2072, August 2004
- Hiroshi Saruwatari, Hiroaki Yamajo, Tomoya Takatani, Tsuyoki Nishikawa, and Kiyohiro Shikano,
``Blind separation and deconvolution for convolutive mixture of speech combining SIMO-model-based
ICA and multichannel inverse filtering,'' IEICE Trans. Fundamentals, Vol.E88-A, No.9, pp.2387-2400,
2005
- Yoshitaka Nakajima, Hideki Kashioka, Nick Cambell, Kiyohiro Shikano, ''Non-Audible
Murmur (NAM) Recognition'', IEICE Trans. Information and Systems, Vol.E89-D, No.1, pp.1-8, 2006
- Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura, ``Interface for
Barge-in Free Spoken Dialogue System Using Nullspace Based Sound Field Control and Beamforming,''
IEICE Trans. Fundamentals, Vol.E89-A, No.3, pp.716--726, 2006
- Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, ``Utterance-based
Selective Training for the Automatic Creation of Task-Dependent Acoustic Models,''
IEICE Trans. Information and Systems, Vol.E89-D, No.3, pp.962-969, 2006
- Randy Gomez, Akinobu Lee, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, ``Improving
Rapid Unsupervised Speaker Adaptation based on HMM Sufficient Statistics in Noisy Environments
using Multi-template Models,'' IEICE Trans. Information and Systems, Vol.E89-D, No.3, pp.998-1005,
2006
- Yoshimitsu Mori, Hiroshi Saruwatari, Tomoya Takatani, Satoshi Ukai, Kiyohiro Shikano,
Takashi Hiekata, Youhei Ikeda, Hiroshi Hashimoto, and Takashi Morita, ``Blind Separation of Acoustic
Signals Combining SIMO-Model-Based Independent Component Analysis and Binary Masking,''
EURASIP Journal on Applied Signal Processing, vol. 2006, Article ID 34970, 17 pages, 2006
- Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano, ``Unvoiced Speech
Recognition Using Tissue-conductive Acoustic Sensor,''
EURASIP Journal on Advances in Signal Processing, vol.2007, Article ID 94068, 11 pages, 2007
- Tomoki Toda, Keiichi Tokuda, ``A Speech Parameter Generation Algorithm Considering Global
Variance for HMM-Based Speech Synthesis,'' IEICE Trans. Information and Systems, Vol. E90-D,
No. 5, pp. 816-824, May 2007
- Tomoki Toda, Alan W Black, Keiichi Tokuda, ``Voice Conversion Based on Maximum Likelihood
Estimation of Spectral Parameter Trajectory,'' IEEE Transactions on Audio, Speech and Language
Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007
- Tobias Cincarek, Hiromichi Kawanami, Ryuichi Nishimura, Akinobu Lee, Hiroshi Saruwatari,
Kiyohiro Shikano, ``Development, Long-Term Operation and Portability of a Real-Environment
Speech-oriented Guidance System,'' IEICE Trans. Information and Systems, Vol. E91-D, No. 3,
pp. 576-587, March 2008
The Speech and Acoustic Laboratory is equipped with the following:
- a multi-channel real time AD/DA processing system (64 ch microphone array with PC cluster),
- a sound field reproduction facility with 112 ch AD/DA,
- a moving robot with a microphone array,
- a sound-proof acoustic experiment room, and robot dialog room,
- a PC cluster(more than 100 cpus), file servers (more than 20 tera bytes), and many PCs,
- realworld speech dialog systems (Takemaru-kun in Ikoma North Community Center,
Kita-chan and Kita-Robo at Gakken Kita-Ikoma railway station).
| Grant and Funding (in 2007-2008 |
- Ministry of Education and Science>, Culture, Sports, Science and Technology(MEXT) e-Society Project(Human kindly speech interface),
- Grant-in-Aids for Scientific Research(A) from MEXT.
(speech-based universal communications).
- Ministry of Internal Affaires and Communications(MIC) SCOPE project (Auditory reality,
manipulation and more).
- MIC SCOPE project (Speech morphing algorithm for the aids of speech handicapped people).
- MEXT young researcher Grant-in-Aids (A) (Voice conversion algorithm).
- Ministry of Economy, Trade and Industry(MITI) NEDO project(Robot dialog).
- Research collaboration with Industries, Toyota, Mega-chips, Asahi-Kasei, KDDI, NEC, KOBELCO,
Matsushita, Sony, Yamaha, Sanyo, Ikoma-shi,Hoshiden, Hitachi and more.