Digital Pattern Playback

Pattern playback, a device that converts a spectrographic representation back to a speech signal, was developed by Cooper and his colleagues from Haskins Laboratories in the late 1940s [1] and has contributed tremendously to the rapid development of research in speech science [2-4]. Today, we can easily implement a modern pattern playback with digital technology, and this is valuable for pedagogical applications [5,6].

The following algorithm is based on the fast Fourier transform (FFT). In this algorithm, a time slice of a given spectrogram is treated as a logarithmic spectrum of that time frame, and the spectrum is converted back into the time domain by the inverse FFT as shown in Fig. 1. Because we are not reconstructing the original phase, we simply set the phase components to zero.

Fig. 1 Block diagram for the Digital Pattern Playback (FFT-based algorithm) [5].

Figure 2 (a) shows the spectrogram of an original speech signal (56 kB). From Fig. 2 (a), we obtained a reconstructed speech signal (76 kB) using the FFT-based algorithm. Figure 2 (b) is a simplified version of the original spectrogram (a), and Fig. 2 (c) is the spectrogram of a reconstructed signal (76 kB) from the simplified version using the FFT method. In this case, the sampling frequency was 16 kHz, the frame length was 16 ms, and the frame shift was 10 ms (the fundamental frequency was 100 Hz).

Fig. 2: Spectrograms of an utterance “arayuru yasai-o kaikonda”: (a) original signal, (b) simplified version of (a), and (c) reconstructed signal using the FFT method from (b).

Spectrograms (and any images) can be converted into sounds. Please upload the image file by specifying it in the upload form below and pressing the “Upload” button.
The digital pattern playback system converts the file into a sound file.
The frequency range of the spectrogram should be 0-8 kHz;
the length of 200 horizontal pixels are approximately corresponds to 1 second.

Note 1: It takes about 5-10 seconds for conversion after pressing the “Upload” button.
Note 2: Currently, only the JPEG format is supported. The filename extension should be “.jpg” in lower case letters. You can upload an image up to XGA (1024×768) size.

!!! ファイル選択 & アップロード機能!!!

Please download the spectrogram below locally. For saving the image, please right click on the image and select “Save image as…” After saving the image, please upload the image file via the form.
Do you recognize what it says?

http://www.haskins.yale.edu/haskins/MISC/PP/pp.html
F. S. Cooper, A. M. Liberman and J. M. Borst, “The interconversion of audible and visible patterns as a basis for research in the perception of speech,” PNAS, 37, 318-325, 1951.
F. S. Cooper , P. C. Delattre, A. M. Liberman, J. M. Borst and L. J. Gerstman, “Some experiments on the perception of synthetic speech sounds,” J. Acoust. Soc. Am., 24 (6) , 597-606, 1952.
J. M. Borst, “The use of spectrograms for speech analysis and synthesis,” J. Audio Eng. Soc., 4, 14-23, 1956.
T. Arai, K. Yasu and T. Goto, “Digital pattern playback,” Proc. Autumn Meet. Acoust. Soc. Jpn., 429-430, 2005.
T. Arai, K. Yasu and T. Goto, “Digital pattern playback: Converting spectrograms to sound for educational purposes,” Acoust. Sci. & Tech., 27(6), 393-395, 2006.