Audio for SmartBadge version 4

This page describes some of the things which can be done with audio on the SmartBadge version 4.

Set the sample rate (for input and output) to 8Ksamples/second

To set the samping rate:

	set-sample-rate 8000

It is important to note that the sampling rate can only be set to certain values, thus the exact rates are not necessarily exactly what you ask for:

	set-sample-rate -v
	Failed to set sample rate to 44100
	actual speed is 43200

	set-sample-rate -v -r 16000
	selected audio rate is 16000
	Failed to set sample rate to 16000
	actual speed is 16045

	set-sample-rate -v -r 8000
	selected audio rate is 8000
	Failed to set sample rate to 8000
	actual speed is 8022

See set-sample-rate.c source code for details of the use of the ioctl(dspFD, SNDCTL_DSP_SPEED, &ioctlParam ), which does the actual setting of the sampling rate for the OSS audio driver.

Simple output and input

Copy a file containing audio samples out to the audio device

	cp /opt/Badge4/support/badge_fs/hello_world/cwbeep /tmp/cwbeep

	dd if=/tmp/cwbeep bs=1024 count=4 of=/dev/sound/dsp

Record several seconds of input from the microphone

	dd if=/dev/sound/dsp bs=1024 count=20 of=/tmp/z2

Play several seconds of recorded audio

	dd if=/dev/sound/dsp bs=1024 count=20 of=/tmp/z2

Concatenating audio samples

For a more complex example, consider the modification to Jef Poskanzer's "saytime - audio time hack for the SPARCstation", which resulted in saytime2.c
The most important changes were to change from of 8 bit uLaw (used by the SUN) to 16 bit linear samples (used by the badge) and setting the sampled rate for the output to 8000. The complete set of changes is documented in the source file.

Setting the volume, recording source, etc.

"aumix" by default uses /dev/mixer to make it work you either have to

make a symbolic link from /dev/mixer to /dev/sound/mixer or
explicitly specify the mixer device on the command line.

To find out the current settings:

	# aumix -d /dev/sound/mixer -q
	vol 65, 65
	bass 50, 50
	treble 50, 50
	line 88, 88, R
	mic 88, 88, R

You can adjust the volume by:

	# aumix -d /dev/sound/mixer -v 80:80

	# aumix -d /dev/sound/mixer -v 90:90

	# aumix -d /dev/sound/mixer -v 70:70

To find out all the options, run "aumix" without any arguments.

Playing an MP3 file

Then you can play an mp3 file by specifying its name:

mpg123 /opt/Badge4/maguire/xxx.mp3

Where /opt/Badge4/maguire/xxx.mp3 is an MP3 file.

Alternatively you can copy the MP3 file to /tmp and then play it from there to avoid an problems with your network connectivity while playing it.

Speech synthesis

Note that this section is based on e-mail I sent to [hp_badge4] with the subject: Text to Speech and temperature by voice on Mon, 29 Jul 2002 22:20:44 +0200 (MET DST).

If you would like to have text read to you, you can use the CMU Text To Speech synthesis programs:

  flite -- text message or a file or a URL
and
  flite_time -- generate a phrase version of the time value you give it.

There are systems such as CMU's Flite and University of Edinburgh's Festival:

"Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.

For further information see the web links from http://www-2.cs.cmu.edu/~awb/ and specifically: http://www.speech.cs.cmu.edu/flite/

The 16kHz sample rate version of Flite 1.1 is 5,900,220 bytes in size.

I downloaded the CMU Flite speech synthesis package (which is available from http://www.speech.cs.cmu.edu/flite/index.html). They already have a precompile ARM Linux version (which they ported to the iPAQ). To run it on the badge you fetch the file and unpack it:

	zcat flite-1.1_bin16KHz_arm-linux.tar.gz | tar -xvf -
	# unpacks an executable "flite16k"

	# make the link /dev/dsp link so that it is happy
	ln -s /dev/sound/dsp /dev/dsp

	# with the 2.95.3 files unpacked as per "Using the GNU libraries" section
	# of the Badge4 Embedded Development Kit: Bastille Day Release document

	ln -s /opt/Badge4/2.95.3/arm-linux/lib/ld-linux.so.2 /lib/ld-linux.so.2

	# define a path to the usual gnu libraries
	LD_LIBRARY_PATH=:/lib:/opt/Badge4/2.95.3/arm-linux/lib
	export LD_LIBRARY_PATH

	# you can now run it: for example:
	./flite16k "Hi Chip! How are you? This is a test. 1, 2, 3, 4, 5 ..."

	# for something a bit fancier try:
	./flite16k "The time is now `date +'%A %d %B %Y at %H:%M'`"

There is also a program specifically for time annoucements. Unpack with: zcat flite-1.1_time_bin16KHz_arm-linux.tar.gz | tar -xvf -

	./flite_time `date +'%H:%M'` 
	The time is now, exactly twenty past seven, in the evening.

As their documentation says, it is a scottish accent!

If you give the flite16k program a file name as an argument it will read you this file.

So I put the following text into a file "flite-description.txt".

Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools. Flite 1.1-release is now released as source with a pre-built binary for ipaq Linux. Flite offers: Completely in C (no C++ or Scheme) for portability, size and speed Reimplentation of the core parts of the Festival architecture (HRG) allowing close compabilility between voices built for each system. Voices compiled into C (mostly automatically) from FestVox format voices Thread safe Scalable voice size with all data const so it can be in ROM Target architectures, ipaq (Linux/WinCE) and smaller Flite is in basically written and is in its first stages of testing before release, as free software. A small diphone voice based on the CMU KAL voice is included. along with a sample limited domain talking clock. Here are slides about Flite from a recent talk given by AWB slides.ps or slides.pdf . Here is a recent publication at the 4th ISCA Speech Synthesis Workshop by Alan W Black and Kevin A. Lenzo in html or postscript.

Then I had the system describe itself with:

	./flite16k flite-description.txt

While it is not a great rendering, it is reasonably intelligible.

If you use the my new program report-temp-improved (file: http://www.it.kth.se/~maguire/report-temp-improved.c compiled with: arm-uclibc-gcc -o report-temp-improved report-temp-improved.c )

First install the badge sensors:

	# modprobe badge4_sensorse

Then try it with:

	# ./flite16k "`report-temp-improved`"
	# ./flite16k "`report-temp-improved -c`"

For more fun try:

	# ./flite16k "The temperature is `report-temp-improved -c`"
	# ./flite16k "The temperature is `report-temp-improved`"   
	# ./flite16k "The temperature is `report-temp-improved -cF`"
	# ./flite16k "The temperature is `report-temp-improved -c`" 
	# ./flite16k "The temperature is `report-temp-improved -cFt`"
	# ./flite16k "The temperature is `report-temp-improved -ct`" 
	# ./flite16k "The temperature is `report-temp-improved -Ft`"

Yes, I know it is a very expensive talking temperature device; but it is pretty slick how easy it is to make such an application. The next obvious application is probably a program that runs in the background and invokes a script when the temperature exceeds a certain setpoint.

To generate new voices for Flite, you can use the Festvox software, see http://festvox.org/

Speaker recognition

Interesting URLs:

http://www.cen.uiuc.edu/~romanows/proj/spkrrec02/spkrrec02d.html
http://herakles.imag.fr/besacier/Publis/icassp2000.pdf
http://www.cs.joensuu.fi/pages/tkinnu/research/ and the code is at: ftp://ftp.cs.joensuu.fi/franti/softat/speech/Hautamaki_erikoistyo.zip
see the very nice short report: http://www.ece.cmu.edu/~ee551/Final_Reports/Gr11.551.S00.pdf
http://www.pku.edu.cn/academic/xb/2001/_01e314.html claims that C_0 and C_1 are generally harmful to speaker recognition, while C_2 to C_16 are most useful for speaker recognition (using MFCC), while C_2 to C_12 were found to contain the most useful speech information.
COST250 reference system: http://www.speech.kth.se/cost250/refsys/latest/doc/
Tutorial: http://www.ee.columbia.edu/~patricia/papers/tutorials/tutorial.pdf