Voice Prompts - Creating your own
[Ref: Recording voice prompts with Audacity and SOX, Convert WAV to Raw format ]
Benoit Frigon’s Custom Voice Prompts is a very good guide on creating your own custom messages, voice prompts.
These notes mostly reflect Benoit’s notes, with additional highlight of areas which I did not follow correctly, and therefore got garbled/broken audio.
If you get garbled audio, please check Benoit’s notes or follow the notes again.
Audacity is free, open source, cross-platform software for recording and editing sounds.
Audacity is available for Windows, Mac, GNU/Linux and other operating systems.
We will be using Audacity for these instructions, and a key configuration item will be to set the audio sampling correctly.
- Track Rate: 16000 Hz
- Project Rate: 16000 Hz
[Ref: Best Practices in Designing Speech User Interfaces]
Benoit’s howto describes recording the messages using audacity, unfortunately I didn’t have that luxury (read: no microphone) which led to a few other complications.
If you’re like me, you are going to just ‘dive in’ and start getting recording voice prompts/messages.
Good approach, as you get an idea of how to use Voice Prompts/messages.
Bad approach if it’s been running for a while and you haven’t reviewed your organisation goals and how better to address them with the assistance of your new Asterisk IVR.
Get the first batch of voice prompts/messages how ever you can, but plan for iterations/updates.
Getting the content
Get your voice messages how you can. When we first did this, a friend recorded the messages on her iPhone (AAC) and emailed it to me.
The better the original quality, the less post-processing/work you have to do to make it sound good.
- Record as a single stream of audio
Record the prompts as a single stream of audio, not several short audio clips. It seems like recording as single clips makes more sense (because that’s how the output will be.) The post-processing work/workflow ends up being a lot more convenient if you record everything as a single stream.
Create a new Audacity project:
- Set the Project Audio Rate
- Import your audio
- Set the Track Audio Rate
- Removing Noise
- Slicing | Labelling
- Generate Audio Files
Project Audio Rate
[Ref: Changing Audio Sample Rate in Audacity]
Set the Project Rate to 16000 Hz.
The below image shows where you can set the Project Rate (Hz). Use your mouse and click on the drop-down box and select 16000.
Import your audio.
[Ref: Importing Audio]
Import the audio/music/announcment using the File –> Import menu as in the below diagram.
Track Audio Rate
Set the audio track to:
- 16000 Hz
If the imported audio is stereo, convert the audio to mono with the menu action
- Tracks –> Stereo Track to Mono to convert the audio.
Note: The left hand panel describes the Audio sampling in our example before the conversion, it is
- 32-bit float
After the conversion, as in the image below. The panel will show a different sampling, and most important we need it to be ‘Mono’
To convert the mono track rate to 16000 Hz, select the audio track and the menu:
- Tracks | Resample …
- Choose 1600 Hz
If we’ve configured our system correctly, you should have at least the following configuration visable on your editor.
Noise, Unwanted audio
[Ref: Noise Reduction]
If you’re like me and our unprofessional recordings come through with a lot of ‘pops’ and other noises that are distracting in the recording, then we need to clean it up.
- Highlight a ‘noisy’ part of the Audio Track that we can use as a “Noise Profile”
Select the Menus
- Effect |
- Noise Reduction |
- Get Noise Profile
Once you have your Noise Profile, you can select the sections of the audio track you wish to clean up and then select:
- Effect |
- Noise | Reduction
- Noise Reduce
Slicing | Labelling
[Ref: Splitting a recording for exporting as separate tracks]
Your audio is clean (or you’re going to dive in and try again later) and the next stage is to export the audio so you can convert to something Asterisk can use.
If you have multiple voice segments, then one method that works better for us is to cut and paste these audio segments into a single larger master audio track and label the track segments for later processing.
The following image indicates how to ‘label’ audio track segments/selections.
The new master audio track simplifies a few things about your audio.
Combine the best recordings.
There will always be some differences in the way your voice artist has recorded the audio.
Sometimes, it works better to combine separate recordings to get the best complete version.
Silence before and after the audio.
Use the master track to provide the ‘natural’ gap between voice segments.
Take into consideration that you may wish to combine the voice segments in different ways.
One file for the prompt
You may wish to generate a single file for the prompt.
The following image shows what your edit screen might look like with judicious naming and audio ‘gaps’ between segments.
- Use the naming convention you will use (e.g. ivr-select-9-to-return-to-main-menu)
The label your nominate will be the default filename used when exporting to a file.
It simplifies the workflow if you use a naming convention that works for you when the file is exported.
For example, the above ‘slice/labels’ will generate filenames such as:
- company 2.raw
Audio (silence) Gap
There is a ‘gap’ of silence between words when we speak, when we look towards creating prompts that are a combination of separately spoken words, then we need to consider what is a ‘natural’ gap of silence in our prompts.
Of course, you don’t have to worry about this if you record every prompt separately. But, it is an interesting challenge that gives you a lot more flexibility on how to make use of the recordings you already have.
The gap you leave between different recordings will be dependent on the gap that sounds natural for the audio collection, the below is Benoit’s guidelines.
- 60 ms either side of prompt
- 100 ms at end of a sentence
The true gap will have to be validated, and some prompts have different silence requirements.
To validate the audio silence of your choice, you can copy/paste the audio you want to combine into another audio-track and mute all other tracks. You can now play, adjust your audio until you’ve discovered the silence spacing that fits your prompts.
- Create a new audio track
- Copy/Paste your voice prompt into this track
- Mute other tracks
- Play/Edit the prompt as necessary
Generate Audio Files
[Ref: Export Multiple
Once we’ve completed our edit process, we need to export the slices as separate audio files because that’s how we want to deal with it in Asterisk.
But before you make the conversion, save yourself anguish by making sure you have the track and project audio sampling configured correctly (as per the below pictures)
The conversion process is based on having the above/below sampling rates for both the track and project at 16000Hz.
We save the audio slices/segments to file by performing an Export. start the Export by choosing the Menu:
File –> Export multiple …
Because we wish to export by the ‘labels’ we’ve created above, then ensure that dialog button is black.
From the [Export Multiple] dialog select:
- for the export format: select Other uncompressed files
- beside “Other compressed files” select the Options… for formatting options.
Inside the “Specify Uncompressed Options”, select:
- Header: RAW (header-less)
- Encoding: Signed 16 bit PCM
Click: [OK] to return to the Export Multiple Dialog.
Click: [Export] in the “Export Multiple Dialog” to export/generate audio files from our labelled selections.
SoX reads and writes audio files in most popular formats and can optionally apply effects to them. It can combine multiple input sources, synthesise audio, and, on many systems, act as a general purpose audio player or a multi-track audio recorder. It also has limited ability to split the input into multiple output files.
All SoX functionality is available using just the sox command. To simplify playing and recording audio, if SoX is invoked as play, the output file is automatically set to be the default sound device, and if invoked as rec, the default sound device is used as an input source. Additionally, the soxi(1) command provides a convenient way to just query audio file header information.
Sox is available as a port package in OpenBSD, and the simplest thing is to install it.
The below script is taken out directly from: Benoit Frigon’s post
Please refer to the above for more details.
mkdir -p alaw mkdir -p ulaw mkdir -p gsm mkdir -p wav mkdir -p sln16
for file in *.raw; do
echo "converting $file_out..." cp $file sln16/$file_out.sln16 sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t raw -r 8k -e a-law -c 1 alaw/$file_out.alaw sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t raw -r 8k -e u-law -c 1 ulaw/$file_out.ulaw sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t gsm -r 8k -c 1 gsm/$file_out.gsm sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t wav -r 8k -c 1 wav/$file_out.wav
done echo ‘—————————————————’ echo ‘Done!’
Asterisk 11.18.0Asterisk*CLI> help file convert
Usage: file convert
Convert from file_in to file_out. If an absolute path is not given, the default Asterisk sounds directory will be used. Example: file convert tt-weasels.gsm tt-weasels.ulaw
Make sure the audio results (from the above conversion) is what you need/expect.
The last thing you want, is for your clients/users to listen to garbled messages.
Play the converted files back on your workstation.
Play the converted files in a test extension.