Science and technology

How to customise your voice assistant with the voice of your alternative

It could be onerous to search out significant items for family that have already got nearly all the things. My spouse and I’ve given our dad and mom “experiences” to strive one thing novel, akin to going to a themed restaurant or seeing a live performance, however as our dad and mom become older, it turns into harder. This 12 months was no exception—till I considered a means open supply might give them one thing actually particular.

What if after they request assist from a man-made intelligence (AI) voice assistant akin to Mycroft, my in-laws might get a particular greeting? I appeared on the current voice assistant APIs to see if one thing like this was already out there. There was one thing shut, however not precisely what I used to be in search of. My concept was to document their great-grandchildren talking a brief greeting that might play every time they push the button and earlier than the dialog with the voice assistant begins. The greeting can be one thing like:

“Good morning, Nana and Poppy. Today is December 25th. The time is 3:10 pm. The current temperature for Waynesboro is 47 degrees. The current temperature for Ocean City is 50 degrees.”

When they press the button, my in-laws would hear their great-grandchildren reporting the date, time, and temperature for his or her dwelling and their favourite trip spot. To make this a actuality, I needed to clear up a number of issues.

So many audio information…

The first downside was determining what phrases the voice assistant would wish to say. Thinking about all of the dates, instances, and temperatures that I would wish to cowl, I arrived at an inventory of 79 phrases. I despatched these directions to my nieces:

Please document the youngsters saying every line beneath. Sorry there are such a lot of. It’s okay to do that in a single setting with prompting them if it makes it simpler. I can edit the audio information and take care of most codecs, so none of that ought to be an issue. Just document utilizing your cellphone in no matter means is best.

Make certain that the youngsters say every line clearly and loudly. There ought to be a slight pause between every line to make enhancing simpler (prompting helps like “Repeat after me …”). That will make it simpler for me to cut these up into particular person sound information.

Whenever the button on the system is pushed, it’s going to reply with a random grandchild saying the right date/time/temperature, like:

“Good afternoon, Nana and Poppy. Today is January third. The time is one oh four pm. The current temperature for Waynesboro is thirty degrees. The current temperature for Ocean City is thirty four degrees.”

PLEASE RECORD EACH CHILD SAYING THE FOLLOWING PHRASES WITH A SHORT PAUSE BETWEEN EACH ONE:

Then I offered the next listing of phrases for the youngsters to document:

Good
morning
afternoon
night
night time

Nana and Poppy

The time
Today 
The present temperature for
Waynesboro
Ocean City
is
and
levels
minus

am
pm

January
February
March
April
May

June
July
August
September
October
November
December
first
second
third
fourth
fifth
sixth
seventh
eighth
ninth
tenth
eleventh
twelfth
thirteenth
fourteenth
fifteenth
sixteenth
seventeenth
eighteenth
nineteenth
twentieth
thirtieth
oh
one
two
three
4
5
six
seven
eight
9
ten
eleven
twelve
13
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
thirty
forty
fifty
sixty
seventy
eighty
ninety
hundred

My nieces are doubly blessed with youngsters beneath 10 years previous and near-infinite endurance. So, after a few months of prodding, I obtained a three-minute audio file for every little one.

Now my downside was how one can edit them. I wanted to normalize the recordings, cut back noise, and chop them into audio clips for particular person phrases and phrases. I additionally needed to benefit from lossless audio, and I made a decision to transform the tracks to Waveform Audio File Format (WAV). Audacity was simply the open supply device to do all of that.

Audacity to the rescue!

Audacity is a feature-rich open supply sound-editing device. The software program’s options and capabilities could be overwhelming, so I will describe the workflow I adopted to perform my targets. I make no claims to being an Audacity knowledgeable, however the steps I adopted appeared to work fairly effectively. (Comments are at all times welcome on how one can enhance what I’ve finished.)

Audacity has downloads for Linux, Windows, and macOS. I grabbed the newest macOS binary and rapidly put in it on my laptop computer. Launching Audacity opens an empty new venture. I imported all the youngsters’s audio information utilizing the Import function.

Normalizing audio information

Some of the youngsters spoke louder than others, so the assorted audio information had completely different quantity ranges. I wanted to normalize the audio tracks in order that the greeting’s quantity can be the identical no matter which little one was talking. To normalize the volumes, I started by deciding on all the audio tracks after they have been imported.

To normalize the youngsters’s peaks and valleys, so one little one wasn’t louder than the opposite, I used Audacity’s Normalize impact.

It’s vital to know that the Normalize and Amplify results do very various things. Normalize adjusts the very best peaks and lowest valleys for a number of tracks, so they’re all related, whereas Amplify exaggerates the prevailing peaks and valleys. If I had used Amplify as an alternative of Normalize, the louder little one would have change into even louder. I used the default settings to normalize the 2 audio tracks.

Remove background noise

Another factor I seen is that there was noise between the spoken phrases on the tracks. Audacity has tooling to assist cut back background noise and end in a lot cleaner audio. To cut back noise, choose a pattern of an audio observe with background noise. I used the View->Zoom menu choice to see the observe’s noise extra simply.

To make sure that I chosen solely the background noise, I listened to the chosen audio clip utilizing the Play button within the toolbar. Next, I chosen Effect->Noise Reduction.

Then I created a Noise Profile utilizing step 1 within the Noise Reduction dialog.

Audacity characterizes the background noise within the audio pattern in order that it may be eliminated. To take away the background noise, I chosen the whole audio observe by urgent the small Select button to the left of the observe.

I utilized the Noise Reduction impact once more, however this time I pressed OK in step 2 of the dialog. I accepted the default settings.

I repeated these steps for every kid’s audio observe, so I had normalized audio tracks, and the background noise was characterised and eliminated.

Export the clips as WAV information

The remaining activity was to zoom and scroll by way of every observe and export the particular clips as separate audio information in WAV format. When working with one kid’s observe, I wanted to mute the opposite tracks utilizing both the small Mute button to the left of every audio observe or, since there have been so many tracks, deciding on the Solo button for the observe I needed to work with.

Selecting every phrase and phrase could be tough, however the means to zoom into an audio observe was my good friend. I attempted to set every audio clip’s begin and finish to only earlier than and simply after the phrase or phrase being spoken. Before exporting any audio clips, I performed the chosen clip utilizing the Play icon on the toolbar to ensure I received all of it.

One fascinating factor is how waveforms map to spoken phrases. The waveforms for “six” and “sixth” are extremely related, with the latter having a smaller audio waveform to the suitable for the “th” sound. I rigorously examined every clip earlier than exporting it to ensure I had captured the complete phrase or phrase.

After deciding on an audio clip for a phrase or phrase, I exported the chosen audio utilizing the File->Export menu.

I had to ensure to avoid wasting every clip utilizing the right file identify from the listing of phrases and phrases. This is as a result of the appliance I used to customise the voice assistant expects the file identify to match an entry within the phrase listing.

The anticipated file names for the audio clips (with out the .wav extension) are listed beneath. Note the underscores throughout the phrases. If you are doing this venture, alter the daring file names to match your family members’ nicknames and placement preferences. You’ll additionally need to make the identical modifications within the software supply code.

good
morning
afternoon
night
night time
nana_and_poppy
the_time
right this moment
the_current_temperature_for
waynesboro
ocean_city

is
and
levels
minus
am
pm
january
february
march
april
could
june
july
august
september
october
november
december
first
second
third
fourth
fifth
sixth
seventh
eighth
ninth
tenth
eleventh
twelfth
thirteenth
fourteenth
fifteenth
sixteenth
seventeenth
eighteenth
nineteenth
twentieth
thirtieth
oh
one
two
three
4
5
six
seven
eight
9
ten
eleven
twelve
13
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
thirty
forty
fifty
sixty
seventy
eighty
ninety
hundred

This venture’s GitHub repository additionally features a Bash script to run as a sanity verify for any lacking or misnamed information.

After selecting every clip’s acceptable identify, I saved the clip within the kid’s particular folder (child1, child2, and so on.) as a WAV format file. I accepted the default export settings.

After exporting all of the audio clips, I had a folder for every little one that was absolutely populated with WAV information for the phrases above. This looks like a variety of work, however it took solely about 90 minutes for every little one, and I received far more environment friendly with every successive audio clip.

Package the appliance

Now that I had the audio clips for the greeting, I wanted to consider the appliance and how one can package deal it. I additionally needed an open source-friendly resolution that was open to modification.

About two years in the past, a colleague gave me a Google AIY Voice Kit that he grabbed from the clearance bin for simply $10. It’s a cleverly folded field containing a speaker, microphone, and customized circuit board. You provide a Raspberry Pi and rapidly have a do-it-yourself Google voice assistant. These kits can be found for buy on-line and in electronics shops. This small field supplied a simple method to package deal the venture.

Customize the voice assistant

The Google package features a Python API and several other Python modules. I adopted the package’s directions to get the preliminary configuration working. The Google Assistant gRPC software program is open supply beneath an Apache 2.zero license.

I tailored the Google Assistant gRPC demo to implement my software. The software’s operation is pretty easy: First, it waits for the system’s button to be pressed. The code then constructs 4 separate phrase lists for: 1. the greeting and date, 2. the present time, three. the present temperature of the primary location, and four. the present temperature of the second location. The youngsters’s voices are randomly shuffled, after which every thesaurus is used to play the audio clips equivalent to the kid assigned to that listing. (This is why it was vital to strictly comply with the naming conference for the audio clips.) The software then initiates a dialog with the Google Assistant API.

At first, I believed the code to collect climate information for the present temperature and convert numbers to phrases can be difficult. This proved to not be the case in any respect. In truth, current open supply Python modules made all of it easy and intuitive.

There have been two instances to be addressed for changing numbers to phrase lists: I wanted to transform ordinal numbers to phrases (e.g., 1 and a couple of to first and second), and I additionally wanted to transform cardinal numbers to phrases (e.g., 28 to twenty-eight). The open supply inflect.py module has features that deal with each instances fairly simply.

import inflect

p = inflect.engine()
quantity = 23

# get a cardinal thesaurus (e.g. ['twenty', 'three'])
print(p.number_to_words(quantity).change('-', ' ').break up(' '))

# get an ordinal thesaurus (e.g. ['twenty', 'third'])
print(p.number_to_words(p.ordinal(quantity)).change('-', ' ').break up(' '))

The inflect engine returns string representations of the numbers with embedded hyphens (e.g., twenty-three) in order that the code splits the strings into variable-length phrase lists by changing the hyphens to areas and splitting the string into an inventory utilizing an area because the delimiter.

The subsequent downside to resolve was getting the present temperature for the 2 areas. Open Weather Map provides a free-tier climate service that permits as much as 60 calls a minute or 1 million calls a month, which is far more than this venture wants. I signed up for the free-tier service and obtained an API key. It was very straightforward to entry the service through the use of the open supply Python wrapper module PyOWM. Here is a simplified code snippet:

import pyowm

# use the OpenWeatherMap API key to get a climate supervisor
owm = pyowm.OWM('YOUR-OPEN-WEATHER-MANAGER-API-KEY')
mgr = owm.weather_manager()

# convert location phrase to metropolis for OWM API
# e.g. ocean_city turns into 'Ocean City, US'
location = 'ocean_city'
metropolis = location.change('_', ' ').title() + ', US'

# get present temperature in fahrenheit for location
commentary = owm_mgr.weather_at_place(metropolis)
temp = spherical(commentary.climate.temperature('fahrenheit')['temp'])

Wrapping it up with a bow

The full supply code for the venture is out there in my GitHub repository. The venture features a systemd service unit file tailored from Google’s demo to routinely begin the appliance on system boot. The GitHub repository contains directions to put in the Python modules and configure the systemd service.

I created a short video of the consequence. Five customized voice assistants have been distributed through the holidays: one every for the nice grandparents and grandparents of every little one. For some, these items introduced tears of pleasure. The youngsters’s voices are completely lovable and these containers seize a fleeting second of childhood that may be loved for a really very long time.

Most Popular

To Top