Saturday, October 23Digital Marketing Journals

Voice Recognition Bakery Solution | by KaiChin Huang | Jun, 2021


Project set up

I follow through the follow guilde to set up my voice kit, including the API credential for Google Voice to text.

Scraping data

Here is where I get all my data, and I use Beautiful4 to retrieve necessary information such as Date, Quantity, Product, Customer, and Dayref to store into panda dataframe. I also clean the data here by coverting all the DayRef cells display ‘#N/A’ to ‘yesterday’, lowercase all the letters, and remove parenthesis.

All the data is stored in one Orders dataframe.

Orders Dataframe

Transform Speech to Text

I call Google Speech to Text API to convert the speech to text, however for this project, there are a lot of product names and customer names which aren’t standard english words, so I have to pass these non-standard product names and customer names to the API as hint words so it can recognize them better.

Besides if the user mention a number in the question. Sometimes the recognition is smart enough to change it to Arabic numerals, but sometimes it won’t, so I will need to handle that situation as well by converting all numbers to Arabic numerals.

The system will get triggerd by “Hey Google” followed by questions. And it will terminate when user say GoodBye !

Intent Classification Model

Currently the model will classify query into one of four categories. These four categories are:

  • customerOrder: what is scout order for tomorrow?
  • productOrder: what is mini croissant order for tomorrow?
  • who: Who gets 10 plain croissants today?
  • quantity: How many baguettes does Novo get today?

Since I only got few questions from the client, so I generate my own questions for that I extract all product names, customer names and times. I use a list of common prefix combined with random entity names to generate quesions. Each category will have 500 training questions.

Common Prefix for customerOrder:
['what is ', 'tell me about ', 'I want to know ', 'do you know ']
Common Prefix for productOrder:
[‘what is ‘, ‘tell me about ‘, ‘I want to know ‘, ‘do you know ‘, ‘can you tell me ‘]
Common Prefix for who:
['who ', 'tell me who ', 'I want to know who ', 'do you know who ']
Common Prefix for quantity:
['how many ', 'tell me how many ']

I have used the Multinomial Naive Bayes algorithm for prediction because I find it easy to implement and it has high accuracy.

The OneVsRest strategy can be used for multi-label learning, where a classifier is used to predict multiple labels for instance. Naive Bayes supports multi-class, but we are in a multi-label scenario, therefore, we wrap Naive Bayes in the OneVsRestClassifier.

Entity Extraction

At first I was thinkg using Spacy to train a custom model for entity extraction, however I found that it’s not necessary since our data set it small so I can retrieve all the possible customer and product names and use that to check if query contains product or customer. However this method sometimes doesn’t work when the speech recognition fail to translate to text.

Input: How many bacon date scone does cornercopia get today?
Recognized: How many bacon date scone does cornucopia get today?

Here cornercopia is the customer name, but sometimes the speech recognition will detect it as cornucopia.

To solve this question, I use gestalt pattern matching algorithm. The idea is to find the longest contiguous matching subsequence that contains no “junk” elements; these “junk” elements are ones that are uninteresting in some sense, such as blank lines or whitespace.

For example, given a query and a list of all possible customer name, if we want to extract customer name from the query, we first determine the length of the customer name, then we find all ngrams from the query where n is the length of the customer name. After that, we find the similarity between the customer name and these ngrams by gestalt pattern matching algorithm. If the similairty score is higher than certain value (defalut 0.9), then we extract the entity from the query.

Find the correct data to Respond

After the model classify the query into one of the four buckets. The program extract the entity from the query. If it can’t extract any entity, it will just respond with “I do not understand, please ask me another question”, otherwise it will create filters based on these entities and get the correct result back from dataframe.

Q: what is scout order for tomorrow?
Intent: customerOrder
Entity: {Time: tomorrow, Customer: Scout}
Response: Scout gets 38 morning bun 20 mini croissant 15 ham and cheese croissant 15 chocolate croissant tomorrow.

Leave a Reply