How to build your own AI-powered voice assistant

Ever wondered how Google assistant and Siri can speak with us exactly like humans. This is the magic of Deep Learning.

Data Flow Diagram For Voice Assistant

The above diagram will help you to get an overview of how the process happens inside the voice assistant.
First I will explain each process in-depth and in the end, I will summarise the entire process with the help of an example.
To understand the coordination among each process and visualize the flow of data let's summarise the whole process with the help of an example.

Suppose you raise a query to the voice assistant “Who is Shahrukh Khan”. Before beginning any kind of processing to understand your command the first step that needs to be done is, your voice i.e audio is converted into text, this is called speech to text. After converting the speech to text, on the generated text we will perform intent classification and entity recognition.

Intent classification can be thought of as mapping a query to an action that is needed to be performed by the voice assistant. In our case, the generated intent can be ‘Search’ which means we want to search for something.
The next step after intent classification is Entity Recognition. In this step, we find what are the entities in our sentence. In our example, ‘Sharukh Khan’ is an entity that can be categorized as a person.
Now we know that our intent is ‘Search’ and our entity is ‘Shahrukh Khan’ so from this the voice assistant can figure out that we need to search about Shahrukh Khan and the information we get should be conveyed to the user. This is what we do while predicting the response. Now in the final step, we convert our response which is in the text form to speech and the audio is outputted to the user. This is how the entire process happens.

