The first Alexa-enabled smart speakers debuted on Amazon in 2014. It was a unique device: A natural language processing interface voice activated that could do a variety of tasks.
Today, the internet-connected platform is rapidly expanding and has become an electronic ecosystem. Alexa is almost ubiquitous in the role of virtual assistant, with thousands upon thousands of Alexa enabled devices and millions sold.
Although Alexa can now be found in all things, from TVs and microwaves to headphones, Amazon’s vision for ambient computing is still in its early stages. Although there have been many advances in artificial intelligence and natural language processing, it is still a vast area of work to be done.
Amazon hopes that these devices will eventually be able to understand and support users as effectively as human assistants. However, this requires significant advancements in many areas including context decision-making and reasoning.
For a deeper look into Alexa’s potential and ambient computing, I interviewed Senior Vice President and Chief Scientist Alexa Rohitprasad to learn more about Amazon’s plans for this platform and the future of the virtual assistant.
Richard YonckAlexa can sometimes be called “ambient computing” by some. What are some examples of ambient AI use cases?
Rohit PrasadAmbient computing refers to technology that is available when you are in need and disappears when you’re not. Ambient computing anticipates and simplifies life by being there when you need it, without becoming intrusive. Alexa is a great example of this.RoutinesYou can automate certain functions in your house, such as turning off your lights when it is dark, or useAlexa GuardAlexa will notify you when it hears glass breaking, or smoke alarms.
YonckYour most recentCogX PresentationYou mentioned Alexa “getting into reason and autonomy for your behalf.”
PrasadWe have today features such asHunchesAlexa can suggest actions that you should take when Alexa detects anomalous data. This could include alerting you to the fact that your garage door opens when you go to bed or reordering your ink cartridges when it is low. Ring Video Doorbell Pro owners can now have Alexa greet visitors, offer directions, or take messages.
We’ve made progress to contextual decision-making and have taken first steps in reasoning and autonomy via Self-Learning, which is Alexa’s ability improve its abilities and expand them without any human intervention. We took another step last year with an Alexa feature that could infer the customer’s hidden goal. Imagine a customer asking Alexa for weather information at the beach. Alexa may use this request along with additional contextual information to determine that they might be interested in taking a trip there.
YonckEdge computing refers to computing that is performed near devices rather than on the internet. Are you convinced that enough processing of Alexa can be performed at the edge in order to reduce latency and support federated learning?
PrasadSince the introduction of Echo and Alexa, 2014, we have combined cloud processing with edge processing. It is mutually beneficial. The location of computing will be affected by several factors, including latency and connectivity.
We understood, for example that basic functionality would be required even when there is no internet connectivity. In 2018, we introduced a hybrid mode that allowed smart home functions, such as controlling switches and lights, to continue working even if connectivity is lost. The same applies for Alexa while on the move, even in cars where connectivity may be poor.
We have developed a variety of techniques over the years to ensure that neural networks are efficient enough to be run on any device and to minimize memory usage and computation footprint. This has allowed us to maintain accuracy while reducing computational and storage requirements. With neural accelerators such as our AZ1Neural Edge processor, customers can now enjoy new experiences like natural turn-taking. This feature uses on-device algorithms that combine acoustic with visual cues in order to determine whether conversation participants are interacting with Alexa or each other.
Yonck: We’ve identified several requirements for our task and social bots.Future AI PillarsPlease. Could you please share any projected timelines, even broad ones for these?
PrasadUnsolved problems remain in open-domain multi-turn conversations. It is a pleasure to see the Alexa Prize track competitions being used by students from academia in conversational AI. Participants have made significant advances in the field by developing better natural language understanding and dialog policies that lead to deeper conversations. Others have worked to recognize humor, generate humorous replies or select contextually appropriate jokes.
These AI issues are complex and will require time. Although we are still five to ten years away from achieving these goals, I am excited about the Alexa team’s recent best-paper award. They have incorporated commonsense knowledge graphs explicit and implicitly into pre-trained language models in order to increase machine intelligence. This work will help Alexa become more intelligent and intuitive for customers.
YonckYou mentioned that you could combine transformer-based neural reaction generators and knowledge selection in order to get more engaged responses for open domain discussions. How is knowledge selection done?
Prasad: Open domain conversations are our way of pushing boundaries, even as part the Alexa Prize SocialBot Challenge. We continue to invent new things for participating universities teams. One such innovation is a neural-transformer-based language generator (i.e., neural response generator or NRG). To generate better responses, we have enhanced NRG by adding a dialog policy to the generator and combining world knowledge. This policy decides what the best response is — the AI will acknowledge previous turns and ask questions if necessary. We index publicly-available knowledge and retrieve sentences relevant to the context of the conversation in order to integrate knowledge. NRG’s objective is to provide optimal answers that are consistent with the policy decision.
YonckYou want conversations to be natural and have an extensive context. To provide personalized answers, each user must be able to learn and store a lot of information. It is very computational and storage-intensive. What is Amazon’s current hardware relative to what it will require to achieve these goals?
PrasadProcessing on the edge: Certain processing, such as computer vision to determine who is talking about the device in a room — which can be used for customer service — must take place locally. Our teams have been working hard to improve machine learning on the device, both through inference and update. This area is a hot field of invention and research. Particularly, I’m excited by large deep-trained models that are pre-trained and can be quickly distilled for efficient processing at the edge.
YonckQuestion: Which is your greatest obstacle to achieving ambient AI that’s fully developed as described in this article?
PrasadReactive responses are not enough to achieve our vision. We need proactive assistance. Alexa can detect abnormalities and notify you. Although AIs could be programmed to provide proactive assistance in these cases, it will not scale due to the many uses.
We need to shift towards greater general intelligence. This is an AI’s ability to perform multiple tasks and not require significant task-specific intelligence. It can also self-adapt to variations within the set of tasks it has been given, as well as learn new tasks.
It means Alexa to be self-learn without the need for human supervision.
Publited at Sun, 15 August 2021 19:16.56 +0000