Improve the Responses of Virtual Voice Assistants

—- Preference Based Software Interface

To Improve the Responses of Virtual Voice Assistants

Jacob Onbreyt, Angel Sandoval, Joe Thomas Jorge, Haiping Wu

City College of New York

Summary

The presence of artificial intelligence technology in everyday objects has increased drastically over the past decade. With the release of virtual voice assistants on smart devices, smartphones have become hosts to artificial intelligence technology. Although useful in providing hands free phone capabilities, when asked questions, the virtual voice assistants come with limitations. Whatever they cannot answer a research specific question, they simply direct users to a mere google search. By updating the software of current virtual voice assistants to conduct searches based on knowledge of user preference, the efficiency and quality of virtual voice assistant response will increase as results will while still being varied will be more topic specific, and to the point. We will explore the possible benefits of incorporating the Preference Based Software Interface update into every day phone virtual assistants to see its feasibility in answers daily internet search questions and cover the initial costs of our first year of research and development which will range from $320,200-$1,112,200.

Author Note

This paper was prepared for English 21007 taught by Susan Delamare

Table of Contents

Introduction…………………………………………………………………………….3

Objective……………………………………………………………………………….4

Preliminary Literature Review………………………………………………………….5

Technical Description of Innovation……………………………………………………8

Budget………………………………………………………………………………….11

References………………………………………………………………………………12

Appendix A…………………………………………………………………………….14

Appendix B…………………………………………………………………………….15

List of Figures

Fig. 1 Examples of Virtual Voice Assistants…………………………………………..5

Fig. 2 Sample Poor Response………………………………………………………….7

Fig. 3 Diagram of Virtual Voice Assistant Interaction…………………………………10

Fig. 4 Chart about questions voice assistants answered correctly……………………..15

List of Tables

Table 1 Project Budget Details…………………………………………………………11

Table 2 Project Task Schedule………………………………………………………….14

Introduction

People are always looking for different ways to simplify the tasks that life throws at them. Much of that is due to it being human nature that if an individual notices that a task can be completed with minimal effort that they will choose the easier route. Thus people resort to the use of electronic devices. These phones, tablets, smart tv’s, etc, are extremely versatile in what a person can do with them, especially since over the years many of these devices have become equipped with a commercial spoken dialog system known as a virtual voice assistant.

According to Yang and Lee (2018) current commercial virtual personal assistants can understand voice commands and provide information and services accordingly. A dialog function enables users to interact with devices for specific commands such as calling somebody. Through a web search, users can search and identify specific information in the same way as using the traditional internet. (Yang and Lee, 2018). However since its release in 2010, virtual personal assistant diffusion into daily life has been a struggle as within the second week of release only 3% users were engaging with their phone’s voice assistant (Yang and Lee 2019). This means that users are not particularly attracted to current virtual voice assistants. Cross (2020) in his article about how Apple can improve their iPhones states that “Siri needs better voice recognition, faster response times… it needs to give more accurate answers to a much broader set of questions.” From this it is evident that the issue with virtual voice assistants lies in their software.

In this paper, we propose a possible method to improve current virtual personal assistant responses through a software update. Without changing basic overall functionality of the virtual voice assistant software, our update will feature a preference based response system that generates queries and answers based on preset user preferences in regards to how they want their answers given to them. If a user does not like a result set provided, they can tell their virtual voice assistant how they want their answer given and the assistant will find a better answer as well as remember your preferences for that type of question for future use. Such an update to the efficiency of virtual voice assistant responses will benefit consumer application of this technology as its improvement will further its evolution into a hub device that can operate phones, homes, and workplaces. Improved responses will decrease time needed for web searches and in turn the speed at which tasks are done since information is readily provided with a mere voice command.

Objective

Currently, the main problem with using artificially intelligent virtual voice assistants is their response efficiency. Our objective is to improve the content of the responses especially those that require the virtual assistant to refer to a search engine database. We will first assemble to focus on literature review and developing a connection between preset user response presentation preference as well as the question at hand for the virtual assistant. Once that is functional, we can move on to having the virtual assistant record questions, and their response preferences based on communication between the assistant and user upon receiving their response. If proven effective, we can then move on to testing the virtual assistant with research specific questions asked throughout the day by common users. The goal is to enable virtual voice assistants to better extract answers from the Internet and possibly save those responses for future use, resulting in more accurate answer extraction efficiency. To view the full task schedule refer to Appendix A.

Preliminary Literature Review

In the late 2010s, virtual voice assistants (fig. 1) were released to the public as the next commercial artificial intelligence service platform (Yang and Lee, 2018). These forms of artificial intelligence are “… any type of device that is equipped with a software agent that provides professional, technical, or social assistance by automating and simplifying many daily tasks”(Yang and Lee, 2018). In other words, virtual assistants are capable of answering users’ questions, making and picking up phone calls, sending a text message, as well as searching for requested information. Yang and Lee (2018) summarized the major functionalities of virtual voice assistants to be dialog which allows users to interact with their device, web search which allows users to find information online, and chat as in informal conversation with the device.

Fig. 1 Examples of Virtual Voice Assistants. Reprinted from https://medium.com/datadriveninvestor/voice-assistants-raising-expectations-by-empowering-the-customer-7551f724df71

Such functionalities are made possible by the artificial intelligence functions that drive the virtual voice assistant. Each virtual assistant has voice recognition and natural language processing algorithms (Yang and Lee, 2018) that allow it to understand its user’s interaction. The process behind the artificial intelligence function goes as follows:

“First, embedded multiple microphones listen and record users’ voices and send the recoded files to a natural language processing cloud server via the Internet. Then, servers interpret users’ commands and provide the best answers or select the most appropriate services (e.g., confirm an appointment’s time, play music, call somebody). Lastly, VPA devices reply to users based on text-to-speech (TTS) technology”(Yang and Lee, 2018).

This process, although extensive, returns answers to users rather quickly and accurately (Appendix B). As a result of this virtual personal assistants have a potential to be used in various ways which includes control of home appliances, learning users’ life patterns through remembering interactions, and integration into the workplace. Blanco (2020) shows virtual voice assistant integration into the workplace with his article on ‘EMMA’ chatbots that have been integrated into dealerships to interact with customers. Typically, salespeople receive numerous emails from their clients and do not have time to answer all of them. Thus, that’s where EMMA comes in; she answers all the salesperson’s emails and work-related text messages for them so that they can stay focused on their jobs in the field.

While this information does display the prospects and positive usability of virtual voice assistants, this form of artificial intelligence struggles to retain users. According to Yang and Lee (2018), “…only 3% of users are active in the second week.” This is likely due to the incomplete responses provided by the virtual assistants due to it only having a developing software. Nishimura, Yamamoto, Uchiya, and Takumi (2018) state that

“Having engineers or researchers develop complex spoken dialog scenarios has the advantage of allowing people with professional knowledge and expertise to create them. On the other hand, this approach has the following drawbacks: 1 It may not always be possible for experts to create spoken dialog content from the viewpoint of the users. Engineers and researchers may not be able to successfully anticipate and satisfy users’ requests.”

This inability to fully follow through with some of users requests (fig 2) was addressed in Cross (2020) where he mentioned that Apple’s Siri at times answers a question that requires specifics with a mere web search.

Fig. 2 Sample Poor Response. Reprinted from https://web-a-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=0&sid=a691a43d-564b-466d-be1b-24757d292116%40sdc-v-sessmgr02

Such a response does not adequately answer the user’s request thereby deterring them from asking their assistant specific questions in the future.

In order to increase user satisfaction with virtual voice assistants and truly see their integration into everyday life as a helping hand, their software especially in phones needs to be updated. Despite high hopes from prospective entrepreneurs with them claiming that increased usage of current devices will improve daily life if used in places such as the work place by dialing conference calls for us, sending notifications to other, and ordering supplies (Lang and Benessere, 2018), virtual assistants have yet to become full on artificial intelligence assistants if they cannot answer specific web search questions. Although Apple’s Siri is lagging behind the developments made by companies such as Google (Cross 2020) each of those systems still simply provides users with a google search of their question without giving them a specific answer to their request. Thus we suggest a software update where the virtual assistant is able to learn how to provide answers based on user preference. That way users will be able to get the information that they need in a manner that doesn’t require extensive research and in a manner that directly answers their question the way they assumed the system would: to the point.

Technical Description of Innovation

The proposed software for the update to a virtual assistant is largely based around an existing virtual assistant software development. In that development, the virtual assistant cross references saved information in regards to user preference with the question at hand and based on a preference inference, searches its database for a possible answer.

We propose to use a similar software so that the phone user can get their response in a manner that suits them best. After the user activates the virtual voice assistant on their mobile device, they interact with it by asking a question. The phone then uses its sensor data to send the interaction to an inference engine (US20130204813A1.pdf, n.d.). The inference engine then takes the interaction and compares it to previously recorded knowledge preferences to formulate a query and direct it at a specific search engine, or formulate a query that answers a possible follow up question (US20130204813A1.pdf, n.d.). After the result set is generated, a reply is sent back to the phone with a question regarding whether the interaction was addressed properly to further narrow down a better result or address follow up questions.

The virtual voice assistant will converse with the phone user to answer their question through a link between the phone, a network, a query engine, a knowledge database, and a search engine (research database) (fig 3). This process occurs as follows:

Fig. 3A– User interacts with a virtual assistant by asking a question.

Fig. 3B– The question then gets analyzed by sensor data, and over a network connection is transmitted to a query engine.

Fig. 3C– The query engine compares the question to the user’s preset preferences and permitted history storage to see if the interaction type corresponds to any particular result generation or method of response presentation and formulates the query for search.

Fig. 3D– The query then gets put into a search engine that runs it through various internet databases and returns a result set based on preferences that formulated the query.

Fig. 3E– The narrow result set is sent back to the virtual voice assistant for user interaction followed by a follow up question that asks if the results addressed the question.

Fig. 3F&G– If the user responds that their question was not addressed, the virtual assistant transmits another query over the network connection to the user asking them to specify the type of answer they want (simpler, more in depth, from a specific source, etc.). The virtual assistant then takes that interaction and sends it back to the query engine and goes through C-F again.

Figure 3 – Diagram of Virtual Assistant Interaction

Collage created with images sourced from:

https://www.clipart.email/clipart/clipart-cloud-internet-338241.html

https://www.iconfinder.com/icons/4937165/listen_sound_speak_talk_talking_voice_waves_icon

https://www.stickpng.com/img/electronics/iphones/iphone-7-template

If another search is prompted to the virtual voice assistant, when asking for the user to specify how they would like their answer presented, the user will be able to save and add that preference to the knowledge database for future interactions.

Budget

Our budget covers the costs associated with developing an interface capable of reaching our objectives, that is, a system with improved responsiveness and versatility. Since new technologies are being introduced all the time, a re-evaluation of our projected costs will be required after the initial year. Budget estimates are based on previous smart interface development groups.

We plan to have a full-time development team composed of a data scientist, software and machine learning researchers and engineers. This section of the team will develop the technical parts of the system. A conversational UX designer will, also, be present. This will be the person in charge of helping develop the personalized and “user friendly” environment for the customers. A product manager and a growth hacker would also be present to focus on the business aspect of developing the smart interface. Other expenses like equipment and a workspace have been included.

Table 1 Project Budget Details, created by Angel Sandoval, 5/13/2020.

References

Cross, J. (2020). iOS 14 WISH LIST: 10 WAYS APPLE CAN TAKE THE iPHONE TO THE NEXT LEVEL. Macworld – Digital Edition, 37(3), 88–97. Retrieved from https://web-a-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=5&sid=7919e437-c5e9-419f-8e3d-948990057c3a%40sdc-v-sessmgr03

Eadicicco, L. (2017). Google Searches for Its Voice. TIME Magazine, 190(16/17), 68–73. Retrieved from https://web-b-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=8&sid=88bdbf6c-759c-40af-90bf-08b5123de80c%40sessionmgr101

Lang, R. D., & Benessere, L. E. (2018). Virtual Assistants in the Workplace: Real, Not Virtual Pitfalls and Privacy Concerns. Journal of Internet Law, 21(12), 1–21. Retrieved from https://web-b-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=7&sid=9e7a8f50-659a-4be4-838c-82b329a7d341%40pdc-v-sessmgr06

Nishimura, R., Yamamoto, D., Uchiya, T., & Takumi, I. (2018). Web-based environment for user generation of spoken dialog for virtual assistants. EURASIP Journal on Audio Speech & Music Processing, 2018(1), 1–1. Retrieved from https://web-a-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=0&sid=a691a43d-564b-466d-be1b-24757d292116%40sdc-v-sessmgr02

Simon, M. (2019). iOS chief Craig Federighi hints that someday Siri won’t take over your whole screen—But not for a while. Macworld – Digital Edition, 36(8), 47–48. Retrieved from https://web-b-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=0&sid=afa72116-ca16-4a01-aac4-a196c5e8dd09%40pdc-v-sessmgr02

Blanco, S. (2020). The PERSISTENCE of “EMMA”; Chatbots give dealerships a way to stay connected with customers. (2020, February 24). Automotive News, 94(6922), S019. Gale Academic OneFile. Retrieved from https://go-gale-com.ccny-proxy1.libr.ccny.cuny.edu/ps/retrieve.do?tabID=T003&resultListType=RESULT_LIST&searchResultsType=SingleTab&searchType=BasicSearchForm&currentPosition=1&docId=GALE%7CA615418247&docType=Article&sort=Relevance&contentSegment=ZONE-MOD1&prodId=AONE&contentSet=GALE%7CA615418247&searchId=R1&userGroupName=cuny_ccny&inPS=true&ps=1&cp=1

US20130204813A1.pdf. (n.d.). Retrieved May 17, 2020, from https://patentimages.storage.googleapis.com/3e/c9/c6/e417d85eff5a1b/US20130204813A1.pdf

Yang, H., & Lee, H. (2019). Understanding user behavior of virtual personal assistant devices. Information Systems & E-Business Management, 17(1), 65–87. Retrieved from https://web-a-ebscohost-com.ccny-proxy1.libr.ccny.cuny.edu/ehost/pdfviewer/pdfviewer?vid=11&sid=6e4c7d07-72bc-476b-90ae-9223dbfeff62%40sdc-v-sessmgr01

Appendix A – Task Schedule

Table 2 Project Task Schedule, created by Jacob Onbreyt, 5/13/2020

Appendix B: Correctness of Virtual Assistants

Figure 4 Chart about questions voice assistants answered correctly. Reprint from

https://www.zdnet.com/article/apple-siri-vs-amazon-alexa-vs-google-assistant-tests-reveal-which-is-smartest/

Voice assistants do a great job of answering questions their users may have no matter how complex. They are not however, One hundred percent accurate. The artificial voice assistants, as helpful as they can be, have room for improvement in their responsiveness and accuracy. Many factors may contribute to the assistants being inaccurate, and these can be things like not being able to distinguish a specific voice from people speaking at the same time, or simply just receiving a question that is harder to answer. These problems if remedied can significantly improve the effectiveness and efficiency of the assistants’ responsiveness.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

Improve the Responses of Virtual Voice Assistants

Leave a Reply Cancel reply

Need help with the Commons?