[go: up one dir, main page]

WO2020185880A1 - Conversational artificial intelligence for automated self-service account management - Google Patents

Conversational artificial intelligence for automated self-service account management Download PDF

Info

Publication number
WO2020185880A1
WO2020185880A1 PCT/US2020/022074 US2020022074W WO2020185880A1 WO 2020185880 A1 WO2020185880 A1 WO 2020185880A1 US 2020022074 W US2020022074 W US 2020022074W WO 2020185880 A1 WO2020185880 A1 WO 2020185880A1
Authority
WO
WIPO (PCT)
Prior art keywords
caller
account
data
speech
conversational
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2020/022074
Other languages
French (fr)
Inventor
Kevin Michael GILLESPIE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beguided Inc
Original Assignee
Beguided Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beguided Inc filed Critical Beguided Inc
Publication of WO2020185880A1 publication Critical patent/WO2020185880A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Definitions

  • This process is typically described as a“decision tree” wherein a defined number of possible caller/customer experiences is fully defined, and the entry points required to achieve the desired outcomes are limited to a specific set of inputs that the caller must use to trigger the intended treatment of the decision tree to route them to the desired outcome.
  • Existing technologies to have a conversational ability with the caller or use previous experience with other callers to augment their responses by correlating unclear answers with their corresponding associated intended meaning. Addtionally, the existing technologies cannot take free-form speech and analyze the intent of the spoken speech to establish meaning as it relates to the various tasks the device is designed to perform.
  • Existing applications also cannot perform multiple tasks in parallel; the caller cannot engage the application in conversation while it is performing the previous tasks.
  • the technologies described herein can take free-form human speech and transcribe it into text, analyze the intent of that speech in relation to the defined tasks for which the device is purposed, establish a corollary to communicate back to the caller for affirmation or next steps, analyze host data relating to the caller’s intent, execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller, competently respond to the caller in a truly conversational format with understanding of the callers intended goal with needed attributes that allow the caller to complete the caller’s intended objective in a self-directed fashion, and respond comprehensively to answer specific caller inquiries throughout the call experience.
  • This new technology enables intent-driven communication automation tools on the phone between automated technologies (e.g., hots, etc.) and human beings.
  • a service provider, vendor or other creditor may have several options for contacting a consumer or account-holder about their account.
  • a service provider, vendor or other creditor will seek to contact the borrower, consumer or account-holder, or their co-signor, as applicable, via telephone, physical mail, SMS text, and email in order to communicate with the borrower, consumer or account-holder, or their co-signor, as applicable, about, for example, past due amounts payable, pending or anticipated changes of to the party’s account details or services, confirmation of changes to services or lending terms, credit authorization of new services or to otherwise renegotiate the payment terms agreement or take legal action to enforce the agreement, among other purposes. All of these tasks involve additional work and expense on the part of the vendor, service-provider or creditor. Contacting a borrower may require investment of additional time to locate the borrower’s contact information, if such information is not readily available.
  • Telecommunication Consumer Protection Act Telecommunication Consumer Protection Act
  • restrictions on outbound contact methods from service providers requires an emphasis on direct inbound communication strategies (i.e., creditors are restricted on initiating communications with consumers and often must wait for a consumer to contact them before they can take a desired corrective action related to an account).
  • the means through which vendors, service-providers, and creditors utilize to engage with consumers who are trying to engage them include: call centers operations, physical mail, and digital platforms such as social media.
  • the borrower may be uncooperative.
  • Embodiments of the present subject matter comprise systems and methods for the engineering, development, management, implementation and use of telephony-based applications for automated self-service account management.
  • Some embodiments of the subject matter disclosed herein comprise methods and systems for design and use of web-based applications for call flow management and routing. Some embodiments comprise an intelligent voice response (IVR) as part of a call flow management device to save time within the caller’s call routing experience.
  • IVR intelligent voice response
  • Some embodiments may comprise a static decisioning intelligence that can ascertain the purpose of the call directly by the input of alpha-numeric entries from the caller through the telephony experience, or dual tone multi- frequency (DTMF) signaling in order to receive direct input from the caller in response to automated questions and apply understanding to those direct entries and establishing the intent of a caller; creating a simplified self-service device that lets callers resolve their issue before reaching a live call center agent or to hasten their hold experience before they reach a live call center agent.
  • Some embodiments of the subject matter disclosed herein comprise methods for speech- to-text (STT) conversion for the real-time processing of recorded speech through textual analysis tools that create accurate text conversion as an output.
  • STT speech- to-text
  • Some embodiments of the subject matter disclosed herein comprise a Conversational Artificial Intelligence application layer, the method comprising: data parsing which allows for the tokenization of relevant textual data to prepare it for machine learning models and auto-detection of cluster typos or incorrect speech-to-text results; and machine learning model techniques such as a Markov Decision Process (MDP).
  • MDP forms the basis for many reinforcement learning problem techniques as it provides the flexibility to implement a wide range of machine learning algorithms including deep learning (neural nets) and classification and an interpretable and auditable environment which provides for continuous human assisted improvement.
  • the conversational artificial intelligence system the host data store comprising historic call data and account data; the telephony system configured to: receive free-form speech from a caller;
  • the conversational artificial intelligence system configured to: parse and tokenize the speech utterance; query the host data store to retrieve historic call data and account data for the caller; and apply a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to:
  • the machine learning model comprises a Markov Decision Process (MDP).
  • MDP Markov Decision Process
  • the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously.
  • the machine learning model comprises an artificial neural network (ANN).
  • the account is a credit account.
  • the one or more pre-defmed tasks comprises: get information about an account or make a payment on an account.
  • the conversational artificial intelligence system is further configured to execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller.
  • the platform is configured to comprehensively answer specific caller inquiries throughout the call experience.
  • the conversational artificial intelligence system is further configured to provide a log tracking process for a person-in-the-loop procedure in order to create supervised transition probabilities.
  • the telephony system is further configured to provide an interactive voice response (IVR) system.
  • the telephony system is further configured to receive dual-tone multi -frequency (DTMF) signaling.
  • a host data store comprising historic call data and account data
  • receiving, via a telephony system, free form speech from a caller transcribing the free-form speech to generate a speech utterance; parsing and tokenizing the speech utterance; querying the host data store to retrieve historic call data and account data for the caller; and applying a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to: identify an intended objective of the caller in relation to one or more pre-defmed account management tasks; and execute the intended objective; or establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective.
  • the machine learning model comprises a Markov Decision Process (MDP).
  • MDP Markov Decision Process
  • the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously.
  • the machine learning model comprises an artificial neural network (ANN).
  • ANN artificial neural network
  • the account is a credit account.
  • the one or more pre- defmed tasks comprises: get information about an account or make a payment on an account.
  • applying the machine learning model comprises executing any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller.
  • the method further comprises comprehensively answering specific caller inquiries throughout the call experience.
  • the method further comprises providing a log tracking process for a person-in-the-loop procedure in order to create supervised transition probabilities.
  • the method further comprises receiving, via the telephony system, interactive voice responses (IVR).
  • IVR interactive voice responses
  • the method further comprises receiving, via the telephony system, dual-tone multi -frequency
  • Fig. 1 shows a non-limiting example of a high-level schematic diagram of the
  • FIG. 2 shows a non-limiting example of a topological diagram of the flow of information through the various systems and stages described herein, which derive successful output responses delivered as automated communications;
  • FIG. 3 shows a non-limiting example of a digital processing device; in this case, a device with one or more CPUs, a memory, a communication interface, and a display;
  • FIG. 4 shows a non-limiting example of a web/mobile application provision system; in this case, a system providing browser-based and/or native mobile user interfaces; and
  • Fig. 5 shows a non-limiting example of a cloud-based web/mobile application provision system; in this case, a system comprising an elastically load balanced, auto-scaling web server and application server resources as well synchronously replicated databases.
  • Described herein, in certain embodiments, are computer-implemented platform for automated self-service account management comprising a host data store, a telephony system, and conversational artificial intelligence system: the host data store comprising historic call data and account data; the telephony system configured to: receive free-form speech from a caller; transcribe the free-form speech to generate a speech utterance; and provide the speech utterance to the conversational artificial intelligence system; the conversational artificial intelligence system configured to: parse and tokenize the speech utterance; query the host data store to retrieve historic call data and account data for the caller; and apply a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to:
  • the machine learning model comprises a Markov Decision Process (MDP).
  • MDP Markov Decision Process
  • the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously.
  • the account is a credit account.
  • the one or more pre-defmed tasks comprises: get information about an account or make a payment on an account.
  • the conversational artificial intelligence system is further configured to execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller.
  • the platform is configured to comprehensively answer specific caller inquiries throughout the call experience.
  • the conversational artificial intelligence system is further configured to provide a log tracking process for a person-in-the-loop procedure in order to create supervised transition probabilities.
  • a host data store comprising historic call data and account data
  • receiving, via a telephony system, free-form speech from a caller transcribing the free-form speech to generate a speech utterance; parsing and tokenizing the speech utterance; querying the host data store to retrieve historic call data and account data for the caller; and applying a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to: identify an intended objective of the caller in relation to one or more pre-defmed account management tasks; and execute the intended objective; or establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective.
  • the machine learning model comprises a Markov Decision Process (MDP).
  • MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously.
  • the machine learning model comprises an artificial neural network (ANN).
  • ANN artificial neural network
  • the account is a credit account.
  • the one or more pre-defmed tasks comprises: get information about an account or make a payment on an account.
  • applying the machine learning model comprises executing any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller.
  • the method further comprises comprehensively answering specific caller inquiries throughout the call experience.
  • the method further comprises providing a log tracking process for a person-in-the- loop procedure in order to create supervised transition probabilities.
  • the method further comprises receiving interactive voice responses (IVR).
  • the method further comprises receiving dual-tone multi -frequency (DTMF) signaling.
  • IVR interactive voice responses
  • DTMF dual-tone multi -frequency
  • a caller is connected via telephony (e.g., a provided phone number to reach the telephony application platform; or the telephony environment).
  • the caller speaks into their telephone and the telephony application platform receives a“speech utterance” in a speech gather process. That speech utterance is sent via API in an MP3 format from the telephony platform to existing technologies for speech-to-text transcription. The transcription is then relayed as input to AVA in a textual format 110.
  • the speech-to-text input for AVA is created via existing technologies for speech-to-text transcription and delivered to the AVA machine learning models 120.
  • 120 AVA using data parsing to parse and tokenize the relevant textual data of the message in order to ready it for machine learning models (e.g., the Markov Decision Process).
  • Data parsing includes a combination of open source tools such as the Natural Language Toolkit (NLTK 3.4) and a machine learning-based tokenizer which can auto-detect and cluster typos together to reduce the training vocabulary size.
  • NLTK 3.4 Natural Language Toolkit
  • machine learning-based tokenizer which can auto-detect and cluster typos together to reduce the training vocabulary size.
  • the Markov Decision Process and other Natural Language Processing (NLP) and/or Natural Language Understanding (NLU) techniques seek to understand intent of the speech utterance and to follow the Markov Decision Process state/actions as well as execute probability function approximations resulting in: responses to the caller with information 160 and making requests via API 130 of the source data 140 required for specific tasks related to the specific state-actions linked to the initial speech (e.g., authentication of a caller, source data relating to the caller, etc.).
  • NLP Natural Language Processing
  • NLU Natural Language Understanding
  • 120 the probability distributions in AVA’s Markov Decision Process 120 to move freely across the various states and perform many tasks simultaneously, such as speaking to the caller 160 while performing a look up of information 130, 140, 150, the interpretation of that data, and the resultant state action to be executed.
  • the systems and methods described herein include a Markov Decision Process.
  • a Markov Decision Process expressed as mathematical formulae.
  • An MDP is a tuple (S, A, P, R, g), where S is a state space, A is a finite set of actions, P is the state transition probability function, R is the reward function, and g is a discount factor (g e [0, 1]).
  • S - The State Space is an exhaustive set of states that the model understands and links to “state-actions,” which include responding to the customer, sending a task out to a client system, (making a note, retrieving billing information), ending the call, transferring the call, and more.
  • a - The set of actions (different from state-actions) define the movements possible throughout the network. These actions (as well as the states) were built based on the
  • P - The state transition probability function is approximated by machine learning models; a semi-supervised pipeline consisting of Doc2Vec and a classification model trained on the movements between States learned from our analysis of the voice samples.
  • the inputs to the models consist of both caller inputs as well as the sourced client data required for decisioning.
  • Doc2Vec models for understanding text in a numerical format use an additional vector (e.g.,“document ID”) that will expand upon the broader ML concept of feature vectors as with Word2Vec.
  • the additional vector trained in addition to the word vectors establishes a concept of a“document” (e.g., a complete transcription of a speech utterance from a caller) through creating a numerical representation of the document (or label), in lieu of the Word2Vec establishment of the concept of a word with a numerical representation of each word.
  • Word2Vec is a two-layer neural net that process text; using Continuous Bag of Words (CBOW) and Skip Gram (skip- gram). Continuous Bag of Words will concatenate the following word after a series of words, and skip-gram will predict all surrounding words (or context) by using just one word.
  • Word2Vec would train word vectors to predict the next word by giving a numerical representation to each word through its use of CBOW and skip-gram, Doc2Vec trains word vectors and additionally trains a document vector to create a numerical representation of the document, regardless of its length.
  • Doc2Vec is an extension of Word2Vec with unsupervised learning of continuous representations of larger blocks of text (e.g., sentences, labels, documents, etc.).
  • R - In the first version of AVA, the reward function at this state is fixed upon whether or not the customer was able to make a payment, payment arrangement or restore their services. In future versions of AVA, customer interactions will provide a better reward estimation. The reward function helps to update the probability function defined in the previous bullet point.
  • the AVA environment encapsulates the telephony application platform for the auditory caller experience where speech-to-text conversion (STT) and text-to- speech (TTS) conversion occur via existing technologies reached via API 205, the AI environment 210 where data parsing and the Markov Decision process functions are executed 220, 225, and the retrieval aspects of retrieval of source data communicate via API to source data systems necessitated by specific state/actions, (e.g., authenticating a caller, retrieval of account management information, etc.).
  • STT speech-to-text conversion
  • TTS text-to- speech
  • the telephony environment contains existing technologies for speech-to-text conversions via existing resting APIs for STT
  • the source data is reached via API for the AI environment to access and interpret 220, and sometimes relay to the customer 225 based on the approximated state/action that apply.
  • 225, 220, 215 The probability distributions in AVA’s Markov Decision Process 225 move freely across the various states and perform many tasks simultaneously, such as speaking to the caller 225, 205 while performing a look up of information 220, 215, 225, the interpretation of that data and the resultant state action to be executed.
  • the platforms, systems, media, and methods described herein include a digital processing device, or use of the same.
  • the digital processing device includes one or more hardware central processing units (CPUs) or general purpose graphics processing units (GPGPUs) that carry out the device’s functions.
  • the digital processing device further comprises an operating system configured to perform executable instructions.
  • the digital processing device is optionally connected a computer network.
  • the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web.
  • the digital processing device is optionally connected to a cloud computing infrastructure.
  • the digital processing device is optionally connected to an intranet.
  • the digital processing device is optionally connected to a data storage device.
  • suitable digital processing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, video game consoles, and vehicles.
  • server computers desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, video game consoles, and vehicles.
  • smartphones are suitable for use in the system described herein.
  • Suitable tablet computers include those with booklet, slate, and convertible
  • the digital processing device includes an operating system configured to perform executable instructions.
  • the operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications.
  • suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD ® , Linux, Apple ® Mac OS X Server ® , Oracle ® Solaris ® , Windows Server ® , and Novell ® NetWare ® .
  • suitable personal computer operating systems include, by way of non-limiting examples, Microsoft ® Windows ® , Apple ® Mac OS X ® , UNIX ® , and UNIX- like operating systems such as GNU/Linux ® .
  • the operating system is provided by cloud computing.
  • suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia ® Symbian ® OS, Apple ® iOS ® , Research In Motion ® BlackBerry OS ® , Google ® Android ® , Microsoft ® Windows Phone ® OS, Microsoft ® Windows Mobile ® OS, Linux ® , and Palm ® WebOS ® .
  • suitable media streaming device operating systems include, by way of non-limiting examples, Apple TV ® , Roku ® , Boxee ® , Google TV ® , Google Chromecast ® , Amazon Fire ® , and Samsung ® HomeSync ® .
  • suitable video game console operating systems include, by way of non-limiting examples, Sony ® PS3 ® , Sony ® PS4 ® , Microsoft ® Xbox 360 ® , Microsoft Xbox One, Nintendo ® Wii ® , Nintendo ® Wii U ® , and Ouya ® .
  • the device includes a storage and/or memory device.
  • the storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis.
  • the device is volatile memory and requires power to maintain stored information.
  • the device is non-volatile memory and retains stored information when the digital processing device is not powered.
  • the non-volatile memory comprises flash memory.
  • the non volatile memory comprises dynamic random-access memory (DRAM).
  • the non-volatile memory comprises ferroelectric random access memory (FRAM).
  • the non-volatile memory comprises phase-change random access memory (PRAM).
  • the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing based storage.
  • the storage and/or memory device is a combination of devices such as those disclosed herein.
  • the digital processing device includes a display to send visual information to a user.
  • the display is a cathode ray tube (CRT).
  • the display is a liquid crystal display (LCD).
  • the display is a thin film transistor liquid crystal display (TFT-LCD).
  • the display is an organic light emitting diode (OLED) display.
  • OLED organic light emitting diode
  • on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display.
  • the display is a plasma display. In other embodiments, the display is a video projector.
  • the digital processing device includes an input device to receive information from a user.
  • the input device is a keyboard.
  • the input device is a pointing device including, by way of non-limiting examples, a mouse, trackball, track pad, joystick, game controller, or stylus.
  • the input device is a touch screen or a multi-touch screen.
  • the input device is a microphone to capture voice or other sound input.
  • the input device is a video camera or other sensor to capture motion or visual input.
  • the input device is a Kinect, Leap Motion, or the like.
  • the input device is a combination of devices such as those disclosed herein.
  • an exemplary digital processing device 301 is programmed or otherwise configured to conduct telephony, store and retrieve caller data, and apply machine learning algorithms.
  • the digital processing device 301 includes a central processing unit (CPU, also“processor” and“computer processor” herein) 305, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
  • the digital processing device 301 also includes memory or memory location 310 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 315 (e.g., hard disk), communication interface 320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 325, such as cache, other memory, data storage and/or electronic display adapters.
  • the memory 310, storage unit 315, interface 320 and peripheral devices 325 are in communication with the CPU 305 through a communication bus (solid lines), such as a motherboard.
  • the storage unit 315 can be a data storage unit (or data repository) for storing data.
  • the digital processing device 301 can be operatively coupled to a computer network (“network”) 330 with the aid of the communication interface 320.
  • the network 330 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
  • the network 330 in some cases is a telecommunication and/or data network.
  • the network 330 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
  • the network 330 in some cases with the aid of the device 301, can implement a peer-to-peer network, which may enable devices coupled to the device 301 to behave as a client or a server.
  • the CPU 305 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
  • the instructions may be stored in a memory location, such as the memory 310.
  • the instructions can be directed to the CPU 305, which can subsequently program or otherwise configure the CPU 305 to implement methods of the present disclosure. Examples of operations performed by the CPU 305 can include fetch, decode, execute, and write back.
  • the CPU 305 can be part of a circuit, such as an integrated circuit. One or more other components of the device 301 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the storage unit 315 can store files, such as drivers, libraries and saved programs.
  • the storage unit 315 can store user data, e.g., user preferences and user programs.
  • the digital processing device 301 in some cases can include one or more additional data storage units that are external, such as located on a remote server that is in communication through an intranet or the Internet.
  • the digital processing device 301 can communicate with one or more remote computer systems through the network 330.
  • the device 301 can communicate with a remote computer system of a user.
  • remote computer systems include servers, personal computers (e.g., portable PC), slate or tablet computers (e.g., Apple ® iPad, Samsung ® Galaxy Tab), telephones, smartphones (e.g., Apple ® iPhone, Android-enabled device, Blackberry ® ), or personal digital assistants.
  • Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the digital processing device 101, such as, for example, on the memory 310 or electronic storage unit 315.
  • the machine executable or machine readable code can be provided in the form of software.
  • the code can be executed by the processor 305.
  • the code can be retrieved from the storage unit 315 and stored on the memory 310 for ready access by the processor 305.
  • the electronic storage unit 315 can be precluded, and machine-executable instructions are stored on memory 310.
  • Non-transitory computer readable storage medium
  • the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device.
  • a computer readable storage medium is a tangible component of a digital processing device.
  • a computer readable storage medium is optionally removable from a digital processing device.
  • a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like.
  • the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
  • the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same.
  • a computer program includes a sequence of instructions, executable in the digital processing device’s CPU, written to perform a specified task.
  • Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types.
  • APIs Application Programming Interfaces
  • a computer program may be written in various versions of various languages.
  • a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
  • a computer program includes a web application.
  • a web application in various embodiments, utilizes one or more software frameworks and one or more database systems.
  • a web application is created upon a software framework such as Microsoft ® .NET or Ruby on Rails (RoR).
  • a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, and XML database systems.
  • suitable relational database systems include, by way of non-limiting examples, Microsoft ® SQL Server, mySQLTM, and Oracle ® .
  • a web application in various embodiments, is written in one or more versions of one or more languages.
  • a web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof.
  • a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML).
  • a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS).
  • CSS Cascading Style Sheets
  • a web application is written to some extent in a client- side scripting language such as Asynchronous Javascript and XML (AJAX), Flash ® Actionscript, Javascript, or Silverlight ® .
  • AJAX Asynchronous Javascript and XML
  • Flash ® Actionscript Javascript
  • Javascript or Silverlight ®
  • a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion ® , Perl, JavaTM, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), PythonTM, Ruby, Tel, Smalltalk, WebDNA ® , or Groovy.
  • a web application is written to some extent in a database query language such as Structured Query Language (SQL).
  • SQL Structured Query Language
  • a web application integrates enterprise server products such as IBM ® Lotus Domino ® .
  • a web application includes a media player element.
  • a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe ® Flash ® , HTML 5, Apple ® QuickTime ® , Microsoft ® Silverlight ® , JavaTM, and Unity ® .
  • an application provision system comprises one or more databases 400 accessed by a relational database management system (RDBMS) 410.
  • RDBMSs include Firebird, MySQL, PostgreSQL, SQLite, Oracle Database, Microsoft SQL Server, IBM DB2, IBM Informix, SAP Sybase, SAP Sybase, Teradata, and the like.
  • the application provision system further comprises one or more application severs 420 (such as Java servers, .NET servers, PHP servers, and the like) and one or more web servers 430 (such as Apache, IIS, GWS and the like).
  • the web server(s) optionally expose one or more web services via app application programming interfaces (APIs) 440.
  • APIs app application programming interfaces
  • an application provision system alternatively has a distributed, cloud-based architecture 500 and comprises elastically load balanced, auto-scaling web server resources 510 and application server resources 520 as well synchronously replicated databases 530.
  • a computer program includes a mobile application provided to a mobile digital processing device.
  • the mobile application is provided to a mobile digital processing device at the time it is manufactured.
  • the mobile application is provided to a mobile digital processing device via the computer network described herein.
  • a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non-limiting examples,
  • Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator ® , Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and Phonegap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, AndroidTM SDK, BlackBerry ® SDK, BREW SDK, Palm ® OS SDK, Symbian SDK, webOS SDK, and Windows ® Mobile SDK.
  • iOS iPhone and iPad
  • a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in.
  • standalone applications are often compiled.
  • a compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, JavaTM, Lisp, PythonTM, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program.
  • a computer program includes one or more executable complied applications.
  • the computer program includes a web browser plug-in (e.g., extension, etc.).
  • a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types.
  • the toolbar comprises one or more web browser extensions, add-ins, or add-ons.
  • the toolbar comprises one or more explorer bars, tool bands, or desk bands.
  • plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, JavaTM, PHP, PythonTM, and VB .NET, or combinations thereof.
  • Web browsers are software applications, designed for use with network-connected digital processing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of non limiting examples, Microsoft ® Internet Explorer ® , Mozilla ® Firefox ® , Google ® Chrome, Apple ® Safari ® , Opera Software ® Opera ® , and KDE Konqueror. In some embodiments, the web browser is a mobile web browser.
  • Mobile web browsers are designed for use on mobile digital processing devices including, by way of non-limiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, music players, personal digital assistants (PDAs), and handheld video game systems.
  • Suitable mobile web browsers include, by way of non-limiting examples, Google ® Android ® browser, RIM BlackBerry ® Browser, Apple ® Safari ® , Palm ® Blazer, Palm ® WebOS ® Browser, Mozilla ® Firefox ® for mobile, Microsoft ® Internet Explorer ® Mobile, Amazon ® Kindle ® Basic Web, Nokia ® Browser, Opera Software ® Opera ® Mobile, and Sony ® PSPTM browser.
  • the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same.
  • software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art.
  • the software modules disclosed herein are implemented in a multitude of ways.
  • a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof.
  • a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof.
  • the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, and a standalone application.
  • software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on cloud computing platforms. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
  • the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same.
  • suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object-oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase.
  • a database is internet-based.
  • a database is web-based.
  • a database is cloud computing-based.
  • a database is based on one or more local computer storage devices.
  • a telephony environment which can be described as a software application platform for further development is first established.
  • This environment has a receiving phone number (or phone numbers) by which a caller or transfer application can reach the telephony environment.
  • the software application environment is where all the transcription, voice and necessary experiential data points will flow through.
  • Data flows to the AI model which is in the cloud, flows to host systems that hold account management data, and flows back through the telephony with voice applications providing an auditory experience to the caller.
  • the organization of this flow is essential and key to the specific design of the conversational AI.
  • Source caller data from the client containing existing historical caller experiences in their entirety is required. This data must be at a specific level of scale in order to successfully inform the AI for its initial approximations, establishment of states/actions and the development of Representative Turn Groups which impact how the set of actions can define the movements across the network.
  • the MDP there are provided potential values for each component of the tuple.
  • the state space values can be created, at first, through a manual process. Specifically, by establishing the total potential variety of entries that all equate to the same state.
  • a definitive list for affirmation or “yes” states might include:“Yup,”“ok,”“Okie,”“alrighty,”“absolutely,”“of course,”“no problem,”“sure,”“alright,” etc.
  • These and all utterances that might be colloquial or otherwise regional, or that might be a specific parlance relating to anachronistic or business-related (task- related) language (such as 40 IK management, for example), or acronyms that relate to business language (such as“HIS,” standing for“High Speed Internet,” for example) is logged and placed as specific state-space values.
  • the set of actions (different from state-actions) define the movements possible throughout the network.
  • the state transition probability function is approximated by machine learning models; a semi-supervised pipeline consisting of Doc2Vec and a classification model trained on the movements between States learned from the analysis of the voice samples.
  • the inputs to the models consist of both caller inputs as well as the retrieved source data through state actions for decisioning that relate to account management.
  • the probabilities are learned at the same time as the states/actions from the voice samples.
  • a person in the loop via a log tracking process in order to create supervised transition probabilities.
  • a caller In execution of these functions a caller reaches the specific telephonic number of the telephony environment hosting the application.
  • the telephony application accepts the call and executes a state action related to“basic greeting without prior knowledge of caller” defined by client policy and historical call analysis.
  • This state action sends a textual output to the telephony environment for text-to- speech conversion, such as an MP3 signal broadcast on the live phone call,“Thank you for calling us. My name is AVA can I have your name please.”
  • the caller’s initial speech reaction to this greeting then becomes the first variable by which the remainder of MDP results can occur.
  • the caller then states,“My name is Tim Cowherd and I want to understand my bill, why did my balance go up?”
  • the telephony environment will deliver the MP3 of this speech utterance for speech-to-text conversion, which transcribes the exacting result and then delivers the textual input to the AI environment.
  • the AI environment utilizes data parsing to prioritize and categorize specific words or documents/sentences, groups of words such as“bill,”“understand,”“Name,”“Tim Cowherd,”“bill go up” to provide the ML models with numerical representations of the words, documents, sentences and groups of words.
  • the numerical representations of the text thus submitted to the machine learning models will leverage probability distributions in AVA’s Markov Decision Process to move freely across the various states and perform many tasks simultaneously, such as a simultaneous execution of the state action to deliver text back to the telephony environment via text or speech conversion to MP3,“I understand you would like to know more about your account, could you please give me the account holder name and account number to start?”, the state action to understand the name“Tim Cowherd” and associate it with the caller, and the state action to authenticate the caller with the proceeding information from the caller in the next speech utterance containing the account holder name and account number.
  • AVA Markov Decision Process

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Human Computer Interaction (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Described are computer-implemented platform, systems, and methods for automated self-service account management utilizing a telephony system and a conversational artificial intelligence system to identify and execute intended account management tasks.

Description

CONVERSATIONAL ARTIFICIAL INTELLIGENCE FOR AUTOMATED SELF-
SERVICE ACCOUNT MANAGEMENT
CROSS-REFERENCE TO RELATED APPLICATIONS
[001] This application claims the benefit of U.S. Provisional Application No. 62/817,423, filed March 12, 2019, entitled“CONVERSATIONAL ARTIFICIAL INTELLIGENCE FOR
AUTOMATED SELF-SERVICE ACCOUNT MANAGEMENT,” which is incorporated herein by reference for all purposes in its entirety.
BACKGROUND
[002] A great deal of modem commerce involves the use of credit accounts that allow consumers to obtain goods, services, cash and other commodities in exchange for an agreement to pay the requisite fees under pre-defmed terms.
SUMMARY
[003] Existing software applications purposed for automation of payment processes are designed for accepting payments, routing phone calls to proper internal departments and for simplified automated services pertaining to products, services, and accounts. These applications achieve these tasks through technologies that have inherent limitations. The designs of these technologies require a limited or exact number of possible solutions or outputs. Addtionally, these technologies only accept simplified values from the caller - such as simple“yes” and“no” responses or dual-tone multi -frequency signaling (DTMF) entries - which then dictate the specific route or pathway experience for the caller based on confined rules. This process is typically described as a“decision tree” wherein a defined number of possible caller/customer experiences is fully defined, and the entry points required to achieve the desired outcomes are limited to a specific set of inputs that the caller must use to trigger the intended treatment of the decision tree to route them to the desired outcome. Existing technologies to have a conversational ability with the caller or use previous experience with other callers to augment their responses by correlating unclear answers with their corresponding associated intended meaning. Addtionally, the existing technologies cannot take free-form speech and analyze the intent of the spoken speech to establish meaning as it relates to the various tasks the device is designed to perform. Existing applications also cannot perform multiple tasks in parallel; the caller cannot engage the application in conversation while it is performing the previous tasks.
[004] The technologies described herein, also referred to as“AVA,” can take free-form human speech and transcribe it into text, analyze the intent of that speech in relation to the defined tasks for which the device is purposed, establish a corollary to communicate back to the caller for affirmation or next steps, analyze host data relating to the caller’s intent, execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller, competently respond to the caller in a truly conversational format with understanding of the callers intended goal with needed attributes that allow the caller to complete the caller’s intended objective in a self-directed fashion, and respond comprehensively to answer specific caller inquiries throughout the call experience. This new technology enables intent-driven communication automation tools on the phone between automated technologies (e.g., hots, etc.) and human beings.
[005] With regard to credit accounts, many consumers or account-holders/borrowers honor their agreements and pay the requisite fees according to agreed-upon terms and never need to contact the service provider or vendor regarding a statement of their account or the nature of their existing services or products. However, some account-holders require greater explanation of their existing due payments or further information about their existing account. Additionally, some customers are also not willing to or are not able to uphold their end of the agreement and do not pay the requisite fees as they become due pursuant to the terms of their agreement with the service provider or vendor. When payment terms are violated, service providers and vendors may be required to contact consumers. To do so, many vendors and service providers utilize call center facilities purposed for direct communications with consumers regarding the nature of their account, changes to existing services, or disputed amounts billed to the customer.
[006] A service provider, vendor or other creditor may have several options for contacting a consumer or account-holder about their account. A service provider, vendor or other creditor will seek to contact the borrower, consumer or account-holder, or their co-signor, as applicable, via telephone, physical mail, SMS text, and email in order to communicate with the borrower, consumer or account-holder, or their co-signor, as applicable, about, for example, past due amounts payable, pending or anticipated changes of to the party’s account details or services, confirmation of changes to services or lending terms, credit authorization of new services or to otherwise renegotiate the payment terms agreement or take legal action to enforce the agreement, among other purposes. All of these tasks involve additional work and expense on the part of the vendor, service-provider or creditor. Contacting a borrower may require investment of additional time to locate the borrower’s contact information, if such information is not readily available.
[007] Pursuant to existing U.S. law, such as the Telecommunication Consumer Protection Act (TCP A), restrictions on outbound contact methods from service providers requires an emphasis on direct inbound communication strategies (i.e., creditors are restricted on initiating communications with consumers and often must wait for a consumer to contact them before they can take a desired corrective action related to an account). The means through which vendors, service-providers, and creditors utilize to engage with consumers who are trying to engage them include: call centers operations, physical mail, and digital platforms such as social media.
Additionally, once contacted, the borrower may be uncooperative.
[008] If a creditor cannot find a method for efficient account management support, customer service, or the collection of over-due debts, the creditor may have to incur unexpected losses. In many cases, the additional expense to receive inbound communications cannot be recovered by the creditor. The creditor can also lose existing customers to a competitor due to a lack of communication, lack of convenience to the borrower, or simply a lack of sufficient customer experience that is commensurate with their expectations.
[009] Many account-holders who do not have delinquent accounts also seek a convenient method for paying and managing their account payment, managing their account services, inquiring about new services or existing account details, and they prefer direct telephonic communication to address these issues.
[010] Embodiments of the present subject matter comprise systems and methods for the engineering, development, management, implementation and use of telephony-based applications for automated self-service account management.
[Oil] Some embodiments of the subject matter disclosed herein comprise methods and systems for design and use of web-based applications for call flow management and routing. Some embodiments comprise an intelligent voice response (IVR) as part of a call flow management device to save time within the caller’s call routing experience. Some embodiments may comprise a static decisioning intelligence that can ascertain the purpose of the call directly by the input of alpha-numeric entries from the caller through the telephony experience, or dual tone multi- frequency (DTMF) signaling in order to receive direct input from the caller in response to automated questions and apply understanding to those direct entries and establishing the intent of a caller; creating a simplified self-service device that lets callers resolve their issue before reaching a live call center agent or to hasten their hold experience before they reach a live call center agent. [012] Some embodiments of the subject matter disclosed herein comprise methods for speech- to-text (STT) conversion for the real-time processing of recorded speech through textual analysis tools that create accurate text conversion as an output.
[013] Some embodiments of the subject matter disclosed herein comprise a Conversational Artificial Intelligence application layer, the method comprising: data parsing which allows for the tokenization of relevant textual data to prepare it for machine learning models and auto-detection of cluster typos or incorrect speech-to-text results; and machine learning model techniques such as a Markov Decision Process (MDP). The MDP forms the basis for many reinforcement learning problem techniques as it provides the flexibility to implement a wide range of machine learning algorithms including deep learning (neural nets) and classification and an interpretable and auditable environment which provides for continuous human assisted improvement.
[014] In one aspect, disclosed herein are computer-implemented platform for automated self- service account management comprising a host data store, a telephony system, and
conversational artificial intelligence system: the host data store comprising historic call data and account data; the telephony system configured to: receive free-form speech from a caller;
transcribe the free-form speech to generate a speech utterance; and provide the speech utterance to the conversational artificial intelligence system; the conversational artificial intelligence system configured to: parse and tokenize the speech utterance; query the host data store to retrieve historic call data and account data for the caller; and apply a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to:
identify an intended objective of the caller in relation to one or more pre-defmed account management tasks; and execute the intended objective; or establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective. In some embodiments, the machine learning model comprises a Markov Decision Process (MDP). In further embodiments, the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously. In some embodiments, the machine learning model comprises an artificial neural network (ANN). In some embodiments, the account is a credit account. In various embodiments, the one or more pre-defmed tasks comprises: get information about an account or make a payment on an account. In some embodiments, the conversational artificial intelligence system is further configured to execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller. In some embodiments, the platform is configured to comprehensively answer specific caller inquiries throughout the call experience. In some embodiments, the conversational artificial intelligence system is further configured to provide a log tracking process for a person-in-the-loop procedure in order to create supervised transition probabilities. In some embodiments, the telephony system is further configured to provide an interactive voice response (IVR) system. In some embodiments, the telephony system is further configured to receive dual-tone multi -frequency (DTMF) signaling.
[015] In another aspect, disclosed herein are computer-implemented methods for providing automated self-service account management services, the method comprising: maintaining a host data store comprising historic call data and account data; receiving, via a telephony system, free form speech from a caller; transcribing the free-form speech to generate a speech utterance; parsing and tokenizing the speech utterance; querying the host data store to retrieve historic call data and account data for the caller; and applying a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to: identify an intended objective of the caller in relation to one or more pre-defmed account management tasks; and execute the intended objective; or establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective. In some embodiments, the machine learning model comprises a Markov Decision Process (MDP). In further embodiments, the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously. In some embodiments, the machine learning model comprises an artificial neural network (ANN). In some
embodiments, the account is a credit account. In various embodiments, the one or more pre- defmed tasks comprises: get information about an account or make a payment on an account. In some embodiments, applying the machine learning model comprises executing any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller. In some embodiments, the method further comprises comprehensively answering specific caller inquiries throughout the call experience. In some embodiments, the method further comprises providing a log tracking process for a person-in-the-loop procedure in order to create supervised transition probabilities. In some embodiments, the method further comprises receiving, via the telephony system, interactive voice responses (IVR). In some embodiments, the method further comprises receiving, via the telephony system, dual-tone multi -frequency
(DTMF) signaling. BRIEF DESCRIPTION OF THE DRAWINGS
[016] Abetter understanding of the features and advantages of the present subject matter will be obtained by reference to the following detailed description that sets forth illustrative
embodiments and the accompanying drawings of which:
[017] Fig. 1 shows a non-limiting example of a high-level schematic diagram of the
technological environment for the subject matter described herein;
[018] Fig. 2 shows a non-limiting example of a topological diagram of the flow of information through the various systems and stages described herein, which derive successful output responses delivered as automated communications;
[019] Fig. 3 shows a non-limiting example of a digital processing device; in this case, a device with one or more CPUs, a memory, a communication interface, and a display;
[020] Fig. 4 shows a non-limiting example of a web/mobile application provision system; in this case, a system providing browser-based and/or native mobile user interfaces; and
[021] Fig. 5 shows a non-limiting example of a cloud-based web/mobile application provision system; in this case, a system comprising an elastically load balanced, auto-scaling web server and application server resources as well synchronously replicated databases.
DETAILED DESCRIPTION
[022] Described herein, in certain embodiments, are computer-implemented platform for automated self-service account management comprising a host data store, a telephony system, and conversational artificial intelligence system: the host data store comprising historic call data and account data; the telephony system configured to: receive free-form speech from a caller; transcribe the free-form speech to generate a speech utterance; and provide the speech utterance to the conversational artificial intelligence system; the conversational artificial intelligence system configured to: parse and tokenize the speech utterance; query the host data store to retrieve historic call data and account data for the caller; and apply a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to:
identify an intended objective of the caller in relation to one or more pre-defmed account management tasks; and execute the intended objective; or establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective. In some embodiments, the machine learning model comprises a Markov Decision Process (MDP). In further embodiments, the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously. In some embodiments, the account is a credit account. In some embodiments, the one or more pre-defmed tasks comprises: get information about an account or make a payment on an account. In some embodiments, the conversational artificial intelligence system is further configured to execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller. In some embodiments, the platform is configured to comprehensively answer specific caller inquiries throughout the call experience. In some embodiments, the conversational artificial intelligence system is further configured to provide a log tracking process for a person-in-the-loop procedure in order to create supervised transition probabilities.
[023] Also described herein, in certain embodiments, are computer-implemented methods for providing automated self-service account management services, the method comprising:
maintaining a host data store comprising historic call data and account data; receiving, via a telephony system, free-form speech from a caller; transcribing the free-form speech to generate a speech utterance; parsing and tokenizing the speech utterance; querying the host data store to retrieve historic call data and account data for the caller; and applying a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to: identify an intended objective of the caller in relation to one or more pre-defmed account management tasks; and execute the intended objective; or establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective. In some embodiments, the machine learning model comprises a Markov Decision Process (MDP). In further embodiments, the MDP enables the artificial intelligence system to move freely across states and perform a plurality of tasks simultaneously. In some embodiments, the machine learning model comprises an artificial neural network (ANN). In some embodiments, the account is a credit account. In various embodiments, the one or more pre-defmed tasks comprises: get information about an account or make a payment on an account. In some embodiments, applying the machine learning model comprises executing any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller. In some embodiments, the method further comprises comprehensively answering specific caller inquiries throughout the call experience. In some embodiments, the method further comprises providing a log tracking process for a person-in-the- loop procedure in order to create supervised transition probabilities. In some embodiments, the method further comprises receiving interactive voice responses (IVR). In some embodiments, the method further comprises receiving dual-tone multi -frequency (DTMF) signaling.
Certain definitions
[024] Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present subject matter belongs. As used in this specification and the appended claims, the singular forms“a,”“an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass“and/or” unless otherwise stated.
Overview
[025] Referring to Fig. 1, in a particular embodiment, 100 a caller is connected via telephony (e.g., a provided phone number to reach the telephony application platform; or the telephony environment). The caller speaks into their telephone and the telephony application platform receives a“speech utterance” in a speech gather process. That speech utterance is sent via API in an MP3 format from the telephony platform to existing technologies for speech-to-text transcription. The transcription is then relayed as input to AVA in a textual format 110.
[026] Referring to Fig. 1, in a particular embodiment, 110 the speech-to-text input for AVA is created via existing technologies for speech-to-text transcription and delivered to the AVA machine learning models 120.
[027] Referring to Fig. 1, in a particular embodiment, 120 AVA using data parsing to parse and tokenize the relevant textual data of the message in order to ready it for machine learning models (e.g., the Markov Decision Process). Data parsing includes a combination of open source tools such as the Natural Language Toolkit (NLTK 3.4) and a machine learning-based tokenizer which can auto-detect and cluster typos together to reduce the training vocabulary size.
[028] Referring to Fig. 1, in a particular embodiment, 120 the Markov Decision Process and other Natural Language Processing (NLP) and/or Natural Language Understanding (NLU) techniques, (e.g., Doc2Vec, classification models, etc.) seek to understand intent of the speech utterance and to follow the Markov Decision Process state/actions as well as execute probability function approximations resulting in: responses to the caller with information 160 and making requests via API 130 of the source data 140 required for specific tasks related to the specific state-actions linked to the initial speech (e.g., authentication of a caller, source data relating to the caller, etc.).
[029] Referring to Fig. 1, in a particular embodiment, 120 the probability distributions in AVA’s Markov Decision Process 120 to move freely across the various states and perform many tasks simultaneously, such as speaking to the caller 160 while performing a look up of information 130, 140, 150, the interpretation of that data, and the resultant state action to be executed.
Markov Decision Process
[030] In some embodiments, the systems and methods described herein include a Markov Decision Process. Below is an exemplary Markov Decision Process expressed as mathematical formulae.
[031] An MDP is a tuple (S, A, P, R, g), where S is a state space, A is a finite set of actions, P is the state transition probability function, R is the reward function, and g is a discount factor (g e [0, 1]).
[032] S - The State Space is an exhaustive set of states that the model understands and links to “state-actions,” which include responding to the customer, sending a task out to a client system, (making a note, retrieving billing information), ending the call, transferring the call, and more.
[033] A - The set of actions (different from state-actions) define the movements possible throughout the network. These actions (as well as the states) were built based on the
Representative Turn Groups seen in the voice samples provided by the client.
[034] P - The state transition probability function is approximated by machine learning models; a semi-supervised pipeline consisting of Doc2Vec and a classification model trained on the movements between States learned from our analysis of the voice samples. The inputs to the models consist of both caller inputs as well as the sourced client data required for decisioning.
[035] Doc2Vec models for understanding text in a numerical format use an additional vector (e.g.,“document ID”) that will expand upon the broader ML concept of feature vectors as with Word2Vec. The additional vector trained in addition to the word vectors establishes a concept of a“document” (e.g., a complete transcription of a speech utterance from a caller) through creating a numerical representation of the document (or label), in lieu of the Word2Vec establishment of the concept of a word with a numerical representation of each word. Word2Vec is a two-layer neural net that process text; using Continuous Bag of Words (CBOW) and Skip Gram (skip- gram). Continuous Bag of Words will concatenate the following word after a series of words, and skip-gram will predict all surrounding words (or context) by using just one word. Where
Word2Vec would train word vectors to predict the next word by giving a numerical representation to each word through its use of CBOW and skip-gram, Doc2Vec trains word vectors and additionally trains a document vector to create a numerical representation of the document, regardless of its length. Doc2Vec is an extension of Word2Vec with unsupervised learning of continuous representations of larger blocks of text (e.g., sentences, labels, documents, etc.).
[036] The probabilities are learned at the same time as the states/actions from the voice samples. We have a person in the loop in order to create supervised transition probabilities.
These probability distributions allow the AI to move freely across states and do not limit the model to only being in one state at a time (e.g., answer a general billing question while it restores services).
[037] R - In the first version of AVA, the reward function at this state is fixed upon whether or not the customer was able to make a payment, payment arrangement or restore their services. In future versions of AVA, customer interactions will provide a better reward estimation. The reward function helps to update the probability function defined in the previous bullet point.
[038] Referring to Fig. 2, in a particular embodiment, 200 the AVA environment encapsulates the telephony application platform for the auditory caller experience where speech-to-text conversion (STT) and text-to- speech (TTS) conversion occur via existing technologies reached via API 205, the AI environment 210 where data parsing and the Markov Decision process functions are executed 220, 225, and the retrieval aspects of retrieval of source data communicate via API to source data systems necessitated by specific state/actions, (e.g., authenticating a caller, retrieval of account management information, etc.).
[039] Referring to Fig. 2, in a particular embodiment, 205 the telephony environment contains existing technologies for speech-to-text conversions via existing resting APIs for STT
technologies that create textual input for the Artificial Intelligence environment to receive, and contains existing technologies for text-to-speech conversions via existing resting APIs for TTS for the caller to receive as auditory feedback from AVA, (or otherwise definable as
“conversation” between the hot and the caller).
[040] Referring to Fig. 2, in a particular embodiment, 215 the source data is reached via API for the AI environment to access and interpret 220, and sometimes relay to the customer 225 based on the approximated state/action that apply.
[041] Referring to Fig. 2, in a particular embodiment, 225, 220, 215 The probability distributions in AVA’s Markov Decision Process 225 move freely across the various states and perform many tasks simultaneously, such as speaking to the caller 225, 205 while performing a look up of information 220, 215, 225, the interpretation of that data and the resultant state action to be executed.
Digital processing device
[042] In some embodiments, the platforms, systems, media, and methods described herein include a digital processing device, or use of the same. In further embodiments, the digital processing device includes one or more hardware central processing units (CPUs) or general purpose graphics processing units (GPGPUs) that carry out the device’s functions. In still further embodiments, the digital processing device further comprises an operating system configured to perform executable instructions. In some embodiments, the digital processing device is optionally connected a computer network. In further embodiments, the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web. In still further embodiments, the digital processing device is optionally connected to a cloud computing infrastructure. In other embodiments, the digital processing device is optionally connected to an intranet. In other embodiments, the digital processing device is optionally connected to a data storage device.
[043] In accordance with the description herein, suitable digital processing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, video game consoles, and vehicles. Those of skill in the art will recognize that many smartphones are suitable for use in the system described herein. Those of skill in the art will also recognize that select televisions, video players, and digital music players with optional computer network connectivity are suitable for use in the system described herein. Suitable tablet computers include those with booklet, slate, and convertible
configurations, known to those of skill in the art.
[044] In some embodiments, the digital processing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX- like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing. Those of skill in the art will also recognize that suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®. Those of skill in the art will also recognize that suitable media streaming device operating systems include, by way of non-limiting examples, Apple TV®, Roku®, Boxee®, Google TV®, Google Chromecast®, Amazon Fire®, and Samsung® HomeSync®. Those of skill in the art will also recognize that suitable video game console operating systems include, by way of non-limiting examples, Sony® PS3®, Sony® PS4®, Microsoft® Xbox 360®, Microsoft Xbox One, Nintendo® Wii®, Nintendo® Wii U®, and Ouya®.
[045] In some embodiments, the device includes a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is volatile memory and requires power to maintain stored information. In some embodiments, the device is non-volatile memory and retains stored information when the digital processing device is not powered. In further embodiments, the non-volatile memory comprises flash memory. In some embodiments, the non volatile memory comprises dynamic random-access memory (DRAM). In some embodiments, the non-volatile memory comprises ferroelectric random access memory (FRAM). In some embodiments, the non-volatile memory comprises phase-change random access memory (PRAM). In other embodiments, the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing based storage. In further embodiments, the storage and/or memory device is a combination of devices such as those disclosed herein.
[046] In some embodiments, the digital processing device includes a display to send visual information to a user. In some embodiments, the display is a cathode ray tube (CRT). In some embodiments, the display is a liquid crystal display (LCD). In further embodiments, the display is a thin film transistor liquid crystal display (TFT-LCD). In some embodiments, the display is an organic light emitting diode (OLED) display. In various further embodiments, on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display. In some embodiments, the display is a plasma display. In other embodiments, the display is a video projector.
[047] In some embodiments, the digital processing device includes an input device to receive information from a user. In some embodiments, the input device is a keyboard. In some embodiments, the input device is a pointing device including, by way of non-limiting examples, a mouse, trackball, track pad, joystick, game controller, or stylus. In some embodiments, the input device is a touch screen or a multi-touch screen. In other embodiments, the input device is a microphone to capture voice or other sound input. In other embodiments, the input device is a video camera or other sensor to capture motion or visual input. In further embodiments, the input device is a Kinect, Leap Motion, or the like. In still further embodiments, the input device is a combination of devices such as those disclosed herein.
[048] Referring to Fig. 3, in a particular embodiment, an exemplary digital processing device 301 is programmed or otherwise configured to conduct telephony, store and retrieve caller data, and apply machine learning algorithms. In this embodiment, the digital processing device 301 includes a central processing unit (CPU, also“processor” and“computer processor” herein) 305, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The digital processing device 301 also includes memory or memory location 310 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 315 (e.g., hard disk), communication interface 320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 325, such as cache, other memory, data storage and/or electronic display adapters. The memory 310, storage unit 315, interface 320 and peripheral devices 325 are in communication with the CPU 305 through a communication bus (solid lines), such as a motherboard. The storage unit 315 can be a data storage unit (or data repository) for storing data. The digital processing device 301 can be operatively coupled to a computer network (“network”) 330 with the aid of the communication interface 320. The network 330 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 330 in some cases is a telecommunication and/or data network. The network 330 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 330, in some cases with the aid of the device 301, can implement a peer-to-peer network, which may enable devices coupled to the device 301 to behave as a client or a server.
[049] Continuing to refer to Fig. 3, the CPU 305 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 310. The instructions can be directed to the CPU 305, which can subsequently program or otherwise configure the CPU 305 to implement methods of the present disclosure. Examples of operations performed by the CPU 305 can include fetch, decode, execute, and write back. The CPU 305 can be part of a circuit, such as an integrated circuit. One or more other components of the device 301 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
[050] Continuing to refer to Fig. 3, the storage unit 315 can store files, such as drivers, libraries and saved programs. The storage unit 315 can store user data, e.g., user preferences and user programs. The digital processing device 301 in some cases can include one or more additional data storage units that are external, such as located on a remote server that is in communication through an intranet or the Internet.
[051] Continuing to refer to Fig. 3, the digital processing device 301 can communicate with one or more remote computer systems through the network 330. For instance, the device 301 can communicate with a remote computer system of a user. Examples of remote computer systems include servers, personal computers (e.g., portable PC), slate or tablet computers (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, smartphones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
[052] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the digital processing device 101, such as, for example, on the memory 310 or electronic storage unit 315. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 305. In some cases, the code can be retrieved from the storage unit 315 and stored on the memory 310 for ready access by the processor 305. In some situations, the electronic storage unit 315 can be precluded, and machine-executable instructions are stored on memory 310.
Non-transitory computer readable storage medium
[053] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device. In further embodiments, a computer readable storage medium is a tangible component of a digital processing device. In still further embodiments, a computer readable storage medium is optionally removable from a digital processing device. In some embodiments, a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
Computer program
[054] In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable in the digital processing device’s CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
[055] The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
Web application
[056] In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, and XML database systems. In further embodiments, suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQL™, and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client- side scripting language such as Asynchronous Javascript and XML (AJAX), Flash® Actionscript, Javascript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tel, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further
embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.
[057] Referring to Fig. 4, in a particular embodiment, an application provision system comprises one or more databases 400 accessed by a relational database management system (RDBMS) 410. Suitable RDBMSs include Firebird, MySQL, PostgreSQL, SQLite, Oracle Database, Microsoft SQL Server, IBM DB2, IBM Informix, SAP Sybase, SAP Sybase, Teradata, and the like. In this embodiment, the application provision system further comprises one or more application severs 420 (such as Java servers, .NET servers, PHP servers, and the like) and one or more web servers 430 (such as Apache, IIS, GWS and the like). The web server(s) optionally expose one or more web services via app application programming interfaces (APIs) 440. Via a network, such as the Internet, the system provides browser-based and/or mobile native user interfaces.
[058] Referring to Fig. 5, in a particular embodiment, an application provision system alternatively has a distributed, cloud-based architecture 500 and comprises elastically load balanced, auto-scaling web server resources 510 and application server resources 520 as well synchronously replicated databases 530.
Mobile application
[059] In some embodiments, a computer program includes a mobile application provided to a mobile digital processing device. In some embodiments, the mobile application is provided to a mobile digital processing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile digital processing device via the computer network described herein.
[060] In view of the disclosure provided herein, a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non-limiting examples,
C, C++, C#, Objective-C, Java™, Javascript, Pascal, Object Pascal, Python™, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
[061] Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and Phonegap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, Android™ SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
[062] Those of skill in the art will recognize that several commercial forums are available for distribution of mobile applications including, by way of non-limiting examples, Apple® App Store, Google® Play, Chrome WebStore, BlackBerry® App World, App Store for Palm devices, App Catalog for webOS, Windows® Marketplace for Mobile, Ovi Store for Nokia® devices, Samsung® Apps, and Nintendo® DSi Shop.
Standalone application
[063] In some embodiments, a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in. Those of skill in the art will recognize that standalone applications are often compiled. A compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, Java™, Lisp, Python™, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program. In some embodiments, a computer program includes one or more executable complied applications.
Web browser plug-in
[064] In some embodiments, the computer program includes a web browser plug-in (e.g., extension, etc.). In computing, a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types. Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Silverlight®, and Apple® QuickTime®. In some embodiments, the toolbar comprises one or more web browser extensions, add-ins, or add-ons. In some embodiments, the toolbar comprises one or more explorer bars, tool bands, or desk bands.
[065] In view of the disclosure provided herein, those of skill in the art will recognize that several plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, Java™, PHP, Python™, and VB .NET, or combinations thereof.
[066] Web browsers (also called Internet browsers) are software applications, designed for use with network-connected digital processing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of non limiting examples, Microsoft® Internet Explorer®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, and KDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called microbrowsers, mini-browsers, and wireless browsers) are designed for use on mobile digital processing devices including, by way of non-limiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, music players, personal digital assistants (PDAs), and handheld video game systems. Suitable mobile web browsers include, by way of non-limiting examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS® Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon® Kindle® Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSP™ browser.
Software modules
[067] In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof. In further various embodiments, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, and a standalone application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on cloud computing platforms. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
Databases
[068] In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of historic caller, call, and account information. In various embodiments, suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object-oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase. In some embodiments, a database is internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing-based. In other embodiments, a database is based on one or more local computer storage devices.
EXAMPLES
[069] The following illustrative examples are representative of embodiments of the software applications, systems, and methods described herein and are not meant to be limiting in any way.
Example 1— Training an MDP bot for Conversational AI Applications on the Phone
[070] A telephony environment which can be described as a software application platform for further development is first established. This environment has a receiving phone number (or phone numbers) by which a caller or transfer application can reach the telephony environment. The software application environment is where all the transcription, voice and necessary experiential data points will flow through. Data flows to the AI model which is in the cloud, flows to host systems that hold account management data, and flows back through the telephony with voice applications providing an auditory experience to the caller. The organization of this flow is essential and key to the specific design of the conversational AI.
[071] Source caller data from the client containing existing historical caller experiences in their entirety is required. This data must be at a specific level of scale in order to successfully inform the AI for its initial approximations, establishment of states/actions and the development of Representative Turn Groups which impact how the set of actions can define the movements across the network.
[072] In the MDP there are provided potential values for each component of the tuple. There are five sets of data required to train the AI: The historical voice sample provided containing existing human-to-human or human-to-robot phone calls used to build the Representative Turn Groups, the client policy that dictates actions taken by the AI for each procedure, the manual entries for the variety of speech to be received, and the semi-supervised augmentations of Doc2Vec and classification models, and the reward function inherent to the MDP that provide“scoring” relating to successful intent understanding and responses or the proper execution of state actions and probability distributions as they relate to the initial input. The state space values can be created, at first, through a manual process. Specifically, by establishing the total potential variety of entries that all equate to the same state. As an example, a definitive list for affirmation or “yes” states might include:“Yup,”“ok,”“Okie,”“alrighty,”“absolutely,”“of course,”“no problem,”“sure,”“alright,” etc. These and all utterances that might be colloquial or otherwise regional, or that might be a specific parlance relating to anachronistic or business-related (task- related) language (such as 40 IK management, for example), or acronyms that relate to business language (such as“HIS,” standing for“High Speed Internet,” for example) is logged and placed as specific state-space values. The set of actions (different from state-actions) define the movements possible throughout the network. These actions (as well as the states) are built based on the Representative Turn Groups seen in the voice samples provided by the client. The state transition probability function is approximated by machine learning models; a semi-supervised pipeline consisting of Doc2Vec and a classification model trained on the movements between States learned from the analysis of the voice samples. The inputs to the models consist of both caller inputs as well as the retrieved source data through state actions for decisioning that relate to account management. The probabilities are learned at the same time as the states/actions from the voice samples. Addtionally, to augment with a human-aided technique to the continued improvement of the AI understanding and performance, a person in the loop (via a log tracking process) in order to create supervised transition probabilities. These probability distributions allow the AI to move freely across states and do not limit the model to only being in one state at a time (e.g., answer a general billing question while it restores services, for example). The reward function a is fixed upon whether or not the customer was able to make a payment, payment arrangement, restore their services and perform the initial base tasks for which the deployment is purposed at the client. As the Conversational AI Models and capabilities mature, caller interactions provide a better reward estimation. Ultimately the reward function helps to update the probability function defined in the previous bullet point.
[073] In execution of these functions a caller reaches the specific telephonic number of the telephony environment hosting the application. The telephony application accepts the call and executes a state action related to“basic greeting without prior knowledge of caller” defined by client policy and historical call analysis. This state action sends a textual output to the telephony environment for text-to- speech conversion, such as an MP3 signal broadcast on the live phone call,“Thank you for calling us. My name is AVA can I have your name please.” The caller’s initial speech reaction to this greeting then becomes the first variable by which the remainder of MDP results can occur. The caller then states,“My name is Tim Cowherd and I want to understand my bill, why did my balance go up?” The telephony environment will deliver the MP3 of this speech utterance for speech-to-text conversion, which transcribes the exacting result and then delivers the textual input to the AI environment. The AI environment utilizes data parsing to prioritize and categorize specific words or documents/sentences, groups of words such as“bill,”“understand,”“Name,”“Tim Cowherd,”“bill go up” to provide the ML models with numerical representations of the words, documents, sentences and groups of words. The numerical representations of the text thus submitted to the machine learning models will leverage probability distributions in AVA’s Markov Decision Process to move freely across the various states and perform many tasks simultaneously, such as a simultaneous execution of the state action to deliver text back to the telephony environment via text or speech conversion to MP3,“I understand you would like to know more about your account, could you please give me the account holder name and account number to start?”, the state action to understand the name“Tim Cowherd” and associate it with the caller, and the state action to authenticate the caller with the proceeding information from the caller in the next speech utterance containing the account holder name and account number.
[074] While preferred embodiments of the present subject matter have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the present disclosure. It should be understood that various alternatives to the embodiments of the subject matter described herein may be employed.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A computer-implemented platform for automated self-service account management
comprising a host data store, a telephony system, and conversational artificial intelligence system:
the host data store comprising historic call data and account data;
the telephony system configured to:
a) receive free-form speech from a caller; b) transcribe the free-form speech to generate a speech utterance; and
c) provide the speech utterance to the conversational artificial intelligence system; the conversational artificial intelligence system configured to:
a) parse and tokenize the speech utterance;
b) query the host data store to retrieve historic call data and account data for the caller; and
c) apply a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to:
i) identify an intended objective of the caller in relation to one or more pre defined account management tasks; and
ii) execute the intended objective; or
iii) establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective.
2. The system of claim 1, wherein the machine learning model comprises an artificial neural network (ANN).
3. The system of claim 1, wherein the account is a credit account.
4. The system of claim 1, wherein the one or more pre-defmed tasks comprises: get
information about an account or make a payment on an account.
5. The system of claim 1, wherein the conversational artificial intelligence system is further configured to execute any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller.
6. The system of claim 1, further configured to comprehensively answer specific caller inquiries throughout the call experience.
7. The system of claim 1, wherein the conversational artificial intelligence system is further configured to provide a log tracking process for a person-in-the4oop procedure in order to create supervised transition probabilities.
8. The system of claim 1, wherein the telephony system is further configured to provide an interactive voice response (IVR) system.
9. The system of claim 1, wherein the telephony system is further configured to receive dual-tone multi-frequency (DTMF) signaling.
10. A computer-implemented method for providing automated self-service account
management services, the method comprising:
a) maintaining a host data store comprising historic call data and account data; d) receiving, via a telephony system, free-form speech from a caller;
e) transcribing the free-form speech to generate a speech utterance;
d) parsing and tokenizing the speech utterance; e) querying the host data store to retrieve historic call data and account data for the caller; and
f) applying a machine learning model to the tokenized speech utterance and the historic call data and account data for the caller to:
i) identify an intended objective of the caller in relation to one or more pre defined account management tasks; and
ii) execute the intended objective; or
iii) establish a corollary to respond to the caller in a conversational format via the telephony system for affirmation or additional data needed to execute the intended objective.
11 The method of claim 10, wherein the machine learning model comprises an artificial neural network (ANN).
12. The method of claim 10, wherein the account is a credit account.
13. The method of claim 10, wherein the one or more pre-defmed tasks comprises: get
information about an account or make a payment on an account.
14. The method of claim 10, wherein applying the machine learning model comprises
executing any number of parallel tasks relating to account management or retrieval of decisioning data points relating to the caller.
15. The method of claim 10, wherein the method further comprises comprehensively
answering specific caller inquiries throughout the call experience.
16. The method of claim 10, wherein the method further comprises providing a log tracking process for a person-in-the4oop procedure in order to create supervised transition probabilities.
17. The method of claim 10, wherein the method further comprises receiving interactive voice responses (IVR).
18. The method of claim 10, wherein the method further comprises receiving dual-tone multi- frequency (DTMF) signaling.
PCT/US2020/022074 2019-03-12 2020-03-11 Conversational artificial intelligence for automated self-service account management Ceased WO2020185880A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962817423P 2019-03-12 2019-03-12
US62/817,423 2019-03-12

Publications (1)

Publication Number Publication Date
WO2020185880A1 true WO2020185880A1 (en) 2020-09-17

Family

ID=72427637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/022074 Ceased WO2020185880A1 (en) 2019-03-12 2020-03-11 Conversational artificial intelligence for automated self-service account management

Country Status (1)

Country Link
WO (1) WO2020185880A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159901A (en) * 2021-04-29 2021-07-23 天津狮拓信息技术有限公司 Method and device for realizing financing lease service session

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043574A1 (en) * 1998-10-02 2007-02-22 Daniel Coffman Conversational computing via conversational virtual machine
US20130111348A1 (en) * 2010-01-18 2013-05-02 Apple Inc. Prioritizing Selection Criteria by Automated Assistant
US20140247927A1 (en) * 2010-04-21 2014-09-04 Angel.Com Incorporated Dynamic speech resource allocation
US20150189085A1 (en) * 2013-03-15 2015-07-02 Genesys Telecommunications Laboratories, Inc. Customer portal of an intelligent automated agent for a contact center

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043574A1 (en) * 1998-10-02 2007-02-22 Daniel Coffman Conversational computing via conversational virtual machine
US20130111348A1 (en) * 2010-01-18 2013-05-02 Apple Inc. Prioritizing Selection Criteria by Automated Assistant
US20140247927A1 (en) * 2010-04-21 2014-09-04 Angel.Com Incorporated Dynamic speech resource allocation
US20150189085A1 (en) * 2013-03-15 2015-07-02 Genesys Telecommunications Laboratories, Inc. Customer portal of an intelligent automated agent for a contact center

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159901A (en) * 2021-04-29 2021-07-23 天津狮拓信息技术有限公司 Method and device for realizing financing lease service session
CN113159901B (en) * 2021-04-29 2024-06-04 天津狮拓信息技术有限公司 Method and device for realizing financing lease business session

Similar Documents

Publication Publication Date Title
US12444003B2 (en) Systems, media, and methods for automated response to queries made by interactive electronic chat
JP7743400B2 (en) System and method for managing interactions between a contact center system and its users - Patents.com
US20250239253A1 (en) Systems and methods for providing automated natural language dialogue with customers
US11775572B2 (en) Directed acyclic graph based framework for training models
US12033161B2 (en) Systems and methods for automated discrepancy determination, explanation, and resolution
US12417370B2 (en) Automated assistant invocation of appropriate agent
JP7381579B2 (en) Semantic Artificial Intelligence Agent
US9922649B1 (en) System and method for customer interaction management
US9516126B1 (en) Call center call-back push notifications
US10832255B2 (en) Systems and methods for understanding and solving customer problems by extracting and reasoning about customer narratives
US20230043528A1 (en) Using backpropagation to train a dialog system
US11114092B2 (en) Real-time voice processing systems and methods
WO2020185880A1 (en) Conversational artificial intelligence for automated self-service account management
US12412220B2 (en) Digital platform for connecting insurance related transactions
US9620111B1 (en) Generation and maintenance of language model
US12321339B1 (en) Methods and systems for regulatory exploration preserving bandwidth and improving computing performance
US20250315641A1 (en) Systems and methods for automatically creating and training virtual assistants
AU2020386374B2 (en) System and method for managing a dialog between a contact center system and a user thereof
US12243081B2 (en) System and method for presenting one or more products to a user via a virtual assistant

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20770867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20770867

Country of ref document: EP

Kind code of ref document: A1