Total Pageviews

Saturday, November 16, 2013

Multimodal Interfaces For User Self-Service Input/Output

Copyright (C) Unified-View, All Rights Reserved.

For years you have seen me refer to multimodal mobile devices as the mainstay for unified communications. You hav have also seen me stress the fact that UC is not just for person-to-person contacts, but also for interactions with automated, online applications. Now that consumers have rapidly adopted the use of smartphones and tablets, the all are able to access information and people without necessarily making a tradtional phone call over the PSTN.

This transition is not only affecting the use of telephones and the legacy Telephone User Intefcae (TUI), but is also causing a ripple affect in how online mobile self-service apps must support the flexibility of multimodal endpoint devices. In the Winter issue of Speech Technology, the challenge of integrating voice and visual interfaces is discussed, highlighting the need for end users to use both forms of interaction as their personal needs dictate. 

(Go to

As I have frequently stressed in the past, people will usually find it faster and easier to talk than to type input, and faster and easier to read (or look) than listen to informational output. On top of that, there will be times when a user has to be hands-free and/or eyes-free, in which case, the user choice will need that flexibility option at the input or output level of any self-service application.

Device Independence For Mobile Apps

Because "Consumer BYOD" will require mobile apps to support a variety of form factors and mobile OSs, there will be a new need to separate mobile apps from the control of any input and output content. That approach seems to be on the agenda of the World Wide Consortium's Multimodal Interaction Working Group to develop standards for  inter-operating between "modality components." This means that an application process will be completely independent from the different input/output format controls, which can be selectively used for individual end user situations.

Think of it in terms of person-to-person UC-enabled messaging, where the message sender creates a message in text or speech, the recipient gets notified about the message, then can choose to retrieve that message as voice or text. So, an application interaction controller can be directed to dynamically convert and deliver input or output to a particular device screen interface or voice channel. Since we are talking about a Web-based interaction with an application, not a person, a real-time voice (or video) connection is not directly involved except when invoked through a "click-for-assistance" option in the application (like the Amazon "Mayday button").

Here is a recent post I did on the role of multi-modal live assistance as part of mobile customer self-services.