Conversation Design User Experiences for SiriKit and iOS

Final product image — What You’ll Be Creating

Introduction

A lot of articles, our site included, have focused on helping readers create amazing iOS apps by designing a great mobile user experience (UX).

However, with the emergence of the Apple Watch a few years ago, alongside CarKit, and more recently the HomePod this year, we are starting to see a lot more apps and IoT appliances that use voice commands instead of visual interfaces. The prevalence of IoT devices such as the HomePod and other voice assistants, as well as the explosion in voice-assistant enabled third-party apps, has given rise to a whole new category of user experience design methodologies, focusing on Voice User Experiences (VUX), or Conversational Design UX.

This has led to Apple focusing on the development of SiriKit a few years ago and providing third-party developers the ability to extend their apps to allow users to converse with their apps more naturally. As SiriKit opens up more to third-party developers, we are starting to see more apps becoming part of SiriKit, such as prominent messaging apps WhatsApp and Skype, as well as payment apps like Venmo and Apple Pay.

SiriKit’s aim is to blur the boundaries between apps through a consistent conversational user experience that enables apps to remain intuitive, functional and engaging through pre-defined intents and domains. This tutorial will help you apply best practices to create intuitive conversational design user experiences without visual cues.

Objectives of This Tutorial

This tutorial will teach you to design audibly engaging SiriKit-enabled apps through best-practices in VUX. You will learn about:

designing for voice interactions
applying conversational design UX best practices
testing SiriKit-enabled apps

Assumed Knowledge

I’ll assume you have worked with SiriKit previously, and have some experience coding with Swift and Xcode.

Designing for Voice Interactions

Creating engaging apps requires a well thought-out design for user experience—UX design for short. One common underlying principle to all mobile platforms is that design is based on a visual user interface. However, when designing for platforms where users engage via voice, you don’t have the advantage of visual cues to help guide users. This brings a completely new set of design challenges.

The absence of a graphical user interface forces users to understand how to communicate with their devices by voice, determining what they are able to say as they navigate between various states in order to achieve their goals. The Interaction Design Foundation describes the situation in conservational user experience:

“In voice user interfaces, you cannot create visual affordances. Users will have no clear indications of what the interface can do or what their options are.”

As a designer, you will need to understand how people communicate with technologies naturally—the fundamentals of voice interaction. According to recent studies by Stanford, users generally perceive and interact with voice interfaces in much the same way they converse with other people, irrespective of the fact that they are aware they are speaking to a device.

The difficulty in being able to anticipate the different ways in which people speak has led to advances in machine learning over the past few years, with natural language processing (NLP) allowing platforms to understand humans in a more natural manner, by recognizing intents and associative domains of commands. One prominent platform is Apple’s Siri, and its framework for third-party developers, SiriKit.

Overview of SiriKit

While most would understand Siri as primarily focusing on non-visual voice assistance, its integration into Apple’s ecosystem allows users to trigger voice interactions through their operating system, be it iOS, watchOS, CarPlay, or the HomePod.

The first three platforms provide limited visual guidance in addition to audible feedback, whereas the HomePod only provides audible feedback. Using iOS or CarPlay whilst driving, the platform will provide even less visual feedback and more audio feedback, so the amount of information a user receives is dynamic. As an app designer, you will need to cater to both kinds of interactions.

This means that SiriKit calibrates how much it offers visually or verbally based on the state of the device and user, and as long as you conform to best practices, SiriKit will gracefully handle all of these for you.

Intents and Domain Handling

The framework handles user requests through two primary processes: intents and domain handling.

Intents are managed through the voice-handling Intents framework, the Intents App Extension, which takes user requests and turns them into app-specific actions, such as booking a car-share ride or sending money to someone.

The Intents UI app extension, on the other hand, allows you to deliver minimal visual content confirmation once a user has made a request and your app wants to confirm prior to completing the request.

SiriKit classifies intents (user requests) into specific types, called domains. As of iOS 11, third-party developers are able to leverage the following domains and interactions:

List of SiriKit domains and interactions

It may initially seem that the choice is quite limited, but Apple’s justification is that this helps manage user confidence and expectations carefully while gradually allowing users to learn and build their knowledge of how to interact with Siri. This also allows Apple and the community to scale over time whilst blurring the boundaries between apps that sit behind the voice-assistance interface.

iOS developers taking advantage of SiriKit also benefit from the contextual knowledge the platform provides. This includes grouping sentences by conversational context, according to the intents and domains that machine learning delivers. That is, Siri will try to figure out whether your next command is part of the same conversational context or a new one. If you say “Please book an Uber ride”, Siri would know that you are intending to book a car-share ride, in the carshare domain. However, the app would need more information, such as what type of ride. If your next command was “Uber Pool”, it would know that the second command is within the same context.

Leveraging SiriKit allows you to benefit from Apple’s platform orchestrating a lot of the heavy lifting, allowing you to focus on what’s important, which is developing value. Having said that, you still need to be a good ‘Siri citizen’. Next, you will next learn about various best practices you can follow in order to create a conducive user experience with non-visual communication and voice interaction.

For more information on developing with SiriKit, check out Create SiriKit Extensions in iOS 10.

iOS SDK

Create SiriKit Extensions in iOS 10

Patrick Balestra

Applying Conversational Design UX Best Practices

Let’s take a look at some best practices that you can immediately apply to your SiriKit extension in order to ensure your app provides a pleasant, logical and intuitive conversational voice interface for your users.

Inform Users of their Options

The first guiding principle is to succinctly inform your users of what options they have during a particular state of interaction.

Whereas graphical user experiences can effortlessly provide a visual context back to their users, such as through modal dialog boxes, for instance, that same luxury doesn’t exist with voice-enabled apps. Users have varied expectations on what natural language processing can handle. Some will be very conservative and might not realize the power of Siri, and others might start by asking something complex that doesn’t make sense to Siri.

You need to design your user experience so as to provide users with information on what they are able to do at a particular juncture, in the form of options.

What options your app returns should be contextually relevant. In the following screenshot, the contact person has multiple phone numbers, and if the user hasn’t explicitly stated which one to use, the user should be prompted.

SiriKit uses contact resolution, which you have access to via the SDK, to guide the app to determine which contact phone number (or even which contact name if more than one contact entry has the same name) the end-user intended. According to Apple’s documentation:

During resolution, SiriKit prompts you to verify each parameter individually by calling the resolution methods of your handler object. In each method, you validate the provided data and create a resolution result object indicating your success or failure in resolving the parameter. SiriKit uses your resolution result object to determine how to proceed. For example, if your resolution result asks the user to disambiguate from among two or more choices, SiriKit prompts the user to select one of those choices.

For more information on resolving user intents, refer to Apple’s documentation on Resolving and handling intents.

Be Fast & Accurate

It is important that in your app’s conversational user experience, you respond to commands expeditiously, as users expect a fast response. What this means is that you should design your interaction workflow to provide the quickest set of actions to achieve function completion without unnecessary prompts or screens.

Apple encourages you to take the user directly to the content without any intermediary screens or messages. If a user needs to be authenticated, take the user to the authentication screen directly, and then make sure to maintain the context so the user can continue in a logical manner to complete his or her actions. Apple’s Human Interface Guidelines advise that you will need to:

Respond quickly and minimize interaction. People use Siri for convenience and expect a fast response. Present efficient, focused choices that reduce the possibility of additional prompting.

Limit the Amount of Information

The Amazon Echo design guidelines recommend that you don’t list more than three different options in a single interaction, but rather provide users with the most popular options first. Then, if you need to provide more than three options, provide an option at the end to go through the rest of them.

Prioritize and order the options according to which the users would most likely use, and allow users to explicitly call out some of the less popular options without reading them out. You could dynamically adjust the prominent options based on your users’ historical preferences as well.

Most importantly, don’t demonstrate prejudice or deception! That is, don’t misrepresent information or weigh the prompts to prioritize the most expensive options—for example, listing the most expensive car-share ride options first with the cheaper car-pooling options last. This is a sure way for your customers to lose confidence and trust in your app.

Provide Conversational Breadcrumbs

It’s hard for users to work out where they are without visual cues, and even if SiriKit can keep track of the current context, users tend to interact with SiriKit whilst doing something else, such as driving or jogging.

Therefore, you always need to provide an informative response to a command, not only confirming it but reminding the user of the context. For instance, when the user asks to book a car-share ride, you can provide context around your response by saying something like: “You’ve booked a ride for today at 5 pm, using AcmeCar” instead of just responding with “booking confirmed”.

In other words, provide enough contextual information in your response for the user to understand what has been confirmed, without having to glance at her or his phone as confirmation of the user’s intentions.

Provide an Experience That Doesn’t Require Touching or Glancing

As Apple’s ecosystem of Siri-enabled devices expands beyond iOS and watchOS into devices that lack a visual interface, it is important that your responses and interactions don’t require users to glance back at the screen or even touch their devices to confirm something. The verbal responses should be contextual and succinct enough (including providing a limited subset of options), giving users just the right amount of information for them to continue to blindly interact with their device.

The power of Siri comes from users being able to have their iPhones in their pockets and use headphones to interact with their voice assistants, to shout a new reminder to their HomePods from across the room, or to listen to messages whilst driving their CarKit-enabled vehicles. Interacting with their SiriKit-enabled apps should only require secondary focus and attention, not primary touching or visual confirmation.

The exception, however, is when an intent requires an extra layer of security and authentication prior to fulfilling the request.

Require Authentication for Certain Intents

It is important that you identify intents that do require specific authentication and authorization prior to being used. If a user asks “What is the weather”, you won’t need authentication. However, if a user asks to “Pay Jane $20 with Venmo”, you obviously should require that the user authenticate first.

SiriKit manages intent restriction where users would need to authenticate via FaceID, TouchID or a passcode when the device is locked, by requiring that you specify in your application’s info.plist explicitly which intents are restricted while locked:

Anticipate and Handle Errors

As well as using voice prompting to handle disambiguation, as discussed earlier, you will also need to ensure that you anticipate and handle as many error scenarios as you can.

For instance, when a user tries to send money to another participant and that participant doesn’t have an email address which is required or has multiple numbers, you need to handle that. SiriKit provides the INIntentResolutionResult class which allows you to set a resolution for the appropriate data type you are trying to resolve:

func resolveContent(forSendMessage intent: INSendMessageIntent, with completion: @escaping (INStringResolutionResult) -> Void) {
        let result: INStringResolutionResult
        if let text = intent.content, !text.isEmpty {
            result = .success(with: text)
        } else {
            result = .needsValue()
        }
        
        completion(result)
    }

Apple recommends that you try and extrapolate historical information from user behaviors where possible, to reduce the number of interaction steps in the workflow. Take a look at the INIntentError documentation, which provides a set of possible errors you can handle, such as interactionOperationNotSupported or requestTimedOut.

Add Custom Vocabularies

SiriKit supports adding custom vocabularies through the use of the plist file AppIntentVocabulary.plist to help improve your app’s conversational user experience. You can use this for onboarding users as well as for including specific terms that your app can recognize.

Providing your users with example commands helps with onboarding by guiding your users to your app’s capabilities. If you ask Siri “What can you do?” it will prompt you not only with the inbuilt functionalities that are possible, but also with third-party apps. To promote your app’s functionalities to your users globally, include intent examples in your AppIntentVocabulary.plist file:

You can also help Siri understand and recognize terms that are specific only to your app by supplying it with an array of vocabulary words. These are words that apply to any user of your app (such as if you have a specific term for messaging that your app uses), but if you need to provide user-specific terms, take advantage of INVocabulary. Within the plist, you add a ParameterVocabularies key for your custom global terms and associate each term with a specific corresponding intent property object (you may have multiple intents for each term).

Consult Apple’s documentation on Registering Custom Vocabulary with SiriKit to learn how to create user-specific terms.

Testing Your SiriKit-Enabled Apps

Finally, as of Xcode 9, you can conveniently test Siri using your Simulator by triggering Siri via the new XCUIDevice subclass XCUISiriService. Leverage this to test all of your intent phases, custom vocabularies, and even app synonyms, and to ensure your designed user experiences work as intended.

For the purposes of this tutorial, clone the tutorial project repo, open up the project in Xcode, and run it to make sure it works in your environment. With the simulator running, enable Siri by going to Settings. Summon Siri on your Simulator as you would do on your physical device and say something like “Send a message to Jane”.

Next, in Xcode, open the file titled MessagingIntentsUITests.swift and you will notice the single test case method:

Testing SiriKit with Xcode and Simulator

You can add as many intents as you want to test. Finally, go ahead and run this test case and you will observe the Simulator trigger Siri and speak out the intended command. Remember, this is not a substitute for real human testing and dealing with different accents and background noise, but it’s useful nonetheless as part of your automated workflow.

Conclusion

Designing user experiences for voice interaction is a whole new world from visual UX design. Best practices and techniques are still being discovered by the designers and developers who are pioneering this new field.

This post has given you an overview of the current best practices for conversation design UX on iOS with SiriKit. You saw some of the key principles in designing a voice interface, as well as some of the ways that you can interface with SiriKit as a dev. I hope this has inspired you to experiment with voice interfaces in your next app!

While you’re here, check out some of our other posts on cutting-edge iOS app development.

Swift

Get Started With Natural Language Processing in iOS 11

Doron Katz
Augmented Reality

Code Your First Augmented Reality App With ARKit

Vardhan Agrawal
Augmented Reality

Code a Measuring App With ARKit: Interacting and Measuring

Vardhan Agrawal
Machine Learning

Get Started With Image Recognition in Core ML

Vardhan Agrawal