AWS Machine Learning Blog

Building Better Bots Using Amazon Lex (Part 1)

As Jeff Barr showed in his introductory blog post, Amazon Lex is a service that allows developers to build conversational interfaces for voice and text into applications. With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, so you can quickly and easily build sophisticated, natural language conversational bots (chatbots). Amazon Lex’s advanced deep learning technology provides automatic speech recognition (ASR), for converting speech to text, and natural language understanding (NLU), to recognize the intent of text, so you can build applications with a highly engaging user experience.

What does it take to develop a functional chatbot using Amazon Lex? If you use one of the examples in the documentation, you can start interacting with your chatbot in about a minute or two.  And although it’s a start, it’s hardly sufficient.  Let’s see what else you need to do to build a chatbot.

The basics

Chatbot design is a nascent discipline with few established norms.  Only interactions with real users can teach us what’s frustrating and what’s delightful.  We recommend that you treat this section as an exploration of design considerations, not as a guide for bot design.  Here’s what we’ve learned from the millions of interactions with Amazon Alexa.

Simon Sinek urges us to start with why:  Why does this chatbot exist?  Good design starts with a clear goal:  Who is the user, and what is that user trying to accomplish?

You also need to think about modality and medium.  A helpful voice interface should anticipate that there are times when the user isn’t paying attention or simply can’t hear what was said, and offer the ability to repeat the last prompt or handle responses like “What?” or “Where were we?” gracefully.  For a text interface, this may not be necessary.  Some text interfaces might even support response cards, with images and buttons.

For the interface designer, emphasis is an essential tool. It lays out the norms for presenting information and recommending choices. The rules for achieving clarity vary depending on the mode (web UI, text chat, or voice).  Depending on the goal and mode, the tools used for emphasis can differ. Consider these examples of how an order could be placed in each of the three modes:

 o_bots_1 Would you like me to place this order?  (yes/no) You would like me to order ____.  Is that right? 
Web (Point and Click) Chat (Text) Voice (Speech)

With a web experience, the user experience can emphasize “Place your order” to clearly call it out as the recommended choice and to remind the user that this action is a commitment on the customer’s part. In a chat experience, the convention is to list options in parentheses to let the user know what’s expected. You also can use text placement, whitespace, and capital letters for emphasis.

In a voice interaction, you replace this convention with a new norm that assures the user that she is in control of when the order will be placed. For example, you can use a confirmation when making significant changes to the order. Additionally, with voice, you might be able to control the speech rate, for example, speak slower for emphasis.  If no other means are available, you might simply use repetition.  Consider this example.

Your order total comes to $55.

Would you like me to place this order and charge $55 on the card saved with this account? 

 
  Voice – Repetition for Emphasis  

Enter CoffeeBot

Let’s say that you need a voice bot to support conversations involving uncommon words such as when you order your latté:  “May I have a triple mocha please?”  Or, if you’re comfortable ordering a bot around, “Get me a triple mocha.”

Using the Amazon Lex console, we create a custom bot.  We call it “CoffeeBot” and use an IAM role that has the appropriate permissions to invoke Amazon Lex. To learn how to do this, see the Amazon Lex Getting Started guide.

Lex terminology

As Jeff explained in his blog, Amazon Lex uses intents, slot types, and slots.  To promote reusability, intents and slot types are associated with an AWS account and can be used by multiple Amazon Lex bots.  As you make changes, you will notice that Amazon Lex automatically tracks a version for these resources so you know exactly what’s used for the particular version of the bot you’re testing.  While this might seem insignificant at first, it’s essential to support concurrent and continuous development.

Conversation flow

When starting on a new bot, it’s often helpful to simply record how people normally engage in conversation for a single type of request, analyze those requests, and then extrapolate to a larger scope of requests.  For CoffeeBot, let’s focus on how someone might order a mocha at a coffee shop.

Live Conversation Phase of Conversation Intents and Alternatives

Hi – what can I get you today?

> Oh hi!  Can I get a large non-fat mocha please?

Greetings & initial request

– Beverage type: mocha

– Beverage size: large

– Creamer: non-fat

Could start with just the drink type or another combination.

Would you like that iced?

> No thanks.

Configure

– Beverage temp:  hot

Just hot or iced?  What are the other values?

What kind of chocolate?

> Dark

Configure

– Chocolate type:  dark

All drinks don’t allow chocolate.  Only mocha?

Any whip?

> No, thanks.

Configure

– Whipped cream:  no

Are there different types of whipped cream?

Maybe change to a different drink or size?

Okay, one large single dark non-fat mocha, no whip.

> Oh, can you add a pump of toffee?

Confirm → Configure

– Flavor: 1 pump toffee

Could have just accepted here.  Two kinds of flavors?  Is there a limit?

Sure!  One large single dark non-fat mocha, no whip with 1 pump of toffee.  What’s your name?

> Jenny

Confirm → Label

– Name: Jenny

Store name as preference in app

Thanks, Jenny!  Your drink will be on the right in just a few.  That’ll be $4.17.  Can I get you anything else?

> No, thanks.

Check out Charge options for this

Out of five. Here’s your change.  Have a great day!

> Yeah – you too!

Check out → Done

 

placeOrder:  name, beverageConfig

– places back-end Order

– send email confirmation?

– earn points?

Real orders are a lot more complicated than this simple mocha example. There might be over a hundred thousand valid coffee drinks.  Natural conversations don’t follow a rigid order; the user might change her mind partway through the conversation or skip parts of the conversation.  This flexibility of flow is a hallmark of natural conversations.

Consider the case where CoffeeBot hears “double short mocha.”  Did the user ask for a “short (8 oz.) mocha with two shots of espresso” or did something get lost while the user was actually asking for a “double shot mocha” without mentioning a size?  Validating the input is still necessary. The next post in this series will cover input validation.

When designing conversational interfaces, it’s important to remain focused. Some of the complexity we noted naturally disappears when you start with an app that “knows” who the user is and can handle payment.  If it was a particularly cold or wet day, this conversation might have included the weather as a topic.  Although it might be fun to model such tangents, that might not be the best use of your time.  Focus on the development tasks that substantially enhance the user experience instead of those that are merely nice to have.

Conversation: information

Analysis like this reveals the structure of these conversations.  Although the snippets of information might be shared separately or together, in one of many possible sequences, and there might be optional add-ons for certain beverages. At the end of the day, there is a set of values that must be known in order to place an order for a mocha.  Accepting that you are catering to an essential subset of what you have discovered, and that these slots will be refined over time using a feedback loop, you start with some slot types for CoffeeBot.

Development teams will create their own conventions for making it easy to find shared resources, but we’ll use “cafe” as a prefix for ours.

Slot Type Slot Values
cafeBeverageType coffee; cappuccino; latte; mocha; chai; espresso; smoothie
cafeCreamerType two percent; skim milk; soy; almond; whole; non-fat; skim; half and half
cafeStrength single; double; triple; quad; quadruple
cafeFlavor vanilla; almond; French vanilla; caramel; hazelnut
cafeBeverageSize kids; small; medium; large; extra large; short; six ounce; eight ounce; twelve ounce; sixteen ounce; twenty ounce
cafeBeverageTemp kids; hot; iced
cafeBeverageExtras half sweet; semi sweet

Conversation: goal

Next, define a goal, known as an ‘intent’ in Amazon Lex terminology.  Although the bot might ultimately use multiple intents, let’s start with just one.  (Multiple bots can use the same intent, and potentially different versions of the same intent.)

It’s important to provide some examples of what you might expect users to actually say. Amazon Lex calls these examples “utterances.” This is important because Amazon Lex uses them to train machine learning models to recognize the right intents. This text doesn’t have to exactly match what the user says , and it’s good to start with a small set and then add permutations as needed.

Each utterance refers to one or more slots, and those slots must be defined. Each slot can also be associated with one or more prompts that Amazon Lex uses to elicit the value from the user. The Amazon Lex dialog manager keeps track of the slot values. It can also use the priority of each required slot to decide which one to prompt for next, as follows.

o_bots_2

At this point, the bot is ready to place an order for coffee. We will worry about fulfillment in our Part 2 post.

Conversation: test

Next, you build and test the bot. It’s very useful to test what you have so far because it’s easier to find and fix errors at this step than when you have a mobile app with a lot more going on. You can test it in the overlay that appears in the Amazon Lex console.

Test_bot_text

What just happened?  Although we didn’t configure this exact text as an utterance, the input text “I want a mocha” was matched to the cafeOrderBeverageIntent you created and the utterance was interpreted as I want a {BeverageType: mocha}.  Then, Amazon Lex determined that BeverageSize was required, and prompted with the default prompt for this slot, namely, What size?  tall, medium, large?.

Finally, when all required slots have been filled, Amazon Lex simply displayed the values as requested (“Return parameters to client”).

Conversation: flow

What if Lex doesn’t understand?  You can use clarification prompts in the intent editor on the console to try to elicit something different and, failing all else, exit gracefully.

o_bots_4

With this configuration, we see that Amazon Lex uses different clarification prompts up to three times and then gives up using one of the hang-up phrases.  Why multiple prompts?  It’s true that you need just one, but having multiple prompts allows Amazon Lex to choose, which maintains spontaneity.

A_bird_text

You also can configure the bot to prompt the user right before the order is placed. The confirmation prompts are listed in intent editor on the console.

Confirmation prompts don’t need to include every single slot that you have set up, especially when some slots are required only if others have been filled.  It’s a good idea to include required slots in the prompt, but you can also use a code hook to customize the prompt.  (More on that in the next post.)

o_bots_6

Notice that the confirmation prompt can include slot values.  Let’s see how that works out.

Can_I_get_text

As expected, Amazon Lex determined that the required slots had been filled and presented the confirmation prompt.  It also interpreted “nope” correctly and hung up.  The second time around, Amazon Lex correctly interpreted “yep” as the affirmative and presented values.

May_I_have_text

What does this show?  With no additional configuration, Amazon Lex correctly interpreted the user’s desire to swap the size, right from the confirmation prompt.

Conclusion

In this post, we looked at some elementary bot design decisions.  We started with some raw observations about conversation flow, narrowed in on a particular transaction, and quickly built an interactive bot with some defaults.  In the test console, we made sure that the bot behaves as expected.

In Part 2, we look at some more considerations and develop this elementary bot so that it can understand voice.

Note: The code for for Part 1 and Part 2 is located in our Github repo.

If you have questions or suggestions, please leave a comment below.


 

About the Authors

niranjanAs a Solutions Architect, Niranjan Hira is often found near a white board helping our customers assemble the right building blocks to address their business challenges.  In his spare time, he breaks things to see if he can put them back together.

 

 

harshal_pimpalkhute_100As a Product Manager on the Amazon Lex team, Harshal Pimpalkhute spends his time trying to get machines to engage (nicely) with humans.