Simple Python Hello World with Alexa

I’ve recently got myself an Amazon Echo Dot; Amazon’s speech-based, voice-controlled device using Amazon’s Alexa, digital assistant. They’re pretty nice devices; and they’re quite fun to play with (and not too expensive either) – though I’ve not had it long enough to say if it’s actually useful yet…

Obviously the first thing that I did with it was to try to write some code for it. (Actually that’s not quite true – the actual first thing that I did with it was to run through all of the many, funny and often very geeky, easter eggs that have been baked into the system. I think my personal favourite is “Alexa who is the mother of Dragons?” – closely followed by “Alexa: Tea. Earl Grey. Hot.”). But I digress…

Coding for Alexa is actually pretty easy; and appealingly for an old-school back-end programmer like me: there are no visual user-interfaces to worry about. It’s all done with simple JSON; in the form of a pretty simple web application. Perhaps more interestingly there’s no need to worry about the server for that application – since it’s ideally suited for using Amazon’s AWS Lambda Functions.

If you’ve not seen Lambda before – it’s perhaps the best known example of one of the latest technologies: also known as server-less computing or Function-as-a-Service. Essentially it’s an easy way to write a function and have it executable: without the need to write the rest of the web application that you’d typically need to support it. The main difference is that these Lambda Functions are meant to be automatically triggered when some kind of event occurs (for example data being uploaded to a database); and one such suitable trigger is Alexa…

Amazon have some pretty thorough tutorial material on their Alexa Developers Site; including some example apps (or skills as they’re more correctly called). Interestingly many of the skills currently available in the skills store are more or less exact clones of the demo projects: but there we are.

If you want to use an AWS Lambda Function to power you skill (and you don’t have to – a skill can be powered by any HTTPS addressable web service) your choices are to write that in either Node.js, or Python. Essentially using Lambda is free unless you happen upon a hit – as the first million requests / month are free: so unless you have a strong desire to use another hosting source, you might as well use Lambda.

Anyway although there are (as I said) quite a few tutorials out there – I could find any really good (basic) Python examples. So after figuring it out for myself (based on the ColorPicker example in the “Develop a skill in under 5 minutes” tutorial – which does what it says: although it doesn’t actually explain anything as it goes along): here’s my version of a hello world skill.

The first thing to do is to go to the Amazon developers portal, sign up (or sign in), pick Alexa, and chose Alexa Skills Kit.

Alexa Skills Kit

From there, click through to add a new skill.

Our simplest example will be a “custom interaction model”; and we need to give a name and an invocation name. The name will be displayed in various places so make it something as descriptive as you can in 50-characters; and the invocation name will be what you have to say every time to make the thing work: so pick something as simple as possible…

Since this is a hello world skill – we’ll ask Alexa to say hello, and have her reply with a custom reply. At the risk of being unoriginal we’ll call the skill “Hello” – and we’ll invoke it with the name “hello world”.

Now before we can write any actual code we first need to think about the Interaction Model. This is a three-part description of all of the possible interaction with the skill. The Intent Schema is some JSON that you use to define points of interaction with your skill. For our basic skill – we’ll only have one of these: representing the point where we ask Hello World to say hello… The Sample Utterances define all of the phrases you can use to trigger that intent; and the Custom Slots are used to represent any skill specific sets of data that you want Alexa to understand. For example the colours in the ColourPicker skill.

Since we want minimal interaction for this very simple skill we don’t need to worry about slots at all – and we just need to define an intent and some utterances. We could do this by handwriting the JSON for the Intent Schema – but it’s easier to use the Skill Builder.

From here we can see the three pre-defined (and required) skills (Cancel, Help & Stop). We’ll add our own custom “greeting” intent; and then provide as many sample utterances as we can think of to represent how the user might want to invoke the interaction.

Because our skill is invoked with the name “hello world” we’ll need to say: “Alexa open hello world” to start things off… Then to trigger the greeting we’ll ask Alexa to “say hello”. We’ll also let people use the phrase “say hi”; and we could add any others that we wanted to use too.

Or we could shortcut that by saying: “Alexa ask hello world to say hello”: which does the same thing and triggers our new greeting intent.

Now that we’ve done that – we can move onto writing our code. As I said I’ll use the Amazon Lambda service to host it as that’s the simplest way to go. To create a Lambda function go to https://aws.amazon.com, sign in (or up) and choose Lambda. Then create a new Lambda Function.

There are a number of Blueprints to choose from – if you filter by “Alexa” then you’ll see the color-expert-py function that I used as the inspiration for my code.

If we use that as the basis of our new function – it pre-configures the Alexa trigger; if we go with a blank function then we simply have to do that ourselves on the next page.

Lastly we need to give it a name (and description), set the runtime to Python 2.7 (Python 3.x isn’t supported yet for Alexa, as I understand it); and set the role to lambda_basic_execution.

Now on to the code!

The whole thing weighs in at a little less than 150-lines.

# This attempts to be (more or less) the simplest possible hello world Alexa skill...

from __future__ import print_function

# We'll start with a couple of globals...
CardTitlePrefix = "Greeting"

# --------------- Helpers that build all of the responses ----------------------

def build_speechlet_response(title, output, reprompt_text, should_end_session):
    """
    Build a speechlet JSON representation of the title, output text, 
    reprompt text & end of session
    """

    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': CardTitlePrefix + " - " + title,
            'content': output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session
    }


def build_response(session_attributes, speechlet_response):
    """
    Build the full response JSON from the speechlet response
    """
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }


# --------------- Functions that control the skill's behavior ------------------

def get_welcome_response():
    session_attributes = {}
    card_title = "Hello"
    speech_output = "Welcome to the Hello World demonstration... Ask me to say hello."
    # If the user either does not reply to the welcome message or says something
    # that is not understood, they will be prompted again with this text.
    reprompt_text = "I'm sorry - I didn't understand. You should ask me to say hello..."
    should_end_session = False
    return build_response(session_attributes, build_speechlet_response(card_title, speech_output, reprompt_text, should_end_session))


def handle_session_end_request():
    card_title = "Session Ended"
    speech_output = "Have a nice day! "
    # Setting this to true ends the session and exits the skill.
    should_end_session = True
    return build_response({}, build_speechlet_response(
        card_title, speech_output, None, should_end_session))

def say_hello():
    """
    Return a suitable greeting...
    """
    card_title = "Greeting Message"
    greeting_string = "Hello! Here's a computer tongue-twister: Ploughs might cough, but power mowers are thoroughly tough though"
    return build_response({}, build_speechlet_response(card_title, greeting_string, "Ask me to say hello...", True))

# --------------- Events ------------------

def on_session_started(session_started_request, session):
    """ Called when the session starts """

    print("on_session_started requestId=" + session_started_request['requestId']
          + ", sessionId=" + session['sessionId'])


def on_launch(launch_request, session):
    """ Called when the user launches the skill without specifying what they want """

    print("on_launch requestId=" + launch_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    # Dispatch to your skill's launch
    return get_welcome_response()


def on_intent(intent_request, session):
    """ Called when the user specifies an intent for this skill """

    print("on_intent requestId=" + intent_request['requestId'] +
          ", sessionId=" + session['sessionId'])

    intent = intent_request['intent']
    intent_name = intent_request['intent']['name']

    # Dispatch to your skill's intent handlers
    if intent_name == "greeting":
        return say_hello()
    elif intent_name == "AMAZON.HelpIntent":
        return get_welcome_response()
    elif intent_name == "AMAZON.CancelIntent" or intent_name == "AMAZON.StopIntent":
        return handle_session_end_request()
    else:
        raise ValueError("Invalid intent")


def on_session_ended(session_ended_request, session):
    """ Called when the user ends the session. Is not called when the skill returns should_end_session=true """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])

# --------------- Main handler ------------------

def lambda_handler(event, context):
    """ Route the incoming request based on type (LaunchRequest, IntentRequest,
    etc.) The JSON body of the request is provided in the event parameter.
    """
    print("event.session.application.applicationId=" +
          event['session']['application']['applicationId'])


    if event['session']['new']:
        on_session_started({'requestId': event['request']['requestId']},
                           event['session'])

    if event['request']['type'] == "LaunchRequest":
        return on_launch(event['request'], event['session'])
    elif event['request']['type'] == "IntentRequest":
        return on_intent(event['request'], event['session'])
    elif event['request']['type'] == "SessionEndedRequest":
        return on_session_ended(event['request'], event['session'])

Let’s take a closer look at that.

First of all we have two functions shamelessly borrowed from the example. These actionably compile the JSON that Alexa will use from the text we supply; and provide a “card” (a visual feedback that’s displayed in the companion app on your phone). I’ve tidied the display of the card up slightly over the examples to improve the look.

Example cards in the Alexa companion app on iOS.The card is optional so we could omit that part; and if we don’t need to use these specific functions to compile the JSON if we wanted to do it another way: but this seems like a nice, easy, way to do that.

Next we have the three functions that actually do the work. As you’ll see later in the code, these are triggered based on the specific intent that’s invoked. We have one to welcome us when we start the app with “Alexa open hello world”, one to close out when we say cancel or stop; and one that triggers on our greeting.As you can see, in each case we just assemble some strings with the text to be read by Alexa – and pass to the JSON builder functions.

Lastly we have the event handlers for session_started (which we’re not really using except for debug), on_launch (which triggers the initial message), on_session_ended (which again we’re not using); and most importantly on_intent – which is where we trigger the functions defined previously. Note that here we need to use the name of the intent that we defined previously for our “greeting”…

The very final part is the lambda_handler which defines the interface between Alexa and Lambda – and actually calls the event handler functions. This part can be treated as boiler plate though – so we don’t need to worry about it.

So really our custom code is (more or less) just the one function:

def say_hello():
    """
    Return a suitable greeting...
    """
    card_title = "Greeting Message"
    greeting_string = "Hello! Here's a computer tongue-twister: Ploughs might cough, but power mowers are thoroughly tough though"
    return build_response({}, build_speechlet_response(card_title, greeting_string, "Ask me to say hello...", True))

This could obviously be any greeting we like; mine is inspired by an old (1985?) episode of the BBC’s Micro Live that happened to have been watching on YouTube previously… (Needless to say Alexa manage much better than the BBC B did! 🙂).

After we’ve saved everything in AWS all we need to do is hook our skill up to this Lambda Function via the arn: identifier that we have. From there we can test on our real device – and in the text-based simulator (with optional voice playback). If we really wanted to we could then also send the skill off to be approved for publication. There are some quite exacting requirements on the user-interface to pass this approval process so if you do want to publish your skill it is well worth checking the documentation.

And that’s all there is to it. Now you understand the basics there are lots of other tutorials to work through to create an actually useful skill; but this one will at least give you enough to get started. Have fun!

Leave a Reply

Your email address will not be published. Required fields are marked *