Let’s Chat: Making Sonos Talk with the audioClip API

Last week we brought out an early, experimental version of audioClip for everyone to try. Using this namespace developers can make Sonos speakers play short clips of sound that won’t end whatever was playing on that speaker at the time. We’ve heard a lot of folks asking for this capability over the past few years and we’re glad to finally be bringing it to you. Some small details may change and there are still some things that we need to implement, but this is a great preview of what the namespace is capable of.

Of course, what good is a new API namespace if we don’t build fun things with it? Personally I’ve always wanted the ability to have my Sonos speak to me. You can imagine the uses: tell your kids on the third floor to come down for dinner. Announce the score of the football game. Say who that latest email is from. Really, the sky’s the limit. So let’s use Google Translate’s free Text-to-Speech (TTS) API, along with this new audioClip namespace, and build ourselves a browser-based Sonos TTS experience.

Getting Started

We’ll need a few things to get started:

  • A set-up and configured Sonos system, obviously. Make sure you can play some content on it.
  • The username and password associated with that Sonos system.
  • A Control Integration, complete with a Client Key and Secret, with the redirect uri set to ‘http://localhost:3001/redirect’.
  • A machine capable of running a node version that supports at least ES7. I’m using node version 8.6.0 on my Mac.
  • Some topical music. Let’s do the obvious thing and put on some Talking Heads.

We’ll also be using a few external npm packages to help us with some app infrastructure. These are things that aren’t important to learn about for this blog post, but that are still required for our final app to run:

  • google-tts-api: This puts a nice, neat, promise-ready wrapper around the Google Text-to-Speech API.
  • simple-oauth2: A handy package to simplify the process of getting and refreshing access tokens.
  • node-persist: Mimics the HTML5 localStorage API used in browsers, so it’s pretty easily understood.

These packages are all installed automatically when we execute our npm install command in the next step.

Preparing the App

In a directory of your choice clone the github repo and cd into the newly-created directory. Type npm install and wait for it to run through the install process. Next, copy the .env.sample file into a new file called simply .env. Edit this new file and fill in your Sonos client id and client secret, obtained from the developer portal. After this you should have everything set up.

I built this app using React to drive the front-end. React is pretty new to me and I’ve really enjoyed learning about how to put such an app together. I had coincidentally just read this blog post by Phil Nash over at the Twilio blog. (As an aside, you should put that blog on your RSS reader of choice. Consistently great content.) The app structure Phil lays out here seemed to meet all of the needs I anticipated for my app. I cloned his repo and used that as a base for Sonos TTS.

App Architecture

The app we’re building here today consists of a front-end App, built in React, and a back end server. The React app makes calls to the back-end server to get data needed for the front-end UX. It also sends text to the back-end to speak. The back end interfaces with the Sonos auth and API servers. It keeps track of the access tokens that get generated during auth. The front-end app has no real idea that it’s working with Sonos. All the “smarts” are consolidated in the back end.

 

We should note that the back-end server we’re building here is completely unsecured. It shouldn’t be run anywhere except on your local machine. It’ll have access to your Sonos household and will store OAuth 2.0 access and refresh tokens.

Running the App

As Phil notes in his blog post you can choose to run this app as separate back end server and front-end processes. This is useful in the case where you do plan to run the server and front-end on different machines or instances. We’re going to run everything locally, so we’ll take advantage of the script Phil made to run both server and front-end simultaneously. Type npm run dev and wait for things to spin up. Your browser should automatically be brought to the foreground and the app will start up.

If this is your first time running the app you’ll immediately be redirected to the Sonos auth servers to log in to your Sonos account. Once you’ve done so you’ll be sent back to the main app screen.

 

In the screen above you can see we’re presented with a list of speakers in our household and a box in which to type the phrase we want the Sonos to say. Go ahead and pick some speakers, type something (might I suggest “Sonos speakers sound great!”?) and see what happens. Hopefully, your Sonos just talked to you.

There are a few things to note here:

  • If you’ve got multiple households associated with your Sonos account, you’ll have an extra select list so you can choose which household to target. You can have multiple households if, for example, you’ve got Sonos set up at both your primary residence and vacation home.
  • Remember above where I said that the audioClip namespace is still experimental? Well, one of the things that isn’t fully baked yet is a capability flag, called AUDIO_CLIP. Using this flag a player indicates its ability to actually play audio clips. Until that flag is available this app will just list all speakers. If the user selects a speaker that can’t play audio clips, the app will return an error. At the time of this writing only the Sonos One and Beam support audio clips.

Now that we’ve built and run the app let’s dig into the details to see how we did it.

Authorizing the App with Sonos

There are a few interesting parts of the code to look at. First let’s examine how we set up simple-oauth2 to work with Sonos. There are two main things we need to configure: the various auth endpoints and API keys and secrets, and the redirect handler. (For a quick refresher on authenticating against Sonos’ OAuth2.0 server, see our docs.)

Luckily for us, simple-oauth2 makes this all, well, simple. They provide a nice set of convenience methods for defining the OAuth2.0 parameters and for providing the authorization URLs.

That second constant, the authorizationUri, is built by the authorizeURL method. It’s really handy because it encapsulates everything that’s important in the initial call to the authorization code endpoint. So a simple redirect to authorizationUri is all that’s needed to kick off the auth flow.

At this point the user is sent to the Sonos authorization site. They’ll log in to their account and read about the permissions your app is asking for. After having granted those permissions, they’re sent back to the redirect URI. That URI was specified when the Control Integration was built on the dev portal, and is handled by our app. The handler for that URI takes the authorization code and exchanges it for an access token via the Sonos auth endpoints.

You can see above the simple-oauth2 method getToken which takes care of all the behind-the-scenes stuff for us. Everything is nice and straightforward since we configured simple-oauth2 at the beginning to plug directly in to Sonos’ auth server. We get the token back and save it, using node-persist, to local storage. That way we don’t have to ask the user to log in every time we restart the app. Now obviously local storage is not how you’d want to persist access tokens in a production app, but this simple method works for our purposes.

Talk To Me

Ok, we’ve got our access token and can now make calls to the Sonos Control API. You’ll note that in the code above, once a token is successfully fetched and saved, we send the user back to localhost:3000 which is the URL for our main app. They’ll see the main app screen, shown above. Behind the scenes the app has called our /households endpoint which gets a list of Sonos households associated with the authorized account:

You can see in the code above that we’ve inserted an Authorization header with our recently-fetched access token.

It’s important to note that the /households endpoint that the front-end app is calling is not the Sonos Control API command, but is to the back-end server we’ve built. Remember, the front-end app doesn’t know anything about Sonos. The back end is taking care of all the calls to Sonos as well as handling all authentication.

I’ve built a little bit of UX goodness in to the app. If there’s only one household available for the account, the household select list is not displayed. This is the case for the vast majority of accounts out there. Once the user picks a household (or the single household has been automatically selected), the app calls our /clipCapableSpeakers endpoint. Again, /clipCapableSpeakers is a custom endpoint we’ve built on our back-end server.

After making a GET /groups call to the Sonos Control API, our back-end sorts through the resulting list of players provided in the response. Normally we’d only select those players that have the AUDIO_CLIP capability flag. However, at the time of this writing, that flag has not been implemented. We’ll return all the players and let the user decide which will work with audio clips.

Now the user can select a speaker and in the text box below type something for the speaker to say. After hitting submit, we finally call our custom /speakText endpoint on our back end. The handler for this endpoint receives the text to speak and the selected speaker id. The first thing it does is call the google tts service to turn that speech text into a URL that will play the spoken text:

We take the returned URL, add it to our request body, and make a POST to /audioClip on the Sonos Control API.

If everything went well here, the user’s speaker just spoke their typed text. Imagine the possibilities!

Wrap Up

We did a few cool things here: we built a simple React app, We actually implemented authorization against the Sonos servers, and we made a few calls to the Sonos Control API. The end result of all this work is that now our speakers can talk to us.

A really neat next step here would be to secure the back-end server side of this code, put it in the cloud, and have your own private Sonos TTS service. You could hook up any kind of front-end to that you want. Maybe some IFTTT Webhooks? That’d go a long way towards implementing the “announce the football score” scenario I noted at the beginning of this post.

Again, you’ll find everything you need at the github repo. Head over and check it out.

Thanks for reading this post, and building a basic Sonos TTS app with us. We’re really excited about all the things developers and partners will do with audioClips.

– Matt Welch – Principal Developer Advocate
Currently listening to C’est La Vie No.2 by Phosphorescent

New APIs: audioClip & playlists

With the October 9th release of Sonos version 9.2, we’re excited to bring you two new namespaces with new APIs:audioClip and playlists. See Release notes for Sonos software for customer-facing features.


Audio clip

You can now play short audio clips on Sonos speakers. Check out the audioClip namespace for details.

This namespace is experimental. Some of the functionality may not yet work as documented. For example, the AUDIO_CLIP capability in the player object has not yet been released, so you won’t be able to tell which players support audio clips and which do not. Currently, the only players that support audio clips are the Sonos One and the Beam. See the groups object for details about capabilities.

We’ll let you know when we’ve added more functionality. Keep an eye on this blog for details.


Playlists

Your app or hardware integration can now get the list of Sonos playlists in a household and load one to start playback. Sonos activates the chosen playlist in the default playback session and starts playback, as if the user selected the playlist using the Sonos app. See the playlists namespace for details.


We look forward to seeing these new APIs in your apps and hardware integrations!

How Sonos Does Sound: An Introduction to Our Sound Experience Guidelines

In creating Sonos products for the last 16 years, we’ve thought a lot about how sound is experienced in the home. In fact, the company was founded on the desire to create a better home-listening experience for our customers: we thought about the experiences we wanted to create first, and then designed the hardware and software to enable them. Our platform is now open, and our team has been hard at work creating tools, guidelines, and API documentation that will help our partners build great experiences to benefit our mutual customers.

Our designs at Sonos have been informed by making prototypes and products and seeing how people interact with them to enjoy music, radio, podcasts, and other types of content in the home. We’ve learned about how they want to choose content; how they want to play different songs in different rooms or the same song in multiple rooms; and how they want to adjust the output volume in one or more rooms. To help give you a head-start with your integration, we’ve gathered many of our findings into the Sonos Sound Experience Guidelines and made them available on our portal. The guidelines are non-technical and focus on user experience rather than APIs and technical know-how. They are intended for a wide audience of designers, product managers, business owners, and developers.

Over the years, we’ve also seen how people use different interfaces to interact with Sonos, such as voice, screen-UI, gestures, and physical controls. We’ve learned that the use of one interface type over another isn’t based purely on user preference but can change during the course of a day. The types of interface a user chooses might be based on convenience, the number of people listening (or watching), the type of content being played, and the activities of people in the house. People expect these different interfaces to work in concert, seamlessly and without technical hiccup. We call this seamless interaction continuity of control and we talk about it in our guidelines. We really want our partners to build on this continuity of control, and we believe that our own Sonos products are just the start.

Obviously, some pretty elaborate technology is needed to enable multi-room interaction and to ensure a seamless experience for our users. However, we believe it is essential to remember that people want to focus on their lives, their family, and their friends, and not be distracted by technology. It is therefore critical that technical complexity does not translate into difficult, time-consuming, or frustrating experiences. To this end, we’ve created a chapter in our guidelines that discusses some key principles of designing with simplicity in mind. We use these principles ourselves and encourage you to consider them for your integration.


People want to enjoy their lives, not be distracted by technology

 

We think you’ll find the Sound Experience Guidelines useful and informative, and we’re looking forward to seeing what you create. We’re hoping to see a few products and features that we’ve considered at Sonos, but more importantly, we’re excited about seeing partner implementations we’ve never imagined. Good luck with your integration, have fun with the Sonos platform, and let us know if you have any feedback.

Regards,
Rob Lambourne
Distinguished Designer, Sonos Platform Team

Currently listening to: Big Red Machine
First concert: The Wedding Present, Derby, UK