Smart Speaker Parity: One Speaker to Beat Them All

Project SayDo brings your favorite features from Google Home, Amazon Alexa, and Siri to OpenHome! Winner of the “Best Capability” Prize, this project enables the community to immediately leverage highly...

Author: Lydia You

Project SayDo brings your favorite features from Google Home, Amazon Alexa, and Siri to OpenHome! Winner of the “Best Capability” Prize, this project enables the community to immediately leverage highly useful features on OpenHome. 

Introducing Browser, or Project SayDo, the all-in-one capabilities bundle. Browser enables OpenHome to:

  • Look up directions on Google Maps

    • Including functionality for choice of transport (walk, public transit, car)

  • Call/text someone from your contacts

  • Open an app on your phone

  • Play/search music on Spotify

  • Search movies on Netflix

  • Search/play videos on YouTube

  • Set a timer

Most people only use Amazon Alexa or Google Home to control the lights or play music. The full suite of capabilities enables users to completely replace their existing home speakers with OpenHome SpeakerOS. These capabilities in Browser represent some of the most practical and most-used smart speaker functions, and marks a huge milestone in the development of a voice-first OS.

Demo

About the Developer

Diego is the founder of Domo.ai, a company building voice assistants for wearables, based out of San Francisco. His background is in convolutional neural networks. Congrats Diego for winning the “Best Capability” prize at the Building Voice Experiences Hackathon!

About the Project

Diego’s goal was to use OpenHome’s SDK to hack together a smart speaker with the same functionality as the existing smart speakers on the market, namely Google Home and Amazon Alexa. During the hackathon, which lasted about 8 hours, he was able to implement several capabilities, including making calls, sending texts, getting directions on Google Maps, and searching for YouTube videos (see his methodology below).

Methodology: Capability 101

Each capability has the following components:

  • Definition: Define the keywords that will trigger the capability to be executed.

  • Prompting: The program will guide the user through a series of prompts to get the information it needs to execute the function.

  • Execution: The program will call the relevant API with the information the user requested.

Below is the code for how to implement the “search song on Spotify” capability:

Definition

Python

class SpotifySearchCapability(Capability):
	@classmethod
	def register_capability(cls):
	return cls(unique_name="spotify_music_search", hotwords=["play some music", "play some vibes"])

Each capability starts with defining a class for that specific capability. Within the class definition, the developer will specify “`hotwords“`, or keywords that OpenHome will listen to in order to trigger the capability.

Prompt Scripting

Python

def call(self, agent):
        initial_message = "Tell me the mood or type of music you're interested in."
        agent.speak(response=initial_message)

        music_request = agent.listen().strip()

        if not music_request:
            agent.speak(response="I didn't catch that. Could you please repeat?")
            return "Failed to get music request."

prompt = "Produce the name of one song, just and only one song, according to what the user wants, and do not say anything else, just the name of the song: \n\n" + music_request
        suggestion = text_to_text(music_request)

        if not suggestion:
            agent.speak(response="I couldn't find any good matches for your request.")
            return "No suggestions generated."

The capability will then prompt the user to provide more information about their song request. If the program detects the user is asking for a specific song, it will send a call to the LLM asking it to intelligently interpret the user’s request, and return the name of the song they are asking for.

API Integration

Python

  # Construct search query from suggestions. Here we just pick the first suggestion.
        search_query = suggestion
        search_url = f"https://open.spotify.com/search/{urllib.parse.quote_plus(search_query)}"

        try:
            # Attempt to open the Spotify search results
            webbrowser.open(search_url, new=2)
            logging.info(f"Searched Spotify for: {search_query}")
            agent.speak(response=f"Trying to play {search_query} on Spotify.")
            return f"Searched for {search_query} on Spotify."

        except Exception as e:
            logging.error(f"Failed to search Spotify for {search_query}: {str(e)}")
            agent.speak(response="Failed to search on Spotify. Please try again.")
            return "Failed to initiate Spotify search."

Once the song name is identified, it is a simple matter of calling the Spotify API and searching for the song. Browser’s capabilities emphasize opening web browsers to execute the function, so Diego’s implementation opens a new browser searching for the song on Spotify.