Introducing MetaCap: Live Emotional Insights Come to OpenHome

We’re thrilled to showcase a community project built by members of the OpenHome developer community! At the heart of OpenHome’s mission is the vision to transform your smart speaker from a...

Author: Jesse Leimgruber

We’re thrilled to showcase a community project built by members of the OpenHome developer community!

At the heart of OpenHome’s mission is the vision to transform your smart speaker from a mere AI Assistant into a true companion, one that adapts, learns, and grows alongside you. Today, we’re thrilled to unveil one of our most exciting community contributions to date: MetaCap, a project that gives OpenHome the power to sense emotions from voice.

About the Project

MetaCap serves as the foundation for an even more remarkable capability: Empathy Evolver.

MetaCap is a community-developed capability designed to take OpenHome’s interactive experience to new heights. This innovative feature enhances the smart speaker’s ability to capture live audio during streams, immediately after a user-initiated pause, and seamlessly transmits this audio plug and analysis to any agent within the network.

Original Contributions:

  • A new plugin that adds further modularity to the OpenHome SDK. Transmit audio files to different parts of the program.

  • Detect emotion from a .wav file using best-in-class sentiment analysis models.

  • Empathy Evolver capability that responds empathetically to user emotion.

Demo

About the Developers

The innovative team behind MetaCap emerged during the OpenHome AI Voice Experiences Hackathon and consists of:

  • Dylan Iskandar from Stanford University

  • Paritosh Kulkarni from Columbia University

  • Nils Andre from Cambridge University

Empathy Evolver: Beyond Listening

Leveraging the power of MetaCap, the Empathy Evolver capability harnesses the live captured audio, utilizing the plugin to process .wav files and extract emotional nuances from the user’s voice. This breakthrough allows OpenHome to not just hear but understand the user’s emotional state, enabling responses that are truly empathetic and tailored to the moment.

The following code snippet shows a basic emotion detecting capability that uses a best-in-class wav2vec emotion recognition model to detection emotion in the user’s audio input.


pipe = pipeline("audio-classification", model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")


class EmotionCapability(BaseCapability):
    @classmethod
    def register_capability(cls) -> "EmotionCapability":
        return cls(
            unique_name="emotion_capability",
            config_help_text="Gets the emotion",
        )

    def call(
        self,
        msg: str,
        agent,
        text_respond: SynchronousTTT,
        speak_respond: None,
        audio: str,
        meta: dict[str, Any],
    ):
        meta["emotion_capability"] = pipe(audio)

A New Personality: Inspired by Samantha from “Her”

To demonstrate the full potential of MetaCap and the Empathy Evolver, the team crafted an original personality inspired by the beloved character Samantha from the movie “Her.” This addition makes OpenHome come alive, transforming the application from a home assistant to an AI partner. With Samantha’s influence, OpenHome now embodies warmth, understanding, and a deep connection, setting a new standard for what smart speakers can be.