More with Gruut: Use the Microsoft Bot Framework to analyze emotion with the Azure Face API

Last summer—it was 2020, I think it was summer—I published a post that showed off how to use Azure text sentiment analysis with the Microsoft Bot Framework. I used it to build on Shahed Chowdhuri’s GruutChatbot project. The gist was that Gruut would respond with I am Gruut differently based on the conveyed emotion of the text that a user enters. (As before, the project doesn’t have a legal team so we’re using the Gruut naming to avoid any issues from the Smisney Smorporation.)

Here’s a thought: since the Bot Framework allows users to send attachments, what if we send him a picture? How would Gruut react to sending images of a happy-seeming person, or an upset-seeming person? We can do this with the Azure Face API, which allows us to detect perceived emotions from faces in images. That’s what we’ll do in this post.

Note: While we’re using this service innocently, face detection is a serious topic. Make sure you understand what you are doing, and how the data is used. A good first step is to check out the Azure Cognitive Services privacy policy. For this post, we’ll be using stock images.

Before you start

In this post, we won’t walk through installing the Bot Framework SDK and Emulator, creating the bot project, and so on—that was covered in the first post on the topic. Check out that post if you need to get acclimated.

We’ll need to create a Face resource in Azure (yes, that’s what it’s called). Once you create that, grab the Endpoint (from the resource in the Azure Portal) and also a key from the Keys and Endpoint section. You can then throw those into Azure Key Vault with the other values (check out the previous post for those instructions).

Process the attachment

First, we need to process the attachment. This involves getting the location from the chat context, and downloading the image to a temporary path. Then, we’ll be ready to pass it to the Cognitive Services API. In our renamed GruutBot class, we now need to conditionally work with attachments.

Here’s how it looks. We’ll dig deeper after reviewing the code.

protected override async Task OnMessageActivityAsync(
            ITurnContext<IMessageActivity> turnContext,
            CancellationToken cancellationToken)
{
    string replyText;

    if (turnContext.Activity.Attachments is not null)
    {
      // get url from context, then download to pass along to the service
      var fileUrl = turnContext.Activity.Attachments[0].ContentUrl;
      var localFileName = Path.Combine(Path.GetTempPath(),
              turnContext.Activity.Attachments[0].Name);

      using var webClient = new WebClient();
      webClient.DownloadFile(fileUrl, localFileName);

      replyText = await _imageService.GetGruutResponse(localFileName);
    }
    else
    {
      replyText = await _textService.GetGruutResponse(turnContext.Activity.Text, cancellationToken);
    }

    await turnContext.SendActivityAsync(MessageFactory.Text(replyText, replyText), cancellationToken);
}

Let’s focus on what happens when we receive the attachment.

if (turnContext.Activity.Attachments is not null)
{
    // get url from context, then download to pass along to the service
    var fileUrl = turnContext.Activity.Attachments[0].ContentUrl;
    var localFileName = Path.Combine(Path.GetTempPath(),
    turnContext.Activity.Attachments[0].Name);

    using var webClient = new WebClient();
    webClient.DownloadFile(fileUrl, localFileName);

    replyText = await _imageService.GetGruutResponse(localFileName);
}

For now, let’s (blindly) assume we only have one attachment, and the attachment is an image. From the turnContext we can retrieve information about the sent message. In our case, we want the ContentUrl of the attachment. We’ll then store it in a temporary path, with the context’s Name. Then, using the WebClient API, we can download the file to the local temp directory. With a path set, we can now inspect our GetGruutResponse method in a new image service, which takes a path to where the image resides.

Write a new ImageAnalysisService

Because we’re now using text and analysis SDKs in this app, we should split them out into different services. Let’s begin with a new ImageAnalysisService.

To kick off a new ImageAnalysisService.cs file, we’ll read in the settings using the ASP.NET Core Options pattern with ImageApiOptions. These align with what I’ve got stored in the Azure Key Vault.

namespace GruutChatbot.Services.Options
{
    public class ImageApiOptions
    {
        public string FaceEndpoint { get; set; }
        public string FaceCredential { get; set; }
    }
}

Then, after bringing in the Microsoft.Azure.CognitiveServices.Vision.Face NuGet library, we can use constructor dependency injection to “activate” our services. Here’s the beginning of the class:

namespace GruutChatbot.Services
{
    public class ImageAnalysisService
    {
        readonly ImageApiOptions _imageApiSettings;
        readonly ILogger<ImageAnalysisService> _logger;
        readonly FaceClient _imageClient;

        public ImageAnalysisService(
            IOptions<ImageApiOptions> options,
            ILogger<ImageAnalysisService> logger)
        {
            _imageApiSettings = options.Value ?? throw new
                ArgumentNullException(nameof(options),
                "Image API options are required.");
            _logger = logger;
            _imageClient = new FaceClient(
                new ApiKeyServiceClientCredentials(
                    _imageApiSettings.FaceCredential))
            {
                Endpoint = _imageApiSettings.FaceEndpoint
            };
        }
}

Now, we’ll write a GetGruutResponse method that takes the file path to our downloaded attachment. Here’s how the method looks (we’ll dig in after):

public async Task<string> GetGruutResponse(string filePath)
{
  try
  {
    var faceAttributes = new List<FaceAttributeType?>
       { FaceAttributeType.Emotion};

    using var imageStream = File.OpenRead(filePath);

    var result = await _imageClient.Face.DetectWithStreamAsync(
              imageStream, true, false, faceAttributes);

    return GetReplyText(GetHeaviestEmotion(result));
  }
  catch (Exception ex)
  {
    _logger.LogError(ex.Message, ex);
  }
  return string.Empty;
}

First, we need to pass in a List<FaceAttributeType?>, which tells the SDK which face attributes you want back. There’s a ton of options, like FacialHair, Gender, Hair, Blur, and so on—we just need Emotion.

var faceAttributes = new List<FaceAttributeType?>
                    { FaceAttributeType.Emotion};

Then, we’ll open up a FileStream for the file in question.

using var imageStream = File.OpenRead(filePath);

To make the Cognitive Services call, we can call the DetectWithStreamAsync method. The true switch is to return a faceId, and the false is for not returning landmarks.

var result = await _imageClient.Face.DetectWithStreamAsync(
                    imageStream, true, false, faceAttributes);

When we call this, we’re going to get a list of all different emotions and their values, from 0 to 1. For us, we’d like to pick the emotion that carries the most weight. To do that, I wrote a GetHeaviestEmotion method. This is a common use case, as the SDK has a ToRankedList extension method that suits my needs.

private string GetHeaviestEmotion(IList<DetectedFace> imageList) =>
            imageList.FirstOrDefault().FaceAttributes.Emotion.
                ToRankedList().FirstOrDefault().Key;

Now, all that’s left is to figure out what to send to Gruut. I created a GruutMoods class that just has some static readonly strings for ease of use:

namespace GruutChatbot.Services
{
    public static class GruutMoods
    {
        public readonly static string PositiveGruut = "I am Gruut.";
        public readonly static string NeutralGruut = "I am Gruut?";
        public readonly static string NegativeGruut = "I AM GRUUUUUTTT!!";
        public readonly static string FailedGruut = "I.AM.GRUUUUUT";
    }
}

Now, we can use a switch expression to determine what to send back:

Note: I classified the perceived emotions based on trial-and-error using a few images. Your mileage may vary.

private static string GetReplyText(string emotion) => emotion switch
{
  "Happiness" or "Surprise" => GruutMoods.PositiveGruut,
  "Anger" or "Contempt" or "Disgust" or "Sadness" =>
           GruutMoods.NegativeGruut,
  "Neutral" or "Fear" => GruutMoods.NeutralGruut,
  _ => GruutMoods.FailedGruut
};

So now, in GetGruutResponse this is what we’ll return:

return GetReplyText(GetHeaviestEmotion(result));

Again, here’s all of GetGruutResponse for your reference:

public async Task<string> GetGruutResponse(string filePath)
{
    try
    {
      var faceAttributes = new List<FaceAttributeType?>
        { FaceAttributeType.Emotion};

      using var imageStream = File.OpenRead(filePath);

      var result = await _imageClient.Face.DetectWithStreamAsync(
                  imageStream, true, false, faceAttributes);

      return GetReplyText(GetHeaviestEmotion(result));
    }
    catch (Exception ex)
    {
      _logger.LogError(ex.Message, ex);
    }

    return string.Empty;
}

And finally, the entirety of ImageAnalysisService:

using GruutChatbot.Services.Options;
using Microsoft.Azure.CognitiveServices.Vision.Face;
using Microsoft.Azure.CognitiveServices.Vision.Face.Models;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

namespace GruutChatbot.Services
{
    public class ImageAnalysisService
    {
        readonly ImageApiOptions _imageApiSettings;
        readonly ILogger<ImageAnalysisService> _logger;
        readonly FaceClient _imageClient;

        public ImageAnalysisService(
            IOptions<ImageApiOptions> options,
            ILogger<ImageAnalysisService> logger)
        {
            _imageApiSettings = options.Value ?? throw new
                ArgumentNullException(nameof(options),
                "Image API options are required.");
            _logger = logger;
            _imageClient = new FaceClient(
                new ApiKeyServiceClientCredentials(
                    _imageApiSettings.FaceCredential))
            {
                Endpoint = _imageApiSettings.FaceEndpoint
            };
        }

        public async Task<string> GetGruutResponse(string filePath)
        {
            try
            {
                var faceAttributes = new List<FaceAttributeType?>
                    { FaceAttributeType.Emotion};

                using var imageStream = File.OpenRead(filePath);

                var result = await _imageClient.Face.DetectWithStreamAsync(
                    imageStream, true, false, faceAttributes);

                return GetReplyText(GetHeaviestEmotion(result));
            }
            catch (Exception ex)
            {
                _logger.LogError(ex.Message, ex);

            }

            return string.Empty;
        }

        private string GetHeaviestEmotion(IList<DetectedFace> imageList) =>
            imageList.FirstOrDefault().FaceAttributes.Emotion.
                ToRankedList().FirstOrDefault().Key;

        private static string GetReplyText(string emotion) => emotion switch
        {
            "Happiness" or "Surprise" => GruutMoods.PositiveGruut,
            "Anger" or "Contempt" or "Disgust" or "Sadness" =>
              GruutMoods.NegativeGruut,
            "Neutral" or "Fear" => GruutMoods.NeutralGruut,
            _ => GruutMoods.FailedGruut
        };
    }
}

Also, if you aren’t aware, you’ll need to add the service to the DI container in Startup.cs:

public void ConfigureServices(IServiceCollection services)
{
    // Create the Bot Framework Adapter with error handling enabled.
    services.AddSingleton<IBotFrameworkHttpAdapter, AdapterWithErrorHandler>();

    // Create the bot as a transient. In this case the ASP Controller is expecting an IBot.
    services.AddTransient<IBot, GruutBot>();

    // some stuff removed for brevity

    services.Configure<ImageApiOptions>(Configuration.GetSection(nameof(ImageApiOptions)));
    services.Configure<TextApiOptions>(Configuration.GetSection(nameof(TextApiOptions)));

    services.AddSingleton<ImageAnalysisService>();
    services.AddSingleton<TextAnalysisService>();    
}

Repeat with TextAnalysisService

With this in place, a refactored TextAnalysisService takes the same shape and looks like this:

using Azure;
using Azure.AI.TextAnalytics;
using GruutChatbot.Services.Options;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using System;
using System.Threading;
using System.Threading.Tasks;

namespace GruutChatbot.Services
{
    public class TextAnalysisService
    {
        readonly TextApiOptions _textApiOptions;
        readonly ILogger<TextAnalysisService> _logger;
        readonly TextAnalyticsClient _textClient;

        public TextAnalysisService(
            IOptions<TextApiOptions> options,
            ILogger<TextAnalysisService> logger)
        {
            _textApiOptions = options.Value ?? throw new
                ArgumentNullException(nameof(options),
                "Text API options are required.");
            _logger = logger;
            _textClient = new TextAnalyticsClient(
                new Uri(_textApiOptions.CognitiveServicesEndpoint),
                new AzureKeyCredential(_textApiOptions.AzureKeyCredential));
        }

        public async Task<string> GetGruutResponse(string inputText, 
                CancellationToken cancellationToken)
        {
            try
            {
                var result = await _textClient.AnalyzeSentimentAsync(
                    inputText,
                    cancellationToken: cancellationToken);

                return GetReplyText(result.Value.Sentiment);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex.Message, ex);

            }

            return string.Empty;
        }

        static string GetReplyText(TextSentiment sentiment) => sentiment switch
        {
            TextSentiment.Positive => GruutMoods.PositiveGruut,
            TextSentiment.Negative => GruutMoods.NegativeGruut,
            TextSentiment.Neutral => GruutMoods.NeutralGruut,
            _ => GruutMoods.FailedGruut
        };
    }
}

The final product

Now, with our SDK calls refactored to services, here’s our full GruutBot:

using GruutChatbot.Services;
using Microsoft.Bot.Builder;
using Microsoft.Bot.Schema;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Threading;
using System.Threading.Tasks;

namespace GruutChatbot.Bots
{
    public class GruutBot : ActivityHandler
    {
        private readonly TextAnalysisService _textService;
        private readonly ImageAnalysisService _imageService;

        public GruutBot(TextAnalysisService textService, ImageAnalysisService imageService)
        {
            _textService = textService;
            _imageService = imageService;
        }

        protected override async Task OnMessageActivityAsync(
            ITurnContext<IMessageActivity> turnContext,
            CancellationToken cancellationToken)
        {
            string replyText;

            if (turnContext.Activity.Attachments is not null)
            {
                // get url from context, then download to pass along to the service
                var fileUrl = turnContext.Activity.Attachments[0].ContentUrl;
                var localFileName = Path.Combine(Path.GetTempPath(),
                    turnContext.Activity.Attachments[0].Name);

                using var webClient = new WebClient();
                webClient.DownloadFile(fileUrl, localFileName);

                replyText = await _imageService.GetGruutResponse(localFileName);
            }
            else
            {
                replyText = await _textService.GetGruutResponse(turnContext.Activity.Text, cancellationToken);
            }

            await turnContext.SendActivityAsync(MessageFactory.Text(replyText, replyText), cancellationToken);

        }

        protected override async Task OnMembersAddedAsync(
            IList<ChannelAccount> membersAdded,
            ITurnContext<IConversationUpdateActivity> turnContext,
            CancellationToken cancellationToken)
        {
            // welcome new users
        }
    }
}

The bot in action

Now, let’s see how the bot looks when we send Gruut text and images.

Wrap up

In this post, we showed how to use Azure Cognitive Services face emotion detection with the Microsoft Bot Framework. We processed and downloaded attachments, passed them to a new ImageAnalysisService, and called the Face API to detect perceived emotion. Then, we wrote logic to send a message back to Gruut. Finally, we showed a refactoring of the existing text analysis to a new TextAnalysisService, which allows us to manage calls to different APIs seamlessly.