Conversation & Document Summarisation Have Been Added To Microsoft Cognitive Services
Microsoft Cognitive Services have been known for STT, TTS and the NLU prowess of LUIS. Of late it seems like Microsoft is expanding into the area of Large Language Models. One example if this is their recently announced summarisation feature.
Introduction
On 24 May 2022 Microsoft introduced summarisation redaction for conversations and documents.
Summarisation is now one of the features offered by Azure Cognitive Service for Language.
Both document and conversation summarisation can be implemented using chat logs and speech transcripts, documents or any snippets of text which require redaction. Microsoft sees this summarisation tool as a ready-for-use interface to enable end-to-end analysis of speech conversations, from audio to transcript to insights and more. Considering LLM functionality at large, summarisation is part and parcel of the current offering one could argue.
Certain functionality has become synonymous with Large Language Models (LLM’s), as listed below. The list of five implementations or functionality groups are represented across the commonly known LLM’s like OpenAI, Cohere, AI21labs, etc.
These LLM’s have playgrounds and API’s. What has been missing is a studio or no-code interface to leverage these LLM’s. HumanFirst is leading the way with their POC integration to Cohere and NVIDIA has also alluded to a similar approach for ease of access to LLM’s.
Generation and Summarisation
Most of the LLM providers follow the same basic approach, where Generation is one of a handful of LLM functions. These functions or groupings might be few, but are exceptionally powerful and are only found in the domain of LLM.
By making use of casting, the Generation API can be leveraged to generate text based on the casting. Casting is the combination of training instructions and the text to be used as reference. The training instructions can be seen as a type of few shot training.
Below is a summarisation example from OpenAI’s playground, where the instruction (cast) is given: “Summarise the following:”, followed by the text to perform the summarisation on.
Back To Microsoft
Microsoft has two specific API’s for summarisation, one aimed at conversations between customers and service representatives, and one to summarise documents.
According to Microsoft, these two features can operate on both chat logs and speech transcripts, allowing seamless integration with Microsoft’s Cognitive Service for Speech. An example of a ready-for-use solution is Ingestion Client, which enables end-to-end analysis of speech conversations, from audio to transcript to insights.
This first release is trained with GPT-3, focussing on the needs of customer support and call centres.
What makes Microsoft’s approach different is that they have specific API’s and specific use-cases in mind. The summarisation API will form part of their CCAI strategy as an agent-assist feature.
According to Microsoft:
Customer support agents typically spend 5–15 minutes writing notes when wrapping up each call or text chat, or when they transfer a case to the next level of support. This considerable time and effort significantly slow down the time to resolution. Our new feature automatically generates a summary of issues and resolutions from a two-party conversation, especially between a customer and an agent, which can greatly reduce case handling time, increase agents’ job satisfaction, sustain high customer engagement, improve customer experiences, as a result boost customer loyalty. Built with this API offering in Azure Cognitive Services, Dynamics 365 Customer Service now enables this capability out-of-box for their customers
Demo Time
Document Summarisation
Starting with the document summarisation…
In order to make use of the API, you will need to create an Azure resource. This can be done using free credits/trial period prior to going on to pay-as-you-go. But you will need to enter your credit card details.
Once you have created the Azure resource, click on Keys and Endpoint to view the endpoint you will be using to access the API and the access keys.
Microsoft supplies the CURL commands which can be edited with your resource endpoint and access key.
As seen below, once the command is sent, an apim-request-id code is returned.
This is the subsequent API call to retrieve the document summarisation using the request ID, which was returned by the previous request. Hence, seemingly this is an asynchronous/batch approach process.
The input API code:
curl -i -X POST https://<your-language-resource-endpoint>/text/analytics/v3.2-preview.1/analyze \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-language-resource-key>" \
-d \
'
{
"analysisInput": {
"documents": [
{
"language": "en",
"id": "1",
"text": "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI Cognitive Services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, there’s magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multi-sensory and multilingual learning that is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
}
]
},
"tasks": {
"extractiveSummarizationTasks": [
{
"parameters": {
"model-version": "latest",
"sentenceCount": 3,
"sortBy": "Offset"
}
}
]
}
}
'
And requesting the results…
curl -X GET https://<your-language-resource-endpoint>/text/analytics/v3.2-preview.1/analyze/jobs/my-job-id \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-language-resource-key>"
The full JSON response…
Conversation Summarisation
Below the test conversational data supplied by Microsoft…
Agent: “Hello, you’re chatting with Rene. How may I help you?”Customer: “Hi, I tried to set up wifi connection for Smart Brew 300 espresso machine, but it didn’t work.”Agent: “I’m sorry to hear that. Let’s see what we can do to fix this issue. Could you push the wifi connection button, hold for 3 seconds, then let me know if the power light is slowly blinking?”Customer: “Yes, I pushed the wifi connection button, and now the power light is slowly blinking.”Agent: “Great. Thank you! Now, please check in your Contoso Coffee app. Does it prompt to ask you to connect with the machine?”Customer: “No. Nothing happened.”Agent: “I see. Thanks. Let’s try if a factory reset can solve the issue. Could you please press and hold the center button for 5 seconds to start the factory reset.”Customer: “I’ve tried the factory reset and followed the above steps again, but it still didn’t work.”Agent: “I’m very sorry to hear that. Let me see if there’s another way to fix the issue. Please hold on for a minute.”
And the result view from the Microsoft documentation:
Unfortunately conversation summarisation is currently a gated public preview feature for which you need to apply and there is a waiting period of 10 days.
Conclusion
With regard to LLM’s there are a few things happening…three of note…the first is that there are a handful of companies specialising in Large Language Models in varying degrees. As mentioned at the onset of this article, these are OpenAI, Cohere, AI21labs, etc.
Secondly, there are models being open-sourced which are democratising access to large language models.
Thirdly, the traditional cloud platforms are looking at integrating LLM’s into their products. Making it easier for organisation to include documents into their search and knowledge bases. This approach makes the integration of LLM’s seamless and LLM’s acts as a supporting feature, disappearing in the background.
Often the question has been asked, what is the real-world production value of LLM’s. Seemingly Microsoft is trying to change this, with specific use-case based API’s and positioning these LLM implementations as supporting technology in orchestrating initiatives like CCAI.