Using DialoGPT For Conversational Response Generation
There has been renewed focus on expediting chatbot development. Attempts to fast-track conversational AI development has included bootstrapping using LLM’s, QnA, Search, Knowledge Bases, etc. This article looks at how conversational Response Generation can be used to the same end.
Introduction
Even The Gartner® Critical Capabilities for Enterprise Conversational AI Platforms Assessment focussed on 14 critical capabilities and the available pre-sets.
These pre-sets can range from pre-defined sets of intents for different industries (banking, health care, insurance, etc.). Boost.ai is a case in point of a company delivering on large volumes of pre-defined intents.
Chatbot development frameworks also have pre-set application constituted by intents, entities, flows, etc. All these efforts are an attempt to get up and running with a conversational interface as soon as possible.
Another avenue of fast-tracking chatbot development is generation (generative models).
Generative Models
Generative models can be used in a number of ways:
- Question and Answering
- General conversational generation
- Contextual generation
- And more…
In a previous post I explained the idea of casting, and how this principle of casting can leverage Generation for contextual conversations, question answering, etc.
The example above based on OpenAI shows how casting via few shot learning can be used to create a question answer implementation.
Another example here below from OpenAI is impressive indeed, where again with a few lines of training data, a conversation which is fully contextual can be maintained. Notice how the questions asked by the Human is based on the conversation history, and OpenAI detects it.
Here is the code for the completion function:
import os
import openaiopenai.api_key = os.getenv("OPENAI_API_KEY")start_sequence = "\nAI:"
restart_sequence = "\nHuman: "response = openai.Completion.create(
model="text-davinci-002",
prompt="The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: I am an AI created by OpenAI. How can I help you today?\nHuman: ",
temperature=0.9,
max_tokens=150,
top_p=1,
frequency_penalty=0,
presence_penalty=0.6,
stop=[" Human:", " AI:"]
)
DialoGPT
So why DialoGPT? What surprised me, is that the DialoGPT project was launched in November 2019, but as you can see from the downloads on 🤗Hugging Face, there has been a renewed interest in it. As a comparison, the number of downloads of BLOOM last month was 42,514.
DialoGPT is a large-scale pre-trained dialogue response generation model for multi-turn conversations. The model is trained on 147M multi-turn dialogues from Reddit discussion threads.
According to Microsoft, their approach was to:
- Capture the joint distribution of source/prompt and
- target/response pairs in conversational flow
- with good granularity.
- Microsoft found that sentences generated by DialoGPT are diverse and contain information specific to the source prompt, analogous to the outputs that GPT-2 generates.
DialoGPT In Its Simplest Form
Below is arguably the shortest piece of code required to run DialoGPT. You can copy this code and paste it into a Colab notebook.
pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torchtokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")# Let's chat for 5 lines
for step in range(5):# encode the new user input, add the eos_token and return a tensor in Pytorch
new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')# append the new user input tokens to the chat history
bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids# generated a response while limiting the total chat history to 1000 tokens,
chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)# pretty print last ouput tokens from bot
print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special
Below is the response within the Colab Notebook, where you can have a conversation with DialoGPT without leaving the Notebook environment.
DialoGPT & Telegram
The following demo takes you through the process of creating a Telegram chatbot, which is integrated with DialoGPT. Here is a conversation which I had with DialoGPT via my Telegram chatbot.
You will have to register with Telegram and make use of their BotFather interface through which you can create a bot instance with a Telegram token.
The command while chatting to the BotFather to create a chatbot instance are:
/start
/newbot
You will be asked for your bot’s name and your username, after which the BotFather sends you the access detail:
Use this token to access the HTTP API:xxxxxxxxxxx:XXXXXXXXXXXXXXXXXXXXXXXXXXXXKeep your token secure and store it safely, it can be used by anyone to control your bot.For a description of the Bot API, see this page: https://core.telegram.org/bots/api
Once this is set up, access the notebook via Colab. The only change required to the notebook is the adding of your telegram token and the Giphy token. This is what DialoGPT will use to access GIF’s to respond with.
After running the code in Colab, access your bot via Telegram and start chatting.
Fine-Tuning
Fine-Tuning is possible my making use of AWS and Amazon SageMaker. I have not yet tried to fine-tune DialoGPT. But I would say for any implementation which remotely resembles a production implementation, or a specific use-case will demand fine-tuning.
Conclusion
When selecting a pre-trained dialogue response generation model for multi-turn conversations, the OpenAI generation API is extremely powerful in:
- Well formed, augmented responses
- Maintaining context throughout the conversation
- Sourcing answers and acting as a general QnA chatbot.
- Few-shot training results are impressive.
As Microsoft notes, the conversational text data used to train DialoGPT is different from the large written text corpora (e.g. wiki, news) associated with pre-trained models prior to DialoGPT.
Microsoft metions that DialoGPT is less formal, more interactive, occasionally trollish, and in general much noisier, which might fit the application you have in mind. 🙂