LLM Alignment, Hallucination & Misinformation

This study yet again shows the importance of data discovery, data design & data delivery to the LLM; all with human supervision.

4 min readNov 3, 2023

This study also illustrates the lack of current market readiness and growing future demand for a Human-In-The-Loop approach for data development. Especially in an AI-Accelerated scenario, also referred to as weak-supervision.

What is Alignment?

Firstly, what is alignment? Alignment refers to ensuring models behave in accordance to what the intention of the prompt was. This comes down to the accuracy of prompt engineering. Prompts are in essence a body of text where the user defines, or rather describes, their intent. And by implication the user describes the intended outcome in the prompt.

A process of optimising prompts via an iterative process can aid in model alignment, where prompts are refined for specific models and use-cases. Hence an iterative process of convergence to an optimal prompt for a specific solution.

OpenAI devoted six months to iteratively aligning GPT-4 before its release. — Source

The image above shows the taxonomy explored in the study with seven overarching categories: reliability, safety, fairness and bias, resistance to misuse, interpretability, goodwill, and robustness.

And each major category contains several sub-categories, constituting 29 sub-categories.

LLMs are Non-Deterministic

In the context of LLMs, non-deterministic means that the same prompt submitted to an LLM at different times, will most probably yield different results.

In order to deal better with the non-deterministic nature of LLMs, training can be used via various avenues. The study divides training into three steps.

Step 1 — Supervised Fine-Tuning (SFT): Given a pre-trained (unaligned) LLM that is trained on a large text dataset, we first sample prompts and ask humans to write the corresponding (good) outputs based on the prompts. We then fine-tune the pre-trained LLM on the prompt and human-written outputs to obtain SFT LLM.
Step 2 — Training Reward Model: We again sample prompts, and for each prompt, we generate multiple outputs from the SFT LLM, and ask humans to rank them. Based on the ranking, we train a reward model (a model that predicts how good an LLM output is).
Step 3 — Reinforcement Learning from Human Feedback (RLHF): Given a prompt, we sample output from the SFT LLM. Then we use the trained reward model to predict the reward on the output. We then use the Reinforcement Learning (RL) algorithm to update the SFT LLM with the predicted reward.

The three steps highlighted by the study is helpful, but I still prefer the data discovery, data development and data design approach. Data Discovery done right can aid immensely in using existing conversational data and ensuring the data which is designed, matches the desired conversations of the users.

From here via an AI accelerated latent space (data productivity platform) discovered data can be design and further developed via weak human supervision.

The study defines the current major use-cases of LLMs into the four main categories as seen in the image. The study does state that this diagram is not exhaustive, and there is scope for improvement.

Hallucination & Misinformation

Misinformation

Misinformation mostly refers to wrong or biased answers and can also be the result of no well-formed or sufficiently refined prompt engineering.

Intrinsic Hallucination

Hallucination may consist of fabricated contents that conflict with certain source content.

Extrinsic Hallucination

Or cannot be verified from the existing sources.

Hallucination can be mitigated by increasing training data, especially accurate contextual reference data at inference.

Or a process of ranking and reward with RLHF.

Finally

Everyone is trying to figure out how to build applications using LLMs, I see this as the data delivery phase. The upcoming phase are data discovery, data design and data development.

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Get an email whenever Cobus Greyling publishes.

Get an email whenever Cobus Greyling publishes. By signing up, you will create a Medium account if you don’t already…

cobusgreyling.medium.com

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a…

arxiv.org