Social Media Themes Extractor for AI Agent Communication
This study introduces a framework that automates thematic analysis based on social media…
This study introduces BEYONDWORDS, a new generative AI-based framework that automates thematic analysis of social media data by integrating tweet embeddings and a Chain of Thought (CoT) prompting strategy with large language models (LLMs) to extract and refine themes from large-scale, unstructured text.
The focus was the study was applied to tweets from the autistic community using the #actuallyautistic hashtag, the methodology identifies three key themes —
- Social media content quality and engagement,
- Advocacy for autistic rights and acceptance, and
- Mental health and well-being — demonstrating its ability to uncover nuanced insights while preserving discourse richness.
The approach combines machine learning and generative AI to offer a scalable, adaptable solution that outperforms traditional methods, with potential applications across diverse online communities, validated through rigorous clustering metrics and iterative theme refinement.
Many existing models still struggle to capture the complexity & nuance of informal social media language, particularly in communities with specialised or evolving vocabularies
Does the Social Media Feed Contribute to or Enrich Human Conversation with the AI Agent?
No.
The social media feed in this study does not directly contribute to or enrich or supplement a live human conversation with an AI Agent.
Rather, it serves as the raw input data that the BEYONDWORDS framework processes independently to extract themes.
The method operates autonomously, using pre-collected tweets as a dataset, which the AI analyses through embeddings, clustering and generative theme extraction without real-time human interaction.
But, the resulting insights could potentially supplement or inform subsequent human-AI conversations if integrated into an interactive context, though this study focuses solely on the automated analysis pipeline rather than a dynamic conversational system.
Building Context
The reason this study attracted me was that much word has been done to create a contextual reference for LLMs. If the user context or query context can be accurately determined, a query can be resolved much faster and accurately.
Hence various methods like RAG, intent and classification models, spatial observation, grounding and more have been developed and implemented.
There is also significant focus on personal assistants, and building memory into the AI Agent to serve, again, as a contextual reference. This memory can be longterm or short term. Short term memory has been implemented in the form of session variables and more.
In an enterprise setting, customer management systems provides a contextual reference for customer conversations.
Back to the study…being able to build a contextual reference from social media content can server as an invaluable resource for building contextual reference frameworks for a user.
Granted, with this approach it will be hard to perform on-the-fly in real-time.
Chain-Of-Thought
The CoT strategy is a key part of the proposed method, helping guide the creation process with clear, step-by-step tasks…
- The language model is asked to pick out important words and phrases from tweets, focusing on ones that show key topics and feelings.
- It then groups these words into categories based on shared themes or ideas, making everything clearer and more organised.
- For each group, the model acts like an expert, pulling out big-picture themes and giving short descriptions that sum up the main points of the tweet discussions.
LLM Feedback Mechanism
To further refine the theme extraction process, a recursive feedback mechanism involving a second agent LLM is implemented.
This agent acts as a grading system, following these steps…
The secondary LLM evaluates the themes generated by the primary LLM against predefined quality thresholds.
If the themes do not meet the established criteria, feedback is relayed back to the primary LLM.
The primary LLM utilises the feedback (score + comment) to reevaluate the extraction process, revisiting the previous steps as necessary.
This cycle of evaluation and adjustment continues, improving the quality of theme extraction with each iteration until either the threshold score or the maximum number of iterations are reached.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.