The Emergent AI Stack
Date: 12.04.23
Author: Amir Behbehani
Artificial Intelligence advancements, especially with Large Language Models (LLMs) like GPT-4, are transforming traditional applications. These models have evolved from simple text processing tools to handling complex analytical tasks, significantly changing how we interact with technology. However, a crucial challenge remains: improving memory management within LLMs to fully unlock the potential of autonomous GPT agents in performing professional tasks from start to finish. These tasks may include building a software application or generating a bespoke contract.
Current AI, including autonomous agents, often face limitations in their ability to perform complex cognitive tasks such as problem-solving, mathematical reasoning, and unsupervised operations. These limitations stem from inadequate memory management, resulting in impaired decision-making, and sub-optimal analytical capabilities. This is particularly evident when attempting to perform a complex task solely based on a single prompt, as seen in autoGPT.
Note: This article is intended for AI novices and those with high-level knowledge. It aims to provide a broad understanding of the Emergent AI Stack and its significance.
AI can automate a wide range of tasks, improving human efficiency and reducing labor costs, especially in countries with high labor costs, such as the United States.
In this post, I will use the insurance industry as an example. However, this new AI technology stack can be applied to any industry vertical. The insurance industry includes specific primary characteristics and use cases, including:
Lead generation: consumers seeking to insure large purchases such as houses, cars, health/life
Insurance Quotes are generated once all necessary data are received
Negotiation and optimization of quotes
Signing of contracts
Document and policy delivery
Maintenance and retention, including claims
Let’s jump into the first and foundational layer of the new AI stack:
Layer 1: Foundation Layer — The Central Processing Unit
At the core of the AI stack is the Foundation Layer, where LLMs (Large Language Models) process and analyze vast amounts of data. These models, such as GPT and BERT, are becoming more capable of approximating general intelligence. However, they also encounter challenges, such as generating biased or factually inaccurate content. For instance, GPT-4, despite its sophistication, has demonstrated limitations in creating content that may be biased or inaccurate. This highlights the importance of ongoing advancements in this layer.
Let’s say you want to use the foundational models to interact with your content and data, particularly within the enterprise. Out of the box, these models may not perform reliably well. The results can be variable if you ask the same question 10 different ways. Additionally, if you have a large amount of proprietary content you want to feed into these models, you will need a data store to facilitate interaction with the LLMs. Relying on a prompt alone will not suffice. You’ll need to set up a vector database at a minimum and preferably a knowledge graph as well.
Foundational models have seen significant advancements in understanding input and generating output across many modalities (text, image, audio, and video). An increasing number of models also specialize in different areas, so this layer is a critical piece of our AI stack.
Layer 2: The Data Layer: AI’s Long-Term Memory
The Insurance vertical has vast data on its customers and policyholders, and these datasets typically sit in relational databases or PDFs. It is a prerequisite for these data sets to be readily available to be ingested into the new AI tech stack and potentially to have the AI tech stack write to these databases as prospects and customers take further actions. LLMs estimate the probability of the following best word, pixel, etc., based on their vast training data, and similarity calculations need to be stored in Vector Databases, which are distinct from standard relational databases.
A vector database can be helpful for general inquiries, such as determining the document type or whether a contract has been signed. However, a knowledge graph is better suited for retrieving more specific information, such as the dollar value of a contract or the counterparty’s title in an agreement. You can hone your responses by augmenting the LLM with these additional data sources and dynamically selecting the appropriate source based on the question type (general or specific).
Having a well-thought-out data layer is essential for storing and managing enterprise data, but it also brings challenges in data preparation, integration, and governance. Efficient data storage and retrieval are crucial, and technologies like vector databases, such as Pinecone, play a key role in achieving this. These databases are increasingly being used in AI applications for efficient storage and retrieval of data, which is especially important for large-scale AI models.
Layer 3: The Context Layer: Enhancing AI with Short-Term Memory
After the insurance data are ingested into the AI stack, you can augment your prompts with specific data. This allows the chat sessions not to lose context. Tools like LangChain and LlamaIndex can enhance the LLM’s ability to handle complex tasks and maintain context awareness. This layer acts like a computer’s RAM, providing quick access to relevant information for immediate processing.
By working with layers 1, 2, and 3, your query responses should be significantly improved compared to simply calling the GPT API. This layer can stand to benefit from some continued innovation. Current deep-learning methods face challenges in generalization, abstraction, and understanding causality, highlighting the need for more robust AI systems, especially when working with autonomous agents that automate the prompt / retrieval dynamic.
Layer 4: The Operating System
Wouldn’t it be great if your application could determine in real-time which data source to call based on the question type? And what if it could do so on an as-needed basis? For instance, an insurance sales agent might not have access to a specific data source, but the underwriter does. Or, when a customer inquiries about their coverage, and it’s unclear whether it relates to house insurance or auto insurance, where should the search for data begin, or should the chatbot ask a clarifying question first? This is where the concept of memory management, akin to that in a computer’s operating system, becomes crucial.
Here, the Operating System (OS) layer of our AI architecture plays a crucial role:
OS-Driven Query Analysis: The OS layer analyzes the detail level in each query received by the autonomous agents. It determines the required depth and specificity for responses, guiding agents in providing a broad overview or a detailed answer.
OS-Guided Data Source Alignment: The OS identifies the granularity of each query and directs agents to the most suitable data source. This ensures agents access the most relevant information, from general databases for broader questions to specialized repositories for detailed inquiries.
OS-Controlled Data Source Flexibility: The OS enables agents to transition seamlessly between various data sources as the conversation evolves, maintaining the relevance and accuracy of the information.
OS-Managed Contextual Memory: The OS manages memory allocations for the agents, storing recent conversation parts in short-term memory and broader topics or user history in long-term memory for sustained relevance.
OS-Structured Conversational Progression: The OS ensures that each question and answer sequence leads the conversation toward a resolution, maintaining focus and preventing deviations from the main topic.
OS-Facilitated Socratic Engagement: The OS directs agents to use a Socratic approach for problem-solving, deconstructing complex problems into simpler components for individual addressing and collective synthesis.
OS-Enabled Comprehensive Problem Solving: Under the OS’s orchestration, autonomous agents collaboratively solve various problems, adapting to evolving requirements with minimal external direction.
These roles emphasize the critical part of the operating system (OS) in coordinating the functionality and intelligence of autonomous agents within the system.
Platforms that facilitate dynamic memory management, such as Memra, are crucial in enhancing retrieval output. These frameworks assist LLMs in determining whether the questions being asked necessitate more general or specific responses, significantly improving context and minimizing hallucinations. This is particularly important when transforming large amounts of unstructured text into datasets for quick and accurate ingestion into AI and non-AI record systems. It provides new AI system development and deployment frameworks while integrating better with existing operations.
At this layer, AI interacts with other AI (autonomous agents) that perform intermediary tasks without human intervention. AI can break down complex tasks into simpler subprocesses and assign them to different agents to complete. The results are then aggregated into a complete solution. The integration hub manages this process and ensures that the agents’ output reaches the application layer.
In the B2C context, using LLMs for tasks beyond text and image generation represents a significant leap in productivity. For example, LLMs can book flights with minimal user input. In the B2B context, particularly in the insurance industry, AI can automate processes such as generating quotes, drafting and executing contracts, and issuing policy documents with minimal employee involvement. To achieve this, it is necessary to develop an OS layer that manages the user journey, ensuring each step is completed before progressing to the next while assisting users with any questions or concerns. Initiatives like AutoGPT are working towards making this a reality. However, building the OS layer from the ground up, industry by industry, is essential.
The OS layer is an area ripe for innovation across the AI industry, and we expect to see significant developments in this area in 2024.
Layer 5: The Application Layer: AI’s Interface with Users (or other AI)
The Application Layer is where AI becomes tangible for users, simplifying complex processes into user-friendly applications. This layer is crucial for making AI accessible to many users. It transforms the underlying AI technologies into practical applications that users can interact with, thus democratizing the benefits of AI. In the insurance example, imagine a customer downloading a smartphone app or going to a webpage, where the user interacts with a chatbot to take care of all their insurance needs by simply saying what they want in plain English. No more navigating endless menus trying to find something; now we merely can ask. This front end also has a benefit to the back end too. As users interact with an AI bot and supply all the necessary information to submit a quote or a claim, we can use the front end and these other layers to convert user inputs into datasets that can be used to address customer needs more than a legacy navigation menu.
Conclusion:
We explored the different layers of the Emergent AI Stack, beginning with the Foundation Layer. In this layer, LLMs process and analyze vast amounts of data. The Data Layer is responsible for storing and managing enterprise data to enable efficient retrieval and integration with AI. The Context Layer enhances AI with short-term memory, preserving context during interactions. The OS Layer, similar to an operating system, manages memory, directs queries, and facilitates problem-solving. Lastly, the Application Layer presents AI technologies through user-friendly interfaces, making the benefits of AI accessible to all.
In this evolving landscape, the OS Layer of the Emergent AI Stack plays a crucial role, similar to how operating systems became in personal computers. Early personal computers transitioned from basic command-line interfaces to complex, user-friendly operating systems that seamlessly managed software applications, multitasking, and user interaction. Similarly, the OS Layer in the Emergent AI Stack is becoming increasingly important. It orchestrates the integration and interaction between different AI layers, manages data flow, and ensures that AI applications effectively respond to user needs.
Looking to 2024, the focus is expected to shift towards enhancing the sophistication of the OS Layer in the Emergent AI Stack. This year will likely witness significant advancements in the orchestration and management of AI systems. These developments can potentially drive the creation of even more dynamic and innovative enterprise applications than in 2023.