Story of Production-Grade Self-Evolving , Self Tested LLM for Contracts Redlining, Negotiations, and Risk Neutralization.

Posted by Admin May 15, 2024 Tech

Advancing AI with Self-Evolving Capabilities: The Development of the Observer Model

As we faced the challenges of enhancing and refining our AI system, we turned to the wealth of knowledge available in the broader AI research community. Delving into research papers from leading organizations like Meta and Microsoft, we gleaned insights that shaped our strategy for creating a self-evolving large language model (LLM). Our goal was to not only address the immediate feedback from users but to establish a system that continually improves and adapts over time.

Introducing the Observer Model

The core of our new strategy was the implementation of what we called the “Observer Model.” This was a fine-tuned component designed to monitor and evaluate the main production model’s performance in real-time. The Observer Model’s primary function was to assess the responses generated by our AI on several critical parameters, including fairness, ethics, and factual correctness.

Feedback Integration and Learning

The introduction of the Observer Model also transformed how customer feedback was handled. Previously, feedback had to be manually reviewed and incorporated, a process that was both time-consuming and susceptible to delays. With the Observer Model, whenever customers flagged responses as inappropriate or incorrect, the model automatically logged these instances.

This data was not immediately discarded or blindly accepted. Instead, each flagged response was elaborated upon by the Observer Model and then stored in detailed logs for further review. Our team would periodically review these logs, combining human oversight with automated processes to ensure a balanced approach to model training.

Reinforcement Learning Techniques

To integrate the insights gained from this feedback loop effectively, we employed advanced reinforcement learning techniques, specifically Proximal Policy Optimization (PPO) and Distributional Policy Optimization (DPO). These methods allowed us to fine-tune our AI based on real-world interactions and feedback, adjusting the model’s behavior in a controlled and incremental manner.

This approach not only improved the model’s performance over time but also ensured that the enhancements were grounded in actual user experiences and needs. By systematically integrating user feedback and employing reinforcement learning, our AI was not just reacting to inputs but evolving from them.

Continuous Enhancement: Progress and Challenges in Our AI System

Our AI system at Aavenir is constantly evolving, making notable progress each day. However, it is not without its imperfections. The system still encounters glitches and suffers from various process gaps, highlighting the complexities of such advanced technologies.

Testing Challenges

Testing our system poses significant challenges due to its complexity. It’s crucial to have robust automated testing in place to ensure that new features do not disrupt existing functionalities. This ongoing testing is essential for maintaining the system’s reliability and for preventing any regression in performance as we continue to develop and refine our features.

Looking Ahead

Despite these hurdles, we are committed to improving and refining our AI system. We understand the importance of continuous integration and testing in developing a reliable and efficient AI platform. As we move forward, our focus remains on overcoming these challenges and enhancing our system’s capabilities.

Name: Anand Trivedi

Company name: Aavenir

Designation: Head of Artificial Intelligence

Website: AI-powered Connected Source-to-Pay Solutions | Aavenir