Over the last two years of Atomicwork’s journey of building a GenAI platform to deliver a Modern ITSM solution and Conversational ESM product, we’ve had to make painful choices between Cost, Accuracy and Performance, often trying to optimize for all of them.
Traditionally a popular theorem in the context of distributed systems, CAP (Consistency, Availability and Partition Tolerance) takes on a whole new meaning in the world of GenAI apps. As it stands today, building GenAI apps requires thinking along the dimensions of Cost, Accuracy and Performance—the new CAP.
Through this blogpost, I want to share our learnings for Enterprise IT, product and engineering teams working to adopt, develop or deploy GenAI applications for their business—especially with the context of Cost, Accuracy and Performance.
Cost in GenAI applications refers to the operating cost of AI models in production and not the resources required to develop, deploy, and maintain AI systems. Cost is a primary dimension as most businesses will be consuming AI Models via APIs and LLM calls that are expensive.
While evaluating AI Models, we have learned that a single model approach will not work for B2B use cases. It is important to categorize the existing landscape of LLM models into 3 segments:
For Enterprise IT, product and engineering teams, understanding the cost implications of different approaches is crucial. Balancing cost with the desired level of accuracy and performance is a delicate dance that requires careful planning based on your Gen AI use cases.
Accuracy is the measure of how well an AI system performs its intended task or use case.
All AI models are not the same in accuracy even if they are trained with similar size data/tokens. For B2B GenAI applications, accuracy is paramount, especially in critical domains like HR, legal, healthcare, finance, and business systems.
Achieving high accuracy requires not just using state-of-the-art models, but also building robust data pipelines, quality training data, and in some cases even sophisticated fine-tuning.
Modern IT, product and engineering teams must constantly strive to improve the accuracy of their AI models while being mindful of the trade-offs between cost and performance. Improving accuracy as a primary dimension will increase the latency and cost significantly.
Performance in GenAI applications encompasses speed, scalability, and efficiency.
A high-scalable AI system can easily process large volumes of data quickly and adapt to changing business data through RAG architecture. However, delivering results in real-time or low latency for end users will become a challenge.
Most product and engineering teams face the challenge of optimizing model performance without compromising accuracy or incurring excessive costs. This often involves fine-tuning for domain specificity, leveraging parallel processing across models, and utilizing specialized hardware beyond GPUs like Groq to meet low-latency and high-performance requirements.
At the end of the day, compromising on one parameter over the other two comes down to deeply understanding the domain, the user persona, the use case, and the cost of making an error.
Let’s look at a few examples from support:
For Enterprise IT, product and engineering teams working on GenAI applications, understanding the interplay between cost, accuracy, and performance is essential to know that improving one of these dimensions could adversely impact the other.
Here are 3 key implications to consider based on our production experience over the last 2 years:
By understanding the implications of this new CAP theorem for AI and adopting a strategic and collaborative approach, enterprise IT teams can navigate the complexities of developing advanced AI systems that deliver value while managing TCO effectively.
Co-authored by Aparna Chugh, Head of Product at Atomicwork.
Originally published on Medium.