Enhancing ZenoChat with Intel's OpenVino and the "AI PC"

Engineering
Jan 4, 2024
3 min read

Updated: Jan 8, 2024

Vistry's ZenoChat, a cornerstone of our Conversational AI Platform, has seen significant advancements through the integration of Intel's cutting-edge technologies. With the incorporation of the OpenVino library and leveraging the capabilities of Intel's new Meteor Lake Core Ultra chipset, ZenoChat is setting new benchmarks in AI-driven conversation.

Intel AI PC | Meteor Lake Pre-Release Prototype

Why edge?

"Edge" in the context of technology and computing refers to processing data near the location where it's generated, rather than relying on a central data-processing warehouse. This means computations are done on local devices like smartphones, laptops, IoT devices, or local servers, leading to faster response times and reduced need for constant internet connectivity. Edge computing is particularly valuable in scenarios requiring immediate data processing or where privacy and data sovereignty are concerns.

The edge deployment of Vistry's ZenoChat, a Conversational AI platform, offers a compelling value proposition in scenarios where real-time, efficient, and private interactions are critical. Its ability to operate locally on devices eliminates cloud dependency, enhancing response speed and reliability, crucial for sectors like healthcare and finance where data sensitivity is paramount. Furthermore, ZenoChat's versatility makes it ideal for customer service in retail, manufacturing and hospitality, offering personalized, instant assistance without the latency or privacy concerns associated with cloud-based solutions.

What is the AI PC?

The emergence of the AI PC, as conceptualized by Microsoft and Intel, represents a significant shift in computing, where AI processing capabilities are integrated directly into personal computers. Utilizing specs like the Intel Core Ultra 7 155H processor and 32 GB of RAM, along with Intel Arc Graphics and OpenVINO 2023.2.0, these AI PCs are designed to handle sophisticated AI tasks efficiently.

This setup enables advanced data processing and AI-driven applications to be run locally, enhancing performance and responsiveness in a wide range of tasks, from image processing to complex AI modeling. This integration signifies a leap forward in making AI capabilities more accessible and effective for everyday users and professionals alike.

Leveraging OpenVino for Efficient Language Processing

Our recent integration of OpenVino, optimized for Intel hardware, has been a game-changer, particularly for handling models up to 7 billion parameters. This has enabled ZenoChat to maintain high efficiency in language processing, even with extensive input data.

Vistry's experiments with various model architectures and quantization techniques have been diverse and innovative. We've utilized models like Berkeley NEST's Starling-LM-7B-alpha for its balanced approach to language modeling and generation. Microsoft's phi-2 model, known for its robustness and scalability, was also tested. The dolphin-2_6-phi-2 from Cognitive Computations provided insights into specialized, high-performance applications. Lastly, the TinyLlama-1.1B-Chat-v1.0 model offered a unique perspective on efficient, smaller-scale conversational AI. These varied experiments showcase Vistry's commitment to exploring cutting-edge AI technologies to refine and enhance our Conversational AI Platform.

For more details on these models, you can visit their respective pages on Hugging Face:

Achieving Quick Response Times

The combination of OpenVino and the Intel Core Ultra chipset has led to impressive performance metrics. We've achieved a time to first token of approximately 1.615 seconds for inputs exceeding 1,000 tokens, maintaining a generation rate of around 10 tokens per second. These figures are testament to the synergy between Vistry's software capabilities and Intel's hardware innovations.

As we leverage the cutting-edge capabilities of the Intel Core Ultra processor with its integrated GPU, coupled with emerging open-source LLMs using OpenVino INT8 precision, our latest tests have yielded impressive results that parallel the performance of cloud-based solutions like GPT 3.5. With an average Time to First Token (TTFT) of 1.615 seconds on our Employee Assistant, our edge-based Conversational AI Platform demonstrates remarkable responsiveness, closely mirroring the rapid response times observed in GPT 3.5 for similar prompts and contexts.

Intel AI PC | MSI - Prestige Laptop running ZenoChat

Focus on Demonstration and Optimization

With these promising results, our team is now dedicated to finalizing a robust demo of ZenoChat for our session "Unlock Ultimate Customer Experience with AI" at CES 2024. While there's potential for further optimizations, the current performance level already meets our operational requirements, showcasing the power of our Conversational AI Platform.

This breakthrough in Intel Arc GPU-accelerated latency, coupled with a robust token generation rate of approximately 10 tokens per second, underscores our platform’s ability to deliver real-time, efficient conversational experiences. These metrics not only signify a major stride in edge AI computing but also reinforce our commitment to providing powerful, locally deployed AI solutions without compromising on speed or quality of interaction.

This collaboration with Intel not only enhances ZenoChat's capabilities but also demonstrates Vistry's commitment to leveraging the latest technological advancements to provide superior AI-driven solutions.

Enhancing ZenoChat with Intel's OpenVino and the "AI PC"

Why edge?

What is the AI PC?

Leveraging OpenVino for Efficient Language Processing

Achieving Quick Response Times

Focus on Demonstration and Optimization

Recent Posts

Comments

Transform Your Brand

Discover the power of personalized, intelligent assistance with a trial that’s crafted to reflect your brand's unique voice and identity.