Contact Us
No results found.
Ekrem Sarı

Ekrem Sarı

AI Researcher
22 Articles
Stay up-to-date on B2B Tech

Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and LLMOps for RAG frameworks.

Professional Experience

During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments, including performance monitoring and spam detection using ML feedback loops.

Research Interest

At AIMultiple, his research is centered on the MLOps lifecycle and the performance and benchmarking of end-to-end AI systems. He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks. Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical operational metrics like accuracy, efficiency, API cost, and scalability.

His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the high-performance GPU and cloud infrastructure required for deploying AI agents.

Education

Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.

Latest Articles from Ekrem

AIJan 30

Best RAG Tools, Frameworks, and Libraries in 2026

RAG (Retrieval-Augmented Generation) improves LLM responses by adding external data sources. We benchmarked different embedding models and separately tested various chunk sizes to determine what combinations work best for RAG systems. Explore top RAG frameworks and tools, learn what RAG is, how it works, its benefits, and its role in today’s LLM landscape.

AIJan 30

Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone

Vector databases power the retrieval layer in RAG workflows by storing document and query embeddings as high‑dimensional vectors. They enable fast similarity searches based on vector distances.

AIJan 30

Benchmark of 16 Best Open Source Embedding Models for RAG

Most embedding benchmarks measure semantic similarity. We measured correctness. We tested 16 open-source models, from 23M-parameter to 8B-parameter embeddings, on 490,000 Amazon product reviews, scoring each by whether it retrieved the right product review through exact ASIN matching, not just topically similar documents.

DataJan 30

Remote Browsers: Web Infra for AI Agents Compared ['26]

AI agents rely on remote browsers to automate web tasks without being blocked by anti-scraping measures. The performance of this browser infrastructure is critical to an agent’s success. We benchmarked 8 providers on success rate, speed, and features.

AIJan 29

Embedding Models in 2026: OpenAI vs Gemini vs Cohere

The effectiveness of any Retrieval-Augmented Generation (RAG) system depends on the precision of its retriever. We benchmarked 11 leading text embedding models, including those from OpenAI, Gemini, Cohere, Snowflake, AWS, Mistral, and Voyage AI, using ~500,000 Amazon reviews.

AIJan 29

LLM Quantization: BF16 vs FP8 vs INT4 in 2026

Quantization reduces LLM inference cost by running models at lower numerical precision.  We benchmarked 4 precision formats of Qwen3-32B on a single H100 GPU. We ran over 2,000 inference runs and 12,000+ MMLU-Pro questions to measure the real-world trade-offs between speed, memory, and accuracy.

AIJan 29

RAG Frameworks: LangChain vs LangGraph vs LlamaIndex

We benchmarked 5 RAG frameworks: LangChain, LangGraph, LlamaIndex, Haystack, and DSPy, by building the same agentic RAG workflow with standardized components: identical models (GPT-4.1-mini), embeddings (BGE-small), retriever (Qdrant), and tools (Tavily web search). This isolates each framework’s true overhead and token efficiency.

AIJan 28

Multimodal Embedding Models: Apple vs Meta vs OpenAI

Multimodal embedding models excel at identifying objects but struggle with relationships. Current models struggle to distinguish “phone on a map” from “map on a phone.” We benchmarked 7 leading models across MS-COCO and Winoground to measure this specific limitation. To ensure a fair comparison, we evaluated every model under identical conditions using NVIDIA A40 hardware and bfloat16 precision.

AIJan 28

Top 20+ Agentic RAG  Frameworks in 2026

Agentic RAG enhances traditional RAG by boosting LLM performance and enabling greater specialization. We conducted a benchmark to assess its performance on routing between multiple databases and generating queries. Explore agentic RAG frameworks and libraries, key differences from standard RAG, benefits, and challenges to unlock their full potential.

AIJan 28

Supervised Fine-Tuning vs Reinforcement Learning in 2026

Can large language models internalize decision rules that are never stated explicitly? To examine this, we designed an experiment in which a 14B parameter model was trained on a hidden “VIP override” rule within a credit decisioning task, without any prompt-level description of the rule itself.