AIMultipleAIMultiple
No results found.
Ekrem Sarı

Ekrem Sarı

AI Researcher
18 Articles
Stay up-to-date on B2B Tech
Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and LLMOps for RAG frameworks.

Professional Experience

During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments, including performance monitoring and spam detection using ML feedback loops.

Research Interest

At AIMultiple, his research is centered on the MLOps lifecycle and the performance and benchmarking of end-to-end AI systems. He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks. Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical operational metrics like accuracy, efficiency, API cost, and scalability.

His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the high-performance GPU and cloud infrastructure required for deploying AI agents.

Education

Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.

Latest Articles from Ekrem

AIDec 5

Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone

Vector databases power the retrieval layer in RAG workflows by storing document and query embeddings as high‑dimensional vectors. They enable fast similarity searches based on vector distances.

AIDec 5

Best RAG Tools, Frameworks, and Libraries

RAG (Retrieval-Augmented Generation) improves LLM responses by adding external data sources. We benchmarked different embedding models and separately tested various chunk sizes to determine what combinations work best for RAG systems. Explore top RAG frameworks and tools, learn what RAG is, how it works, its benefits, and its role in today’s LLM landscape.

AIDec 5

Embedding Models: OpenAI vs Gemini vs Cohere

The effectiveness of any Retrieval-Augmented Generation (RAG) system depends on the precision of its retriever. We benchmarked 11 leading text embedding models, including those from OpenAI, Gemini, Cohere, Snowflake, AWS, Mistral, and Voyage AI, using ~500,000 Amazon reviews.

AIDec 2

Multimodal Embedding Models: Apple vs Meta vs OpenAI

Multimodal embedding models excel at identifying objects but struggle with relationships. Current models struggle to distinguish “phone on a map” from “map on a phone.” We benchmarked 7 leading models across MS-COCO and Winoground to measure this specific limitation. To ensure a fair comparison, we evaluated every model under identical conditions using NVIDIA A40 hardware and bfloat16 precision.

AIDec 2

RAG Frameworks: LangChain vs LangGraph vs LlamaIndex vs Haystack vs DSPy

We benchmarked 5 RAG frameworks: LangChain, LangGraph, LlamaIndex, Haystack, and DSPy, by building the same agentic RAG workflow with standardized components: identical models (GPT-4.1-mini), embeddings (BGE-small), retriever (Qdrant), and tools (Tavily web search). This isolates each framework’s true overhead and token efficiency.

AIDec 1

LLM Inference Engines: vLLM vs LMDeploy vs SGLang

We benchmarked 3 leading LLM inference engines on NVIDIA H100: vLLM, LMDeploy, and SGLang. Each engine processed identical workloads; 1,000 ShareGPT prompts using Llama 3.1 8B-Instruct to isolate the true performance impact of their architectural choices and optimization strategies.

Enterprise SoftwareNov 25

Top Serverless Functions: Vercel vs Azure vs AWS

Serverless functions enable developers to run code without having to manage a server. This allows them to focus on writing and deploying applications while infrastructure scaling and maintenance are handled automatically in the background. In this benchmark, we evaluated 7 popular cloud service providers following our methodology to test their serverless function performance.

AINov 25

Benchmark of 30 Finance LLMs: GPT-5, Gemini 2.5 Pro & more

Large language models (LLMs) are transforming finance by automating complex tasks such as risk assessment, fraud detection, customer support, and financial analysis. Benchmarking finance LLM can help identify the most reliable and effective solutions.

DataNov 24

Remote Browsers: Web Infra for AI Agents Compared

AI agents rely on remote browsers to automate web tasks without being blocked by anti-scraping measures. The performance of this browser infrastructure is critical to an agent’s success. We benchmarked 8 providers on success rate, speed, and features.

AINov 20

Relational Foundation Models: SAP vs. Gradient Boosting

We benchmarked SAP-RPT-1-OSS against gradient boosting (LightGBM, CatBoost) on 17 tabular datasets spanning the full semantic-numeral spectrum, small/high-semantic tables, mixed business datasets, and large low-semantic numerical datasets.