Ekrem Sarı

AI Researcher

14 Articles

Stay up-to-date on B2B Tech

Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and RAG frameworks.

Professional Experience

During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments including performance monitoring and spam detection using ML feedback loops.

Research Interest

At AIMultiple, his research is centered on the performance and benchmarking of end-to-end AI systems.He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks.Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical metrics like accuracy, efficiency, cost, and scalability.

His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the infrastructure required for deploying AI agents, such as remote browser solutions and web automation platforms.

Education

Ekrem holds a bachelor's degree from Hacettepe Üniversitesi and a master's degree from Başkent Üniversitesi.

Latest Articles from Ekrem

AINov 6

RAG Frameworks: LangChain vs LangGraph vs LlamaIndex vs Haystack vs DSPy

Comparing Retrieval-Augmented Generation (RAG) frameworks is challenging. Default settings for prompts, routing, and tools can subtly alter behavior, making it difficult to isolate the framework’s impact. To create a controlled comparison, we replicated the same agentic RAG workflow across LangChain, LangGraph, LlamaIndex, Haystack, and DSPy, standardizing components wherever possible.

AIOct 27

Context Engineering: Maximize LLM Grounding & Accuracy

LLMs often struggle with raw, unstructured data such as email threads or technical documents, leading to factual errors and weak reasoning. We benchmarked systematic context engineering and achieved up to +13.0% improvement in task accuracy, confirming that structured context is key to enhancing performance in complex tasks.

AIOct 15

Multi-GPU Benchmark: B200 vs H200 vs H100 vs MI300X

For over two decades, optimizing compute performance has been a cornerstone of my work. We benchmarked NVIDIA’s B200, H200, H100 and AMD’s MI300X to assess how well they scale for Large Language Model (LLM) inference. Using the vLLM framework with the meta-llama/Llama-3.1-8B-Instruct model, we ran tests on 1, 2, 4 and 8 GPUs.

AIOct 19

GPU Concurrency Benchmark: H100 vs H200 vs B200 vs MI300X

I have spent the last 20 years focusing on system-level computational performance optimization. We benchmarked the latest NVIDIA GPUs, including the NVIDIA (H100, H200, and B200) and AMD (MI300X), for concurrency scaling analysis. Using the vLLM framework with the gpt-oss-20b model, we tested how these GPUs handle concurrent requests, from 1 to 512.

AISep 1

Hybrid RAG: Boosting RAG Accuracy

Dense vector search is excellent at capturing semantic intent, but it often struggles with queries that demand high keyword accuracy. To quantify this gap, we benchmarked a standard dense-only retriever against a hybrid RAG system that incorporates SPLADE sparse vectors.

AIOct 14

Benchmark of 30 Finance LLMs: GPT-5, Gemini 2.5 Pro & more

Large language models (LLMs) are transforming finance by automating complex tasks such as risk assessment, fraud detection, customer support, and financial analysis. Benchmarking finance LLM can help identify the most reliable and effective solutions.

AIOct 11

Embedding Models: OpenAI vs Gemini vs Cohere

The effectiveness of any Retrieval-Augmented Generation (RAG) system depends on the precision of its retriever. We benchmarked 11 leading text embedding models, including those from OpenAI, Gemini, Cohere, Snowflake, AWS, Mistral, and Voyage AI, using ~500,000 Amazon reviews.

DataNov 5

Remote Browsers: Web Infra for AI Agents Compared

AI agents rely on remote browsers to automate web tasks without being blocked by anti-scraping measures. The performance of this browser infrastructure is critical to an agent’s success. We benchmarked 8 providers on success rate, speed, and features.

CybersecurityOct 8

Top Bot Management Platforms

Bot management identifies real users, good and bad bots, safeguarding websites, APIs, and digital assets from automated threats.

AISep 9

Text-to-SQL: Comparison of LLM Accuracy

I have been relying on SQL for data analysis for 18 years, beginning with my days as a consultant. Translating natural-language questions into SQL makes data more accessible, allowing anyone, even those without technical skills, to work directly with databases.

1 2

Stay ahead of the curve with

AIMultiple Newsletter

1 free email per week with the latest B2B tech news & expert insights to accelerate your enterprise.