
Ekrem Sarı
Ekrem is an AI Researcher at AIMultiple, focusing on intelligent automation, GPUs, AI Agents, and RAG frameworks.
Professional Experience
During his tenure as an Assessor at Yandex, he evaluated search results using proprietary frameworks and automated protocols. He implemented QA testing through data annotation, relevance scoring, and user intent mapping across 10,000+ queries monthly, while conducting technical assessments including performance monitoring and spam detection using ML feedback loops.
Research Interest
At AIMultiple, his research is centered on the performance and benchmarking of end-to-end AI systems.He contributes to a wide range of projects, including Retrieval-Augmented Generation (RAG) optimization, extensive Large Language Model (LLM) benchmarking, and the design of agentic AI frameworks.Ekrem specializes in developing data-driven methodologies to measure and improve AI technology performance across critical metrics like accuracy, efficiency, cost, and scalability.
His analysis covers the entire technology stack, from foundational components like embedding models and vector databases to the infrastructure required for deploying AI agents, such as remote browser solutions and web automation platforms.
Latest Articles from Ekrem
Multi-GPU Benchmark: B200 vs H200 vs H100 vs MI300X
We benchmarked NVIDIA’s B200, H200, H100, and AMD’s MI300X to measure how well they scale for Large Language Model (LLM) inference. Using the vLLM framework with the meta-llama/Llama-3.1-8B-Instruct model, we ran tests on 1, 2, 4, and 8 GPUs. We analyzed throughput and scaling efficiency to show how each GPU architecture manages parallelized, compute-intensive workloads.
Hybrid RAG: Boosting RAG Accuracy
Dense vector search is excellent at capturing semantic intent, but it often struggles with queries that demand high keyword accuracy. To quantify this gap, we benchmarked a standard dense-only retriever against a hybrid RAG system that incorporates SPLADE sparse vectors.
Benchmark 30 Finance LLMs: GPT-5, Gemini 2.5 Pro & more
Large language models (LLMs) are transforming finance by automating complex tasks such as risk assessment, fraud detection, customer support, and financial analysis. Benchmarking finance LLM can help identify the most reliable and effective solutions.
Embedding Models: OpenAI vs Gemini vs Cohere
The effectiveness of any Retrieval-Augmented Generation (RAG) system depends on the precision of its retriever component. We benchmarked 11 leading text embedding models, including those from OpenAI, Gemini, Cohere, Snowflake, AWS, Mistral, and Voyage AI, using nearly 500,000 Amazon reviews. Our evaluation focused on each model’s ability to retrieve and rank the correct answer first.
Remote Browsers: Web Infra for AI Agents Compared
AI agents rely on remote browsers to automate web tasks without being blocked by anti-scraping measures. The performance of this browser infrastructure is critical to an agent’s success. We benchmarked 8 providers on success rate, speed, and features.
Top Bot Management Platforms
Bot management identifies real users, good and bad bots, safeguarding websites, APIs, and digital assets from automated threats.
Text-to-SQL: Comparison of LLM Accuracy
I have been relying on SQL for data analysis for 18 years, beginning with my days as a consultant. Translating natural-language questions into SQL makes data more accessible, allowing anyone, even those without technical skills, to work directly with databases.
Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone
Vector databases power the retrieval layer in RAG workflows by storing document and query embeddings as high‑dimensional vectors. They enable fast similarity searches based on vector distances.
Top Serverless Functions: Vercel vs Azure vs AWS
Serverless functions enable developers to run code without having to manage a server. This allows them to focus on writing and deploying applications while infrastructure scaling and maintenance are handled automatically in the background. In this benchmark, we evaluated 7 popular cloud service providers following our methodology to test their serverless function performance.
Top 20+ Agentic RAG Frameworks
Agentic RAG enhances traditional RAG by boosting LLM performance and enabling greater specialization. We conducted a benchmark to assess its performance on routing between multiple databases and generating queries. Explore agentic RAG frameworks and libraries, key differences from standard RAG, benefits, and challenges to unlock their full potential.
AIMultiple Newsletter
1 free email per week with the latest B2B tech news & expert insights to accelerate your enterprise.