AIMultipleAIMultiple
No results found.
Nazlı Şipi

Nazlı Şipi

AI Researcher
7 Articles
Stay up-to-date on B2B Tech
Nazlı is a data analyst at AIMultiple. She has prior experience in data analysis across various industries, where she worked on transforming complex datasets into actionable insights.

She is also part of the benchmark team, focusing on large language models (LLMs), AI agents, and agentic frameworks.

Nazlı holds a Master’s degree in Business Analytics from the University of Denver.

Latest Articles from Nazlı

AIDec 2

LLM Observability Tools: Weights & Biases, Langsmith

LLM-based applications are becoming more capable and increasingly complex, making their behavior harder to interpret. Each model output results from prompts, tool interactions, retrieval steps, and probabilistic reasoning that cannot be directly inspected. LLM observability addresses this challenge by providing continuous visibility into how models operate in real-world conditions.

AIDec 1

Top 9 AI Providers Compared

The AI infrastructure ecosystem is growing rapidly, with providers offering diverse approaches to building, hosting, and accelerating models. While they all aim to power AI applications, each focuses on a different layer of the stack.

AINov 24

Compare Multimodal AI Models on Visual Reasoning

We benchmarked 8 leading multimodal AI models on visual reasoning using 98 visual-based questions. The evaluation consisted of two tracks: 70 Chart Understanding questions testing data visualization interpretation, and 28 Visual Logic questions assessing pattern recognition and spatial reasoning. Visual reasoning benchmark See our benchmark methodology to learn our testing procedures.

AINov 13

LLM Latency Benchmark by Use Cases

The effectiveness of large language models (LLMs) is determined not only by their accuracy and capabilities but also by the speed at which they engage with users. We benchmarked the performance of leading language models across various use cases, measuring how quickly they respond to user input.

Agentic AINov 11

Top 5 Open-Source Agentic Frameworks

We reviewed several popular open-source AI agent frameworks, examining their multi-agent orchestration capabilities, agent and function definitions, memory management, and human-in-the-loop features. To evaluate practical performance, we implemented four data analysis tasks on each framework: logistic regression, clustering, random forest classification, and descriptive statistical analysis.

Agentic AINov 7

Benchmarking Agentic AI Frameworks in Analytics Workflows

Frameworks for building agentic workflows differ substantially in how they handle decisions and errors, yet their performance on imperfect real-world data remains largely untested.

Agentic AISep 24

Vision Language Models Compared to Image Recognition

Can advanced Vision Language Models (VLMs) replace traditional image recognition models? To find out, we benchmarked 16 leading models across three paradigms: traditional CNNs (ResNet, EfficientNet), VLMs ( such as GPT-4.1, Gemini 2.5), and Cloud APIs (AWS, Google, Azure).