AIMultipleAIMultiple
No results found.
Şevval Alper

Şevval Alper

AI Researcher
15 Articles
Stay up-to-date on B2B Tech

Şevval is an AI researcher at AIMultiple. She has previous research experience in pseudorandom number generation using chaotic systems.

Research interests

Şevval focuses on AI coding tools, AI agents, and quantum technologies.

She is part of the AIMultiple benchmark team, conducting assessments and providing insights to help readers understand various emerging technologies and their applications.

Professional experience

She contributed to organizing and guiding participants in three “CERN International Masterclasses - hands-on particle physics” events in Türkiye, working alongside faculty to facilitate learning.

Education

Şevval holds a Bachelor's degree in Physics from Middle East Technical University.

Latest Articles from Şevval

AISep 24

LLM Parameters: GPT-5 High, Medium, Low and Minimal

New LLMs, such as OpenAI’s GPT-5 family, come with different versions (e.g., GPT-5, GPT-5-mini, and GPT-5-nano) and various parameters, including high, medium, low, and minimal. Below, we explore the differences between these versions of the models by gathering their benchmark performances and the costs to run these benchmarks. Price vs.

Agentic AISep 27

MCP Benchmark: Top MCP Servers for Web Access

MCP (Model Context Protocol) establishes a standardized communication bridge between AI agents and applications, allowing AI apps and LLMs to interact with external tools and services. We benchmarked 8 MCP servers across web search and extraction, as well as browser automation tasks, by running 4 different tasks 5 times on all suitable MCPs.

AIApr 22

AI Reasoning Benchmark: MathR-Eval

We designed a new benchmark, Mathematical Reasoning Eval: MathR-Eval, to test the LLMs’ reasoning abilities, with 100 logical mathematics questions. Benchmark results Results show that OpenAI’s o1 and o3-mini are the best performing LLMs in our benchmark.

AIAug 23

Vibe coding: Great for MVP But Not Ready for Production

Vibe coding is a new term that has entered our lives with AI coding tools like Cursor. It means coding by only prompting. We made several benchmarks to test the vibe coding tools, and with our experience, we decided to prepare this detailed guide.

AIAug 22

8 AI Code Models Benchmarked: LMC-Eval

More than 37% of tasks performed on AI models are about computer programming and maths.

AIApr 2

Speech-to-Text Benchmark: Deepgram vs. Whisper

We benchmarked the leading speech-to-text (STT) providers, focusing specifically on healthcare applications. Our benchmark used real-world examples to assess transcription accuracy in medical contexts, where precision is crucial. Benchmark results The average WER results of our tasks show that Deepgram is the leading speech-to-text provider for healthcare in this benchmark.

Agentic AISep 9

Top 4 AI Search Engines Compared

Searching with LLMs has become a major alternative to Google search. We benchmarked the following AI search engines to see which one provides the most correct results: Benchmark results Deepseek is the leader of this benchmark, by correctly providing 57% of the data in our ground truth dataset.

AISep 29

AGI Benchmark: Can AI Generate Economic Value

AI will have its greatest impact when AI systems start to create economic value autonomously. We benchmarked whether frontier models can generate economic value. We prompted them to build a new digital application (e.g., website or mobile app) that can be monetized with a SaaS or advertising-based model.

AISep 2

Best AI Code Editor: Cursor vs Windsurf vs Replit

Making an app without coding skills is highly trending right now. But can these tools successfully build and deploy an app? To answer this question, we benchmarked the following AI coding tools: Claude Code, Cline, Cursor, Replit and Windsurf Editor by Codeium.

AIApr 7

Top AI Website Generators Benchmarked

To find the most helpful prompt-to-website creator, we benchmarked the following tools: If you need to learn about no-code AI website generator tools, you can follow the links: Benchmark results We conducted this benchmark using the latest versions of the tools available as of January 2025.