AIMultipleAIMultiple
No results found.
Şevval Alper

Şevval Alper

AI Researcher
15 Articles
Stay up-to-date on B2B Tech

Şevval is an AI researcher at AIMultiple. She has previous research experience in pseudorandom number generation using chaotic systems.

Research interests

Şevval focuses on AI coding tools, AI agents, and quantum technologies.

She is part of the AIMultiple benchmark team, conducting assessments and providing insights to help readers understand various emerging technologies and their applications.

Professional experience

She contributed to organizing and guiding participants in three “CERN International Masterclasses - hands-on particle physics” events in Türkiye, working alongside faculty to facilitate learning.

Education

Şevval holds a Bachelor's degree in Physics from Middle East Technical University.

Latest Articles from Şevval

AISep 24

LLM Parameters: GPT-5 High, Medium, Low and Minimal

New LLMs, such as OpenAI’s GPT-5 family, come with different versions (e.g., GPT-5, GPT-5-mini, and GPT-5-nano) and various parameters, including high, medium, low, and minimal. Below, we explore the differences between these versions of the models by gathering their benchmark performances and the costs to run these benchmarks. Price vs.

Agentic AIOct 15

MCP Benchmark: Top MCP Servers for Web Access

We benchmarked 8 MCP servers across web search and extraction, as well as browser automation tasks, by running 4 different tasks 5 times on all suitable MCPs. We also performed a load test involving 250 concurrent AI agents.

AIOct 18

AI Reasoning Benchmark: MathR-Eval

We designed a new benchmark, Mathematical Reasoning Eval: MathR-Eval, to test the LLMs’ reasoning abilities, with 100 logical mathematics questions. Benchmark results Results show that OpenAI’s o1 and o3-mini are the best performing LLMs in our benchmark.

AIOct 9

Vibe coding: Great for MVP But Not Ready for Production

Vibe coding is a new term that has entered our lives with AI coding tools like Cursor. It means coding by only prompting. We made several benchmarks to test the vibe coding tools, and with our experience, we decided to prepare this detailed guide.

AIOct 10

8 AI Code Models Benchmarked: LMC-Eval

More than 37% of tasks performed on AI models are about computer programming and maths.

AIOct 20

Speech-to-Text Benchmark: Deepgram vs. Whisper

We benchmarked the leading speech-to-text (STT) providers, focusing specifically on healthcare applications. Our benchmark used real-world examples to assess transcription accuracy in medical contexts, where precision is crucial. Benchmark results Based on both WER and CER results, GPT-4o-transcribe demonstrates the highest transcription accuracy among all evaluated speech-to-text systems.

Agentic AIOct 18

Top 4 AI Search Engines Compared

Searching with LLMs has become a major alternative to Google search. We benchmarked the following AI search engines to see which one provides the most correct results: Benchmark results Deepseek is the leader of this benchmark, by correctly providing 57% of the data in our ground truth dataset.

AISep 29

AGI Benchmark: Can AI Generate Economic Value

AI will have its greatest impact when AI systems start to create economic value autonomously. We benchmarked whether frontier models can generate economic value. We prompted them to build a new digital application (e.g., website or mobile app) that can be monetized with a SaaS or advertising-based model.

AIOct 10

Best AI Code Editor: Cursor vs Windsurf vs Replit

Making an app without coding skills is highly trending right now. But can these tools successfully build and deploy an app? To answer this question, we benchmarked the following AI coding tools: Claude Code, Cline, Cursor, Replit and Windsurf Editor by Codeium.

AIApr 7

Top AI Website Generators Benchmarked

To find the most helpful prompt-to-website creator, we benchmarked the following tools: If you need to learn about no-code AI website generator tools, you can follow the links: Benchmark results We conducted this benchmark using the latest versions of the tools available as of January 2025.