Making an app without coding skills is highly trending right now. But can these tools successfully build and deploy an app? To find an answer to this question, we made this benchmark with the following AI coding tools:
- Cursor
- Replit
- Windsurf Editor by Codeium
- Cline
- Claude Code
We prepared 2 tasks for the agents:
Benchmark results
Prompt-to-API benchmark
None of these tools could build a correctly functioning API from a Swagger API documentation with a single prompt.
We attempted to create the API 2 times with the same Swagger file and prompt:
- First attempt:
- Windsurf created the API but its API failed our unit tests for some of the endpoints. 10/15 endpoints are working correctly.
- Cursor failed to create an API.
- Cline failed to create an API with the correct endpoints.
- Claude Code cannot deploy the API to Heroku.
- Second attempt:
- All failed to create a working API.
Replit agent does not support creating an API based on our specifications. Since it did not support Laravel Lumen and Heroku, it suggested an alternative way with the same API functionality. We did not accept the alternative solution to keep the benchmark as fair as possible.
During the API creation process, Cursor and Windsurf initially attempted to use PostgreSQL Hobby Dev for the Heroku deployment. However, this revealed a limitation in their knowledge of current Heroku add-ons, as Hobby Dev is no longer supported. Eventually, both tools managed to correctly identify and configure PostgreSQL Essential 0 tier, which is currently Heroku’s most economical PostgreSQL offering.
This demonstrates how these AI tools can adapt their recommendations, though there might be a delay in their knowledge of platform-specific changes in service offerings.
Cline created a correctly working API, but its endpoints were different than the prompted documentation, so we rated it as 0.
All tools offer agentic features, which means they can autonomously perform multiple development tasks. These include writing code, creating file structures, modifying existing code, and generating terminal commands. They can also execute terminal commands and display their outputs directly in their chat interface, making the development process faster.
We did not try to create a UI for this task. If you are interested in screenshot-to-code benchmarks and prompt-to-website benchmarks, you can see our articles.
Methodology
This benchmark uses Cursor’s Composer mode and Windsurf Editor’s Cascade mode, with Claude Sonnet 3.5 as the LLM.
Cline and Claude Code used with Claude Sonnet 3.7, and the task will be updated with Claude Sonnet 3.7 used in Cursor and Windsurf.
Prompt: I have a Swagger API Documentation export file (library.json) that defines my API specification. Please help me create a Laravel Lumen Micro REST API based on this specification that will be deployed to Heroku.
We only prompt the tools once with our Swagger file and allow them to use their agentic features. They were expected to build and deploy the app.
Our Swagger file was prepared carefully to cover the whole API without any mistakes.
Please note that we did not make any further prompting to create a working API, since it will harm the objectivity of this task.
App building benchmark results
We tried to build a basic to-do app since it is one of the first apps every developer builds. Claude Code is the leader of this task. See the results of this task below:
Claude Code
Claude Code is the most successful one, the only functionality missing was the drag-and-drop feature of the tasks.


Cline
Cline was able to code the app, but the buttons were not working, so we didn’t test it to see the functionalities.
Replit Agent
Replit Agent was the fastest, it coded the app in almost 5 minutes. At first, the app looked fine, but when we tested it we saw there were some missing features and functionalities.

For example, when we checked a task as done, all other open tasks were also marked as complete and their contents were overwritten. We decided to share up to 5 such errors with the agent. However, the agent couldn’t debug these errors.

Windsurf Editor
Windsurf Editor coded the app in almost 20 minutes. It failed to create an appealing UI. Drag and drop functionality, task editing, and import and export buttons are not correctly working.


Cursor
Coding the app with Cursor was unsuccessful, we tried for more than an hour and it could not provide an app. Since Cursor couldn’t manage to solve the problem after our 5 error-solving attempts, it failed in this task.
Methodology
Our prompt:
Todo App Development Requirements
Create a modern, responsive Todo application using React with the following specifications:
Core Features
- Task Management
- Add new tasks with title and optional description
- Mark tasks as complete/incomplete
- Edit existing tasks
- Delete tasks
- Bulk actions (select multiple tasks for deletion or status change)
- Rich text support for task descriptions
- Task Organization
- Categories/Labels for tasks
- Priority levels (High, Medium, Low)
- Due dates with reminder functionality
- Sort tasks by different criteria (due date, priority, status)
- Filter tasks by status, category, and priority
- Search functionality for tasks
- User Experience
- Drag and drop reordering of tasks
- Keyboard shortcuts for common actions
- Responsive design (mobile-first approach)
- Dark/Light theme support
- Loading states and error handling
- Animations for task actions
- Data Management
- Persist data in localStorage
- Export/Import task data (JSON format)
- Undo/Redo functionality for actions
- Data validation and sanitization
Technical Requirements
Frontend
- Use React 18+ with TypeScript
- State management with React Context or Redux Toolkit
- Styling with Tailwind CSS
- Form handling with React Hook Form
- Date handling with date-fns
- Schema validation with Zod
- Testing with Jest and React Testing Library
Component Structure
- App Container
- Theme provider
- Global state provider
- Router setup
- Task Components
- TaskList (main container)
- TaskItem (individual task)
- TaskForm (add/edit task)
- TaskFilters (filtering options)
- TaskSearch (search functionality)
- UI Components
- Button (reusable)
- Input (reusable)
- Modal (for edit/delete confirmations)
- Dropdown (for filters/sorting)
- Checkbox (for task completion)
- Toast notifications
Data Structure
interface Task {
id: string;
title: string;
description?: string;
completed: boolean;
createdAt: Date;
updatedAt: Date;
dueDate?: Date;
priority: 'high' | 'medium' | 'low';
categories: string[];
}
interface Category {
id: string;
name: string;
color: string;
}
Features Implementation Order
- Basic task CRUD operations
- Task status management
- Categories and priorities
- Filtering and sorting
- Search functionality
- Drag and drop reordering
- Data persistence
- Theme support
- Keyboard shortcuts
- Export/Import functionality
Non-functional Requirements
- Performance
- Optimize rendering with React.memo where needed
- Implement virtualization for large lists
- Lazy loading for non-critical components
- Accessibility
- ARIA labels and roles
- Keyboard navigation
- High contrast mode support
- Screen reader friendly
- Code Quality
- ESLint and Prettier configuration
- Git hooks with Husky
- Consistent code formatting
- Comprehensive documentation
- Unit and integration tests
- Error Handling
- Graceful error boundaries
- User-friendly error messages
- Logging mechanism
- Retry mechanisms for operations
Additional Considerations
- Implement proper loading states for async operations
- Add confirmation dialogs for destructive actions
- Include proper input validation and error messages
- Implement proper debouncing for search
- Add tooltips for action buttons
- Include empty states for lists
- Add proper focus management
- Implement proper color contrast ratios
Please implement the features in the order specified, ensuring each component is properly tested before moving to the next feature. Follow React best practices and ensure the code is well-documented.”
Todo App Benchmark Scoring (100 points)
Basic Features (35 points)
- Add task: 5
- Edit task: 5
- Delete task: 5
- Mark complete/incomplete: 5
- Multi-select/Bulk actions: 5
- Search tasks: 5
- Sort tasks: 5
Advanced Features (35 points)
- Categories/Labels: 7
- Priority levels: 7
- Due dates: 7
- Filter by status/category/priority: 7
- Drag and drop reordering: 7
UI Features (30 points)
- Responsive design (mobile/desktop): 10
- Dark/Light theme toggle: 5
- Animations for actions: 5
- Keyboard shortcuts: 5
- Export/Import data: 5
Our only intervention in the coding process was sharing the errors (up to 5 errors) with the agents or saying “Continue please” when the agents asked us whether to continue.
Pricing
Monthly pro plan costs of the tools as of January 2025:
- Windsurf Editor by Codeium: $15
- Claude Code: $3.6 for two tasks, it is an API-based pricing.
- Cursor: $20
- Replit: $25
- Cline: $4.9 for two tasks, it is an API-based pricing.
To learn the features of these tools, you can read our article about AI coding assistants.
Next steps
We will add more tasks to explore their abilities and limits further.
What are the best practices?
To preserve the objectivity of this benchmark, we did not engage in further prompting and debugging. In reality, getting better results is possible with prompting to solve problems.
Preparing detailed documentation helps tools to create better apps.
Knowledge of coding, databases, and deployment options helps get better results.
These tools can be used to help developers get the best results.
Key features of AI code editors
AI code editors provide:
Intelligent code suggestions: Real-time code suggestions based on the context of the code.
Code completion: Completion of code blocks, functions, and methods.
Code reviews: Review existing code and provide suggestions for improvement.
Bug fixes: Detect and fix bugs in the code.
Syntax highlighting: Syntax highlighting to make the code more readable.
Integration with existing workflows
FAQ
More on AI coding:

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Comments 0
Share Your Thoughts
Your email address will not be published. All fields are required.