AWSβs Claude-powered IDE generates spec-driven code and tests, but keep an eye on the results.
Kiro is the new Amazon Web Services IDE for creating software projects using agentic AI. A developer using Kiro creates a specification for the desired program, and Kiro uses Claude Sonnet (3.7 or 4.0) to iteratively generate a set of requirements, a design document, and a task list for building the application. You can oversee each step of the process, intervene to make changes to the specs or commands, or allow the system to run on autopilot.
I was able to set up a copy of Kiro while it was still in open preview; itβs since been restricted to a waitlist, as demand quickly outstripped capacity. Even after AWS added more capacity over the weekend following Kiroβs first preview, I still experienced timeouts from the Claude Sonnet API. But Iβll focus here on the overall design and flavor of the IDE, rather than performance issues.
Setting up a Kiro project
Kiro is built atop a forked version of Visual Studio Code, so itβs easy for users of VS Code to jump right in. Itβs unclear why Kiro is an entirely distinct product rather than just a suite of plugins for the VS Code IDEβperhaps itβs to avoid competing with Microsoftβs own Copilot. Regardless, you can migrate plugins from your existing VS Code setup into Kiro if you want to use them as part of the Kiro development process, or just install them from the OpenVSX marketplace.
When you open a new project folder in Kiro, youβre given a Claude Sonnet prompt, which you can use to either βvibeβ (describe your project in the most general way possible and drill down from there) or βspecβ (use a formal design as described earlier). The βvibeβ choice is suitable for simple one-offs, and I used that to generate a Python project that checks if the virtual environments for other Python projects are invalid. It wasnβt difficult to verify if the resulting code worked, as I now have dozens of projects with busted venvs thanks to a recent system upgrade.

Designing a simple command-line tool in Kiro for checking if Python projects have valid virtual environments.
IDG
For the βspecβ choice, I set up a more ambitious project: a command-line-driven static site generator. Here, Kiro uses your prompts to generate the requirements, design, and task list documents to set up a step-by-step creation process.
You can also create βsteeringβ documentation, which provides constant rules for how Kiro is to interpret your instructionsβwhat your project is meant to be, what stack to use for it, and how to lay the files out in the repository. These documents can be generated near the start of the project and then modified to guide how things develop, or generated after the fact and used to guide any AI-driven code refinements.
The generated requirements document follows the familiar βuser storyβ/βWHEN/THENβ pattern found in agile development:

An example of a generated agile requirements document in Kiro.
IDG
The design document describes the components and architecture for the application, and can go into a lot more detail than the technical stack document:

An example of a Kiro design document.
IDG
The task list details every step of the projectβs authoring. Itβs interactive, and you can trigger it by clicking automatically generated βstartβ or βretryβ links at each step:

A Kiro task list details every step of developing the application.
IDG
Guiding Kiro through the development process
With each item on the task list, Kiro provides live feedback about what itβs doing, what files itβs editing, and what commands it needs to run. At every step, you can intervene manually by changing files or altering the commands to be run:

Kiro provides live feedback throughout the development process.
IDG
You do have the option to place Kiro on autopilot and attempt to let it generate as much as possible on its own, but I elected to babysit each step to see the feedback in detail. This turned out to be necessary, as I had to intervene constantlyβfor instance, to ensure all Python commands were run with py (on Windows). Instructing Kiro to do this consistently only worked once, and modifying the steering documents to explicitly mention it didnβt seem to help, either.
Many of Kiroβs automated actions take several minutes to complete, so you can switch away, do other work in another window, and have Kiro ping you with a system-tray notification when it needs your attention. If you need to stop what youβre doing and come backβe.g., to restart for updatesβKiro will re-read the documentation for your project to figure out where it left off. However, it takes some minutes to do this, and more than once, I experienced API timeouts when trying to resume a session.
One curious defect of the way Kiro works with code is that it doesnβt seem to attempt any mechanical linting or syntax-checking before running it. Many code examplesβsource and tests alikeβhad syntax errors that could have been easily detected with a separate linting step:

Syntax errors in Python code generated by Kiro.
IDG
Test-driven development in Kiro
At every step of a projectβs construction, Kiro writes unit tests and attempts to validate them. If a test fails, Kiro will describe what it thinks went wrong, attempt to revise the test, and retry.
One initial example of a problematic test involved a preview server for the static site generator. The way the test was written didnβt seem to account for the need to stop the server after its tests were done. After I explained the problem to Kiro, it suggested some fixes:

A unit test in a Kiro project.
IDG
The first proposed set of fixes didnβt help the problem. Kiro then proposed to completely rewrite the tests:

Another unit test in a Kiro project.
IDG
Those tests still failed, so it backed up and tried another approach. This required replacing all the calls to the failing tests as well, which took some additional tries (and a couple of Claude timeouts). Unfortunately, after another timeout, Kiro reverted to thinking the test was still hanging (it wasnβt) and I was forced to restart that entire step to avoid more problems.
I also found the generated test suite covered some things very well and other things not at all. For instance, in the final integration steps, tests failed because the sample content created for the static site generator had no templates. Despite this being raised in a test, Kiro didnβt create any of the missing templates by the time it signed off on the last checklist item. Additionally, one of the sample posts, about βPython best practices,β was garbled nonsense.
Final thoughts
By now, most everyone is aware of the limits and outright hazards of AI-generated code. Kiroβs iterative, document- and guideline-driven design attempts to address both of those problems, but even these solutions only stretch so far. The limits of context sizes with generative AI, for instance, still lead to problems when building projects more than a few files in size. Also, since the design documents are evaluated in the same way as any other instructions sent to the model, thereβs no guarantee about how consistently they will be evaluated.
AI-generated code also tends to be verbose or overengineered, and Kiro-generated code has that flavor. The generated virtual-environment checker was 230 lines of Python, and included various command-line switches (including an export-to-JSON option) that are convenient and useful, but werenβt explicitly requested. A basic version of that tool wouldnβt need more than two dozen lines of code.
The other big issue with Kiro is how all its useful functionality comes from the Claude Sonnet API, and the constant timeouts and loss of context make using it a bumpy experience. Itβll be worth re-evaluating Kiro when its back-end capacity has been expanded to support a full release productβalthough I can imagine a competing version of this product that uses a compact local model for those with a decently powerful machine to run it.


