Imagine that you have an idea for an app that needs to understand language. You wouldn't start by training foundation models and fine-tuning it for your specific use case or build the entire AI stack from scratch. Instead, you would just make API calls to existing foundation models, and the core intelligence layer for your app is already there. Existing foundation models provide a ready-made intelligence layer that any developer can build on.
Robotics is not like this, yet.
If you want to build a robotic system to solve a real-world problem, the off-the-shelf models and libraries that software developers take for granted don't exist. You would need to implement your own controllers, build out data pipelines, and train your own models, essentially constructing the entire physical intelligence stack before you can even start on the actual robot application. Many of these components are not just hard to build, but actually represent major open research problems. There is no widely known effective recipe, let alone an off-the-shelf API.
To make robotics applications as practical and ubiquitous as AI-powered apps, we need a ready-made physical intelligence layer that is accessible and reusable. General-purpose robotic foundation models, such as π0, π0.5, π0.6, or π*0.6, provide such a layer, significantly lowering the cost and effort needed to build useful robotic systems. When anyone can access powerful robotic foundation models, we will see a proliferation of different robotics applications across every area of society, from homes to hospitals, offices, warehouses, and factories.
We collaborate with companies that specialize in deploying robots, demonstrating our models perform well in real-world settings and to stress-test their generality across diverse tasks and environments.
Continuous shot of π0.6 folding laundry at Weave's customer site.
Weave builds robots for the home. Earlier commercial versions are deployed in businesses across the San Francisco Bay Area, performing the same task we're focused on for homes: laundry folding.
Laundry folding is a compelling first task for home robotics. It is universal, time-consuming, and attention-intensive. The workload scales directly with household size, making it a recurring time sink for families.
At the same time, laundry folding has long been considered one of the most challenging manipulation problems in robotics. Folding itself is a long-horizon task, and even a slightly misplaced corner can affect final fold quality. Clothing is deformable and highly variable and garments differ in size, fabric, color and shape.
But because we fold for customers in laundromats as well as homes, doing the job means our robots have to fold as many clothes as possible. That means not just t-shirts but also long-sleeves, shorts, pants, and more—in wildly varying sizes and fabrics.
Working with Physical Intelligence, we see multiple improvements in model performance in terms of fold quality, time to fold each article, the number of interventions our remote specialists have to make to get to presentable final folds. We wanted to push autonomous folding performance with Physical Intelligence models at different locations, including at one of our customer sites.
Our results working with Physical Intelligence are very encouraging as we work to make the best possible robots and ship to new customers every week.
Continuous shot of π0.6 packaging orders at Ultra's customer site for a full shift at 96.4% autonomy.
Ultra builds industrial AI robots that slot into existing workstations and begin doing valuable work within hours. Today we're automating mission-critical warehouse tasks with a fleet of revenue-generating robots deployed across the US, now scaling toward hundreds of deployments. Our goal is simple: build the most productive, easy-to-use industrial robots ever made.
Our first use case, e-commerce order packaging, has historically been impossible to automate with robots. Large variability in workflow, item types, deformable packaging, and external machinery have created a "long tail" of problems that have been intractable to solve with traditional automation techniques which are often too rigid to be practical. Vision-language-action models (VLAs) provide a way to solve this by providing a recipe which improves in performance with data scale rather than engineering hours without reducing flexibility in the ways customers can use the robot.
Throughout our time working with Physical Intelligence, we have seen considerable improvements in real-world autonomous performance at our customer deployments. With each new model generation (π0 to π0.5 to π0.6) we have observed a significant step up in intelligence, throughput, and reliability with no signs of slowing down. With π0.6, we observe noticeable improvements in cycle times and success rates which compound to mean higher throughput for our customers and less fatigue for our remote intervention team.
In addition to quantitative improvements to cycle time and success rates with π0.6, we have also noticed substantial qualitative improvements in several aspects of the model's behavior.
These models run on our robots in live customer warehouses, packing real orders every day. When the model stumbles, our human-in-the-loop intervention system steps in to ensure correctness across thousands of orders — while continuously generating new data that improves the next model iteration.
The examples above show that the models we develop can be used for real, useful applications, and our partners are able to interface their own hardware to our models to facilitate the tasks that matter most to them. We plan to make our models broadly useful and applicable, so that roboticists can use them to power their own applications in the same way that developers use API-based LLMs. By providing easy access to the physical intelligence layer, we hope to reduce the cost of exploring new robot applications, morphologies, and use cases. We're working with a number of other partners that we're excited to talk about soon. If you are interested in working with us, please get in touch.