There’s a small, previously unhyped company called Rabbit that has quietly created a large, action model, Lamb, an AI agent capable of executing tasks on your behalf. The company has just announced its R1 service, a reimagination of the computer and smartphone, powered almost entirely by its large action model.
If you haven’t heard about Rabbit before, you need to take a look at their keynote. Hi everyone, my name is Jesse, and I’m the founder and CEO of Rabbit. I’m so excited to be here today to present two revolutionary things we’ve been working on: a foundational model and a groundbreaking consumer mobile device powered by it.
Our mission is to create the simplest computer, something so intuitive that you don’t need to learn how to use it. The best way to achieve this is to break away from app-based operating systems currently used by smartphones. Instead, we envision a natural language-centered approach. The computer we’re building, which we call a companion, should be able to talk, understand, and, more importantly, get things done for you. The future of human-machine interfaces should be more intuitive.
Before we get started, let’s take a look at the existing mobile devices that we use daily. The smartphone, like iPhone and Android phones, has been around for years. The problem with these devices, however, is not the hardware form factor but what’s inside – the app-based operating system. Want to get a ride to the office? There’s an app for that. Want to buy groceries? There’s another app for that. Each time you want to do something, you fumble through multiple pages and folders to find the app you want to use, and there are always endless buttons that you need to click, add to the cart, go to the next page, check the boxes, and jumping back and forth. The smartphone was supposed to be intuitive, but with hundreds of apps on your phone today that don’t work together, it no longer is.
If you look at the top-ranking apps on app stores today, you’ll find that most of them focus on entertainment. Our smartphones have become the best device to kill time instead of saving it. Many people before us have tried to build simpler and more intuitive computers with AI. A decade ago, companies like Apple, Microsoft, and Amazon made Siri, Cortana, and Alexa. With these smart speakers, they either don’t know what you’re talking about or fail to accomplish the tasks we ask for.
Recent achievements in large language models (LLMs), a type of AI technology, have made it much easier for machines to understand you. The popularity of LLM chatbots over the past years has shown that a natural language-based experience is the path forward. However, where these assistants struggle is still getting things done. For example, if you go to the chatbot and use your Expedia plug-in to book a ticket, it can suggest options but ultimately cannot assist you in completing the booking process from start to finish. The problem is forcing a model to perform a task it is not designed for. We have yet to produce an agent as good as users simply clicking the buttons.
In order to achieve our vision of a delightful, intuitive companion, we must go beyond a piece of complex software. We want it to be in the hands of everyone. So, we first set out to fundamentally understand how computer apps are structured and, more importantly, how humans interact with them. We wanted to find a way for our AI to trigger actions on behalf of users across all environments, not just a limited set of apps.
These applications share something in common: the interface. They all have a user interface. So, at a philosophical level, if we can make an AI trigger actions on any kind of interface, just like a human would, we will solve the problem. This insight led us to create the large action model, or Lamb, as we call it. It is a new foundational model that understands and executes human intentions on computers.
Driven by our research in neuro-symbolic systems, with a large action model, we fundamentally find a solution to the challenges that apps, APIs, or agents face. We solve it with interfaces. Lamb can learn any interface from any software, regardless of which platform they’re running on. In short, the large language model understands what you say, but the large action model gets things done. We use Lamb to bring AI from words to action.
Finally, we can build a computer that, in addition to understanding what you’re trying to say, can actually help you do things on your behalf. We pack the large action model into an advanced Rabbit OS operating system. It is built for real-time interactions between you and the Rabbit, powered by Lamb.
The concept and test results of the large action model are so powerful that we decided to make a one-of-a-kind mobile device. Introducing R1, your pocket companion. Designed in collaboration with teenage engineering, the R1 is a standalone device primarily driven by natural language. It comes with a touchscreen, a push-to-talk button, an analog scroll wheel, a microphone, speakers, and a computer vision-enabled 360-degree rotational camera, the Rabbit Eye. It is Bluetooth and Wi-Fi capable, and it has a global 4G LTE network SIM card slot.
R1 allows you to ask for anything like using a chatbot, but with the speed of Rabbit OS, the response time is 10 times faster than most voice AI projects. With the push-to-talk button, you don’t need to say anything to wake it up, just press and hold the button and talk like a walkie-talkie. R1 is equipped with everything you need to interact with and perceive your surroundings. It also comes with a built-in real-time translator and a note-taker, removing the final barriers to communication.
Rabbit R1 can interact with all kinds of applications through the Rabbit Hole web portal. The web portal allows you to log into different services and unlock functionalities for R1. It is just like iCloud. You can choose your preferred services and log in through them to get authenticated on your device. With the Rabbit Hole web portal, you have complete control over which services you want to activate on your R1 and which provider you prefer for music, for example.
In addition, R1 can make complex actions that may take multiple apps to finish on a smartphone. For example, it can help plan an entire trip for you, book tickets, get a ride, and order food. It even has a teach mode, where you can show R1 how to do something, and it will learn from you.
The Rabbit Eye onboard camera is designed for advanced computer vision and can analyze surroundings and take actions in real time. It can automatically detect the language spoken in the environment and provide bidirectional translation. It also comes equipped with a global 4G LTE SIM card slot, enabling worldwide connectivity.
Now, you might be wondering about the price. R1 is priced at $199, with no subscription or hidden fees. You can pre-order it now at rabbit.com, and it is expected to start shipping in early 2024. And remember, you have 14 days to cancel your order if you change your mind.
Rabbit has also showcased some of their research, highlighting their achievement in learning human actions on computer applications. Their large action model, Lamb, is the highest among all models when compared for web navigation tasks and has shown early signs of competitiveness. They have introduced Lamb as a model rooted in imitation or learning by demonstration, observing humans using the interface and replicating the processes reliably.
With all these advancements, Rabbit R1 seems to be the future of AI-powered operating systems, and it is setting a new standard for intuitive and seamless interactions with technology.
Some have raised questions about the authenticity and functionality of Rabbit R1, but we won’t know until the first people start receiving their R1s in a few months from now. This is a new generation of devices, and it looks like the future is AI-powered operating systems that complete tasks on your behalf.
So, what do you think? Do you think Rabbit R1 is real, and would you consider getting it to help you complete various tasks on your computer and phone? Let me know in the comments, and let’s continue the conversation.