Do you remember when Sam Alman briefly hinted at investing in his very own robotic startup? Well, there’s some big news on that front. Take a look at the latest developments.

Yeah, oh! So here’s that Twitter post. This is Eric Jang, and he’s saying, “Every behavior you see in this video is controlled from pixels to actions with a single neural net architecture. No teleoperation. This is where another human being controls the robot and kind of guides it.”

No scripted replay or task-specific code. No CGI. All-in-one continuous video shot. And they have a blog post describing how they did it. Let’s take a look at that in just a second.

He’s saying, “Over the last few months, we’ve convinced ourselves that our AI strategy can scale to all kinds of short mobile manipulation tasks, and these capabilities can be added by our tele-operators who have no formal training in machine learning. We were planning to throw in some coffee making as an Easter egg, but the coffee machine broke right before the film shoot, and it turns out it’s impossible to get a Curig K Slim in Norway via next-day shipping.”

Upon further research, it turns out that Curig K Slim is by far the easiest coffee machine to make coffee with, so much so that there are even articulated versions of it. If you’re not sure what he’s referring to, he’s probably referring to the demo by Figure AI. So this Figure robot that basically opens up a Curig coffee thing, puts in the little tab, and closes it. The reality is that very often, people use this particular machine for demonstrating robotics because of just how simple and easy it is to open and close.

“What’s next? We’re focused on increasing task generalization so they can run unmonitored for longer duration and deal with more scenarios not seen in the fine-tuning data. We’re also hiring on both the AIT and Studio team in Mountain View; it certainly seems like a good opportunity. And being that close to Open AI, both in terms of where you’re working and the physical location, that can’t be a bad thing. They even have an Android operator role; they’re looking for somebody that has great attention to detail, is creative, and highly motivated to make robots do more tasks.”

Eric also wrote the book “AI is good for you,” which I’ve heard great things about. Haven’t read it yet but definitely seems like a good read. And here’s their blog post talking about how they did it. It’s called “All New Neural Networks, All Autonomous, All ONX Speed.”

The reason they’re highlighting this is that you might have seen a lot of robotic demonstrations recently, and for most people, it’d be hard to differentiate between what’s really advanced versus not. Just because a robot can do backflips or run really fast or whatever that might be cool, but that’s not necessarily what a lot of people are working towards. What we’re working towards is a general robot, one that is capable of understanding how to interact and solve new, never-before-seen tasks, like adapting generally to new environments and new challenges, etc.

Instead of being hardcoded by some very smart engineer to do a particular thing, the robot itself is learning and improving how to do it. And, of course, all ONX Speed just means that there’s no camera tricks. You’re not speeding up how fast it’s going. You’re not cutting out the pauses between when taking actions. It’s sort of just real, uncut, unedited footage.

So, their mission is to provide an abundant supply of physical labor via safe, intelligent robots. Their environments are designed for humans, so they designed their hardware to take after the human form for maximum generality. To make the best use of this general-purpose hardware, they also pursue the maximally general approach to autonomy, learning motor behaviors end-to-end from vision using neural networks.

They deployed the system for patrolling tasks in 2023, and they are now excited to share some of the new capabilities their androids have learned purely end-to-end from data. The Eve robots are actually patrolling real buildings at night right now. There are several companies that, I believe, either purchased or rented them. They’re adding some flare to the demo, but they’re already making money for the company by being deployed in real-world scenarios.

Every behavior you see in the above video (the first video that you saw) is controlled by a single vision-based neural network that emits actions at 10 HZ. The neural network consumes images and emits actions to control the driving, the arms, gripper, torso, and head. The video contains no teleoperation, no computer graphics, no cuts, no video speedups, no scripted trajectory playback. It’s all controlled via neural networks, all autonomous, all ONX speed.

To train the machine learning models that generate these behaviors, we have assembled a high-quality, diverse dataset of demonstrations across 30 eve robots. We use that data to train a base model that understands a broad set of physical behaviors from cleaning to tidying homes to picking up objects to interacting socially with humans and other robots. We then fine-tune that model into more specific family of capabilities, e.g., for General door manipulation and another for warehouse tasks, and then fine-tune those models further to align the behavior with solving specific tasks.

This strategy allows us to onboard new skills in just a few minutes of data collection and training on a desktop GPU. All the capabilities shown in the video were trained by our android operators. They represent a new generation of software 2.0 engineers who express robot capabilities through data. Our ability to teach our robots short mobile manipulation skills is no longer constrained by the number of AI engineers.

So, if you needed to train something like this for yourself instead of hiring a very expensive software engineer, somebody with a tech background, it seems like instead, you might be able to get someone, even yourself, to kind of mimic the motions needed to complete a task. This would be the data on which the robot is trained. You’d be able to teach it skills, how to open the specific doors in your house, how to interact with specific objects in your house, etc.

They have two roles they’re hiring, area researchers in the San Francisco Bay Area, and they’re also hiring android operators in both Oslo and Mountain View offices to collect data, train models with that data, and evaluate those models. Unlike most data collection jobs, our operators are empowered to train their own models to automate their own tasks and think deeply about how data maps to learned robot behavior.

What’s interesting to me here is that, you know how they say, as the AI takes over more jobs, they’re hoping that this Innovation will create new jobs that we haven’t really seen before? Well, certainly, this is one of them. The robot Whisperer, the robot teacher. You teach robots how to do certain very specific tasks, and it does look like maybe even a huge tech background is not necessary here.

They mentioned things like having good coordination, motor skills, dexterity, nice to have, experience with virtual reality, and experience within technical vehicles like robots, androids, or drones. If you’ve played around with virtual reality, have hundreds of hours clocked with GTA, and have some experience flying a drone around, and want to transition to this exciting new field of Robotics, I feel like if you’re in the Bay Area or you’re in Norway next to their original headquarters, like this could be huge for you.

I hope you enjoyed that, and I can’t wait for one of these to start helping me around in the house, folding clothes to me is like the most annoying task. If I can teach it to do that, I’d be pretty set. Anyways, my name is Wes Rth, and thank you for reading.


Please enter your comment!
Please enter your name here