Playback speed
Share post
Share post at current time

Tesla Autonomous Robots, Mosaic MTP-30B, & DeepMind RoboCar

AI Daily | 6.22.23

Join Ethan, Conner, and Farb in this riveting podcast episode as they explore the latest advancements in the AI and robotics space. Get ready to dive into the groundbreaking developments brought to you by Mosaic ML, Tesla, and DeepMind!

In this episode: 

1️⃣ Mosaic ML's Game-Changing Open Source LLM:

  • Mosaic ML has released their first commercially available open source LLM, with an impressive upgrade from 7 billion to 30 billion parameters.

  • This LLM offers a significant jump in usability, as it can handle an 8,000-token context compared to Llama's 2,048 tokens.

  • The availability of Mosaic's LLM and the ongoing battle between open source and closed-source models will shape the future of AI, ensuring knowledge and technology accessibility for all.

2️⃣ Tesla's Foundation Models for Autonomous Robots:

  • Tesla is making significant progress in building foundation models for autonomous robots, leveraging multimodal networks and incorporating camera videos, maps, and navigation data.

  • Their models are ontologically agnostic, predicting the likelihood of objects filling 3D space, which provides broad applicability across various scenarios.

  • With their expertise and the upcoming Dojo supercomputer production, Tesla is poised to tackle true robotics and make substantial advancements in the field.

3️⃣ DeepMind's Robocat:

  • DeepMind's RoboCat is a foundation model for operating robotic arms, capable of solving tasks with as few as a hundred demonstrations.

  • The model utilizes a unique feedback loop, fine-tuning itself and generating new data to improve performance and adapt to different tasks.

  • Simulated agents like RoboCat are crucial for advancing robotics, and increased GPU compute capabilities are key to further accelerating progress in the field.

Episode Links:

Follow us on Twitter:

Subscribe to our Substack:


Ethan: so let's kick off with our first story. This is Mosaic ml. They've dropped MPT-30B, so this is their first commercially available open source LLM. This is trained also on 8,000 context lane. Connor, tell us some more about this.

Conner: This is an upgrade of their previous T seven B. This is now of course 30 B. It's a jump from 7 billion parameters to 30 billion parameters. Very much mirrors how llama has a seven B and a 30 B. And this also very much mirrors llama capability. But of course the big difference here is that this is commercially usable where llama is not.

Uh, another big jump here is that it was trained on an eight K con. It has an eight K contact window. So while Llama could only take 2048 tokens, This can take 8,000 tokens like GP three or GP four can. So it's a very big jump, especially considering nowadays GP four and GP 3.5 can take up to 32,000 tokens in context.

So open source finally catching up in that regard and being able to take 8,000 tokens of context. It's a very big jump for usability of APIs, of really any chat of any platform whatsoever.

Ethan: Yeah, the benchmarks and examples looked really good too. Farb, what do you think this is gonna mean for the space?

Do you think llama's gonna start moving to try to be commercially available even sooner? What does this mean for the space in general? Who can use this?

Farb: You know, we've talked about llama becoming commercially available. My money is on it becoming commercially available at some point here. There is just too much advantage for the folks at Meta to do, you know, to not do that.

I don't think this is going to, you know, change what's happening in the open source source world in, you know, at a high level in the sense that open source is just gonna continue to plow forward. Uh, will it make a, you know, practical difference? Sure. Because it'll just be more, more options available to the open source world.

And, you know, one of the things that. People talk about is this battle between open source versus, you know, the, the, the closed operations that are, that are doing this, and they're like, well, open source is still not anywhere close to what these other folks are doing, but how long has open source been added?

You know, 20, 23, maybe like three, three months or something, four, four months, uh, the pace of development in the open source world, it's tough to know what's obviously happening behind closed doors at, at Deep Mind and open AI and meta. But if it's any faster than what's happening in the open source world, I would, I would kind of be shocked.

So, This is huge news for everybody. Uh, this sort of balance between the open source world and the closed source world, uh, is a true battle for the future of humanity. And whether or not we end up under the thumb of, you know, two or three, uh, tech giants, or we live in a world where knowledge and information and probably the most powerful technology in human history is available to all or available to a few.

So, you know, this isn't ending anytime soon. We're just in the. First few months of this, uh, imagine it, you know, five years from now.

Conner: I mean, we've seen, we've seen metas have the willingness to make commercially available models with things like image bind. Um, they probably didn't do with llama cause people, even with llama being non commercially, being available for non-commercial use, people attacked them for like liability.

But now that incident, yeah. Now that 30 B exists and now that Falcon 40 B exists, llama can, meta can say, well llama's pretty much equal to these other models out there.

Ethan: So they not, A mosaic also showed off their kind of enterprise inference and showing just how much cost savings they can drive for enterprises and startups compared to just. Hitting open AI's api. So really cool stuff outta Mosaic and always amazing to see.

Conner: One last thing on mosaic. They trained it uh, I think in an equivalent of one and a half million dollar for the base model of course, but then they're in Instruct and Chat. Fine Tunes only took like a thousand dollars each.

It's all on their platform. All their data sets they use to fine tune as open source and you can add on your own data. Very available to use for anybody really. Mosaic's a great platform.

Farb: Base model and instruct are commercially available. Chat not commercially available. Correct.

Ethan: Absolutely. Uh, let's move on to our second story, which was Tesla had an amazing kind of tweet thread here about them building foundation models for autonomous robots.

So they covered how multimodal their networks are. Taking in camera videos, maps, navigation. They covered some occupancy predictions, so a bunch of really cool videos giving a little bit more insight into what Tesla is doing. For their foundation models. You know, it's mostly, mostly targeted towards hiring, but for, what'd you think of this?

Farb: You know, it's kind of interesting because obviously this is nothing n new at Tesla. They've been doing this stuff for for years, but they're living in a different world now that now they're living in a world where it might have been considered creepy or weird or nobody cared if you were sharing this stuff, you know, uh, 12 months ago.

But today, you get all the kudos in the world from both the market itself, which is obviously hugely important to any public company. But also just, you know, the people, you know, uh, developers that wanna work on projects like this, it's free marketing in a sense. Uh, one of the cool things that the model does is it's sort of ontologically agnostic.

So what it's doing is it's predicting the likelihood of an object filling a space in three dimensions. So, excuse me, it doesn't really care. If the object is a cat or a house, or a tree, uh, it's trying to figure out whether or not there's going to be something that the robot could crash into, uh, somewhere in a 3D space, which means that they can, you know, apply this broadly to lots of different situations, whether it's driving a car or a robot walking down a hallway.

So that's a huge advantage for the folks at, at, at Tesla. They've been kind of building this from the ground up to be, you know, agnostic to the specifics of what's moving around, uh, which kind of gives them a lot more options and directions to move in than other folks who are building, you know, highly fine tuned models on just one type of space.

Conner: Absolutely.

Ethan: Connor, anything notable you saw out of this? Like, do you think they're in a good position to actually tackle true robotics? You know, they showed off their humanoid before they're trying to hire some more engineers. What's your take on this?

Conner: I do, like we talked about with meta's, computer vision papers from yesterday, a lot of these, a lot of this data applies to different types of models and different types of robotics.

So all, uh, as you said, far all the like agnostic occupancy models that they're building, using a lot of training and a lot of. A lot of data from the entire fleet of Teslas around the world. That's training for on a huge amount of compute, um, to build very big models and very capable models. Uh, we see they'll be even building their own chip sets for training with their whole Dojo chips and their whole dojo system of, of infants and training.

Um, so I think this will honestly apply very well from the Tesla cars to the Optimist.

Farb: Robot Dojo supercomputer production is supposed to be hitting next month.

Conner: Yeah. Yeah. Start in July, 2023, so yeah.

Ethan: I think they're in a really cool position to get some real robotics into the world as they've been doing for the past decade.

So really cool things out of them. And our last story today is DeepMind's RoboCat. So RoboCat is right now just a paper, but they had some really interesting perspectives and ways of tackling robotics. So this is a foundation model for operating robotic arms. Solves tasks in as few as a hundred demonstrations.

And the most important thing here was this really interesting feedback loop they had. So it actually improves from its own self-generated data, so it can, you know, be fine tuned on a task of, let's say, you know, cooking a steak and then it can finetune itself, learn and generate new data in order to cook something else.

So it had a really interesting feedback loop. Connor, tell us more about it. What'd you take from it?

Conner: Yeah, they, the feedback loop, you said they have five different steps. So they do between a hundred to a thousand of a manual robotic arm trained by a human, and then they fine tune RoboCat on that task with that specific arm, with that specific agent.

And then that agent practices again and again a hundred thousand times for more training data. And then they take that final new data in the original data and put it into the original dataset and then retrain robocon entirely. So that feedback loop is a very specific, very fine tuned feedback loop that gives it a huge swath of new data without a lot of work from researchers.

And because it's controlling it just a robotic arm, they can also simulate that entirely. It speeds it up even more. Something we talked about yesterday with Meta's own robotic arm in their own training system. So very efficient and very amazing way. They have to add new data to robo and then. From there, they can also generalize it to specific different tasks of robotic arms.

They can give it a new robotic arm and it can learn to go from two prongs of robotic arm to three prongs of a robotic arm, and seems like very capable model. So we'll see if any code comes out of it or if Google themselves DeepMind themselves, start using it. But from the paper or from their site, it seems like a very capable model.

Ethan: These simulated type of agents are extremely important to robotics. Cause of course we're operating in the real world. So actually putting all these examples together, getting the labs set up, all of that has been such a length of time and that's why robotics has lagged a bit far. We've, we've worked in the space and explored it for a long time now.

What do you think of something like this when you have these type of feedback loop, when you're taking these approaches, do you think we're gonna see robotics move even faster?

Farb: Yeah, I think so. You know, our, uh, our two prong advantage over the, uh, robot arms, uh, will slowly disappear, not slowly, will quickly disappear.

Uh, I think I'll get a little bit meta on this point cuz a lot of meta not the company, uh, meta the, the general word. Most companies won't talk about this too openly. I've heard a little bit of it here and there, but we are already living in a world of absolute GPU compute scarcity. Mm-hmm. Models like this that are, especially in a world where a model like this that's self-improving, if you can imagine that.

There were no limits on the G P U processing that somebody like DeepMind could access because as big as they are, as many computers as they have, they are limited by what they can apply that resource to at any given point. So if we were living in a world of unlimited G P U processing capabilities, these this thing would be sitting there training itself 24 hours a day on a million different GPUs, and it would be so much further than it.

Will actually be because we are limited by GPU Power and this is happening at every company. This is happening, you know, it's really the open source world of anything is really focused on trying to get more out of less. These big players are trying to get more outta more, but they can't really access them more so.

The number one thing any nation or any company could be doing today that's in this space is figuring out how to get more G P U compute going. It will change the world faster than anything else will probably in our lifetimes.

Conner: I think the more and more and the more of less is really the beautiful collaboration between these big players and the open source.

Like we've seen, again, to bring up Meta Big M the company, uh, we saw between Llama and G G M L, they. Which llama themselves, Facebook themselves switch from using the big originally trained llama that's expensive to run to using GML llama cpp, which is very efficient and very resource efficient to run.

So that collaboration we'll see with more models like this in the future. So, yeah.

Ethan: Yeah, and if RoboCat can kind of lay the foundation that we have in the L L M world, which is a ton of text, if it can lay that same thing for a ton of simulated robotic environments, I think we're getting closer to a foundation model for robotics, which would be amazing.

Um, as always, what else are y'all seeing? Farb?

Farb: Uh, I'm, I'm looking forward to seeing a steel da steel cage death match, uh, between, uh, Zuck and Elon Zuck and Musk, uh, battle to the Death. I'd love to really see them both in some hardcore, like wrestling outfits, you know? Um, yeah, and, uh, maybe do like a whole music intro type thing.

Uh, It's gonna be a, a fun time, even if they just keep talking about it and never man up and actually get in the cage with each other.

Conner: That's a call to action right there. Yeah, that's a call. That's definitely a call. Hopefully. Sorry, what about you? I saw that Disney's Secret Invasion, their new series on Disney Plus, uh, their new Marvel series actually used diffusion models for their opening credits.

Um, looks very good. It's a very interesting effect, but it is very interesting how even just from that, just so recently, it already looks pretty dated. Um, at considering the new mid journey V four, V five stable to Fusion xl. So, uh, as we've seen before, new technology and major releases, they can't really keep up.

So, absolutely.

Ethan: Yeah. I saw, uh, Dropbox Release, Dropbox Dash, which is their kind of AI powered universal search. Um, pretty cool tool, but just in general, you know, we touch on this all the time, but the speed at which these large enterprises are bringing AI into their products. You know, Dropbox, bringing universal search to enterprise.

I bet you Microsoft SharePoint adds it soon. So always just interesting seeing the rapid pace of enterprise. They also launched, uh, a Dropbox AI fund, so I believe it's about 50 million. So, Good things outta Dropbox. But as always, thank you all for tuning into AI Daily and we will see you again tomorrow.

AI Daily
AI Daily
AI Daily