AI Breakthroughs: New LIMA Model, CoDi Multimodal Generation, Mind-to-Video, & ChatGPT App Reviews.

Playback speed

Share post at current time

Share from 0:00

0:00

AI Breakthroughs: New LIMA Model, CoDi Multimodal Generation, Mind-to-Video, & ChatGPT App Reviews.

AI Daily | 5.22.23

AI Daily

May 23, 2023

In this episode, we delve into three exciting news stories. First up, we explore the remarkable LIMA model, a 65 billion parameter model that performs almost as well as GPT4 and even outperforms Bard and Da Vinci! Find out how Meta's innovative approach is revolutionizing the open source AI realm. Next, we unravel the fascinating CoDi model, which utilizes compositional diffusion for generating multimodal outputs. Learn how this powerful model can transform text and audio inputs into stunning videos. Lastly, we uncover the mind-boggling Mind-to-Video technology that reconstructs videos based on brain activity. Join us as we discuss the possibilities and implications of mapping the human mind. We also discuss hilarious ChatGPT App reviews, which you won’t want to miss!

Main Take-Aways:

LIMA Model

The Lima model is a large 65 billion parameter LLaMa model that was fine-tuned on a thousand carefully curated responses.
It performed almost as well as GPT4 and even better than models like Bard and Da Vinci.
Meta has released the Lima model as an open source model, showcasing their commitment to staying up-to-date in the AI field.
Lima's approach differs from other models by using supervised examples instead of human feedback, resulting in impressive responses.
This development is significant because it brings another open source model closer to competing with the massive models trained by OpenAI, providing an alternative approach to alignment.

CoDi

CoDi is a model that specializes in any-to-any generation using compositional diffusion.
Unlike other models that primarily process text, CoDi is designed for multimodal tasks, allowing users to input combinations of text, audio, and video to generate corresponding outputs.
CoDi can generate videos based on text and audio inputs or produce new text outputs based on two different text inputs.
Understanding context is crucial for CoDi, as it needs to comprehend the relationships between different modalities to provide accurate and comprehensive results.
CoDi appears to be an open-source model, potentially an enhanced version of Meta's previously released ImageBART, although a direct comparison has not been made yet. The code for CoDi is likely available for use.

Mind-to-Video:

The team has developed a mind-meets-video model that reconstructs videos based on brain activity captured through fMRIs.
The training process involves pairing fMRI data with corresponding videos, allowing the model to learn the relationship between brain signals and video content.
The model aims to capture what a person is remembering or perceiving by analyzing their brain activity and finding similar videos from the training set.
Although it is not yet capable of mind reading, the model provides insights into how the brain processes and represents visual information.
The team's previous work focused on mind-to-image generation, and this mind-to-video model represents an impressive advancement, achieving a 45% increase in accuracy compared to previous methods.

Links to Stories Mentioned:

Follow us on Twitter:

Transcript:

Farb: Good morning and welcome to AI Daily. We have, uh, three big stories for you here today. So let's just jump in. Our first story is about a LIMAmodel. This is pretty cool. This is a pretty large model here. Let's pull it, pull it up real quick. Um, can you tell us a little bit about this one, Connor?

Conner: Yeah, so it's a 65 billion parameter LLaMa model.

But instead of doing the real-time feedback, real-time learning through, through human feedback, they did a little bit of a fine tune on just a thousand carefully curated response responses. Um, and apparently it performed almost as well as GPT4 and better than Bard, and better than Da Vinci.

Farb: Yeah it looks here like it's, uh, almost as good as at, uh, as GPT4, and it seems like maybe sometimes it did better even, but, um, that, that's pretty impressive.

What, what, what does this mean, Ethan?

Ethan: Well, I mean, this is meta again, dropping another open-source model. Um, we'll see how the weights move, et cetera. But Meta's keeping up to date, you know, first with LLaMa and now with LIMA. And like Connor said, 65 billion parameters. So a pretty small model compared to something like GPT3 or GPT4

and instead of using all this human feedback like the other models do, they took thousand supervised examples and they simply fine-tuned on that. And like Connor said, we, they saw, I think, what was it, 43% better responses. Um, In certain cases versus GPT4, which is a top of the class model. So important cuz meta is staying on top of the, you know, open-source AI realm.

And secondly, because we have another open-source model getting closer and closer to competing with these huge models being trained by open ai. This is another take on alignment really.

Conner: We see how open AI does the human feedback. We've seen how philanthropic does the constitutional and the constitutional.

Parameters. Yep. And LIMA, out of Meta is now fine tuning on very carefully curated responses. So we'll see what win's out here. But it's another great way to do it looks like. So, yeah.

Farb: Connor, what do you think is the advantage in this approach?

Conner: Um, of course it takes a lot less input than human feedback takes.

This still looks like it probably takes more input than Philanthropic’s Constitutional. Um, but just a thousand prompts and responses carefully curated. A good sized team can do that pretty quickly, so yeah.

Farb: That’s, that's pretty impressive. I wonder how long they've been working on it. Oh, nice to see meta continuing to flex their muscles in the open.

Open space, open source, AI space, uh, or open space. You know, you wouldn't put it past them, uh, by any stretch. Let's move on to our next story. CoDi, this is pretty cool. I'm gonna load this up. Can you tell us a little bit about this, Ethan, while I get it, uh, loaded up here on the window.

Ethan: Yeah, so CoDi is what they call any, to any, um, generation.

They're using compositional diffusion. And I guess the simplest way to put it is, you know, most of these models you're taking in text, right? Stable diffusion. You give it a prompt, you take in some text and it's gonna output some audio, or maybe you put in some text and you get a text output, like an l l M, but Cody is a take on.

Kind of multimodal, um, that some people talk about. So you can put in, just like you're scrolling through, they have some really cool examples you can put in text and an audio clip and get some video taken out of that. So you put in some audio of a polar bear laughing and you put in some text of a polar bear, you know, sitting next to a branch and you're gonna get a video of something like that.

Same thing, vice versa. So you can put in text and another text and you get a completely new. Output based on those two things. So really important, you know, to understand context of the world, whether you give it a video, you need to understand the audio of that video. You need to understand what's happening in that video and maybe a text description and that's the way you get complete context out of it.

So multimodal, super important. They had some really great examples in there. And yeah, it's pretty cool.

Farb: This looks a lot like something meta dropped maybe a week or two ago, but this looks like this is maybe the open-source model. Is that right

Conner: Connor? Yeah, meta, uh, dropped image bind a couple weeks ago.

That was their, like multi-modality, um, entrance you could say. And this composable diffusion looks like an open source, maybe even a better version of it. We haven't seen them compared head-to-head yet, but could be.

Farb: So can we use this yet, Connor? Do you know, have you, have you dug into it all? Looks like they have the code available, don't they?

Conner: Yeah, I believe the code's available.

Farb: This is some powerful, freaky stuff here.

Very cool. Very cool. So, uh, let's move on to our last story here. Which is mind meets video. There have been a few other examples of this type of thing out there. They all seem, you know, a little bit earlier than maybe the, uh, than the demos seem to imply.

I dunno if you can see this here. Uh, there's a ground truth video, a reconstructed video. The idea here is they're sort of scanning your brain and trying to understand what video. You know, you're sort of remembering, uh, Connor, can you tell us a little bit about how they constructed this, uh, this little experiment in demo?

Conner: Yeah. My understanding is they had their training set, which is, uh, they take f MRIs of a large group of people watching a large group videos and they pair up each fMRI with each video of course. And then when they reconstruct it, the model basically looks at the new fMRI data from the new input. And then compares that to the data, trained on and finds a video that was similar.

Farb: Yeah so, from this model so not quite reading your mind, is it, it's, it's sort of thing, this is what your mind looks like, um, when, when you see this. Mm-hmm. Uh, or remember this, uh, and this is, it's similar to what other people's minds looked like. When they watch this. So let's create something as you know, inspired by that original video because that's must be what's going on through your mind.

So we're not quite at mind reading level, but it's still pretty cool and powerful and I guess you gotta. You know, this may be a way that they can map the human mind. You know, we'll see if there's, uh, similarities. If we can just be like, well, okay, we don't have, um, you know, these ground truth videos anymore.

But we know from so many, uh, fMRIs that if you're thinking about horses running through a field, these are the parts of your brain that are firing. So we don't, you know, need this other data anymore. We can sort of extrapolate. Uh, from, but in the end, you always need some training data to reference things, right?

Every, every AI is using some training data. It's not just gonna make things up out of nowhere. Ethan, you sound like you look like you're chomping at the bits to say something.

Ethan: So, uh, no, this, this stuffs just very exciting to me. At least this one is 45% more accurate than some of the ones we've seen in the past.

And it almost reminds me if y'all have watched the Minority Report with Tom Cruise and they're sitting there reading their minds and, you know, you're getting this video that's outputted, that's kind of blurry. You're not entirely sure. But the fact that we really don't know much about how the human brain works and can still reconstruct that, Hey, okay, you know, all this is firing and you're probably looking at a giraffe running through the field.

I mean, the search space of that is unbelievable. And the fact that these correlate pretty well is. Really cool to me. So yeah, I was chomping at the bit to say how amazing this is. It's possible.

Farb: Get outta my head.

Conner: Sorry. What's that? It's possible to take a language model where like if you have enough videos, if you have enough MRIs, maybe it could construct something more open.

Not in the train set. Yeah. Yeah. It's still interesting to see how the progresses. This team previously did another model that was only mine to image, so Mind-to-Video is pretty impressive.

Farb: oh. It was the same team. Yep. Yep. Okay. Yeah, 45% of the time it works all the time as as they say. Absolutely. Sounds great.

Well, those are our three big stories. What are y'all seeing out there in the AI world? Uh, Ethan Connor, and feel free to jump in.

Ethan: I, I, I got to see, uh, Lambda Labs demos. Lambda Labs is a GPT infrastructure provider. We've used them. We love the team there. And seeing them drop these demos kind of reminds me of, you know, a lot of people are using hugging face right now when they want to show off a demo or they want to try something.

And Lambda Lab's actually, you know, getting away from just being infrastructure kind of commodity and really delving into the whole AI community and the whole AI world. Putting these demos together, letting you show off demos, letting you deploy them on Lambda infrastructure is fantastic. So love the team there.

Give it a try if you haven't yet.

Farb: Let's show Lambda some more. Looks like they're looking for somebody that will, uh, let them train a massive LLM. Did I steal your story? Connor Did. They're, um, you got, you got another one? Yeah. It looks like they're, they're looking for a partner to train, a train an LLM with, which is really cool, really smart on their part.

And yeah. Congrats to anyone who gets the, the nod for some free LLM training.

Ethan: A hundred percent.

Conner: My story was, I was reading through some of the ChatGPT one star reviews of the ChatGPT app. Um, one of 'em was, works too quickly, ChatGPT types too fast and the instantaneous answers aren't unacceptable.

So unacceptable.

Farb: Unacceptable.

Conner: Wow. Another one was disappointed that when he asked the top chi top 10 Chinese restaurants around Round Rock, uh, it didn't have his location data.

Farb: Wow. It's a tough crowd, tough, tough crowd on the internet. I've never heard anyone on the internet give a one star review for something or be, you know, ridiculous in their critique of it.

So, It's really nice to see a chat p t continuing to push boundaries as, uh, my

Conner: last and favorite is, uh, that someone was complaining. It's an old model from September, 2021 and they want 20 bucks a month. So

Farb: the person, the person using it wants 20 bucks a month or Yeah.

Conner: For plus.

Farb: Yeah. It reminds me of those

Ethan: Amazon one star reviews.

Those are always fun to read

Conner: through.

Farb: Yeah. Yeah, it's actually pretty cool. I put it into, you know, my home screens. I'm trying to use it a little bit more now. My understanding is they are, you know, as assuming Mr. Brockman is correct, they're using whisper, for the audio to text and it is shockingly good.

You can basically mumble at the thing and it will pick up what you said word for word. I was. You know, I was actually just trying to sort of speak as quickly and not enunciate and I was, I was shocked the, the degree of accuracy that it had.

Ethan: A hundred percent. I saw some battery complaints. Was it you who texted that Connor?

Was it like their haptic feedback was kind, hurt the battery and

Conner: maybe that the rest of the one star reviews, A lot of them were complaining about like their phone overheating, even though it's like a 14 pro, so maybe that.

Farb: Find those hard. A little bit hard to believe, but maybe those are some, you know, Samsung trolls that are just trying to bar team get, get in there in trash on the iPhone or something like that.

That seems reasonable. We'll go with that. Uh, well fan. Fantastic. This was another great episode. It's great to see you gentlemen, and uh, we'll see you tomorrow for some more AI Daily. Have a great day, everyone. See you guys.

Conner: Peace guys.

AI Daily

AI Breakthroughs: New LIMA Model, CoDi Multimodal Generation, Mind-to-Video, & ChatGPT App Reviews.

Main Take-Aways:

LIMA Model

CoDi

Mind-to-Video:

Links to Stories Mentioned:

Follow us on Twitter:

Transcript:

Discussion about this video