Playback speed
Share post
Share post at current time

Meta's Llama 2 | Neural Video Editing | FlashAttention-2

AI Daily | 7.18.23

Today on AI Daily, we have three-big stories for you. First, Meta's Llama 2 takes the spotlight, revolutionizing open-source models with its commercial availability. Next, we discuss Neural Video Editing which offers a game-changing solution for seamless frame-by-frame editing in videos. And lastly, FlashAttention-2 delivers lightning-fast GPU efficiency and supercharging performance.

Key Points

1️⃣ Meta’s Llama 2

  • Llama 2, Meta's new addition to the llama open source model, is now commercially available and free for commercial use.

  • Llama 2 is highly capable, comparable to GP 3.5, and is expected to dominate the open source model landscape.

  • The release of Llama 2 creates a significant shift for AI developers, allowing them to run and fine-tune models without additional costs or safety measures from OpenAI.

2️⃣ Neural Video Editing

  • Neural video editing allows users to edit a single frame in a video and apply the edit to the entire video, making it accessible and powerful for beginners and those with limited resources.

  • This technology combines optical flow, control nets, and segment anything to enable interactive and real-time editing of videos.

  • Adobe and the University of British Columbia collaborated on the development of this interactive neural video editing, which is expected to be integrated into Adobe products soon.

3️⃣ FlashAttention-2

  • FlashAttention-2 is a highly efficient GPU usage technique that is twice as fast as the original FlashAttention, providing a significant boost in performance and cost-effectiveness.

  • The improved FlashAttention enables longer context windows for video and language models and paves the way for future hardware developments.

  • This advancement is crucial for maximizing GPU capabilities and brings us closer to unlocking the full potential of current and upcoming hardware.

🔗 Episode Links

Connect With Us:

Follow us on Threads

Subscribe to our Substack

Follow us on Twitter:


Farb: Good morning and welcome to AI Daily. Happy Llama today, folks. It's a big one. A couple of other folks dropped some pretty big news we'll try and share later today, but everything's being overshadowed by the folks at Meta dropping Llama 2. That's our first story. So a little bit about Llama Two, some important pieces about it.

If you don't know Llama, llama is met as open source. Uh, LLM and they've released their second model. Uh, if you are an avid fan of AI Daily, you would've known that we predicted that Llama two would be commercially available, and it just kind of makes sense. Meta doesn't make their money from large language models.

They make their money from serving ads to folks. So it's not surprising that they want to, you know, mess a little bit with competitors who may be thinking that they're gonna make their money from L LLM. So Llama 2 is Meta's big new, uh, addition to the world of llama And, uh, Connor, tell us a little bit more about it.

Conner: Yeah, it's a completely refresh of llama. And of course the big thing here is that instead of just being open source and free, it's now open source and free for commercial use, so anyone can go access and get download and get access to it. You have to fill out a form first. But they've been filling 'em out pretty quickly.

A lot of other people on Twitter have access to it. I have access to it now. And even through Azure, you can get it now. You can. Zuckerberg posted a picture this morning with Satya from Microsoft, both the CEOs together about their new partnership. Llama Two is now available through Azure, just like open ai.

So that's a pretty big mix up of now you go to Azure, are you gonna use big, expensive, open AI or cheap llama that you can fine tune free don use cases. So it, it's a lot more thinking for people now. And like you said, Zuckerberg is making their money off ads. Why do they care?

Farb: Ethan, why is this, why is this huge news for the AI world?

Ethan: Yeah. Well, I mean, what does this mean, right? So we're looking at this pretty much crushes any other current open source model out there. There is no reason. You know, outside of experiments or learning or potential future research, there's no reason for an application developer to go use anything but llama too.

Right now it's extremely capable. It's pretty much GP 3.5 level, and now it's commercially available, as y'all both mentioned. So this means two things as well. Where's this gonna run? Right? I think we're gonna see a lot of infra providers looking their shops right now. You know, we saw replicate like 20 minutes after I think.

Drop an api, you can come test it, you can come run it. Um, a bunch of the other info providers are licking their chops saying, Hey, if this takes a meaningful share of chat G B T usage, right? Through other applications or other companies and startups gonna build on this, like where are they gonna run this?

Right? So I think we're gonna see a lot of info providers super happy that this is commercially available. Now, that's where the real money goes to when you're doing AI inferencing, et cetera. So we're gonna see a lot of that. We're gonna see a lot more experiments and if you're an application developer or you're an actual company or you're an enterprise, like there's real reason to shut off your current GP 3.5 API and say, we're gonna go run this on our own.

We're gonna fine tune this on our own. We're gonna save API costs and we're gonna customize it for our own users under our own control without any of the additional safety measures that Open AI may add. So I think a real complete shift here for everyone, and that's why 90% of AI Twitter is freaking out this morning.

Farb: Yeah, this is a seismic shift for, for the whole open source L l M community. And it's gonna be interesting to see if this spawns a Cambrian explosion of stuff happening, uh, or if people are just going to keep doing what they were doing. But shift over to llama too. It is a, like we said, uh, the best performing open source model right now.

And you know, if they've got this, if they're making this available right now, it makes you wonder what they've got coming up next. Uh, watch out. GPT four maybe, uh, maybe knocking on your door very quickly. Anything else from you guys to add on this one?

Conner: Yeah, I was gonna mention that before this, if you wanted to do something that was commercial available, you'd have, your opportunities were really either Falcon 40 B or M PT Mosaics 30 B at the best, but LAMA two 13 B way outperforms those and they even have LAMA two 70 B.

So very capable, very strong models. There was a couple limitations, though I believe some people were digging through the licensing of it, and I think it's 700 million users. You can have Max before you have to work with meta.

Ethan: So it's a real, real, a real bandwidth log. You know, I'm really scared to hit that limit.

Conner: It's worried if you get that close to that, it'd be, it'd be a lot of wor there, but yeah.

Farb: Good news for your 699,999,999 user. Bad news for the next one. Uh, all right, let's move on to another, I think, super awesome story. Neural video editing. This is really powerful stuff. So basically these folks have come up with a way of.

Editing a single frame in a video and applying that edit to an entire video. This is really, really cool stuff. This seems like a big leap forward all of a sudden outta nowhere, and you know, I, I can't, you know, this is particularly powerful for, uh, people who are either beginners or novice or don't have the resources of a major studio that can afford to spend a lot of time and resources editing every frame, uh, of a video.

This is, imagine a car is driving down a road and you take a little, you know, uh, Pen tool and you draw a little line on the road in one single frame. Now the entire video of the car driving down the road understands that there's a line on the road. The car will drive over the line as opposed to the line being on top of the car.

You can lay over a texture on the car, so you know you could put the picture of a, of a dog's face on one frame, uh, of the car. It'll apply that to the whole car driving around. I was pretty blown away by this. Ethan, what did you think?

Ethan: Yeah, so we, we've had optical flow for a while, right? And optical flow was a way to do similar type work and like predict the next frames, uh, for an object moving in a video, right?

But it was not too effective. Then we saw the progress of control nets, being able to replace things in video. We saw segment anything, being able to replace things. And I think this neuro video editing is like a really nice combination of all those things. And like you said, I mean, it takes a long time to edit every single frame, and it takes some real knowledge of Premier Pro and whatever else after effects you may be using.

So being able to, you know, their examples aren't perfect, but saying, Hey, I'm gonna attach this little sticker to the side of the car, and I just want it to propagate for the next 60 frames. That's really cool, really powerful stuff and all it needs is, I think is a little bit of perfection and you're gonna see this embedded in tools.

You're gonna see people being able to run this. You're gonna see independent filmmakers using this on TikTok and the big studios I think are gonna start moving towards this way of editing. It's more effective, it's easier, and it's kind look just as good.

Conner: Yeah, absolutely. Connor, what do you think? Yeah, this is building on literal, layered neuro editing, which is like the previous iteration of this, but that was not fast enough to this interactively through this real time in any way, shape or form.

So this paper Interact interactive neural video editing. It's from Adobe, it's from the University of British Columbia. So that collaboration, I'm pretty sure we're gonna see this in premier other Adobe products very soon.

Farb: Yeah, making it interactive makes it a lot more practically useful for a creative who's trying to actually do something with, with the video.

It's, uh, you know, it takes it from a, a research world and, and, and actually puts it into the hands of, of creatives, uh, powerful stuff. Let's move on to our third story here. FlashAttention-2. I feel like this would normally be a pretty, you know, the big story of the day, maybe. Uh, but on a, on a day with llama, this, uh, somehow seems a little bit less important.

Uh, flash attention is a, is a, is a way of, you know, essentially being more efficient with your GPU usage and breaking down work so that it can be done more efficiently with minimal loss in actual performance. Uh, Connor, can you tell us a little bit more about flash attention too?

Conner: Yeah, the original flash attention was about four to five times faster than the standard attention implementation that we have in PyTorch.

So that already was a huge benefit and pretty much everyone would use it, whether it's through flash attention themselves or through something like X formers or Triton. Now flash attention too. It's twice as fast as that, so add that up and you get 10 times as fast as the normal pie torch implementation and.

For any application, any training, any inferencing, being able to run something that's 10 times as fast, 10 times as efficient, a 10th of the price. It's a huge benefit. It's a complete rewrite of the original flash attention, and it really shows that if you just focus and these researchers focus, they can completely rebuild something that's twice as fast that we wouldn't have expected in the past.

So I'm excited for future work with H100s, with FPA eight, and what else we can do.

Farb: Very cool.

Ethan: Ethan, what's your take? Yeah, I mean, if you look at GPUs right now, you know we're still operating at, you're only using 60, maybe at the top line, 60% of the flops for a gpu. Like we're not at full efficiency gains at all, even with an A 100 and like Connor mentioned, they haven't even really begun working on FPA eight and the H 100 and some of these new actual hardware developments we're getting out.

Right? So, What does this mean, like long term? I think this is what enables these like longer context windows for video models. This is what enables some of the longer context windows for language models. A lot of these problems people are like running into are engineering solutions for things like flash attention running on the current hardware we have.

I think new hardware, of course, is gonna accelerate things, but we've have new flash attention too, and we don't even have it really at its full potential for H100. So the gap from where we are now to where we'll be in nine months from now for a lot of these new operators being implemented is massive.

So I'm really excited to see this one. Flash Attention is been a critical piece to the reason we have the models we have today, um, on the GPUs we have today for a reasonable price. So this is really awesome.

Farb: Yeah, you're gonna get a lot more for what you have here and start getting closer to maxing out what your GPUs can potentially do.

It's, it's pretty powerful to see. All right. Well, let's move on to the last segment of our show. What are we seeing? I wanted to give a little shout out to the folks at Mosaic, unless, am I stealing one of your stories here? I was gonna, I was gonna give a shout out to the. To the folks at Mosaic for, uh, dropping their, uh, latest model with the eight K context window.

Uh, pretty impressive. Again, maybe getting a little overshadowed by the, the folks at Llama here. Uh, but still super impressive to see from the folks at Mosaic. Uh, they made that available today. It's open source as well. Sorry, I didn't click on your, uh, URL con. I didn't realize you were gonna share this story.

Conner: No, no, no. I should, I shouldn't have that.

Farb: I have to be honest with you. Oh, you didn't have this phone. Okay, great. Well, tell us what you have.

Conner: Uh, late, our Latent Space. Another great ai, AI podcast, if you haven't listened to it, they're amazing. They do some more like spread out content like a week or two, but very in depth, very good content.

They really say Datasets 101 episode where they really dive into the history and like concepts of data sets and how they work. And while they're important, a lot of it is of course common crawl. As we all joke about on ai, Twitter, they really dove into the history of Common Crawl and c4 and. The base data set that pretty much every AI model's trained on for about 60% of its corpus.

And then they, they have some other interesting stuff too, about like the problems with like deduplication or the problems with like copyright and AI data sets. But amazing episode. I definitely recommend listening to it. We'll link it below.

Farb: Yeah, it was impressive to see from them. We will have to challenge them to a steel caged, uh, death match, uh, all uh, Zuckerberg and Musk and see who the AI podcast champions are.

I dunno, maybe I should wait until I see what they look like before I challenge them to a steel cage match. Ethan, what are you seeing?

Ethan: Yeah, shout out to Lang Chain. They, they dropped this thing called Lang Smith, um, which is just a platform for debugging, testing, evaluating, monitoring. Um, you know, I don't think we're, we're not the biggest users of Lang Chain ourself, but I think I actually probably will be a user of LangSmith.

Um, you know, there's a couple different tools out there to monitor stuff, um, and then you can just use log snag rights and be spitting out logs all the time. But I think a nice interface and a nice way to just. See the outputs of your lm do some debugging was still needed in the space. The rest were all kind of like half ad open source projects, so congrats to them.

I think it's cool if you do. Yeah, it's very funny. Every couple weeks you use Chain it implements with it very well.

Conner: So, no, I was gonna say, it's very funny. Every couple weeks we see a new, like Olan chains winning at this, or like, LM QL is doing this, or Microsoft Guidance is doing this and it always touches back and forth.

Farb: Yeah, we need a, we need a new Microsoft guidance story. I feel like it's been a little, little bit since we've heard

Conner: from them. Huh? Michael Charles's been a bit busy, pissing off opening eye. What are they even doing over there?

All right.

Farb: Thanks for joining us for another exciting episode of AI Daily. We'll see you tomorrow for some more and have a great day, everyone.

Ethan:  See you guys.

AI Daily
AI Daily
AI Daily