AI Doomers, GGML, & "Recognize Anything"

Playback speed

Share post at current time

Share from 0:00

0:00

AI Doomers, GGML, & "Recognize Anything"

AI Daily | 6.7.23

AI Daily

Jun 08, 2023

Welcome to AI Daily! In this episode, we discuss three fascinating stories that highlight the potential of AI. We start with Mark Andreessen's thought-provoking blog post on how AI can save the world, countering AI "Doomer-ism." We delve into the implications of AI on human progress, regulation, and income inequality.

Next, we explore GGML, a tensor library for machine learning, and its significance in running large models efficiently on the edge. We examine the importance of edge computing, privacy, and the role of open-source projects like G G M L in making AI more accessible to end users and developers.

Finally, we uncover "Recognize Anything," a powerful image tagging model that goes beyond object recognition. We discuss its ability to understand the relationships between objects within images, the progress made in computer vision, and its potential impact on bridging the digital and physical worlds.

Join us for an insightful conversation as we dive into these AI topics and their implications for the future. Don't miss out on the latest advancements in AI technology and its transformative potential!

Key Points:

Marc Andreessen Blog Post:

Mark Andreessen's blog post challenges the negative views on AI and emphasizes its potential to help humanity.
The internet facilitates the spread of ideas, both positive and negative, surrounding AI.
Regulation alone may not be sufficient to prevent negative consequences of AI, as it is a complex and easily accessible technology.
There is a concern that AI could exacerbate income inequality and be controlled by those in power, emphasizing the need for open-source collaboration and competition to avoid concentration of power in the hands of a few.

GGML:

GGML is a tensor library for machine learning that aims to make large models more efficient and accessible on edge devices.
The focus is on quantizing models like Llama and Whisper to smaller, faster, and cost-efficient versions that can run on CPUs and even on devices like phones.
Bringing AI models to the edge has implications for end users and application developers, particularly in terms of privacy and fundamental human freedoms.
Edge computing plays a crucial role in maintaining human liberty and giving people control over their lives and communities, with open-source projects like GGML enabling the practical implementation of models on edge devices.

“Recognize Anything”:

A strong image tagging model that goes beyond object tagging and focuses on understanding the relationships between objects in an image.
The model shows significant progress compared to previous models like blip and clip, as well as Google's proprietary image tagging.
It is an open-source model built on tag-to-text and works well with the Segment project, which segments different parts of an image for deeper understanding.
The development of such computer vision models is crucial for bridging the gap between the digital and physical worlds, and they are expected to surpass human capabilities in the next 12 to 24 months.

Links Mentioned:

Follow us on Twitter:

Subscribe to our Substack:

Transcript:

Ethan: Good morning and welcome to AI Daily. We got three fantastic stories for you. Today we're kicking off with Mark Andreessen's new blog post quote, why AI will Save the World. So it's really an anti take on some of this AI "Doomer-ism" that's been going around. His goal was really to paint a picture of, Hey, how is AI actually gonna help us?

How do we have these arguments in a concise fashion and make sure we're not delaying in progress? Um, On everything, ai. So it was a long article, a lot of interesting points. Farb what? What was your takeaway from it? Anything that particularly stood out for you?

Farb: Well, I was hoping to hear what Connor had to say, but I'm, I'm happy to go first.

You know, I think the, the, the biggest takeaway from me, for me from the article is his sort of point around the bootleggers and the Baptists. Is that what he calls 'em? I think the Baptists, uh, and, you know, this is, yeah. Th this is something happening everywhere all the time throughout human history. It's gonna continue happening, and it's happening in, in full force here.

And, you know, the internet makes it worse because it allows all the, you know, the Baptists and the bootleggers in his analogy to, you know, collude more readily and to spread the message and spread the meme more readily, infect more people with these ideas. And I, I think he's, he's right about all of that.

In the end, you know, if this is what causes the end of humanity, then it seems a little odd to me that a little bit of regulation will just go ahead and stop all of that from happening. Uh, that doesn't, you know, pass the sniff test. And I, I also agreed with his point, which is one that I've been making for a while now, that this isn't plutonium, this isn't something that is, You know, easy to find.

It's not easy to hide. Plutonium, you cannot make so much plutonium. Uh, you cannot get so much plutonium and uranium together to make yourself a nuclear bomb in such a way that it cannot be, you know, noticed by anybody else. But you can do that here. So that's a, you know, that's a false equivalence there.

And we have a situation where what's probably going to happen, I hope this improves income inequality. But my sense is it could make it worse. Some people will control this technology and some people won't control this technology, and I agree with his points around these things get to everyone, but they get to everyone in the way that the people who have power decide that everybody else can use them.

So my hope is that AI helps create a better life for everybody on this planet. And I don't think it's going to destroy life on this planet, but I do think the risk is that it creates more inequality and, you know, again, the bootleggers are trying to, you know, distract you from that concern and concern, get you concerned with this existential risk.

Uh, and in the end, We may end up increasing the inequality in the world because we weren't paying attention to that.

Ethan: Yeah, definitely. I, I really liked his framing around Baptist and bootleggers, especially at, you know, he really points out the fact that hey, bootleggers win a lot of the times, whether it's with the big banking crisis or whether it is literally talking about the bootleggers of the past.

So they have an advantage. They win a lot of the times. And I think it's important that his article covers, Hey, how do we actually have these conversations? So Connor, what do you think in terms of some of his points around how we combat ai Doism? What really stood out to you?

Conner: Yeah, his, he goes through a lot of risks.

Of course, he goes through several risks and really combats each of them. But he ends with that. The main risk and his opinion is that we, the west, we, the United States, do not have the best eyes. He thinks that the real risk is China having the best eyes and China using it for control. Um, he really thinks that open source is very important, that not having market capture, that competition in AI and multi ai, really working in unison is what's really important.

Farb: So yeah. You don't, this is an arms race. Absolutely. And you don't wanna be on the wrong side of it.

Ethan: Absolutely. And just like you said, far Jean is outta the bottle. So hopefully these conversations can continue in a positive light like Andreessen is pointing out here. But onto our second story, GGML. So GGML is a tensor library for machine learning and it's used by the famous LLaMA.CPP, Whisper.CPP

these are pretty much ways, hey, how can we take these large models like llama? How can we take these large models like Whisper and get them running efficiently, quote on the edge. So on the edge can mean on your actual device. So similar when we talked about Apple doing these language models on device.

The whole goal of GGML is, Hey, how can we bring more of these models on device, quote unquote, to the edge. So Connor, can you tell us a little bit more about this and why this is important?

Conner: Yeah. G G M L is really about taking these big models like llama, like whisper and quantizing them down to very small for bit very efficient models.

Um, they can work on CPUs. They also work extremely well on silica, which models famously usually do not. Um, A lot of effort they're going through is just to make these models smaller or make them faster, make them more cost efficient, and, and as you said, put them available on the edge, both in actual users devices like an iPhone, like a MacBook, or even something like CloudFlare Consolation, which can only run very small models.

But the more we can make these models efficient and smaller and faster, the more the edge becomes possible for these models. And it's really a contrast between. Open ai, huge data centers for GB four and Quantizing Llama Open source, down to a very small, very fine tuned model that can run on your phone even.

Ethan: So, definitely, so a, a ton of technical implementation here and a ton of like on the ground work of like what this actually means to be inference on the edge and G G M L and s c PPP and quantization like Connor talked about. But when we kind of bubble up far, what do you think this means for actual end users and people developing applications when it's so much easier to.

Build these models onto someone's device. You know, first thing I think of is application to privacy, for example. But what does it mean to you when we get more of this AI, quote unquote to the edge?

Farb: You know, if you know me, you know that I am sort of semi obsessed with Edge Edge computing. When we were building Coin Mine, we were pretty obsessed with the idea of using coin mine for edge computing.

And you know, I'm not so necessarily obsessed with edge computing cause I think everything needs to be done on the edge. I'm sort of more mesmerized by this beautiful back and forth between things happening in the cloud, things happening on the edge, things happening in the cloud, things happening on the edge.

There seems to be this beautiful, you know, interplay between these two things. And I don't think you can underscore the importance of this with regards to, like you said, Ethan Privacy. But actually probably all fundamental human freedoms. Computation is what drives the world for the past couple of decades and will probably drive the world for hundreds or thousands of years, and without the sort of, The specter of edge computing hanging over the heads of people who do cloud computing.

They would just take over the world wholesale. Without edge computing, we would be lost. It is one of the great forces that will continue to put pressure to keep humans free and to give people liberty, and to give people jurisdiction over their own lives and their own communities. Without edge computing humans go away.

Conner: Yeah, people talk a lot about open source, but it doesn't really matter matter for models, open source if you can't run it, which is why these projects like LAMA ccp, CPP is very important.

Ethan: Yeah, and it's, it's so insane to me how directly connected this is to the first story as well. I mean, this is the, you know, on the ground technical implementation of why the Genie is out of the bottle.

You can run these models between LAMA and Whisper and all the future ones on Apple, silicon, et cetera. On your device at home without an internet connection. So the genie's out of the bottle, it connects to that. And it's cool to see, you know, real backing and implementation at the technical level of this.

Farb: So I'll, I'll tell you though, you can thank Steve Jobs for this, because Steve Jobs was a humanist. A lot of people liked to peg Steve Jobs as a business person or a product person. He was great at those, but that's not what was his singular drive. It was being a humanist, and if we didn't have his humanist worldview, you wouldn't have the Apple Vision Pro.

The Apple Vision Pro is possible because there's an M two chip in it. The reason that the M two chip exists is because Apple decided they need something that can do this amount of computation with this amount of energy consumption, and nobody else in the world was. Delivering that to them. And so to meet their humanist vision, they had to develop a new technology, something that could provide this level of computation for less energy than they were available for at the time.

Without that, you wouldn't have the Vision Pro, and they may end up being one of the great forces that continues to push edge computing as a viable reality.

Ethan: Absolutely. And the same thing with privacy as well, connects all these dots, so completely agree. Um, let's move on to our third story. Recognize anything.

So this is quote, a strong image tagging model. Pretty much the fundamental goal of this, as with all computer vision, is not just object tagging, but. How do objects relate in an image? You know, if you see a kitchen, you don't want to just point out that there's maybe a knife and a stove and a countertop.

You wanna point out how these things relate to each other. Um, so really important model, and they've made a lot of progress on it. Connor, can you kind of comment on what exactly the progress here was and what it looks like?

Conner: Yeah, so it's really about taking the different parts of the images, like you said, and building them more cohesively together.

We've seen models like blip, like clip, and we've seen even Googles. Main server image tagging that's extremely proprietary, and this is a jump on that. They show that this is better than blip and clip, but this is even in some ways better than Google's proprietary image tagging. This is entirely open source built on tag to text, and they also showed how it contrasts and works very well with Segment.

Anything. A previous project that we talked about a few episodes ago, That can segment the different parts of the image and then you apply. Recognize anything on the segmented parts and understand even more deeply how, hey, this is a plate of food with tacos and pineapple on it. And then in the background you have a table with a kitchen in the background.

So very big model. Very excited to see more demos of it. Open source.

Ethan: Absolutely. Yeah. I think it's really important just on a, we're continuing to see this progress on a computer vision side. Stuff that people have thought has been, you know, almost solved for five, 10 years. Oh, yes, we can easily tag a model, but the representations and the attributes within it are super important.

Barb, anything you wanna comment specifically here on, you know, I think models like this, we'll see applications to Vision Pro like you're very interested in, but anything that stood out here to you?

Farb: You know, I think this whole path of development is absolutely necessary for creating this, you know, new mesh between the digital and physical worlds.

I, I think it is the, I think it is a type of technology that, you know, un unfortunately, I think in not too long, people will not be that mesmerized by anymore. As these, you know, computer vision models are able to just, And, and maybe, maybe we will be mesmerized by it, but for a different reason. Right now we're mesmerized by it because it seems to getting, it's getting close to doing what a human can do.

Uh, but I think that will actually shoot so far past what humans can do so quickly. We will forget that we were ever impressed by the fact that it could do what a human can do. Soon it'll be able to tell you things that no human might really even be able to understand about an image. Oh, it looks like the cilantro that you have on your tacos is this specific, uh, you know, genetic variation of cilantro, and it looks like it's been, you know, in the fridge for two days and then sitting on the counter for five hours.

So I think this stuff will leapfrog human capability very quickly here in the next probably 12 to 24 months if it hasn't already in some ways.

Conner: Yeah, I'm excited for a future where you can put on your vision pro and walk into your kitchen and look in your pantry and it hovers above your stove. Like you can use these recipes, grab these ingredients from the pantry, throw this in the pan, and the combination of recognize anything and segment anything and models like this that we have more in the future, unlocks a lot, especially with they are.

Ethan: So what what was very meta to me about this as well, is it was it didn't require any additional fine tuning or supervision. Like, we're finding so much more and more of these like, emergent capabilities of these models. Um, which, you know, maybe we'll do a Phil philosophy kind of podcast at one point, um, episode because it's, it's very meta when you think about it.

But, um, that's

Farb: kind of what I mean by the sort of like, We're just gonna stop being impressed by this. We're like, oh, whoops. It turns out that these lambs can do this stuff super easily better than people. Whoops. We didn't even realize that. It's really cool to see.

Ethan: Absolutely. Um, well, as always, what else are you all seeing?

Um, either of y'all kick off in anything you're seeing outside of these stories today.

Farb: Well, I was, um, uh, we did this story on Japan, uh, a few episodes ago about them essentially deciding that, you know, copyright wasn't gonna apply to, uh, training of LLMs and things like that. And I read today some sort of additional, I, I didn't dig into it deeply enough to sort of definitively say exactly what it is, but it seems that what they're saying is that they want you to be able to use these, uh, you know, say, say some anime, uh, in a training model.

But if your actual, uh, output, if what you use that trained model on ends up creating something that is. Too close to the original, then you may actually be in some form of copyright infringement. Uh, so I thought it was interesting for them to kind of like, seems like they're adding a little bit more, uh, nuance to what they're trying to accomplish with regards to copyrights and what people are able to do and not able to do freely.

Conner: That, that's always been a worry. Like even say with a fusion, people have showed some demos of like being able to pull out full original images from it that are the exact same as images already on the internet. So always a worry, and I think future models will have less of that problem.

Farb: It seems pretty straightforward though, in the sense that, you know, I can get a full image of anything, any, anywhere, and I'm not really allowed to use it for my own commercial purposes.

Uh, there is a question of how different, you know, at one point does it become different enough that it's, you know, I'm allowed to use it or is it parody or news or, or something like that. Uh, but it seems like that part is like, I'm not, that's not the weirdest part. Hope. Hopefully the way they're setting it up is such that it almost works with existing copyright laws.

Yes. You know, like just don't do what you couldn't do before and you can do anything else.

Conner: Doesn't matter what speed is happening at, it's if you're doing the exact same thing or not.

Ethan: Yeah, absolutely. Connor, what about you?

Conner: Yeah, I saw Lightning AI show LLaMA adapter, uh, fine tuning Falcon 40B and just 30 minutes.

Normally that would take eight hours, that would take 30 hours and eight a 100 s, and it would take me at least five hours to do myself. This is another amazing story of just fine tuning, becoming faster, fine tuning, becoming cheaper, and it's exciting for future uses and what people can do with its models

Farb: and mean 30 minutes just seems almost becoming laughably insane to the point where when will we stop talking about how quickly these things can accomplish it.

It just, I don't know you stuff to ha wrap your head around something like that happening in 30 minutes. Yeah.

Ethan: Yeah, I think we'll start to see fine tuning on the edge too. You know, like more real time fine tuning, um, for some of these models, you know, whether it's with G G M L or whether it's just, you know, with these new implementations like lightning, you can imagine a world in which, you know, developers are coding applications to do more frequent fine tunings, um, which could be important.

So super cool to see.

Conner: Some people have, um, some people have thought that like Apple and their like dictation and their like checks, transformers will be running overnight to fine tune based on how you auto corrected during the day, which makes a lot of sense. Makes a lot of sense,

Ethan: yeah. A hundred percent.

Um, yeah, I got to see, uh, George Hots. Um, this was a couple days ago, but George Hots has been working to get AMD drivers working with some of these, um, ML toolkits and ML models, and bless his soul. It, it was not the most successful outcome. Uh, there was a meme of him. Um, doing some meditative YouTube videos post doing it.

I think, you know, it's funny to see the on the ground level work. George is extremely smart developer, extremely smart technical person, and seeing people do this on the ground work of, Hey, why does NVIDIA have such a big moat on GPUs? Well, this is why, um, at the end of the day, getting these models to work with AMD and some of their drivers has been a stab in the dark for many years now, and I think we're gonna continue to see AMD a little bit far behind in this.

But as always, thank you all for tuning into AI Daily and we will see you again tomorrow. Peace guys.

AI Daily

AI Doomers, GGML, & "Recognize Anything"

Key Points:

Marc Andreessen Blog Post:

GGML:

“Recognize Anything”:

Links Mentioned:

Follow us on Twitter:

Subscribe to our Substack:

Transcript:

Discussion about this video