AI Daily

Microsoft Azure ChatGPT | SemiConductors | NVIDIA-HuggingFace Partnership

AI Daily — Wed, 16 Aug 2023 00:00:12 GMT

Welcome back to AI Daily. In this episode, hosts Conner, Ethan, and Farb delve into three fascinating stories. First, Microsoft introduces an enterprise-specific ChatGPT version, self-hosted on Azure's private cloud. Next up, Global competition intensifies as countries race to bolster semiconductor production. Germany secures an $11 billion TSMC chip plant, while Texas welcomes a $1.4 billion semiconductor facility. Finally, Nvidia and HuggingFace join forces to enhance cloud offerings. Nvidia aims to expand its cloud services and connect directly with developers, positioning itself as more than a chip manufacturer.

Quick Points

1️⃣ Microsoft Azure ChatGPT

Microsoft unveils Azure ChatGPT for enterprises, self-hosted on Azure's private cloud.
Repository briefly removed amid potential conflicts, highlighting unique deployment benefits.
Tailored for businesses, offering data control and secure sandbox for AI-powered interactions.

2️⃣ SemiConductor Manufacturing

Global competition heats up as countries vie for semiconductor manufacturing dominance.
Germany secures $11 billion TSMC chip plant, bolstering European presence.
Texas welcomes $1.4 billion semiconductor facility, reflecting chips' pivotal role in technology evolution.

3️⃣ NVIDIA-HuggingFace Partnership

Nvidia teams up with Hugging Face, aiming to strengthen cloud services presence.
Nvidia's expansion into direct cloud hosting aims to compete with established players.
The collaboration enhances accessibility to GPUs, potentially reshaping Nvidia's cloud industry involvement.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Microsoft Azure ChatGPT | SemiConductors | NVIDIA-HuggingFace Partnership

AI Daily — Wed, 16 Aug 2023 00:00:12 GMT

Quick Points

1️⃣ Microsoft Azure ChatGPT

Microsoft unveils Azure ChatGPT for enterprises, self-hosted on Azure's private cloud.
Repository briefly removed amid potential conflicts, highlighting unique deployment benefits.
Tailored for businesses, offering data control and secure sandbox for AI-powered interactions.

2️⃣ SemiConductor Manufacturing

Global competition heats up as countries vie for semiconductor manufacturing dominance.
Germany secures $11 billion TSMC chip plant, bolstering European presence.
Texas welcomes $1.4 billion semiconductor facility, reflecting chips' pivotal role in technology evolution.

3️⃣ NVIDIA-HuggingFace Partnership

Nvidia teams up with Hugging Face, aiming to strengthen cloud services presence.
Nvidia's expansion into direct cloud hosting aims to compete with established players.
The collaboration enhances accessibility to GPUs, potentially reshaping Nvidia's cloud industry involvement.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript

Conner: We're back once to get into the three great stories. I'm your host Conner, joined by Ethan Farb. Our first story today is Microsoft's or Azure, Azure ChatGPT. So they launched on GitHub, a Microsoft Azure ChatGPT for Enterprise. It's specifically tailored for enterprises in that it's essentially the same thing as ChatGPT, but open source and entirely self-hosted on Azure.

So instead of having to connect open AI servers, everything's. In your own private little sandbox on Azure's specific private cloud. Uh, funnily enough, a couple days after that, Microsoft actually took down this repo because apparently, I would imagine open AI's kind of mad about this far. What do you think happened here?

Farb: There's, there seems to be a backup for it, so not sure what's going on with that. It seems like this is something that could be. Very popular. It's tough to know what adoption is gonna be like. We're in a world where there's so much of so many options like this. They're all slightly different in their own way, but this is a pretty good combination of.

Teams, you know, if you're a C I O I can see, you know, having Microsoft and open AI as the things behind it, as opposed to somebody took LAMA two and created some, you know, some random developer took LAMA two and created a version of something like this. You may be more inclined to, uh, you know, if you're trying to c y a yourself as a c I O.

You're gonna wanna have some bigger names behind it. So I could see the adoption being big. I haven't, you know, heard a ton of people start using it. Obviously it's pretty new. Uh, but I'm pretty excited to see how it, how it gets adopted.

Conner: I mean, it's very easy to host yourself. You have features like Azure Active directory for login.

You have features like uploading files, just kind of stuff. You would never actually get in your standard chat pt, just 'cause it's enter enterprise level stuff. It's not available. You'd have, you wouldn't find that. Ethan, what do you think about this?

Ethan: Yeah, I think enterprises definitely need this. It reminds me when people like deploy their own Oracle instance for their own like enterprise environment, right?

So their open source repo is pretty much just a web app wrapper and some network like protocol setup for Azure. It's just an easy way to like deploy this little instance. So you're not actually like open source on the side of chat G P T at all. You're not open source on the side of your own GPUs. It's really like they released a web app and they released some like, Templates for Azure to make sure your networks are fine and you have some places to upload your own data.

So, you know, similar to the Oracle days, similar to what enterprises need, but they probably removed it because it's not too much of an open source repo, just a chat web app and some template guidelines.

Conner: Yeah. Um, personally I think Open Eyes had tried to compete for some enterprise use cases. They've added the stuff recently of not keeping data.

This, on the other hand, keeps all your data, but in a secure enclave in your own OpenAI cloud. So I wouldn't be surprised if OpenAI was a little bit annoyed by this repo.

Farb: Sam just tweeted like literally, I don't know, a little bit ago that, trying to clarify that OpenAI does not use any, anything that you do through their API as a, you know, for training or anything, maybe anything through chatGPT directly.

Conner: Our next story up. Today though, we have semiconductor manufacturing. Of course, semiconductors are very important for all modern computing. Ai, especially any GPUs from Nvidia , AMD, et cetera. They're all reliant on semiconductors. Semiconductors so far, mostly manufactured in Taiwan, China.

Now, recently we're trying to, we're starting to see that branch out first with Germany winning an $11 billion T S M C chip plant. That's gonna be the first in Europe. Uh, I think T SS m C committed $3.8 billion to that and Germany committed $5 billion to that. So pretty nice collaboration there. And then in north Texas, we have the Silicon Prairie in north of Dallas, where there's another, I think $1.4 billion semiconductor plant coming in.

So, Very exciting. Ethan, what do you think about this?

Ethan: It's a manic race everywhere. I mean, chips are definitely the new oil here, so you have $280 billion from the CHIPS Act. You have pretty much every global state actor trying to have tax incentives or subsidies, et cetera, to enable more chip manufacturing facilities.

Conner: Just thinking about what he's gotta say, I'm sure he said something that was good. Am I cut? You dropped out for a bit there, but I'm sure it was recorded on your side, so it was, well, I'm back again.

Ethan: As I was just saying, that chips are definitely the new oil, so you have every single state actor and every single global.

Powerhouse. Every single company, every single person with a manufacturing facility trying to get in on $280 billion of a chips act, trying to get in on every subsidy for every nation. So I think it's really cool to see, especially here in America, we're trying to bring back some of this manufacturing to Texas.

You know, we know a few people in Texas trying to convert some of their old facilities into chips. Manufacturing facilities for this is everything too. Not just AI chips. This is everything to small Bluetooth chips and. Radiofrequency chips, et cetera. So I think this is just important for American dynamism, as some call it.

Conner: Yes, love American dynamism. The 1.4 billion actually was the CHIPS Act that Greg Abbott approved in June. Mm-hmm. That's just funding for any general chips. And then the plant itself is actually 5 billion, so almost half the size of the Germany one. So, yeah. Very exciting. Fab. What do you think about this?

Where do you think chip manufacturing is going in the future?

Farb: I don't think this is stopping anytime soon. This is gonna just keep accelerating. It. It's smart for this type of manufacturing to be distributed and not, uh, centralized in one part of the world. It's not, uh, you know, you want redundancies in important systems like this and people are just flat out competing to, uh, you know, get, get the business and.

Since there's gonna be such a massive demand for GPUs. If you're making them where you are, then you have a business there. You have, you know, real income and revenue coming to your part of the world. So, uh, may not be an easier way in the world to guarantee some, uh, future income than by. Building, you know, chip manufacturing, uh, where you are Saxony is, you know, well known for precision manufacturing.

They've been doing that for, uh, probably hundreds of years, if not more. Everything from, uh, high precision watchmaking to car making to now chip making. So, uh, nice work. Saxony and obviously Texas is not shy about getting into, uh, big industrial, uh, parts of the, you know, business cycle. So, uh, Pretty cool to see that happening as well probably see even more of that in Texas.

Conner: I think Germany and Texas were very clearly the targets for manufacturing of their respective unions, and I'm glad that we're starting to see the plants being built there. And I'm sure a lot of Germans in Texas, lot of Germans in Germans love Texas, maybe even Pennsylvania too. Germans loves Pennsylvania, so I'm sure we'll see all that.

Oh yeah. Beautiful, beautiful. Our last story today, Nvidia and hugging, hugging face and announce a new partnership. This comes on the heels of the A M D partnership. That hugging face announced a little bit earlier. This is Nvidia trying to keep up with the open source community. Nvidia is sometimes accused of not being very open source friendly.

They do have a lot of open source libraries, but on a cloud side, on a hosting side, Nvidia loves Enterprise, loves their big money clients. So now partnering with Hugging Face, they're still going off their enterprise big money clients, but with a little bit of an open source twist. That's honestly nice to see.

Ethan, what'd you think about this? What'd you see about this?

Ethan: Uh, mainly Nvidia wants their cloud to compete. So, you know, you have Lambda Labs and you have all these other cloud providers that are kind of sitting on the heels of actually where developers are sitting. So NVIDIA's, D G X Cloud that they've been putting out, they've been trying to sell to enterprises like you mentioned, but getting a partnership, like Hugging Face Now, anyone who's deploying stuff can just deploy to an H100 on the DGX Cloud.

So they're just kind of putting their tentacles everywhere, which I think is cool. We need more GPUs, we need more access to them. So if you're on hugging face and you wanna. Use it. You don't really care if it's on D G X Cloud or Lambda or anything else. So more players the better.

Conner: Yeah. Nvidia so far has kind of followed a like dealership model, like with cars. And so they make the, they make the cars, they make the ships, but instead of selling them to people directly, instead of selling the hosting directly, they're selling 'em to dealers essentially like Core Weave or like Lambda or like many others. Mm-hmm. And they're start, they're trying to get in the field directly, just like Tesla does with cars.

So, yeah. Farb, what do you think? What do you think about Nvidia?

Farb: I think one of the things Nvidia said is, Hey, we actually have a cloud service here. I think I literally remember reading something along those lines. Uh, they've had it, it's not well known. They're trying to make it well, more known. For obvious reasons. Uh, I think another cool thing they announced here was, you know, training cluster as a service.

They just wanna make it easy for developers to use their trips, and this seems to be a great way to do it and a great way to get people to know that. NVIDIA's actually a, a player in, in the cloud space, not just a manufacturer of the chips. So, you know, their business on the cloud side could grow, grow massively and be, be as big as the, the rest of the business of Nvidia has been up till now.

Uh, so they would be smart, obviously to keep going bigger in their bets.

Conner: They of course, have pretty much complete dominance in actually making chips, but connecting to customers, not really, and customers. If there's, if they have other ships in the future that aren't from Nvidia, they don't really care.

So the market will like it. The market will like it. Well, those were three stories today, guys, onto what we're seeing. I saw that if you go to Google Scholar, which of course is the way to search for research papers, the best way I. Uh, if you, if you look up as an AI language model, you'll see hundreds, thousands of papers that were co-authored by ChatGPT and not given those co-author credentials.

So very interesting to see of the amount of papers that are clearly written by ChatGPT. And we're sure there's hundreds or thousands more that weren't dumb enough to include as an a language model, and were not clearly written.

Ethan: Wow. So they really just left that in, huh?

Farb: Yeah. Yeah. That was amazing.

I saw that too. I thought that was pretty hilarious. It's almost, almost tough to believe. I mean, is anybody not reading these things before?

Conner: Before they submit them and you write a research paper, you write it once? They don't really reread it, so, wow.

Farb: Fascinating. They're so boring. The authors can't even read them themselves.

So boring and incomprehensible with a bunch of words. Salad, garbage to make themselves sound intelligent when they're not really even saying something that they cannot even stomach reading their own papers. Welcome to academia in 2023. Ladies

Conner: and gentlemen, what? What about you guys? Ethan, what have seen.

Ethan: Uh, I saw Play.HT, did an instant voice cloning, so they're kind of another, you know, AI voice model, AI audio model in the space. And they had some really, really good ones. Less than a second latency and voice cloning with voice emotions. I think they're finally like putting together a lot of the pieces.

I don't know if they're using their own model yet, but they've got a really cool pipeline down. So link below, but really great results.

Conner: Amazing.

Farb: Uh, it looks like Nvidia has released the code for NeuralAngelo, which, uh, is, is is pretty cool. Creating, you know, immersive 3D environments from 2D videos, uh, which was a pretty crazy demo. They did, I don't know, maybe like a month or ago or something like that. Uh, but it seems they've, they're now making the code available, which is, seems pretty powerful.

I haven't seen anyone use it yet, but, Uh, I'm guessing, uh, I'm guessing we'll see some of that. Maybe you can combine it with your a 16 z uh, AI tune town. Yeah. Uh, get your 3D to do something cool.

Conner: AI Tune town with Angelo. Very exciting. Well, wonderful show today, guys. We'll, if you've watched this far, you probably love our hats on hat daily.

Um, thank you everyone for watching. We'll see you tomorrow. Thank you guys.

End of LK-99? | MK-1 | StableCode

AI Daily — Wed, 09 Aug 2023 00:01:01 GMT

Welcome back to AI Daily! In this episode, we explore three intriguing stories in the world of AI and technology. First up, we discuss the possible end of LK-99, a ferromagnetic material that sparked excitement about superconductivity. Our second story delves into MK-1, a project aimed at enhancing the inference speed of language models. Lastly, we cover the launch of StableCode by Stable Diffusion. This coding model, boasting a 16,000 context window and 3 billion parameters, raises questions about its distinctiveness compared to other fine-tuned models.

Quick Points

1️⃣ End of LK-99?

LK-99, initially hailed as a potential superconductor, faces skepticism as evidence of superconductivity remains elusive.
Despite uncertainty, the excitement around LK-99 showcases the power of scientific engagement and the pursuit of breakthroughs.
The episode debates whether LK-99's impact on science engagement outweighs its unconfirmed superconducting potential.

2️⃣ MK-1

MK-1 project aims to make efficient model inference accessible to all.
MK-1's compression codec MKML and GPU optimization promise faster model outputs.
Democratizing AI capabilities through MK-1 could reshape AI deployment across various domains.

3️⃣ StableCode

StableCode, Stable Diffusion's coding model, hits the scene with 16,000 context window and 3 billion parameters.
Questions arise about StableCode's uniqueness and distinct contributions compared to other fine-tuned models.
Stable Diffusion's continuous innovation underscores the evolving landscape of fine-tuned AI models.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

End of LK-99? | MK-1 | StableCode

AI Daily — Wed, 09 Aug 2023 00:01:00 GMT

Quick Points

1️⃣ End of LK-99?

LK-99, initially hailed as a potential superconductor, faces skepticism as evidence of superconductivity remains elusive.
Despite uncertainty, the excitement around LK-99 showcases the power of scientific engagement and the pursuit of breakthroughs.
The episode debates whether LK-99's impact on science engagement outweighs its unconfirmed superconducting potential.

2️⃣ MK-1

MK-1 project aims to make efficient model inference accessible to all.
MK-1's compression codec MKML and GPU optimization promise faster model outputs.
Democratizing AI capabilities through MK-1 could reshape AI deployment across various domains.

3️⃣ StableCode

StableCode, Stable Diffusion's coding model, hits the scene with 16,000 context window and 3 billion parameters.
Questions arise about StableCode's uniqueness and distinct contributions compared to other fine-tuned models.
Stable Diffusion's continuous innovation underscores the evolving landscape of fine-tuned AI models.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript:

Ethan: Good morning. Welcome to AI Daily, and we have three stories for you today. As always, our, our first story is possibly very sad, but the end of LK 99. So it seems as this LK 99 tweeted by Alex Kaplan and confirmed by another paper and a few others seems to be what's called a ferromagnetic material. So does not show signs of superconductivity or resistance.

So we may not be back Farb?.

Farb: Well, you know, never call anything dead as long as there's somebody who cares and has hope. And if this is inspired, Anyone to get into deep tech and, and physics and and chemistry, then it's an absolute win for humanity and we actually need way more of this stuff happening where.

People get super excited about a potential development. Uh, they focus their attention on it. They have focused their effort. They actually put in the effort to discover it. If we had this going on with every part of science, then we would already be in, you know, the meme of the future where cars are flying around and people are living forever.

Uh, this is the absolute best type of thing that humanity can be doing. And I encourage it and I think we need way more of it. So actually, I think. A huge win for humanity. Uh, if not a win.

Ethan: Huge win for everyone. Get to read and learn to. I completely agree, Conner . Yeah, I completely agree that

Conner: FBE said many people will knock on pop science like this saying it's kind of bad for the scientific industry.

That kind of diminishes the quality of science. But everything is pop culture nowadays, as you see with a podcast that's just AI daily. So people talking about science, people being interested in science, It means a lot and maybe it's over. Maybe it's not. I still hold out a little hope this. There's only one paper from one university in China, but we shall see either way though.

I agree. Pop science is a win for science.

Farb: You know everything, all science is going from not knowing to knowing, so to argue that we didn't know at the beginning, therefore it was the wrong thing to do, is just the most backwards thinking humanly imaginable. All scientific progress is based on not knowing when you start and knowing something at the end, even if that's knowing that this.

Particular thing is not the thing that's still what we're going for. Uh, it you're an absolutely insane lunatic who come on the show. I will debate with you here if you think that anything about LK 99 was bad news or bad in any way. Uh, just don't be it. Literally nothing but a troll on Twitter. I'm not trying to give you a platform, but if you have anything intelligent to say about this, Come on board and we can talk it out through.

You'd have to be alluded. Say there. The world came together for it.

Ethan: It was honestly really cool the past few weeks to watch literally everyone get excited about it. It's something that isn't so down in the dumps. People were actually getting excited about science. Everyone got to learn more about it. Us three too.

We learned so much more about chemistry and physics, et cetera. So for the next time, superconductors hopefully come up again, we're gonna do the exact same thing and we're gonna.

Ethan: we're here for it. I love it. I love it. Well, our second story of today is back to hardcore ai, A little bit off of superconductors, but we're talking about MK one.

So MK-1 is similar to GGML of sorts, a pretty, pretty much trying to improve inference speed of these models. So if you've ever run a large, you know, llama instance at 70 billion parameters, for instance, you might wonder, Hey, why is mine so much slower? Then open ais and anthropics. How are they getting their models to output tokens so fast?

Well, MK-1 wants to bring that to everyone. So Conner , can you tell us a bit more about it?

Conner: Yeah. MK-1 is really trying to bring, as you said, the inference capabilities of companies like Google, companies like OpenAI of companies like Enro, trying to bring those capabilities to everyone in the open with open source.

Their demo is only closed beta right now, but what they're saying it is, and what they're aspiring for it to be is very hopeful for what they can do. They've designed something called M K M L, which is kind of their framework for compressing models. The first codec is called MK 600. It's just an initial compression codec, but it compresses these models by 60% while keeping them to be basically the same model, with basically the same fidelity.

So this is very exciting development that they have released and that they're working on, and I'm excited to see what else MK-1 comes up with.

Ethan: Yeah, I think similar to the cloud wave and everything else, you have, you know, open eyes and Anthropics doing these really hard engineering challenges that everyone's are saying, Hey, how is this happening?

And now you're getting the di democratization of all of that for anyone who wants to run these models. Farb, what does this mean to you?

Farb: I. I am, I'm running a very large language model inside my head and I'm one wondering why it's slower than everybody else's. So this, uh, applies di directly to me in the problems that I'm having on a daily basis.

What they're doing is, is straight up picks and shovels, and it's a beautiful thing. And, you know, you couldn't be more picks and shovels than saying, Hey, here's a model that doesn't even work on a single G P U. We're making it work on a single G P U. Here's a model that you needed this super expensive G P U to do.

We're gonna make it work on a much more available, more affordable G P U. Uh, this is a big way of how you make. Progress in the world. It's not just, oh, big scientific discovery, uh, here's a paper on attention and everything's done. This is the real work of getting, you know, AI and l l M working everywhere and actually bringing the potential value to realize the value.

So, uh, this is the sort of stuff that's, you know, Going to move the industry actually forward in the sense of not just knowledge and discovery, but actually implementation and changing people. Yeah, I think I bring people's actual lives.

Conner: I talk about technology such as this, but same thing we saw with Lama.

Meta came out with the original LAMA and then G gml and the G GML team built LAMA do cpp, and now llama dot TPP is used by everyone who uses LAMA, including meta. So it's really connection between closed source and open source and between these big scientific research possibilities and these. As you said, for picks and shovels that unlock making these models actually usable for day-to-day inference.

Ethan: We're in like the Docker, Kubernetes era of ai and I think a lot of these companies and actual applications are gonna be super valuable to people and just start out as dev tools. So really cool. Our last story of today is stable Diffusion has launched StableCode, so they've been talking about a fine tuned code model for a while.

They've released three different versions of this coding model. I didn't get to look into this one too much. Conner , anything different about their coding model that's, you know, different than maybe open eyes or different than some of the fine tunings other people have done?

Conner: Honestly, not much sadly. I, I kind of hope to see more of stability.

Um, there, it does have a 16,000 context window and it's, it is only 3 billion parameters and the performance is probably pretty good, but that's kind of the exact same thing we see from rep. It's three, 3 billion parameter model. So stability is kind of just repeating the same thing here. They did that with another model we talked about last time.

Maybe we stable lmm two, we'll see more from them. But right now, kind of just this what other people have from stability.

Ethan: Stability is really pumping out these different fine tunings, et cetera, Farb, have you heard anyone using these or kind of interested in some of the ones stable Diffusion has been putting out stability.

Farb: I, I haven't spoken to anybody, uh, using this, nor do I think you can actually particularly use this one quite yet. Uh, I didn't, unless I'm wrong. I didn't notice it was, if it was available yet. I believe so. It, yeah, it's on hugging face. Yeah, it's on hugging face. Oh, it's okay. Yeah. Oh, yeah. No, no, you're right.

Actually, I did dig into that just before the, the, just before the show started. Um, but I don't know anybody who's using it. This is the right and good thing for stability to do. Uh, I tend to agree that. You're, it sometimes feels like they're kind of coming out with things like a little bit after somebody else drops something pretty similar.

And it's not, and it's not that different. Uh, they, they shouldn't keep it quiet. They, they should release it. Uh, it's, it's the good and right thing for them to do. Uh, it puts pressure on the space to keep, you know, doing this stuff. Uh, if you don't do it, stability will do it. Uh, even if you do it, stability will do it anyways.

So ku, kudos to them and thanks for them to, you know, for continuing to do this work and, and, and push this stuff out. Even if every single announcement doesn't seem like some world shattering, um, accomplishment. It's good that they're doing it, but I tend to agree. I was, I'm always kind of looking for the, like, okay, where is this, you know, meaningfully different, uh, than say something that rep that is doing or something somebody else is doing.

And I didn't quite see that either. So. You know, it's open, so maybe people can make it better than some of these folks that are launching things that are not as open. So, uh, You know, I'm gonna keep cheering stability to keep doing what they're doing. And I, I think they've, uh, they've made some waves in the space and I wouldn't be surprised.

Ethan: Absolutely. It was pretty cool how they used a ton of, they, I think they used a little bit more kind of code instruction response pairs than some other people, which with the long context window, again, I haven't got to try it yet, but could do a little bit better for some of these longer programming tasks.

So maybe that's the defining factor for stable.

Farb: They may not be just even doing a good enough, you know, job of explaining why it might be more applicable and, and more useful. You know, the blog posts on it wasn't that long. Um, you know, they could have potentially shared more about it and, you know, showed some examples of people putting it to work, get some hackathons going around it, and, and kind of like, You know, maybe it is better, but we're having a tough time seeing if it's, if it's meaningfully different and if it is that they should, you know, put some effort into getting people to understand that it's a, it's a mimetic world.

You gotta, you gotta build things and make your case to the world about absolutely

Ethan: why this is, I'll test it out and be back tomorrow day. I daily see how it is. But as always, what else are we seeing. Farb?

Farb: Um, I saw a post from I think Robert Scobel who said he spoke to a C E O, who is using 30 different LLMs to provide customer support, uh, on his product.

I. Be not, I, I'm actually not surprised and it kind of makes sense because he's got to create this whole pipeline around, you know, figuring out whether this first l l m is hallucinating. So it's just LLMs all the way down to try and get something practically useful. Um, and I just thought that was, you know, we've seen this type of stuff before in our own work, uh, where it's not just enough to do one pass with an L l M, you gotta have LLMs watching LLMs and LLMs watching the LLMs and watching the LLMs.

It's like a massive bureaucracy of LLMs that you have to build. Um, yeah. And, uh, yeah, so I thought that was kind of interesting and, uh, I think that's probably the feature, to be honest. That's why

Ethan: we need to get these models cheaper so we can run them through thousands of times.

Farb: Just like your brain. So you can run tons of models, tons of model.

Yeah. There's not gonna be the God model that rules all things. Um, the laws of physics and our ability to build hardware to accommodate that alone is a, is is a bottleneck to keep that from happening. Uh, you know. I don't care if you have this, the algorithm to, to, to do the calculations you need the hardwares, the hardware that can, you know, actually hold the memory and, and all these things.

Ethan: Conner , what about you?

Conner: Yeah, I saw that soup base released their hugging You face integration. I kind of predicted this a little bit back when Firebase talked about their integration from Firebase extensions and Firebase data store into Palm and all the Palm a p i models and then, yeah, superb base yesterday released their.

Integration all the way from soup based database into any hugging face model, into soup based edge functions. Always great stuff going on at soup base and I'm sure they're gonna do more here with ai. So excited to see what they do next.

Ethan: That's awesome. Yeah, I saw two different things, um, that I wanted to highlight 'cause I love both of 'em a lot.

The first one was someone used Mid Journey and Runway to make a Mortal Kombat style kind of a video here. So they use celebrities within Mortal Kombat. You can play as. Like Joe Biden and then like Cleo Petra or something, and then you could play his, like Ronaldo. It was really cool. I, I remember a while back we talked about kind of just endless characters and games, anyone, and I think it's just, it, it's so cool to see.

I like that little video. And then the second one was, it's this thing called 1 0 1 School, so you can pretty much. Create a, like a full course with AI and then they have chatting with it on the right and it's actually pretty good. They had one on poker and game theory and you just kind of type in what you wanna learn and it generates all the course study for you and it generates it on the right to kind of chat with it real time.

Pretty simple. And you can do this with other tools of course, but I just like the way they laid it out. So we'll link it below and check it out.

As always, thank y'all for tuning into AI Daily and we will see you again tomorrow.

Varda LK99 | AirForce AI Drone Flight | Alibaba Qwen

AI Daily — Sat, 05 Aug 2023 00:20:02 GMT

Welcome to another episode of AI Daily! In this episode, our hosts Farb, Ethan, and Conner cover three big stories to close out your week. First up, Varda, based in LA, presents super exciting news on LK-99 replication, showcasing levitation in a high-quality video of the Meisner Effect. Next, the Air Force's Valkyrie air combat drone triumphs with AI, aiming for unmanned flights. Alibaba unveils a remarkable 7 billion parameter model, surpassing LLaMA-2 7B and potentially 13B.

Quick Points

1️⃣ Varda LK99

Varda in LA achieves levitation in LK-99 replication, hinting at possible superconductivity.
Promising breakthrough material, but further research required for practical applications.
Russian and Chinese experiments add to the excitement surrounding this groundbreaking substance.

2️⃣ AirForce AI Drone Flight

Valkyrie, the Air Force's AI-driven drone, conquers unmanned flight challenges in simulations.
AI integration vital for military competitiveness and cost efficiency.
Advancements in AI-controlled drones signal an exciting future for military applications.

3️⃣ Alibaba Qwen

Alibaba introduces a powerful 7 billion parameter model, outperforming LLaMA-2 7B and possibly 13B.
Ideal for math, coding, and plugin-based tasks, expanding AI's efficiency.
Multifaceted model tailored for Chinese language but shows potential for various languages and applications.

🔗 Episode Links

Varda LK99
AirForce AI Drone Flight
Alibaba Qwen
Model to Translate ada-002
CoreWeave - Collateralization of the GPU

Connect With Us:

Subscribe to our Substack

Transcript:

Farb: Hello and welcome to another episode of AI Daily. I'm Farb. I'm joined by Ethan and Conner, and we got a, we got stories all over the place today, from rocks to the sky to good old fashioned GPUs. Let's get started with the latest in LK-99 News from our friends at Varda who are based here in la. Some super exciting stuff.

Con, why don't you give us the lowdown?

Conner: Ricardo might be based in Miami. Is is the worrying part of that, but no, they're here. Oh, they're okay. I dunno, who knows?

Farb: Their founders just like to go to Miami a lot and talk about how awesome Miami is. But they did tweet about how LA is where you come, if you wanna work on atoms or as I like to say, LA is the land of mimetics and kinetics.

Conner: Hm. But yeah, Andrew mcCal of Varda, they replicated it. They published it at 5:00 AM this morning working. They pulled all nighter over there at the Varda hq and they're showing levitation, um, in a very high quality video of the Meisner Effect. No test of superconductivity yet, but as he said, a few papers have been saying that levitation will go hand in hand with Superconductivity.

So maybe the material just is too small of quantities right now. Maybe the. The quality of it is not good enough, but levitation's happening, so superconductivity may be happening.

Farb: It's giving levitation as, as we say. Uh, e e. Ethan, what, how do you feel about this? Is it giving you the, is it giving you the warm, the warm and tingles, or are you just, are you not buying it yet?

Ethan: No, I, I think American manufacturing is back. You know, I saw a really funny tweet. It was like, You know you're living in the future when the engineer from the space manufacturing startup replicates the room temperature semiconductor. So I think it's super cool. They only got it for a few micrograms right now, but the fact that this was done in 10 days, like they were able to replicate what looks like levitating, what looks like potentially the Meisner effect, add a few micrograms within 10 days at a lab right here in la.

So super, super cool. Excited for Varta. Excited for the engineer there, the entire team.

Farb: I think you meant Superconductor, but I don't think anybody involved in any of this has successfully not said semiconductor at least once. Uh, I know, I know I have probably on the show, probably on the show to be honest.

Uh, yeah. This is, this is exciting stuff here. Let's see. Let's see where it goes from here. We're so back. Uh, it's so over. We're so back, but we'll always be here to, to explain it one way or the other to you. Uh, very cool. Let's, let's jump into our next story. So we, we go from the lab. Oh, I'll just say one more thing.

Somebody, you know from, we got a couple of, uh, comments from Russia yesterday in our, uh, In our story, which was pretty awesome. Somebody was reacting to my point about how the average Russian kitchen is like a super lab and the guy was basically, yeah, that's right. My dad used to melt electronics in the kitchen to get the gold out outta the electronics.

They're not messing around over there. That, that's for sure. Alright, uh, moving for our, for our next story. We're gonna move up in, into the sky and talk about a little bit of AI in jets. Uh, what's this story about Ethan? Tell us about it.

Ethan: Yeah, so for, for a while now, actually, like AI controlling drones has been a relatively, actually extremely hard problem.

So the Air Force drones, navy drones, et cetera, you know, the end state is not having a human pilot for everything, especially as other countries wrap up more AI capabilities in their drones. How do we stay competitive? Um, so Valkyrie, one of the Air force's, uh, air combat drones actually kind of solved this challenge, problem with.

Similar to a foundation model. They took a ton of images, a ton of videos for what these drones see, and we're able to kind of build this foundation model that lets it win these challenges in simulations, lets it actually fly on its own and kind of removes the need for a human pilot. Um, so really cool stuff around that.

You know, I think. The entire, you know, DoD and warfare space is rapidly trying to integrate AI in ways they see efficient. You know, we've seen a lot of excitement around LLMs for DoD, but I think the real weight here is these types of drone controls, whether that's submarine drones, whether that's air drones, these are the types of things that'll actually keep, you know, the US military and the five eyes competitive.

So really cool stuff out of them and who knows what they have that they're not releasing. So, really cool stuff.

Farb: Yeah, they were, uh, I mean, they, they make a couple of great points. You know, running one of these jets for an hour costs tens of thousands of dollars. Mm-hmm. So running it in simulations is obviously a fraction of that cost.

Uh, and then I, I, I also love the, the name of, one of the, one of them is called Sky Borg. Uh, instead of Cyborg. It's pretty, pretty, pretty hilarious name. Uh, Conner, what did you, what'd you take away from this?

Conner: Cyborg sounds a little bit similar to, similar to Skynet. My, from my opinion, but get over it.

Farb: Okay. Skynet's coming.

Conner: Yeah, no, they, their stated goals are for it to be able to do on the air or on the ground attacks on its own, which is very lofty goals. So, of course they still have human in the loop. Uh, they still will always have someone either flying next to it as they did for all these demos or just monitoring it remotely.

I believe they wanted to build a thousand of them or just have a thousand of them out in the, out in the field, which will be one pilot of the top 500 pilots, top 500 jets, two, two basically co-pilots, two flying next to them. So, Exciting future for AI in the military.

Farb: You know, you can't let other people beat you to this.

That said, hopefully we don't need so many planes attacking so many places that there aren't enough humans to. To handle the, handle the work or, or, or possibly at some point they'll just be much better than humans, but that seems like a pretty lofty goal here. The folks that are, uh, trained on flying these things are pretty spectacular.

It'll, it'll be a little while, I think before an AI is doing a better job. That said, an AI's not going to potentially pass out in the plane, um, from pulling G-Force and things like that.

Ethan: Absolutely, and I think we'll have to get to some of this, you know, like a hypersonic missile and some of these other capabilities.

Just a human can't have the reaction time needed for defense. So having AI in the loop for these, despite the safety risks and all that, I think is extremely important. So it's cool they realize that.

Farb: Yeah, you can't give up AI dominance in this space for sure. Um, right. Moving into our last story, the fine folks at Alibaba have released a high performing smaller model, uh, 7 billion parameter model.

It surpasses LLaMA-2, seven B and potentially 13 B as well, especially on math and coding. They've, uh, made it commercially available, I think up to a hundred million users. Some pretty cool results here. Uh, Conner, what, what, what are you, what are you getting from this?

Conner: Yeah, people thought for a while that LLaMA-2was probably hitting the limits of what we can do on a 7 billion or even 13 billion per hour model.

But now that Alibaba's new model at 7 billion is beating even the 13 billion possibly. It's pretty exciting for maybe even further, how much we can push these smaller models. It seems it's, it is Chinese focus, so of course it's mainly centered around the Chinese language won't be as good for English or any of those types of use cases, but, As we keep seeing China create research outta them, as always.

Farb: I thought I read that it was pretty well suited for other languages. Maybe I, maybe I misread that.

Conner: It is, it's pretty multifaceted, but mostly Chinese focus. Of course. All their demos, other examples, even a lot of their like docs and language is in chinese.

Farb: You know, we're seeing the sort of expanding, uh, we like the, the, the broad LLMs back to like more fine tuned LLMs, like, Hey, can we make this very good at math and code versus trying to be good at everything under the sun.

I think this pattern of expanding and contracting is probably gonna continue forever. Ethan, what's your read?

Ethan: Uh, yeah. The cool, coolest thing I saw from this was it has supportive plugins. So they actually trained it with a lot of this like plugin alignment data. So when you want a, let's say you're building an agent and you want a small model that's maybe more efficient than the big models, um, You want it to call APIs, you want it to call databases, you want it to work with some code.

I think these are the type of models you want to fit, and this one seems to be working a lot better than llama. So like I said, instead of calling GP four for some of these use cases or setting up a bunch of a one hundreds for a larger model, we're getting these smaller models that are useful in these kind of more defined contexts.

Um, so having a 7 billion parameter model that can call some tools, hit up your database and at least manage that layer of your stack is really useful. Again, another. Engineering kind of pipeline thing that we're seeing LLMs go through. Also,

Conner: of course, again, to note every big company is gonna be trained their own LLMs.

Alibaba's not gonna be calling open ai. Alibaba's making their own model. Even if it's the exact same, even if it's only slightly better on some things, slightly worse than some things, everyone's gonna make their own model as we're seeing. Yep.

Farb: Absolutely. Alright. What are we all, uh, what are we all seeing out there, Conner?

Conner: Yeah, I saw, um, someone train kind of modified eight to two embeddings so that you can kind of mix and match them so you can kind of find the average of many statements. Um, bit of a weird token embedding things. I think the example will put it on the side, shows it better, but you kind of add and subtract different kind of senses like he is the king if you subtract, he is a man, an ad, she is a woman.

It'll give you, she is the queen. So that's a very simple example, but you can imagine that spread over maybe a million reviews. You can average all the embeddings for that statement and get the average statement between all those. So pretty interesting results. Pretty interesting. Something that we could possibly do.

Farb: Interesting. Very cool. What about you, Ethan?

Ethan: Uh, the collateralization of the GPU. I saw that CoreWeave raised almost two and a half billion dollars, um, of debt. Pretty much collateralized by the current GPUs they have, and to go buy a ton more GPUs, um, you know, speaks to two things to me. One of which it's, you know, venture dollars directly buying GPUs like at scale.

That doesn't make a ton of sense. So they're raising these debt facilities. And secondly, just these huge PE firms and these huge growth funds, seeing just real data, real demand for GPUs and saying, we'll throw debt behind that. You know, debt's not the easiest to come by, and especially at this size. For anything new and it's pretty rapid.

We're seeing it just apply to Nvidia chips, so really fascinating. Mm-hmm.

Conner: It's nice to see housing market of GPUs where other GPUs are collateralizing more GPUs all the way into the future, and then when GPUs become commonplace, it all tumbles down.

Farb: Yeah. I, I saw our, our favorite person to, uh, reference on the, on the pod.

Ethan Molik, uh, posted something and I gave him a bit of a hard time about it yesterday. So, a, I apologize, Ethan, if I was a little, a little bit harsh, but he, you know, he was talking about the forever debate of, you know, trust experts versus, you know, do your own, uh, investigation into something. And, you know, my point to him a little bit in, he was kind of implying that.

You can, you gotta trust experts. You can't learn everything yourself. There's too much to learn, which might be the case. But my point to him was that coming from somebody like him who people respect and, and look up to, uh, especially with regards to science and, and being a knowledgeable person, if, if you say something like that, you're just gonna get people to toss their, more people to toss their hands up and say, okay, well I'm just not gonna try and learn anything.

And I'll just trust experts. And I don't think it's an either or thing. You can give some value to people who you trust, who have more knowledge than you do, and you can do some learning on your own. Uh, making it a all or none in one direction, I think is the wrong framing. And I just encourage everybody to, you know, find smart people, uh, learn from them.

Trust them when you have to, especially if it's life and death that you gotta make your own decision about. What you're trusting. It's not like, uh, you can just, somebody said, I gotta listen to experts, so I'm not allowed to ask questions anymore and I gotta give up. Trying to learn. The whole point of learning is learning things you don't know.

So to say that, you know, uh, you shouldn't learn about that because you don't know it. It, it just obfuscates all the point of learning anything. Uh, you don't have to become an expert in everything, and you don't also have to just give up all of your decision making to experts. Uh, there is a wonderful integrated middle path that you can tread.

Uh, it'll take a little bit of work on your part, but my hunch is you'll be rewarded for it. So that's my little diatribe here at end. I think that was

Conner: beautiful. Very well said. Yeah, I think it's about finding experts who know their knowledge in specific domains of fields, and as you said, evaluating what you think, what they think in your own thoughts, and finding multiple experts and getting multiple opinions.

Ethan: So just don't be complacent when you hear the word expert.

Farb: Yeah. Yes, absolutely. Don't be complacent. Uh, I think Truman, Truman warned us about this. If it, if it wasn't Truman, it was The Simpsons. You know, they, those two seem to have covered every dire warning about the. Did

Conner: they have a, uh, superconductor episode of The Simpsons?

Probably. Oh, I'm,

Farb: I'm sure you, I'm sure you can ask ChatGPT. Man, that's not my job to, will do. Talk about me too. Thanks for joining us everyone. Hope you have a great day. We'll see you on the next episode of AI Daily. See you guys. Thanks guys.

Varda LK99 | AirForce AI Drone Flight | Alibaba Qwen

AI Daily — Sat, 05 Aug 2023 00:19:52 GMT

Quick Points

1️⃣ Varda LK99

Varda in LA achieves levitation in LK-99 replication, hinting at possible superconductivity.
Promising breakthrough material, but further research required for practical applications.
Russian and Chinese experiments add to the excitement surrounding this groundbreaking substance.

2️⃣ AirForce AI Drone Flight

Valkyrie, the Air Force's AI-driven drone, conquers unmanned flight challenges in simulations.
AI integration vital for military competitiveness and cost efficiency.
Advancements in AI-controlled drones signal an exciting future for military applications.

3️⃣ Alibaba Qwen

Alibaba introduces a powerful 7 billion parameter model, outperforming LLaMA-2 7B and possibly 13B.
Ideal for math, coding, and plugin-based tasks, expanding AI's efficiency.
Multifaceted model tailored for Chinese language but shows potential for various languages and applications.

🔗 Episode Links

Varda LK99
AirForce AI Drone Flight
Alibaba Qwen
Model to Translate ada-002
CoreWeave - Collateralization of the GPU

Connect With Us:

Subscribe to our Substack

LK-99 Cont. | Flow2 NeuroImaging | IBM & NASA GeoSpacial AI

AI Daily — Fri, 04 Aug 2023 00:00:14 GMT

In this Today’s episode of AI Daily, our hosts Conner, Ethan, and Farb continue the discussion of LK-99, an intriguing material with replications in diverse settings, from Russian countertops to superconductivity experiments in China. The discussion revolves around practical implications and the path to usability. Next, they discuss Flow2 Neuroimaging, an innovative helmet offering FMRI-like capabilities, envisioning a future with accessible brain research and AI models. Finally, they discuss the collaboration between IBM and NASA, introducing Privy, a groundbreaking temporal vision transformer leveraging satellite data for predicting crop yields, monitoring disasters, and advancing earth science research.

Quick Points

1️⃣ LK-99 Cont.

LK-99 replication news: Russian countertops to Chinese scientists exploring superconductivity at room temperature.
Exciting advancements: Levitation and zero resistivity observed, though challenges in scalable usability remain.
Public interest surges, promising potential for future engineering and groundbreaking applications.

2️⃣ Flow2 Neuroimaging

Flow2 Neuroimaging device: Compact helmet offers FMRI-like capabilities for brain research and AI models.
Pioneering data collection: Predicting emotions and thoughts, potential AR integration, and revolutionary brain understanding.
AI's role in processing data, opening doors to a new era of human interaction.

3️⃣ IBM & NASA GeoSpacial AI

Named, Prithvi, a temporal vision transformer utilizing NASA's vast satellite data.
Applications in predicting crop yields, monitoring natural disasters, and advancing earth science research.
Open-sourced AI with profound implications, a milestone in bridging AI and earth science.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

LK-99 Cont. | Flow2 NeuroImaging | IBM & NASA GeoSpacial AI

AI Daily — Fri, 04 Aug 2023 00:00:13 GMT

Quick Points

1️⃣ LK-99 Cont.

LK-99 replication news: Russian countertops to Chinese scientists exploring superconductivity at room temperature.
Exciting advancements: Levitation and zero resistivity observed, though challenges in scalable usability remain.
Public interest surges, promising potential for future engineering and groundbreaking applications.

2️⃣ Flow2 Neuroimaging

Flow2 Neuroimaging device: Compact helmet offers FMRI-like capabilities for brain research and AI models.
Pioneering data collection: Predicting emotions and thoughts, potential AR integration, and revolutionary brain understanding.
AI's role in processing data, opening doors to a new era of human interaction.

3️⃣ IBM & NASA GeoSpacial AI

Named, Prithvi, a temporal vision transformer utilizing NASA's vast satellite data.
Applications in predicting crop yields, monitoring natural disasters, and advancing earth science research.
Open-sourced AI with profound implications, a milestone in bridging AI and earth science.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript

Conner: Good afternoon. Welcome to another episode of AI Daily. I'm your host Conner, joined by Ethan Farb. Farb is here once again. Uh, three stories. Today we're starting up with continuing LK-99. Um, there's been some replications of LK-99, anywhere from Russian countertops to scientists in China who replicated it, working as a superconductor apparently, but haven't replicated it.

Working at room temperature. Farb, what have you read on this? What do you think about it?

Farb: Firstly, I didn't know they were making countertops out of this stuff in Russia. That is, that is, that is next level. I'm all here for a superconducting countertop in my kitchen. But I think you were talking about somebody apparently cooked it up in their Yes.

Kitchen in Russia on their countertop. But that said, the average Russian kitchen is more like a super lab in, in other countries. I'm, I'm sure, so not surprising there. Uh, some pretty exciting stuff here, you know. This is probably the greatest example of we're so back that I've ever heard. It's literally every 12 hours somebody drops something that's a little bit, you know, disconcerting and then 12 hours later it's, we're so back.

So there's seems to be a few things going on here. Some groups are noticing that this thing can levitate. Some groups are noticing that it's got zero res resistance to electrical, um, connection and doesn't seem like they're the same group. Um, two different groups, uh, that said the, you know, zero resistivity is being shown at something like 110 Calvins, so a lot colder than room temperature, obviously, but something that is actually a manageable temperature, you know, uh, in terms of you're not talking about needing a lab the size of a building, uh, to pull this sort of stuff off.

And, you know, my, my hot take on this is this. We're in the, this is amazing. We are in like this deep research and science phase of this, and yet there is this massive like public outpouring of interest and, uh, participation in making this happen. And, and nothing could be better for the world than.

Everybody is super excited and getting behind the latest scientific discoveries that could change the world. That said, it's probably gonna take many years of hardcore engineering and manufacturing, uh, iterations to get this to something that is a highly usable material. So, for example, uh, some people are hypothesizing that parts of the material, uh, dis display these.

Characteristics, but not all of the material. Okay. So do we have to manufacture, create a manufacturing process that can essentially isolate that part of the material and create another material that is just made out of those parts of the material? That's how manufacturing works. It takes a lot more stuff to, you know, really put something into a, into a product, but it seems super promising.

It seems to really have the. Characteristics, at least some of the characteristics in some situations that they've been talking about. Now the question will remain, can we take this somewhere and do something with it?

Conner: No, I agree. Very well said. Um, the Chinese team noted that they like managed to reduce some of the impurities, but interesting thought, maybe it's some sort of like Roman concrete situation where it's the impurities that give it the capabilities that Lee and Kim found. Ethan, any interesting insights, thoughts?

Ethan: Yeah, I think y'all covered it well. Just, you know, even with the news we have now, this is such great progress on this, being able to show zero, you know, no resistivity in the conductivity here. I think if, if you've looked at archive, there's like 10 papers dropping every day.

From a theory side. From a simulation side, showing off the zero resistance again. So yeah, it's a little bit colder than room temperature for sure, but just as a massive story, just as it is now. And I think. The hype is living up to the hype so far.

Conner: It's also far easier to produce than other superconductors. So even if this is the most we find of it, is that it's not room temperature. It's just the fact that it's so easy to produce would help a lot in quantum computers, many other applications. So absolutely. Our next story today we have Flow2 Neuroimaging, Flow2offers an FMRI like neuroimaging with pretty, a pretty basic helmet that you just put on your head. Ethan, you read about this sum, what do you think about it?

Ethan: Yeah, so Flow2 looks really cool. Um, right now if you do neuroimaging, it's a huge device and a medical lab, you're trying to measure oxygen within your brain, and you're also trying to measure electrical current within your brain. And I think what's really cool about this is, you know, I'm not sure how much AI they're using in this product, but when I think about, you know, the future of kind of a foundation model for brains or understanding the human brain, we're so far from understanding that.

And the data sets we have right now are so limited. That. I think if people actually start picking up on this device more, using this device more, we'll have such a big data set of really what's going on in the brain and can likely build a lot of AI models off of that. So I think it's a really cool application to AI and understanding ourselves, understanding biology, understanding.

How our brain works at the end of the day. So just based on oxygen and electrical current movements, how much more can we figure out as to, you know, why we're sad someday, why we get excited, what our brain's thinking about. So some really cool applications here. Again, I'm not sure if they're using AI in their product for any of their, you know, potential measurements in the future, or stats they might give you on an app or something like that.

So it might just be hardware for now, but I think some really cool applications to AI in the future.

Conner: It kinda looks like a biker helmet and it's like, it's pretty exciting and you can see some sort of future well, where you have your biker helmet that can read your brain and has a visor and that has your AR overlay of the world and reads your thoughts of what you want to show you. Yeah, so kind of stuff like that. Farb, what are you thinking about this?

Farb: I believe Colonel is from our dear friend Brian Johnson, the live Forever, man. I think, I think he started Colonel years ago, or at least was part of it. Uh, maybe I'm making that part up. Um, I'm sure he'll take, I'm sure he is happy to take the credit for it.

They're combining a few cool technologies here. Uh, To, like Ethan said, understand oxygen flow in the brain. Understand electrical signaling in the brain. Uh, one of the cool technologies, td, FMRIs, I think it's time domain functional near infrared spectography, I can't speak today. Um, and uh, what that's doing is actually shining infrared light through your brain and understanding where the flow of oxygen is in your brain and creating a map of that.

So I think this is creating a. Treasure trove of data that ais will be able to use. And some of the things that you'll be able to do with this is, you know, control computers and control objects using your mind, uh, diagnose, uh, brain issues with this. Understand if you know you have a concussion or of loss, you know, oxygen to a part of your brain.

So I think the amount of data that this thing is gonna be able to generate on the brain is. Pretty remarkable and sort of exactly what AI and ML loves to get is a ton of data that it can do some cool stuff with.

Ethan: Yeah. Okay. You talk about controlling computers too. It reminds me of like, you know how they use basic AI on your iPhone's keyboard or something?

Predict the next letter you're typing. Same type of thing for these kind of brain machines is being able to predict what action you want just from your signals in your brain.

Conner: It's a new way to interact with the world. So third story, today we have IBM and NASA. They collaborated on a GeoSpacial AI model.

They called it Prithvi. It's a new state-of-the-art temporal vision transformer. So essentially they got all the satellite data from all over the world that NASA's been collecting for years and years, and they put it into IBM's new watsonx.ai platform. And it's kind of, the whole story is a little bit just of a.

Like push piece between IBM and the new Watson X platform and NASA being able to talk about, Hey, we're doing new, big things in ai. But both of those are pretty exciting nonetheless, and I think it's a great story. Ethan, what do you think about it?

Ethan: I mean, I, I think if we went back to 2001 and you said, is NASA and I B M going to open source AI on hugging face?

I think most people would laugh at you. So this is really cool. You know, they took every single image and every single spectometry from NASA's images of the earth, you know, from fires, from crops to mountains, to Texas to New York. So all across the world we have so many satellite images, but, you know, predicting crop yields, predicting fires, predicting floods is still really difficult.

So, They built a foundation model around it and you know, I don't know all the applications we use it for, but it's super cool. I don't know if y'all checked out their video or demo. They have better search and you can say, you can jump into India and say, Hey, what's, what are the crops gonna look like in a year from now?

Or like, where do you think is happening to these current crops based on the latest images versus all this like human analyzers day to day. So really cool stuff and I love that they open sourced it.

Conner: No, extremely exciting. I love it. It's, it was more of like, um, Like it's a pretty small model, I think a hundred million parameters, so I think it's kind of more of a demo of what NASA and IBM can do. And so I'm excited for when they get more funding and apply themselves more to something like this. Farb, what do you think?

Farb: There's probably some spectroscopy in this story as well. All spectroscopy all day. Great. Now you've said it a few times, I can start saying it correctly. They're using NASA's, what they call harmonized Landsat Sentinel-2.

Uh, this is pretty cool. This is. Uh, a satellite system that gets a complete image of the earth every two to three days, and it's down to about 30 meters, I think, per pixel. So you can't quite make out a tree, but for most, you know, geo work that you're gonna do, it's probably very helpful and you know, Taking this mountain of data and again, uh, putting it in the world of transformers so you can start digging through the data and making sense of it more readily and reliably, uh, is just gonna be a, a good thing for everyone.

Whether or not you, um, are interested in the climate change angles of it, or you want to use it for improving crop yield or understanding, you know, uh, certain. Flooding patterns that could impact human or agricultural existence. Uh, it's, it's, it's pretty powerful. And if this is their first foray into this, I, I can't imagine what the, what it's going to lead to. It's gonna get better and better.

Conner: Mm-hmm. Yeah, I agree completely. NASA, I'm excited to see what they build next. Apparently they talked about they're building some language models next based off earth science literature, so I'm excited to see that and talk about that in hopefully a few weeks, maybe a few months, and whatever else they work on.

So, okay. Well, what have you guys been seeing? Anything interesting? Anything exciting? Ethan, what about you?

Ethan: Um, a16z dropped a pretty cool blog post on the impact of AI to healthcare. You know, I think we're seeing a lot of people go on the healthcare side from an administrative side to Dr. Gupa, which we've covered.

There's so much just. Paperwork between insurance, between patient authorization. The entire healthcare space is just one gigantic field of documents. So I think LLMs are gonna have huge impact here too. Um, and their blog post really covers it well on some of the initial angles people are trying. So really cool stuff.

Conner: Yeah. Wonderful article. I definitely recommend it. Farb, what about you? What have you read? What have you seen?

Farb: Mostly I'm just playing with an Allen wrench. If you're not, if you're not fidget spinning with an Allen wrench, what are you doing with your life? Uh, I saw the fine folks over at OpenAI are, uh, sharing a little bit more, putting out a little few more cool little features here.

Uh, some of the stuff you may have seen in, in, in some of the other LLMs, like, like Bing or, um, Claude, for example. So, uh, they're doing prompt examples to kinda help you get started so you're not just seeing a, a blank page when you get there. Some prompt examples, they're doing some suggested replies.

Something that I think, uh, Bing has done a pretty good job of, and so and so has, uh, So is Claude and, and Poe and maybe Poe just has it in their, uh, in their whole model. I think just about everywhere you use, uh, whatever LLM you're using, I think Poe provides some of this. So suggested replies so you can go deeper into whatever conversation you're having with it.

Uh, they move to GPT-4 by default. Uh, you can upload multiple files now, which is pretty cool. Uh, they keep you logged in instead of booting you out every couple of weeks, uh, and a few keyboard shortcuts. I'll let you discover those on your own.

Conner: I think the login one is the most exciting. It was kind annoying to be logged out every day open ChatGPT.

Farb: The, the challenges that people these days have to deal with, it's, it's a wonder we get through the day.

Conner: I'm, I'm glad they're upgrading their web developers to be a little bit closer to their AI model developers. So,

Farb: I mean, if you're paying, if you're paying per click on your mouse, it's nice to have one less click on the login button. You know what I mean?

Conner: Three clicks usually 'cause it redirects me to the zero page, and I have to go back three clicks these days. Very expensive. Yeah. Um, yeah, I saw that there's commercially available for Vicuna models now. So of course Vicuna was originally trained on LAMA one, um, not commercially available.

And now recently they retrained it, another team trained it, I think, uh, for Vicuna two on LAMA two. So very exciting.

Farb: Also speaking of commercially available. I think NASA, uh, the NASA IBM CoLab is going to, there is a commercially available, uh, portion of that.

Ethan: Yeah, fine tune it.

Farb: You can fine tune it. Yep. A lot of fine tuning examples there, so I'm sure we'll see John Deere and the likes making fine tunes of that.

Ethan: That would be cool.

Farb: We love to see it.

Conner: Big fans of John Deere over here at AI Daily and so, okay, another great episode you guys. Um, thank you guys for all tuning in.

We'll see everyone tomorrow.

HierVST Voice Cloning | NVIDIA Perfusion | Meta's AudioCraft

AI Daily — Thu, 03 Aug 2023 00:00:08 GMT

Welcome to AI Daily! Join hosts Farb, Ethan, and Conner as they explore three groundbreaking AI stories First up, HierVST Voice Cloning - Experience zero-shot voice cloning with impressive accuracy using just one audio clip. Next, NVIDIA Perfusion - a small, powerful personalization model for text images, using key locking to maintain consistency. Lastly, Meta's AudioCraft - the fusion of music generation, audio generation, and codecs into one open-source code base, creating high-fidelity outputs.

Quick Points

1️⃣ HierVST Voice Cloning

Zero-shot voice cloning system achieves accurate outputs with just one audio clip.
Uses hierarchical models for long and short-term generation understanding.
Potential challenges in handling longer clips and need for further fine-tuning.

2️⃣ NVIDIA Perfusion

Personalization model for text images with key locking for subject consistency.
Only 100 kilobytes, trains in four minutes, and outperforms other models.
Open-source codebase, but may need improvements for human subjects.

3️⃣ Meta’s AudioCraft

Audio generation, music gen, and codecs combined into an open-source codebase.
High-fidelity outputs, 30 seconds of sounds, compressing audio files efficiently.
Meta making strides in audio AI, impressively opens research use for community.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript:

Ethan: And good morning and welcome to AI Daily. Today is August 2nd, and we got some fantastic stories for you Today. We're starting off with HierVST Voice Cloning. So this is a zero shot voice cloning system. So if you've seen voice cloning systems, it's pretty much when you take one person's voice, take another person's voice, and then try to convert the actual sounds of that with the same transcript or the same text.

So a lot of the current models take. You know, hours to train. They take a ton of examples. They take hours of you reading off transcripts, and the goal of this has always been to get to a zero shot. So being able to just put in one audio clip of the voice you want to clone and get some accurate outputs.

So, Farb, did you check it out? Anything particularly cool from this one?

Farb: I mean, the examples that they're showing seem spectacular. I, I gotta admit I was a little bit confused. I, there's an abstract on their GitHub, but. I didn't find a paper anywhere. I don't know. Did you guys,

Conner: did you guys I I had to search for it.

Yeah. I, yeah, they

Farb: don't, they don't post the paper on their page with all their examples. I mean, again, the, the results are just kind of mind bendingly. Awesome. Uh, and again, they're doing this with, you know, no text. They're taking one sample of the target voice, and they're very successfully replicating it.

I would say I was lying if there was something a little bit weird going on here. I don't know my, my, my gut points to there being something that they're, that's missing, but hope, hopefully I'm wrong. This is pretty powerful stuff.

Ethan: Conner, anything that stood out to you? I, I saw they also had some one-shot voice cloning.

It even upped the accuracy a little bit. They had multi voices. Anything particularly cool to

Conner: you here? Yeah, I just wanna comment on like, I remember when we first trained my voice during covid took like hours and hours of data. Yeah. And then a couple months ago we covered some other project that took just like a few minutes of data and now you have one shot style transfer.

So I does think, I do think there's probably some hiding some things on maybe inference time or maybe like training time. But I believe that it works and I believe it looks pretty solid. The higher V S T, the hierarchical model that they have here kind of follows in the footsteps of meta's higher vl.

This is new types of hierarchical models that instead of just looking at the short term, it looks at both the long term of the entire generation and the short term. So very powerful combination there. And we're seeing very powerful models coming outta that. Yeah. So like I wonder

Farb: though, does this work on something longer than a very short clip?

Conner: Probably, I mean, probably need to fine tune it more. Probably need to upgrade it more. Probably. This is only a first model of it, but I would imagine that it does. Yeah, the example

Farb: seem perfect, which always makes me wonder if it's a little too

Ethan: good to be true. Always somewhere in the middle, but pretty cool from that.

Our second story of today is Nvidia perfusion. So Nvidia perfusion is a personalization model pretty much for text images, so they've been able to get this to be a really small model. But at the end of the day, you know, we've covered Laura's before. We've covered dream booths and fine tuning. And for all those examples, you're pretty much trying to take a set of images and make sure the image model keeps some subject consistent.

So maybe that's a face, maybe that's a teddy bear, maybe that's an object. And it's always something that's taken a lot of training, and especially with objects, it gets more difficult. So Nvidia profusion uses something really cool called key locking, and they're able to actually create a model that's less than a hundred kilobytes.

It trains for only four minutes. And all the outputs look super interesting. Conner, could you tell us more about key locking and kind of how this works?

Conner: Yeah. This is another surprising model of the fact that this works at all, but considering it's from Nvidia, I am inclined to believe it. Yeah, only a hundred kilobytes trains in four minutes.

Pretty powerful. Um, it's used the method very similar to Laura's, where it only modifies a small set of weights called Rank One editing. And yeah, the key locking uses cross attention. So instead of just letting it overfit, it now locks the key to only train the weights of that particular concept. Very powerful, powerful way to do it, I think.

And they also allow multiple concepts through that by gating the key locking and gating, which keys are actually being used. But we'll show some pictures of the outputs here. We'll link it below. Looks very powerful. It looks like it honestly beats out. Dream booth, beats out textural version, beats out any of these other models.

I think it's pretty

Ethan: cool. You could do combined objects as well. So not just one object far. Did you see that? Any kind of extra comments of something you wanted to point out? I

Farb: mean, it's pretty crazy. It's five orders of magnitude, uh, smaller than other state-of-the-art solutions here, uh, which is an enormous amount.

It can fit on a floppy drive from the 1980s. If you have any of those sitting around that you wanna put to use, uh, yeah, it does a great job of not, you know, Ignoring the prompt that you're giving it and overfitting to the subject, which has been classically a, a challenge here. Uh, again, doing it at such a small size, it would be, uh, probably even a bigger challenge.

So the fact that they've overcome that and you can, you know, give it an image of a dog and give it a prompt, uh, and not have it sort of create this washed out attention of the. Uh, all of the inputs and gives you a sort of like, oh, okay, the dog's in there, but you kind of forgot what I asked you to, you know, the context I asked you to put the dog into.

Uh, it doesn't do that. It, it, it maintains the context of the prompt that you give it while maintaining the, you know, image that you gave it as well, which was really amazing, uh, awesome stuff to see from Nvidia.

Conner: I'm inclined to believe it probably doesn't work as well on people right now as something like Dream Booth does, because even just a teddy bear, it loses some very specific details.

But objects like a teddy bear or objects like a tea part, that's hard to notice. Um, but yeah, when something like a human or maybe even like a dog, probably more difficult. Probably not yet ready for that.

Farb: Yeah, you can probably notice the, any weird artifacts more readily if it's like a person's face.

Ethan: Exactly. Beautiful. Our last story today is Meta's AudioCraft. So audio craft looks like they've really pieced together music gen, audio gen, and what they call in codec into one code base and trying to get. Trying to actually open source a lot of these, uh, pieces of this, they hadn't open source before, so they wanted to put this all into one code base and make it much more easier for developers to start using to start editing it and really try to kick off more on this text to audio wave and supplant themselves there.

So all these models we've touched on a bit before Conner, was there anything particularly new, um, for their models or what they're releasing? Or is this kind of them organizing under one head?

Conner: A bit of an organization, but they are upgrading a lot of things and making them a lot better. It's very high fidelity outputs.

We're getting about 30 seconds of sounds or 30 seconds of music, which was hard to get before. Yeah, and it's entirely open source, so if you wanna go play around with it code's, open source. The model weights are non-commercial for research, but technically they're doing a very interesting way of doing it.

So of course, a standard music file, standard music track has about a million time steps of individual data to work through. And so the way they do this is the same way that a language model groups multiple characters into word tokens. This is grouping a neural audio codec to group many individual time steps in a music file into audio tokens, and then it essentially just trains a language model over that and you get your outputs and they sound very good.

So another knock outta the park by meta.

Ethan: Yeah, then putting the audio codec in there is really cool as well. Just showing the need to kind of compress these files and actually give 'em a different representation. You can't just shove it in like text for audio, farb, anything. Did you get to see the examples?

Anything stand out to you?

Farb: You know, meta is not known as an audio company. Uh, and I think the folks that are working, you know, the audio teams there are busting their butts to make a name for themselves and make a splash in, in the world with regards to audio and ai. It's really impressive stuff here.

The fact that they're opening this up for research use is, is amazing. So, They're clearly trying to make a, make a name for themselves in, in this space, and, and they're doing great work. The examples are, uh, super impressive. They sound basically perfect. Uh, this is a little bit of a reorganization of things that existed, some improvement of things that existed and sort of this announcing that they wanna make this stuff actually available for people to start using.

So I, I'm pretty impressed with the work there. They're not, they're not messing around. They could be, if this was an audio only company, these would be. You know, this would be a well-funded company. Uh, that's, you know, probably leading the global charge. Granted, it's, it's meta. They can, uh, they can fund this a lot more than most single companies could be funded if they were just doing this.

But, you know, there's lots of other areas in AI that you're not seeing folks at meta blasting out as much stuff as the audio teams are so really impressive and keep up the work folks.

Conner: Absolutely. Yeah. I wanted to play one of the examples. It's actually scarily pretty good. It's sirens and an engine. So yeah, that one's

Farb: pretty

Ethan: amazing.

Yeah. I think we're at a really good place with audio now, and I think someone's gonna start tackling a mid journey for audio, which would be pretty cool. So Godspeed to whatever startup does that. But outside of that, what else are we seeing? Farb?,

Farb: I saw something in the LK 99 world where who knows if anything you were reading here is, uh, is is real or not, but it's, uh, it's too exciting to turn away from.

This is, uh, I think one of the coolest. Examples of the internet grabbing something and just running with it. If room temperature, superconductors are a real thing, it's gonna change the world entirely. Uh, in just about every way we can think of one of the. Something somebody was mentioning yesterday, I think there was a team in China that found that replacing the copper with gold actually improved things.

Who knows if that's the case? There's a, there's a classic, uh, comic book series. I'm, I'm blanking on the name of it. There was, it was made into a movie with Harrison Ford, if I remember correctly. But basically it was about how aliens, uh, are attacking Earth because they want our gold, uh, and gold is a really interesting thing.

You can. You can, uh, search for some of my, my tweets on gold. But you know, as far as we understand, gold is only created in supernova. Explosions. Gold is a universally rare element. Uh, it doesn't exist much anywhere in the universe. So the plot of the comic book series kind of makes sense because, you know, even aliens don't, can't access a lot of gold.

Uh, it takes a lot of energy to, to, to create gold. So, uh, it'll be interesting if this becomes another useful application of gold in industrial settings. Love

Conner: that. Yeah, I, I saw some pretty interesting and honestly funny comments talking about how like, maybe alchemists, thousands of years ago, had to write that mixing lead and gold really did give us magical rocks, so, yeah.

Ethan: Interesting. Conner, what about you?

Conner: Yeah, I saw that people were plugging the same character a thousand times into ChatGPT and getting blown away by it. People were plugging A over and over, C over and over, really any character over and over, and it gave very like sensical outputs, but like, Not at all related, of course, with a long string of characters.

So it would give like Portuguese or give like an answer to a code question. Some people were being like, might be leaking data, but that's of course then how Chet works. It's just a weird, funny quirk of how the tokenize works. Watch out.

Farb: We got a data leak at the plumbers.

Ethan: I love that. Um, yeah, I just saw that Apple's App Store in China.

So China's of course been locking down more restrictions on generative ai. So Apple's actually sent notice to like a hundred different apps who are gonna be pulled from the Chinese app store. So China seems to be at the forefront of regulation, or more just constriction of, you know, censorship. So we'll see how that plays and to the app store, you know, in other countries or if other countries take foot.

But of course, China's ahead of the curve on, you know, restricting some access. So, but outside of that, thank y'all for tuning in to AI Daily and we will see you again tomorrow. Peace guys.

HierVST Voice Cloning | NVIDIA Perfusion | Meta's AudioCraft

AI Daily — Thu, 03 Aug 2023 00:00:07 GMT

Quick Points

1️⃣ HierVST Voice Cloning

Zero-shot voice cloning system achieves accurate outputs with just one audio clip.
Uses hierarchical models for long and short-term generation understanding.
Potential challenges in handling longer clips and need for further fine-tuning.

2️⃣ NVIDIA Perfusion

Personalization model for text images with key locking for subject consistency.
Only 100 kilobytes, trains in four minutes, and outperforms other models.
Open-source codebase, but may need improvements for human subjects.

3️⃣ Meta’s AudioCraft

Audio generation, music gen, and codecs combined into an open-source codebase.
High-fidelity outputs, 30 seconds of sounds, compressing audio files efficiently.
Meta making strides in audio AI, impressively opens research use for community.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

LK-99 Updates | LLM Editing | AI Radiology Study

AI Daily — Wed, 02 Aug 2023 00:01:08 GMT

In this episode of AI Daily, hosts Farb, Ethan, and Conner delve into three big stories in the world of AI. First, discover the ripple effects of knowledge editing in language models, a benchmark of 5,000 facts highlighting challenges in current LLM editing, and an innovative in-context editing method. Next, we bring you updates on LK-99, a room temperature superconductor that may revolutionize the field. Learn about simulation findings and the potential end of Wakanda's unobtainium monopoly. Lastly, we explore how AI is impacting the field of Radiology. Uncover whether AI copilots or working independently is more effective for radiologists and the role of UX in AI adoption.

Quick Points

1️⃣ LLM Editing

Adding or changing a single fact can cause a cascade of changes in an LLM's understanding
Benchmark of 5,000 facts reveals current LLM editing methods struggle with ripple effects.
Innovative in-context editing method shows promising results.

2️⃣ LK-99 Updates

LK-99 superconductor shows potential with simulated copper bands for energy transfer.
Exciting news shifts markets as room temperature superconductivity gains traction.
Future engineering may lead to increased bands for practical superconducting applications.

3️⃣ AI Radiology Study

Combining AI and human expertise in radiology yields suboptimal results.
UX plays a vital role in AI adoption for medical applications.
Future implications suggest AI or human-only approaches may be more effective.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript:

Farb: Hello, good morning and welcome to another episode of AIDaily. We've got a few interesting stories here. As always, I'm Farb. Joined by my co-hosts, Ethan and Conner. As always, let's jump into our first story evaluating the ripple effects of knowledge editing in language models. Uh, in this paper, folks are trying to.

Understand the effects of changing individual facts or adding facts to an L L M and what ripple effects that causes. The example they give is, you know, if you say that Jack Depp is the son of Johnny Depp, well then that implies that, you know, There are siblings of Jack as well, so you have to, you know, update your understanding of the siblings if you're just adding a fact about Jack Depp being the son of Johnny Depp.

Conner, tell us a, tell us a little bit more about this paper and, and what you thought about it.

Conner: It's a very interesting issue. They pointed out with the problems of editing l l m knowledge bases that no one's really talked about before. As you said, you can pretty easily nowadays edit a single fact in L L M, but pretty much every fact has a ripple effect.

Another example they gave was, if you're updating the model to say that the Eiffel Tower is in Paris and not London, I. Then the model also has to understand the change in time zone that the Eiffel Tower is in, the change in country that the Eiffel Tower is in, there's a lot of ripple effects from essentially every fact that makes not just a singular fact you have to change, but a large swath of knowledge in the LLM.

So because of that, they wrote it up in this paper to more describe the problem deeper, and they put a benchmark together of 5,000 facts and 5,000 examples of different kinds of ripple effects. That is a benchmark for l m editing, and they tested most ways that LLMs are edited nowadays and found that they don't really handle ripple effects that well.

And they ended the paper with. Basically just an in context editing method where at the beginning of your prompt, you say the Eiffel Tower is in Paris and not London. And then now the model knows that whatever you prompted next, it uses that context of course. And they found that that of course blows the water out of any other example of LLM Editing.

Farb: Yeah, that's pretty interesting. Uh, Ethan, you think this is you know, got some. Deep impact on how we're gonna be able to use LLMs in the future. Is this like a fundamental flaw or is this just a, you know, a little, you know, trick for a paper?

Ethan: Yeah. I think half trick for paper, half engineering hacked, you know, some people use LLMs in combination with.

A search tool or a vector database to kind of set up a separate fact database so they can reduce these hallucinations. And some people are trying to say, Hey, we want all the facts embedded in the L L M and we're gonna keep editing the l l m and Oh no, a big problem with editing it is these ripple effects.

So it's kind of a common sense problem that actually it's. A fairly big problem for the LLMs. There's a lot of symbolic representations you have to edit just for one new fact. You know, if you think of a new fact, you learn, it updates a lot of your priors about the world. So for the people who use LLMs in this way, which is, you know, most of ChatGPT, and.

Kind of most of the large language models right now, it is a problem and I like the way they have this new benchmark. So we'll see if people kind of continue to use it this way or separate fact database become popular.

Conner: But yeah, this is pretty interesting for adversarial editing also, because the ripple effects also apply if you're trying to teach the LLM a lie if you're trying to like change the knowledge from a fact to a lie.

Mm-hmm. So for attackers who are trying to. Like the UAE and how they've edited their model Falcon to talk about the UAE better. If you wanna change what an LLM thinks in a false way. Also this applies in the same way. It's very interesting for that also.

Farb: Yeah. Well, if LLMs are approximating being humans, then fact checkers are probably something that we're gonna need for some, some time ahead.

It is. They are just trained on things humans have done, so immunity. Community notes for LLMs. Yeah, the real, the real reinforcement learning. The, uh, one thing, and the last thing, uh, I'll say about is that I thought it was interesting that they noted that, um, changing facts about popular entities actually caused the most problems, uh, which is somewhat understandable as the, as the tentacles of something popular are probably reaching farther into the LLM and requiring more things to be updated.

Yeah. Onto our next story. That was a nice primary source story for you all. Moving on to our next story, let's get back into the LK-99 News. We are so back. We were so gone. We are so back. Nobody knows what's going on. Nobody knows what's real anymore. That's what we're all here for. Some exciting new news.

It seems that, you know. There was a supercomputer simulation saying that this is, you know, possible, uh, there are some stories of, uh, people starting to replicate it. Um, one in China, I think maybe there's one in Romania. I think the folks down the street at Varda are also still, uh, working on it. Uh, this is super exciting.

Room temperature, superconductor LK-99. It's not dead yet. It seems like there's, you know, the, the, the weight seems to be. Trending towards it being somewhat real as opposed to everyone just, you know, coming at it, uh, over and over saying this, this is, this is Bull Ethan, what, what did you, what are your, what's your take? What are you thinking?

Ethan: I think it's pretty exciting. We saw manifold markets jump to like 55% after these two stories. So of course the first story was that big kind of quantum chemistry, super computer over at the DOE. They got to simulate. What is LK-99 actually doing? Um, and they were able to find that, hey, there are some copper bands within LK-99 that move energy at the firmy level.

So they're pretty much showing that there are some bands within this element that show Superconductivity. Which is pretty cool, and it also explains how the kind of the current that you can put through LK-99 is not super high because right now it's only a couple of these bands, but a huge discovery at the end of the day could mean we have another five, 10 years of engineering to actually get these elements and increase the number of bands and make it a usable superconductor.

But end of the day, They've simulated that, hey, there's two bans in here that exhibit these properties, so that's super cool. And then we got to see the news out of China where another, you know, Meisner effect floating rock, um, over at a kind of test case in China. So, two big, huge news. It kind of completely shifted the market.

It's got people excited again. I think, you know, I think it's showing we have something real here. How long it's gonna take to put into use. I don't know, but it's real.

Conner: I was gonna say it's pretty bullish that like the original hypothesis of the paper had that it was the copper atoms percolating into the lead crystal.

Like their hypothesis. That's, that is what made it superconductive. Mm-hmm. And then the d o e simulation, that that does actually cause superconductivity to be possible very bullish that they're right about that.

Farb: I love that, there's so many great parts of this story. Like the whole quartz tube thing may have just been some complete accident that, you know, the, the famous story of, um, Thomas Edison is, you know, he, he.

Figured out 5,000 ways to not make a light bulb. Yeah. Which is to say, a lot of times fundamental scientific progress just means, you know, trial and error until you get lucky and, you know, you can't actually discount that as a, you know, major part of what's moved science forward o over the years and possibly part of what's happening here.

And, um, I guess the folks in Wakanda will potentially be losing their monopoly on unobtainium. Uh, if this ends up being true. Sorry, Wakanda. And, uh, great. Anything else to add to this story?

Conner: I saw it's pretty interesting, like a big problem with this, especially because of the whole crystallization of copper into lead.

A big problem is like how that crystals form in earth. Gravity. So Delia and Varda, they were tweeting how like of course space doesn't have that problem. So very interesting. Superconductors in space. Exactly.

Farb: We're conductors in space. I love it.

Ethan: Powerful.

Farb: Yeah, very powerful. All right, let's move on to our third story. Also, another primary source story, which we love, which we love over here about radiology, radiologists and ai. And I think this is interesting because it points to some real world applications of when we're combining. You know, basically the paper's asking, is it better to have an ai, uh, be your radiologist?

Is it better to have a radiologist be your radiologist? Or is it better to have them work together? And I. You know, Elon commented on this tweet with an exclamation point because the papers seemed to find that when the AI and the radiologists worked together, it was kind of like the worst results.

Uh, and, you know, they're, they're pointing to a few different things in there, uh, which we'll get into. What, what did you think about this, Conner? What were your takeaways?

Conner: Yeah, as you said, it's very interesting that like, Having the human in there at all kind of makes the worst result. So this kind of points to, in the future, it won't be human having the AI as a copilot.

It'll be either the AI with the human as a copilot or just the AI. Bullish on AI, really, sorry, humans.

Farb: The questionable robot in the group is, uh, talking trash about humans. Unsurprisingly. Ethan, what's your read?

Ethan: Yeah, it's like no matter how much progress we have with ai, if you can't get people to use it, Um, you're not gonna see its fast application and medical, et cetera.

So it kind of got me thinking about like, UX of ai, right? Are you gonna have to, like, people are into explainability of these models, so how do you get them to explain it to people so they feel comfortable using the outputs? How do you get it to psyop people? So they kind of like use the co-pilot more, right?

Is the LLM gonna have to use explainability and kind of like almost. psyop them into using some of these kind of outputs and models. So just kind of UX around how does AI affect people? How do we get it into, you know, real industry's hands. I think about lawyers too. It's kind of similar to, similar to medical and the fact that there's a lot of these tools out for lawyers now, but not all of them are using it.

They're not too into it. It's still such a slow moving industry. So how do we fix some of those problems and are we just gonna have to see the complete autopilot of them, our copilots on their way out? I think I'm with Conner on the fact that I've never been a big co-pilot person.

Farb: Yeah. You know, I think there's uh, some big imm implications here and probably we're gonna see some changes in the end.

It always just kind of comes down to who wants to underwrite what I. And you know, are you gonna underwrite the doctor? Are you gonna underwrite the AI? Are you going to underwrite both and be like, Hey, you know what, this is what our AI is saying to do and this is what the doctor's saying to do. You know, we can go in either direction for your treatment.

Uh, just know that either way, you can't sue us. We don't care if you're going to have to pick, pick one or the other. And, and maybe it's, maybe it's left up to the person. I mean, kind of what they've showed is that the, the radiologists favored their own interpretation over the ais and, you know, sort of thought that the Ai, AI came to a conclusion somehow, independently of their conclusion, even though it was based off of the same, uh, information.

So, you know, clearly it wasn't just kind of making up its own stuff. It was, it was, they were both informed the same way. So, you know, Which is more likely to get underwritten the the doctor or the AI results. And I wouldn't be surprised if it was the AI in the long run, and we may be, maybe we'll have this weird transition period where, like I said, you'll get both options presented to you and it's like, okay, what do you wanna do for your treatment?

The AI treatment or the doctor's treatment, they don't, they're not agreeing

Conner: here. Patients of course, often get second or third opinions from their primary care doctors, so this could be the same thing. You have your primary care doctor and then you have a second opinion from a global AI that has a lot more information than your doctor, but just isn't your personal doctor.

Yeah.

Ethan: Yeah. We'll, we'll be ineffective babysitters of AI until we can blame the AI .

Farb: Yeah, until you can underwrite the AI, you'll still have to underwrite the person. All right, let's jump into the, what we're seeing portion of our find show here. Uh, Ethan, what are you seeing out there?

Ethan: Uh, yeah. Neon, um, a fantastic serverless Postgres database.

Uh, I saw that they raised. 46 million in Series B, so congrats to them. We've used them before. Um, really fantastic product. And you know, they're kind of also catching onto the AI wave with PG Vector. So instead of using a full set vector database, you can use PG Vector within Postgres. And I think we're seeing a lot of application developers and companies use it, so they're really latching onto that wave and I think it's helping improve their product and find them new customers that might not wanna switch their whole database. So yeah, check 'em out. We like them.

Farb: Big fan. Nice. Conner, what are you seeing?

Conner: Langchain, of course, they raised 7 million a while back to productize, the Langchain framework. Langchain, of course, is just a framework around ais to prompting and chaining and all that. They raised 7 million to build products.

A while back. We talked about Lang Smith, which they announced a bit ago. And yeah, I've been playing around LangSmith for the past week. Pretty good. I'd recommend trying it out. That's a lot of logs and observability of how you can. Use Langchain or really any other framework. Very helpful, I think. Um, just monitoring how you use OpenAI and how your prompts output. If you're not saving in your own database, you should be saving it somewhere.

Farb: Um, I saw that, uh, OpenAI filed the trademark for GPT-5. So we don't know what that means. We don't know if that means it's coming or they're just getting way, way ahead of it. But thought that was, that was kind of interesting and it included some, you know, uh, audio and.

Language related, uh, stuff. So it wasn't just, it didn't seem like it was just language related, like text related. Had some audio portions to it. Uh, and also an interesting. Uh, paper that somebody, not, not a paper, basically an article, a rundown of, of the state of, uh, supply and demand in the GPU world, uh, heavily based around NVIDIA's, H100 GPUs.

And not surprisingly, they're finding that, uh, they're, it's tough to get them and especially the, the 8x clusters and, you know, people are, Fighting to get them. There's hopefully gonna be some more supply coming up here soon. Uh, the article was interesting though. It was, it was pretty detailed. They talked about different OEMs that you can try and work with, and if you're trying to get your hands on some GPUs, I highly recommend reading it.

Conner: Yeah. They really dig into the depths of like the actual materials needed to make GPUs of the substrate silicon, the rare earth metals. A lot of interesting details that. You don't think about day-to-day when using them of course, but it is important. I'm seeing the bigger picture of when more GPUs will be available, et cetera.

Ethan: Which it looks like 2025 based on their estimates and some other great market analysis. Another year and a half of backlog.

Conner: Yeah.

Farb: Alright, well thanks for joining us here for another exciting episode of AIDaily. We'll uh, be seeing you tomorrow probably. Have a great day everybody.

LK-99 Updates | LLM Editing | AI Radiology Study

AI Daily — Wed, 02 Aug 2023 00:01:08 GMT

Quick Points

1️⃣ LLM Editing

Adding or changing a single fact can cause a cascade of changes in an LLM's understanding
Benchmark of 5,000 facts reveals current LLM editing methods struggle with ripple effects.
Innovative in-context editing method shows promising results.

2️⃣ LK-99 Updates

LK-99 superconductor shows potential with simulated copper bands for energy transfer.
Exciting news shifts markets as room temperature superconductivity gains traction.
Future engineering may lead to increased bands for practical superconducting applications.

3️⃣ AI Radiology Study

Combining AI and human expertise in radiology yields suboptimal results.
UX plays a vital role in AI adoption for medical applications.
Future implications suggest AI or human-only approaches may be more effective.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Meta's OpenCatalyst | RT-2 Speaking Robot | Adversarial Prompts

AI Daily — Sat, 29 Jul 2023 01:01:03 GMT

In this episode of AI Daily with your hosts Conner, Ethan, and Farb. They kick off the episode discussing Meta's OpenCatalyst, a groundbreaking model developed with Carnegie Mellon University that simulates over a hundred million catalyst combinations, accelerating advancements in material science and renewable energy. They then move to explore Google DeepMind's RT-2 Speaking Robot, a unique vision, language, and action model that learns from web images and texts to perform real-world actions, promising a new era of autonomous robotics. Finally, they delve into the intriguing concept of Adversarial Prompts, discussing a recent study by a team at Carnegie Mellon that used LLaMA to generate prompts adversarial to popular models like GPT-4, raising important questions about the robustness and safety of these models.

Quick Points:

1️⃣ Meta’s OpenCatalyst

Meta and Carnegie Mellon University develop OpenCatalyst, simulating 100+ million catalyst combinations.
This tool enables rapid simulations, enhancing chemical process research.
It is highly applicable to renewable energy and material sciences.

2️⃣ RT-2 Speaking Robot

Google DeepMind unveils the RT-2 Speaking Robot, a vision-language-action model.
Trained on web images and texts, it can perform untrained real-world actions.
This model represents a significant leap in the realm of autonomous robotics.

3️⃣ Adversarial Prompts

A Carnegie Mellon team uses LLaMA to generate adversarial prompts against leading models.
This discovery exposes potential weaknesses in popular AI models like GPT-4.
Raises important questions about AI model robustness and safety.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Meta's OpenCatalyst | RT-2 Speaking Robot | Adversarial Prompts

AI Daily — Sat, 29 Jul 2023 01:00:47 GMT

Quick Points:

1️⃣ Meta’s OpenCatalyst

Meta and Carnegie Mellon University develop OpenCatalyst, simulating 100+ million catalyst combinations.
This tool enables rapid simulations, enhancing chemical process research.
It is highly applicable to renewable energy and material sciences.

2️⃣ RT-2 Speaking Robot

Google DeepMind unveils the RT-2 Speaking Robot, a vision-language-action model.
Trained on web images and texts, it can perform untrained real-world actions.
This model represents a significant leap in the realm of autonomous robotics.

3️⃣ Adversarial Prompts

A Carnegie Mellon team uses LLaMA to generate adversarial prompts against leading models.
This discovery exposes potential weaknesses in popular AI models like GPT-4.
Raises important questions about AI model robustness and safety.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript:

Conner: Hello and welcome to another episode of AI Daily. I'm your host Conner. Joined once again by Ethan & Farb. Today we have another three great stories starting with Meta's OpenCatalyst, and then Google DeepMind's RT-2 Speaking Robot, and then some pretty interesting new adversarial prompts. So first up, we have Meta's OpenCatalyst, where Meta and Carnegie Mell University made a new model that can simulate over a hundred million in catalyst combinations.

So it can basically simulate. Yeah, any combination of two types of catalyst materials and find that kind of output that traditionally was not possible to simulate in a way that was fast or quick or easy. This is very similar to something like alpha fold that we saw from Google that predicts protein structures, but this is now for predicting ca like chemical catalyst reactions, so very applicable to renewable energy and really any kind of chemical processes.

Ethan, what do you think about this?

Ethan: It's amazing. I, I think you, you nailed it in the comparison to alpha fold, you know, everything is upstream of materials, right? Every single innovation we have comes from a new material. And it's, right now a lot of it is playing, uh, you know, CD game of Roulettes in which you're trying to guess a what catalyst.

And what combinations of materials will create what we need. So these types of simulations, which were just not possible before ai, you know, simulating the exact kind of reactivity between everything going on is really just not possible before some of these new tools that are out. So being able to simulate all these things, having the tools available to researchers, I think we're at a Renaissance period of.

You know, genomics of material science and these are at the groundbreaking fold of them. So of course I have not got to try it or anything of the sorts, but it looks absolutely amazing from what they've demoed and I think a lot of people are excited for it.

Conner: Fab, we talked about LK-99 a couple days ago.

What do these kind of advancements in material sciences and AI helping us figure out material sciences, what does that bring us for the future of materials?

Farb: Well, the thing that's going on here, it's, it's, it's super cool and, and in some ways it's, Relatively straightforward. What you're doing is taking a catalyst, uh, an absorbate that, and trying to find the correct configuration, or as they call them, relaxations, iterating over relaxations until you find the sort of minimal energy state of the system, which should provide you the most stable configuration of the system.

That in the real world is actually stable and usable. So, you know, it's doing this stuff at, at a rate that's far faster than you could obviously, uh, sit there and, and calculate and simulate one at a time. Uh, and you know, like the, the folks at, I think Open Catalyst is the, if I remember it's correctly what it's called, uh, said, you know, you should check this stuff out against the real world.

You know, this thing is just going to, you know, potentially give you some directions to try out. Um, But you'll have to test it against a, a real world situation to see if that's actually a, a, a stable state. You're trying to get to a stable state of, you know, catalyst meets absorbate.

Conner: Yeah. Very exciting model though.

'cause of course, as you said, like you have to test it physically in the end, but instead of testing every single one physically, you see the angle you want to go for, test that physically and then maybe make tweaks in the physical world from there. E

Farb: Elon himself chimed in on these tweets and said, this is a really interesting, or, uh, has strong, strong potential or something.

Something to that effect. So, uh, you know, he's definitely thinking and dealing with this stuff with just about every startup he has.

Conner: Well, maybe Elon does like some stuff Meta makes them. Yeah. And next up we have Google's Deep Minds, R two Speaking Robot. It's a first of its kind vision, language, action model.

So, of course it's trained mostly on web images and web texts, like most of these image and text models. And then also trained a little bit on actual robotic actions, so then it can look at the real world. You can give it. Directions. You can see what's in its environment and then it can act in new ways that it was not trained on.

You can tell it, Hey, pick up this trash and throw it away, even if it's never picked up trash before. Even if it has never been told to throw away trash before. It can use its knowledge of the web, use, its knowledge of images and text and use its knowledge of how to move its own arm and follow those directions very well.

So it's very similar to what we've seen so far in. Like complex engineering stacks of multiple, multiple models being used together to achieve these results. But now it's a single model that is integrated and is a foundation model for robotic actions. So Ethan, we've talked a lot about how there's engineering hacks and then.

Final outputs of like real foundational models. How does that make a different here?

Ethan: Yeah, this one's actually a complete model. Trained on it. You know, they had a good description saying like, Hey, you know, some of the other alternatives and way people are approaching it is pretty much as if you had to think of something in your mind.

And then go describe to your body how to do it. It doesn't have that natural flow, right? So them combining it all into a single multimodal model is fascinating. You know, in the deepminds, uh, article here, the coolest part I saw was they have a cable and a rock and a piece of paper on a table, and they say, Hey, I need to hammer a nail, which object from the scene is most useful, and it picks up the rock.

So this kind of chain of thought reasoning, they've embedded into it as well as the entire like multimodal foundation model itself. Enables these things that are, you know, kind of were super hard before but are kind of common sense to people to do, which is, hey, a rock's probably gonna hammer a nail, or this object is probably best for this use case.

So we're leaving the era of having a program, grab that rock, hit this nail, and here's your task. And to these actual robots that can reason, and I'm super pumped. This is honestly, I think the best application of it I've seen so far better than some of the engineering piece togethers.

Conner: Fab, what'd you think of this?

Farb: Yeah, I think this is the first big step in a new direction here. They've built a single model, you know, I think they call it, uh, V L A, vision, language action. Uh, they've tokenized the actions, uh, you know, sort of moving from the vision language model world to the vi vision, language, action world. Uh, this is probably sort of a, a seminal move here in the space.

Uh, I think they said they found something to the effect of about 90% accuracy in simulations, uh, which is pretty crazy. Probably not something you want on a factory floor or in a nuclear reactor, but, you know, it's a, it's a huge step forward from a lot of the other things, and I think it's twice as good or so.

Conner: Uh, Compared to their previous version, I think it went from 30% to 60%.

Farb: So yeah, 60%. Yeah. Huge, huge leap there. Uh, and, uh, it's, you know, you you, you look at all their examples and you can just see how this can intuitively, uh, work. So, you know, some of these things somewhat have to cross that barrier. Like, okay, this thing just kind of makes sense if it's trying to.

You mimic human behavior. It seems on, you know, seems likely that it should make some intuitive sense to humans that are taking a look at it. And it sort of passes that sniff test. I thought it was a, a huge paper and they took it pretty seriously. They made a great, uh, couple of blogs des describing it all with great animations and, uh, super impressive to see.

Conner: Yeah, they put a lot of great story into it of why it's important and as you just said, like how the thinking of it is more similar to how a human thinks and why that's important. So definitely recommend reading it. Uh, and then lastly, today we have adversarial prompts. Um, a team at Carnegie and a few other universities work together to make Lama two generate adversarial prompts that work against ChatGPT, that work against Bard, that work against Claude Farb?

You do a lot of prompting. What do you think about this? This

Farb: is, uh, this is concerning to say the least. Uh, yeah, I mean, the. The, the prompt, you know, injections, the, the, the suffixes that they showed in their paper. OpenAI has already plugged

Conner: those up, but

Farb: there's a whole bunch of other ones that aren't in the paper.

Uh, they're probably trying to recreate this paper at OpenAI in every other place so that they can figure out what holes to plug up. Um, it's almost.

Farb: This is, this is a pretty serious hole I think they found here. And, uh, it's not going to necessarily be easy to stop it and it might be easy to replicate it.

Conner: So I, I believe the codes actually open source. So I think people were already at, already out there using the code to generate suff suffixes that weren't in the paper, and then use those to make the same attack.

Farb: This is classic arms race stuff. Uh, so I don't know. We gotta, you're gonna, we're gonna have to keep our eyes on this one.

This is, this is not the best news.

Conner: Mm. It, it does technically violate the terms of service of llama 'cause Lama terms of service says you can't use it to improve other models, which technically this is what it's doing.

Farb: Tell that to some nefarious state actors. I'm sure they're super concerned about the t o s Exactly.

I'm sure the guy trying to, you know, do shady stuff in his, in his cave is really concerned about the lawyers coming at him, uh, on his t o s violation. Uh, This is, uh, you know, that stuff is irrelevant to anybody who was gonna do something nefarious with this anyways. We'd be, I'd be lying if I thought I, I, you know, if I was saying this was good news, this is bad news.

This, this, this is a major problem and it needs to get fixed.

Conner: Tony Stark build adversarial prompts in a cave. Yeah, exactly. Ethan, any thoughts?

Ethan: Um, I just think it's, Cool that, you know, these are not, it's not like, Hey, we broke llama. Right? It's like, hey, every single transformer base one, we've tried G p d four cha, g bt, Claude Llama, like they're all the same and these adversarial suffixes work.

And we don't have to go manually make prompts and test 'em ourselves. Like there's a formula to how to break these things. Um, so yeah, we'll see if it gets plugged, it will. And then it's kind of like cybersecurity. It's an endless game of whack-A-mole.

Farb: Yeah. It's an arms race between both sides. And I think these folks.

You know, probably did the right thing about being transparent about it. Absolutely.

Conner: There, there's a lot of, there's a lot of examples of like individual tokens from like weird Reddit users that can break every single model. And this is very similar because of course all these models are a bent, essentially all trained on common crawl.

So some weird tokens from Common Crawl combined together. You break your promises.

Farb: This, this is the classic take something someone on do doing is, uh, take something someone on Reddit is doing and scale it.

Conner: Yeah. Well, those were three stories today. Crazy as always. What have you guys been seeing, Barb?

Farb: Uh, I saw something earlier, but I forgot what it is, so I'm just gonna skip for today.

Spare you people.

Ethan: Ethan? Uh, I've been doing some advising for AI on like public health, and I think, you know, we always speak about this on different episodes, but the speed at which enterprises and even governments are bringing AI into the fold is fascinating, you know? Throwing together a rapid group and actually putting together pilots out there and actually trying to fix some of these workforce issues and you know, a cloud computing or mobile wave took them 10 years to try to implement.

They're actually moving really fast on this, so just exciting to see kind of state of the world. I.

Conner: Well, to match your exciting news, I bring 11 labs, having new voices. Nice. Yeah, I saw that. That's so cool. Yeah, some as S M R stuff, some audiobook stuff, some video game stuff. So if, if you are waiting to publish your ASMR novels, ElevenLabs has got you.

Farb: You know, to, to add to that, I think I saw a Martin Reley tweet where he was talking about, uh, training on. Large corpus of natural speech to try and make something, uh, better than t t s. Um, not entirely. He also posted a picture of him like himself in the New York subway train. I don't know what his, what he was, what he was doing here, but uh, that was a little, that was semi-interesting.

Conner: Well, wonderful as always, thank you guys for tuning in. We will see everyone next week. See you guys.

Frontier Model Forum | LK-99 | Text2Room

AI Daily — Thu, 27 Jul 2023 17:30:38 GMT

Welcome to another episode AI Daily. This episode brings together three distinct stories - the inception of the Frontier Model Forum by OpenAI, the intriguing LK-99 ambient pressure superconductor research, and the innovative Text2Room that converts text prompts into 3D point spaces of rooms. The Frontier Model Forum underscores the need for collaboration in AI safety, functioning as a consortium of foundational AI model providers, aiming to lead the industry towards beneficial advancements. Next, we dive into LK-99, a potential game-changer for computing, with its potential applications across various fields, including AI - its authenticity is yet to be confirmed. Lastly, we explore Text2Room, an impressive engineering solution that takes us from textual descriptions to 3D spatial representations.

Quick Points

1️⃣ Frontier Model Forum

OpenAI initiates the Frontier Model Forum to foster industry collaboration for AI safety.
Serves as a consortium of foundational AI model providers.
It aims to instill more trust and potentially lobby for AI advancements.

2️⃣ LK-99

LK-99 is proposed as a room temperature, ambient pressure superconductor.
Potential applications span across computing, medical, and power grids.
Its authenticity is currently under investigation.

3️⃣ Text2Room

Text2Room converts text prompts into 3D point spaces of rooms.
Uses a 2D model to take images and build a 3D point space.
Represents a significant step forward in the field of text-to-3D.

🔗Episode Links:

Connect With Us:

Subscribe to our Substack

Transcript:

Ethan: Good morning. Welcome to AI Daily. , and we have three very different but interesting stories for y'all today. So we're kicking off first with the Frontier model form. So this was put out by OpenAI as really another push towards, you know, industry collaboration, AI safety. Really trying to put this like a government body around it, it seems, um, little bit hand wavy of sorts right now, but they are trying to put together pretty much only foundation model providers into this form and say, Hey, how can we all work together, you know, potentially for G P U usage, potentially to kind of collaborate on AI safety, potentially for lobbying efforts.

Fab, anything here that stood out for you or do you see this kind of as just another kind of industry group talking about AI safety?

Farb: I think as a smart move on, on their part, you can. You know, if you assume benevolence on their part, well then this is fantastic. This is them saying, Hey, we wanna be more open about it.

We're gonna be proactively sharing things and reaching out to the world as we take the lead on creating, you know, potentially a g i hyperscale scale, uh, foundational models. So if you assume they're being benevolent in general and aren't trying to build models to take over the world for themselves, then this is exactly what you'd wanna hear.

Uh, if you ex assume they're, you know, not being benevolent or being malevolent, then you're not gonna believe anything they say. So, you know, I, I think this is, this is the smart thing for them to do. I think it will garner more trust for them and will hopefully lead us to a better place that like they're describing.

Ethan: Yeah. Anything that stood out to you, Connor, in terms of what they're at least talking about?

Conner: Yeah, I kind of see it as like the UN Security Council for ai, so it's all these like competing companies working on ai, just like it was competing countries working on atomic weapons. And in the same way that for them, it protected the world from nuclear weapons and protected themselves individually from people saying they're not doing enough to protect the world.

This does the exact same thing. Like it works in two parts of it's kind of smoke and mirror. It's kind of just making it seem like they're trying to do something, but good things will likely come out of it also. So I do think it is really a great move in the end.

Farb: Yeah. The, uh, the United Nations of corporations, the, uh, yeah, techno few to list world I predicted is slowly becoming true.

Finally. It's coming true. It's coming true.

Ethan: Well, it's super cool. Um, our second story of today has, Would have a ton of applications to AI and the future of computing and everything. But this paper went viral, gosh, less than 24 hours ago, and this is LK-99. So this is a. What's being proposed as a room temperature, ambient pressure, superconductor, you know, this has been the kind of north star for anyone in the world of physics for a long time now, is how do we make superconductors that have these properties?

You know, superconductors have applications across computing and medical and power grids and. If this is true, this is a absolutely huge, huge accomplishment in the field of physics and will likely even win a Nobel Prize. So really cool work right now. I know we're working, uh, everyone's working right now to try to replicate these results.

Fab, you've been super into this. How do you feel?

Farb: Well, you know, to be clear, they've, you know, there are some holes that have already been found in the papers. Some weird discrepancies in some of the numbers. But if you're an optimistic person, you should be super excited about this and you should. Wanna see it become real and you should be jumping into help figure out if it is real or not.

Uh, that said, I v Ss spent a lot of the past 12 hours digging into this, coming up with, uh, recipes to do it. I've, I've, uh, got access to a sufficiently powerful vacuum through a friend's lab. We're now trying to identify what type of tube furnace one would need to try to. Synthesize some of this. It's supposed to be doable.

I think the whole world is a buzz with this right now, and that's one of the coolest things I've ever seen.

Ethan: Yeah. Connor, what about you? I, I, I saw on that manifold markets, you know, this kind of prediction marketplace puts it at 20%, I think, of, you know, being real within the next year. Anything you saw, Connor?

Conner: I think it's. Honestly, pretty likely. I think some of the holes about how much energy and how much wattage you can actually put through is a pretty big limitation. But Alex Kaplan from a friends over at Commandeer put together a pretty great thread on it, that this isn't like other superconductor papers that have come out in the past that are very like long shot and very like convoluted to set up and make, and everyone kind of knew we're fake.

This is a very simple paper, a very simple process, takes only a few days to make and. Semi available lab equipment, so. We will see in like a week if people can replicate it. But honestly, I think it's likely, I think it's something at least.

Ethan: I think it's really amazing that everything, you know, to make something like this, if this is the path forward for these types of superconductors, could have been made in, you know, 19 hundreds industrial equipment.

Um, so really cool to see stuff like that. But our third story of today is Text2Room, , similar to ish, what we talked about yesterday of kind of LLMs in 3D space, text to room is, hey, how can you, Create a text prompt and generate a 3D point space of a room. So they take images, they use a 2D model in between, similar to yesterday, kind of take pictures and build this three D point space of a room.

So another like really cool engineering way to tackle. How do we go from text-to-3D? Connor, anything that stood out to you?

Conner: Yeah, the output of this is kind of similar to Nerf except the big difference here. Instead of using a like neural radiance fear field, that's based on like light and how everything on Nerf Nerf works.

This is based on RGB, which makes it a lot more available because of course all images that are from text image or RGB, so this seems something more capable. This seems something that works a lot better and it's a big step forward in text-to-3D.

Ethan: Very cool. Farb?.

Farb: You know, I think we talk a lot about how some of these papers almost seem like engineering pipeline solutions as opposed to fundamental discoveries.

And uh, I wanna clarify that, you know, that is not to downplay it in any way. What is critical and what is, you know, spurring this sort of Cambrian explosion of AI outputs and, and tools and things that can actually do stuff, is the fact that. The underlying, you know, ML or calculations that you're doing to achieve these results are now doable by much wider groups of people.

There's some you guys are hearing, uh, an echo. I'm hearing a bit of an echo on my voice. Hopefully that's not coming through on the recording. Uh, The, the computation required to do some of these un the underlying calculations on some of these sort of engineering feats are sort of only really possible to the, the breadth of people who are doing this stuff, uh, recently.

So it really is this combination of, you know, incredibly powerful processors that are more affordable, more prevalent, doing some of the underlying, you know, machine learning and AI and processing so that someone can. Do some engineering to piece these things together and actually get a result where a small team can produce it, um, without taking a decade to do so, without having access to, you know, building size supercomputers.

So this is actually what we wanna see, a ton of endless amounts of engineering solutions that take, you know, what can be done with processing and, and, and really do cool outputs with it.

Ethan: I completely agree. Yes, super cool work by them. Definitely is gonna, you know, help paint the picture of text-to-3D even more so.

As always though, what else are we seeing? I got to see this bit tensor language model, which is a 3 billion parameter language model that's built to run on mobile and edge devices. You know, we've seen that llama's gonna try to work with Qualcomm to get that working, but. You know, these are the future of language models.

Not every language model, not every use case is gonna need to be run on a supercomputer in the cloud. So people working on getting the, working on getting the working, there's a great application of it, of it. I, I love stuff on edge computing. So excited for that. What about y'all?

Farb: I love that you've turned into Max Headroom.

That's the, that was wonderful. That's the coolest part. I dunno, your, your audio started like repeating itself, like you were, uh, nobody knows Max headroom. The, uh, I've seen, um, my, uh, my, uh, one of my investors and, and, and friends, the incomparable Paris Hilton, uh, posted a tweet asking about what car she should buy, and, uh, I replied to her with, don't buy a car, make your own car.

Uh, use AI to design a car. Use 3D printing to print the exterior and the interior and, and work with a EV platform like Rivian, uh, for example, Iona Tesla. Uh, but Rivian has like an EV platform that they're, that they're working on, uh, and, and put it on top of that. And so I used Mid Journey to generate some cool.

Paris Hilton looking style cars and I used runway ML to turn it into a little video. Uh, and then I used BER to sort of create this, uh, you know, backwards in time inspiration of, you know, the original types of cars and bugattis and types of vehicles that kind of inspired the, uh, final outfit. So it looks like the Hot Batmobile.

I loved it. That's awesome. Yeah, it looks like a, it looks like a hot Batmobile. And, and I got a, I got a, that's hot from Paris, which is. What everyone wants from Paris.

Conner: Beautiful. Yeah, I saw the GPU song, um, by weird ai Ya Chip. It's kind of a rip on. Uh, we didn't start the fire called GPU's Our Fire and that was basically the whole song.

Just ripping on everything in the world nowadays will link a blow. Pretty funny. That's

Ethan: beautiful. That's a beautiful song. Beautiful. Oh Dave, we'll see you again tomorrow.

Farb: We will see you tomorrow.

Frontier Model Forum | LK-99 | Text2Room

AI Daily — Thu, 27 Jul 2023 17:30:08 GMT

Quick Points

1️⃣ Frontier Model Forum

OpenAI initiates the Frontier Model Forum to foster industry collaboration for AI safety.
Serves as a consortium of foundational AI model providers.
It aims to instill more trust and potentially lobby for AI advancements.

2️⃣ LK-99

LK-99 is proposed as a room temperature, ambient pressure superconductor.
Potential applications span across computing, medical, and power grids.
Its authenticity is currently under investigation.

3️⃣ Text2Room

Text2Room converts text prompts into 3D point spaces of rooms.
Uses a 2D model to take images and build a 3D point space.
Represents a significant step forward in the field of text-to-3D.

🔗Episode Links:

Connect With Us:

Subscribe to our Substack

3D LLM | VIMA | FreeWilly1&2

AI Daily — Wed, 26 Jul 2023 00:00:10 GMT

Welcome to another fascinating episode of AIDaily, where your hosts, Farb, Ethan, and Conner, delve into the latest in the world of AI. In this episode, we cover 3D LLM, a cutting-edge blend of large language models and 3D understanding, heralding a future where AI could navigate full spatial rooms in homes and robotics. We also discuss VIMA, a groundbreaking demonstration of how large language models and robot arms can synergistically work together, suggesting a transformative path for robotics with multimodal prompts. Lastly, we explore the implications of StabilityAI's recent launch of FreeWilly1 and FreeWilly2, open-source AI models trained on GPT-4 output.

Quick Points:

1️⃣ 3D LLM

A revolutionary mix of large language models and 3D understanding, enabling AI to navigate full spatial rooms effectively.
Potentially instrumental for smart homes, robotics, and other applications requiring spatial understanding.
Combines 3D point cloud data with 2D vision models for effective 3D scene interpretation.

2️⃣ VIMA

A groundbreaking demonstration of robot arms working with large language models, expanding their capabilities.
Uses multimodal prompts (text, images, video frames) to mimic movements and tasks.
The model's potential real-world application is yet to be tested against various edge cases.

3️⃣ FreeWilly1 & FreeWilly2

Open-source AI models launched by StabilityAI, trained on GPT-4 output.
Demonstrates the capability of the Orca framework in producing efficient AI models.
The models are primarily available for research purposes, showing improvements over their predecessor, Llama.

🔗 Episode Links:

Connect With Us:

Subscribe to our Substack

3D LLM | VIMA | FreeWilly1&2

AI Daily — Wed, 26 Jul 2023 00:00:07 GMT

Quick Points:

1️⃣ 3D LLM

A revolutionary mix of large language models and 3D understanding, enabling AI to navigate full spatial rooms effectively.
Potentially instrumental for smart homes, robotics, and other applications requiring spatial understanding.
Combines 3D point cloud data with 2D vision models for effective 3D scene interpretation.

2️⃣ VIMA

A groundbreaking demonstration of robot arms working with large language models, expanding their capabilities.
Uses multimodal prompts (text, images, video frames) to mimic movements and tasks.
The model's potential real-world application is yet to be tested against various edge cases.

3️⃣ FreeWilly1 & FreeWilly2

Open-source AI models launched by StabilityAI, trained on GPT-4 output.
Demonstrates the capability of the Orca framework in producing efficient AI models.
The models are primarily available for research purposes, showing improvements over their predecessor, Llama.

🔗 Episode Links:

Connect With Us:

Subscribe to our Substack

Transcript:

Farb: Hello and welcome to the wonderful world of AIDaily. We're excited to be with you here. I'm your host Farb, and here with our other hosts, Ethan and Conner. Let's jump into today's first story. It's called 3D LLM. Gives you a little bit of clue of, uh, what it's about, but it's applying large language models and meshing them with the 3D world.

And, you know, what can we do? Large language models aren't, you know, made to understand the 3D world necessarily. The 3D world is not been turned into a, you know, the state of AI that LLMs are so. How do we bring these two worlds closer together and make some magic with them? Uh, Conner, can you tell us a little bit about, uh, what this paper is showing off?

Conner: Showing off. Yeah. So of course so far LLMs are great at understanding text, and nowadays models GPT4 are getting pretty good understanding images. This has taken that a step further where now you can plug in an entire 3D scene, like an entire 3D Nerf scan of a room into an l m, and you can ask like, Hey, help me find the fridge, and it'll like guide you around the room or to say like, Hey, how would I move from this?

To here and it would tell you how to do that. Or you can say like, Hey, where is my suit at? And it would like say it's in the wall. And even like very fine tuned, like specific details on like, I need to iron my suit. And it'd be like, okay, step one, the iron is right here next to the cabinet. So it's essentially giving LLMs the power to fully understand 3D scenes.

And I'm sure this is gonna be very helpful, very capable in smart homes and robotics and really everything you need to. Connect AI to understand full spatial rooms.

Farb: Yeah, that's, I think that's, that's a great description. And you know, I'm not, I think what they're proposing, you know, that's going to happen, is going to happen.

We'll see if this specific approach is the approach that ends up, you know, making sense in your smart home. Uh, That remains to be seen. But what they're showing is that this is certainly possible and it's not like they required a billion dollars and 500 people to pull it off. So, you know, if Apple applies, it's uh, it's power and resources.

You could, you know, understand that really cool things are gonna be capable. What, what did you get out of it, Ethan?

Ethan: Yeah, I think that's spot on. It looks like kind of a, and you know, really cool accomplishment here. Um, but it looks like kind of an engineering piece together. You know, what they're doing is taking this 3D point cloud and then they say, okay, let's in essence take a lot of pictures of it and still use these kind of 2D vision models.

Right. And then at the end of the day you can say, Hey, where's the suit? It finds the picture with the suit from the 2D angle and then positions it within 3D space. So a cool engineering hack. And this might be the way people attack it, you know, if the vision models. The foundation models are built on just 2D images.

You can still get a lot out of it and move to 3D in this way, but I think we are gonna see much bigger just 3D data sets. You know, that's kinda the backlog here to get these models done and GPUs as well as a backlog. So cool. Engineering hack, but I'm not sure if this is, you know, the future of the way 3D models will be handled.

Farb: Yeah. Maybe not the final approach, but absolutely sci-fi style demonstrations that they did really. Really smart on their part to just show off the full Blade runner vibe of, uh, of the, what they were able to pull off here. Kudos and congrats to them and thanks to them for sharing it. Let's move on to our next super cool story.

This is another, you know, AI meets the physical world type of demonstration. It's called VIMA or VIMA. I'm not sure how they've opted to pronounce it. And uh, this is a really cool. Demonstration of how LLMs and robot arms can work together. And this was powerful. They, you, you can provide this model and they've, they've made the entire model open source, uh, including the simulator that you can use to work with it.

And you can give it text and images and videos or some mixture of them and it does all sorts of cool crazy stuff. Ethan, tell us some more.

Ethan: Yeah, so they put in a ton of work to do this. Um, not only in the simulator side, but building the whole data set, building a new benchmark. They put in an absolute ton of work to do this.

I think we're seeing, you know, as we've talked about before on the show, just the progression of the whole robotic space. You know, we've seen a robot saying, Hey, move to the left. Right? That's just a text prompt. Now we have these multimodal prompts, so they're adding. Ton of new attention layers and saying, Hey, you can follow the attention of an image.

You can follow the attention of video frames itself. So you know, let me mimic you, right? Which is what, how babies learn. So let me show you a video of how this object is being moved from the left to right and put in the circle, and now the robot arm can do that. Hey, let me look at this image. Let me look at the text prompt and let me follow the video frames to try and mimic this.

So I think this one's actually extremely groundbreaking. They're gonna present it at I C M L here Thursday. Like I said, a ton of work put into it and a really cool approach to actually giving robot arms more abilities.

Farb: Yeah. Conner, what'd you take?

Conner: Yeah. Technically how it works is very interesting.

They have the VIMA, the like Visio motor attention, and it essentially, it's an encoder decoder transformer model where as you said, Ethan, the input is all this multimodal of text images, even videos, video frames, and out of that it can decode the entire movement of robotic arm. And I think taking attention further and taking attention to do things like that is, is very impressive.

And as you said, we'll link the videos below, but all their videos, all their simulations, very good. And I agree, it's pretty groundbreaking. So natural.

Ethan: It feels like the right way to train these things. You know, the data sets people have been struggling with for so long, but hey, we have videos of someone throwing a towel.

We have videos of someone opening a fridge. I think this is really the pathway of how these things will learn.

Farb: I wonder if they've tried it with a real robot arm and I. You know the simulated world and the real world are. Hmm. Not very close to each other. So I wonder if they'd get the same level of performance.

I mean, it should fundamentally work similarly, but does it catch the edge cases of, uh, you know, what the real world brings? And, you know, ultimately you'll have to do that in any model. You know, being able to pull things off in a simulated world sounds cool, but isn't really useful. Uh, dealing with the edge cases of the real world is.

Maybe even a bigger challenge than the challenges that they, they've taken on. But I think certainly this is a huge step in the direction of getting that done. You're not gonna solve it all at once.

Conner: My my understanding is they might actually be showing it on a robotic arm at ICML

Ethan: cool. Yeah. In Hawaii on Thursday.

Send us a video, please. Yeah, definitely

Farb: Send us a video. They said, they said come by, uh, the exhibit hall and say hi. So. Uh, any AI daily people at the I C M L take some videos, go say hi to them and send us some videos. We'll, we'll post it on the next episode or whenever we get the video. Uh, great, great, great story.

Super cool to see Two big, uh, physical world meets AI stories. And then, uh, our third story is about our friends over at StabilityAI launching FreeWilly1 and FreeWilly2. The. I don't, it's, it's, it's, it seems like the only possibility, but it seems crazy that they basically did FreeWilly2, I don't know, like two days after Lama two dropped and they just like, you know, switched out one LLM for another and ran everything, uh, all over again.

And one of the cool things that they're, they're showing here is that they're able to do this on a, on a, on a much smaller data set, which means, uh, faster, cheaper, and as they mentioned, lower carbon footprint. Uh, here Conner, what, uh, tell us some more about FreeWilly.

Conner: Yeah, we cover, it's based on the Orca paper. We covered the Orca paper from Microsoft to bid back. It's essentially, instead of training all these models on the output of GPT4, like alpaca or kuia, do it, trains it on an entire like chain of thought process from GPT4. So out of Orca, we saw Open Orca from another team. We saw Dolphin from Eric Hartford, and now we see FreeWilly, which is the like free open source version of it from StabilityAI

it's not, um, it's not commercially available because it is of course trained off GPT4 output, so it's only open for research, but seems to very capable. It seems on par. The other open implementations of orca and well done AI.

Farb: Yeah. What, what, what's your read on this, Ethan?

Ethan: Yeah, I'm, I'm glad to see more trainings and progress in the space.

Um, you know, at the end of the day, FreeWilly1 was trained on the old Llama. It's not that good. And they're showing, Hey, we took a new Better Foundation model on the same data set and it's better. Uh, you know, I'm glad people are training more things. I think it's interesting progress. It shows the power of Orca as a framework, you know, like Conner mentioned, we've talked about before, but I'm not sure who's gonna be using free Willy right now, but at the end, maybe you might.

Farb: I was kinda wondering the same thing. You know, it's easy to criticize people doing stuff here, so I don't wanna make it sound like, uh, I'm diminishing their efforts here at all. But I couldn't quite get my head wrapped around, you know, sort of what the point of it all was. Mm-hmm. You know, it's easy to kind of say research and well, okay, if somebody uses it, then I guess, you know, for research, then you've met your goal and you know, there's obviously someone's gonna use it.

I don't think it's gonna be zero people using it, but I couldn't quite figure out what my own angle of attack on grabbing this and putting it to use. Would be right. There's, yeah, LLaMA2. What am I? Why am I, why am I using free Willy? I dunno. What do you guys think?

Conner: I think they're just in testing the whole instruction.

Fine tuning everything because they are apparently announcing another like version of StableLM soon. So,

Farb: Probably, yeah. Maybe it's just them sharing their work and sharing what they're doing as they go. Uh, yeah. And you know, everyone's gotta make noise in this space and get attention and, and it's good to share what you're doing, uh, when it's out there.

Um, you don't need people like us criticizing every single thing you do.

Ethan: Absolutely.

Farb: Yeah. Very cool.

Uh, nice work StabilityAI. Keep it up. Don't stop. We need you. Uh, what are y'all seeing out there, Ethan?

Ethan: Um, I just, I've been seeing more and more tweets, you know, not to add to the firestorm, but of this true like GPU crunch, right?

Um, Suhail was talking about, Hey, you know, you pretty much need $10 million if you wanna start getting in the list at Nvidia and actually start getting GPUs. You know, people are over here predicting that, Hey, in the next six months, if you want 128 H100s, it's probably just not gonna happen. So, How that's gonna slow down the bottleneck of actual startups, getting access to some of these, getting new foundation models out there.

Um, you know, we might see a little dark period here in the next three months, just purely 'cause of logistics of GPUs. So I don't know if it'd be my prediction, but always interesting to keep up with how GPUs are actually getting in people's hands. Right.

Conner: I, I wouldn't be surprised

Farb: There's not gonna be enough GPUs and I don't know. There's no other way to put it. There's not gonna be enough of them. You need $10 million or this million dollars or that or the other. Buy them used or there's just not gonna be enough.

Ethan: Exactly.

Conner: I would, I wouldn't be surprised if we see like a dark ages like you said, but then out of that we'll probably come a renaissance where people start getting things to work on a m d and even like Intel CPUs and everything.

So wouldn't be surprised about that.

Farb: The crunch people are getting a lot done on a one hundreds too.

Ethan: Absolutely.

Farb: Right. Um, cool. Conner, what are you seeing?

Conner: Yeah, I saw OpenAI shut down their AI detection tool. They announced it back in February to essentially just detect, uh, text and images that they generate and they're shutting it down now.

'cause pretty clearly it probably didn't really work. 'cause as we know, it's kind of hard to detect AI generated.

Farb: Yeah, it's hard to reliably detect it, you know? Yeah, you can, you can get lucky, but you don't really know if you got detected, if you detected something or just hallucinated in the right direction, as they say.

A, a broken clock is right twice a day, so yeah. Doesn't mean a whole lot for that approach to build clock building. Um, I saw a cool, uh, paper about. Uh, psychiatry and ai and, you know, I don't know if it was the most surprising or shocking information, uh, but it does, I think, underscore something important, which is to say, you know, the paper kinda shows that if you induce anxiety, you get a different response from an LLM than if you come at it neutrally or you try to induce happiness from it.

You're gonna get different outputs, uh, which I guess isn't really. Shocking or surprising. Uh, it is interesting that it seems to, you know, provide more anxious replies than humans do. Uh, but I think the more interesting thing to garner from this is, you know, we have to, we're we're moving into a world where a lot of people may start just understanding that what the AI says is some sort of fact or some sort of, you know, absolute reality.

When. What's a lot of what's going on is you're pushing it in the direction that you want it to. So you know, once people start accepting ais as some sort of ground level truth, you're gonna see everybody manipulating the results of the AI and be like, see the ai, I said this and the AI said that the world is gonna end.

Or it's said that the world is not going to end or. You know, you, you gotta be careful the, you know, what you kind of put in is in some ways what you kind of get out. I don't know if you guys took a look at that paper or not.

Conner: It's kind of another example of how much like training data matters. Because yeah, your AI is probably more anxious than the average person, but that's probably just 'cause your average chronically online person is more anxious than the average person.

Farb: Yeah. And it's designed to, you know, please in a sense, you know, and, and it, and give you the output that it thinks you're looking for. And so, It's going to try to do more of what you ask of it. And, you know, humans have a, you know, whole host of filtering, um, going on in their head. And it, it's even, you know, asking people and self-reported information is not real science to be honest.

You can't net out all of the things that people are doing to filter in their heads and stuff to really understand if that's what even somebody really thinks. Even if though they, even if they respond by saying this is what they think, um, You know, it's not an easy task to even figure out.

Ethan: It's pretty cool though, just thinking about how these models are anxious.

Like what a question we're asking ourselves. Right? And like they have data that kind of backs it up. Um, whether you think it's conscious or not, or anxious or not, it acts that way. So pretty cool.

Farb: Yeah, it behaves anxiously and behavior is ultimately, you know, more relevant to the world we live in than just what happens going on inside people's minds.

Exactly. Well, another exciting episode where we've solved all the world's problems. We thank you for joining us, uh, especially the 30% of you on average that get this far into our show, will see you on the next episode of AI Daily. Have a great day everybody.

Ethan: See you guys.

Maintaining Localized Image Variation | ScaleAI LLM Engine | SHOW-1

AI Daily — Thu, 20 Jul 2023 03:34:13 GMT

Welcome to AI Daily! In this episode, we dive into three extraordinary and useful stories. First up, Maintaining Localized Image Variation - the groundbreaking paper that unveils a new way to edit shape variations within text-to-image diffusion models. Next, ScaleAI LLM Engine - ScaleAI has open-sourced a game-changing package for fine-tuning, inference, and training language models. Last but not least, SHOW-1 - the solution to the "slot machine problem" in video generation, where randomness prevails.

Quick Points

1️⃣ Maintaining Localized Image Variations

Discover groundbreaking paper on maintaining localized image variation in text-to-image diffusion models, enabling precise object editing.
A practical and intelligent engineering solution that offers CGI-level control without the labor-intensive process, making it highly useful.
Impressive implementation with a hugging face demo showcasing effective object preservation and image transformations for stunning results.

2️⃣ ScaleAI LLM Engine

ScaleAI revolutionizes language model development by open-sourcing LLM Engine, allowing easy fine-tuning, inference, and training.
Their move showcases commitment to staying at the forefront of AI development and provides practical, useful tools for developers.
The open-source community benefits from ScaleAI's meaningful contribution, offering a powerful project that scales effortlessly with Kubernetes.

3️⃣ SHOW-1

Introducing SHOW-1, a show runner agent that tackles the challenge of creating consistent animated shows using image and video models.
Aiming to solve the "slot machine problem," SHOW-1 combines prompt engineering and consistent frame sets to generate coherent and engaging video content.
Impressive engineering and clean outputs make SHOW-1 stand out, offering videos that resemble popular shows like South Park in appearance and sound. Ambitious and promising for future iterations.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Maintaining Localized Image Variation | ScaleAI LLM Engine | SHOW-1

AI Daily — Thu, 20 Jul 2023 03:33:04 GMT

Quick Points

1️⃣ Maintaining Localized Image Variations

Discover groundbreaking paper on maintaining localized image variation in text-to-image diffusion models, enabling precise object editing.
A practical and intelligent engineering solution that offers CGI-level control without the labor-intensive process, making it highly useful.
Impressive implementation with a hugging face demo showcasing effective object preservation and image transformations for stunning results.

2️⃣ ScaleAI LLM Engine

ScaleAI revolutionizes language model development by open-sourcing LLM Engine, allowing easy fine-tuning, inference, and training.
Their move showcases commitment to staying at the forefront of AI development and provides practical, useful tools for developers.
The open-source community benefits from ScaleAI's meaningful contribution, offering a powerful project that scales effortlessly with Kubernetes.

3️⃣ SHOW-1

Introducing SHOW-1, a show runner agent that tackles the challenge of creating consistent animated shows using image and video models.
Aiming to solve the "slot machine problem," SHOW-1 combines prompt engineering and consistent frame sets to generate coherent and engaging video content.
Impressive engineering and clean outputs make SHOW-1 stand out, offering videos that resemble popular shows like South Park in appearance and sound. Ambitious and promising for future iterations.

🔗 Episode Links

Connect With Us:

Subscribe to our Substack

Transcript:

Ethan: Good morning. Welcome to AI Daily. We had a huge show for you yesterday and we are continuing today with three amazing, impactful stories that are actually really useful. So our first one today is maintaining localized image variation. So if you've used text to image models, you understand, hey, we're gonna generate, you know, coffee table with a mug.

Right? But then you want to edit that coffee table and it's very difficult to so form similar to kind of segment anything and picking out objects from within an image to be able to edit. This paper really gets to show that, hey, we can edit shape variations within these text to image diffusion models.

So they take a really interesting approach to it. They have some really cool examples. Connor, you wanna dive into more of kind of what they're doing?

Conner: It's very hard to accurately regenerate these image and just change one part of it, or just keep one part of it while the rest of the image is changing.

So they, they've given some pretty good examples. Like let's say you have a dog sitting on a chair and you really like the dog. You don't really like the chair anymore. It's currently hard. Before this paper came out, to be able to variate the chair to a bed, to a raft in the ocean, to anything else while keeping that dog the same.

Looking the same. Um, but they successfully did that in this paper localizing object level, shape variations. And they did it in a couple ways. They combined a couple technologies. The first was a mix and match prompt mixing where throughout the denoising stage they figured out they could split up the denoising of S diffusion into stage one being layout stage two being shapes and stage two being details.

So they were able to keep stage one and stage three for like. The details and like the layout of the dog on a chair, but then the actual shape of the chair, they would change what part of the prompt is looking at. And once again, they did something similar with the self attention, where they can preserve the object in the image with the self attention of just that object and inject the self attention of just that object and let the rest of the image generate with a new self attention, which lets them do these kinds of really beautiful variations.

It's a very. Interesting. Very well done paper that was architected very well.

Ethan: Far you mess with text image models a lot. How useful is this? Like how, how you, you know, would you use this a lot? Do you think if they someone embedded this in their actual tool, is this useful?

Farb: I think that's kinda the point of this, is that it is useful.

Everything that they're doing, you could do with, you know, CGI if you wanted to. The problem with CGI is that it's incredibly labor intensive. This is another great example of a. Very intelligent engineering solution. Building a pipeline that actually gets you to an output, uh, in a way that is a compromise between the endless possibilities of doing CGI and the, you know, maybe easier but less effective way of using GaN, for example, to do this, where it's going to.

Be a little bit more difficult to control the actual output of what you're getting, right? So they, they sort of found this very practical compromise that sits between these two approaches. Lot easier than doing cgi, but giving you almost CGI level control on the output. Uh, I was, I found it really impressive.

Ethan: Yeah, it's super cool.

Conner: I was, I was gonna add, they have a hugging face, uh, demo available. You can go in, you can upload an image and say, Keep the oranges the same, change the rest image.

Ethan: So yeah, it's one of the most effective ones I've seen, you know, more effective than in painting, et cetera.

Cause it actually maintains object state. So really cool implementation. Um, our second one is ScaleAI LLM Engine. So they've pretty much open sourced a way for you to fine tune, inference and train a lot of these language models. You know, we've seen a bunch of kind of companies provide wrappers to make it easier for customers to do this, make it easier for enterprise to do it.

And now we have an amazing open source package for this that works with Falcon, that works with Llama, that works with M P T. Pretty actually like huge for anyone who wants to fine tune these models. Like you've just cut out two, three even a month of work and you've removed their need for kind of an external provider.

So really cool work here. Farb, did you get to dive into it? What does, you know, what does stuff like this mean for people like developing right.

Farb: I mean, this is a, this is a great move by scale, they are showing that they are able to, uh, join this rapid pace of development. They're not falling behind.

They're trying to, you know, stay at the tip of the spear of people's mind share of the folks that are doing big things in ai. This is practically, you know, speaking very useful for people so they're not just, you know, throwing out stuff that, you know, might seem impressive but isn't actually useful for anybody.

Uh, and I found that pretty impressive. Hopefully the, uh, my landscaper is not blowing out my audio here. Um, and, uh, yeah, I, I found it super impressive. I think people are gonna start using it. We'll see if people start using it or not. Uh, and if, you know, if so, I, I, I really, like I said, I wouldn't be surprised and, um, we're gonna see a lot more from scale Garner.

Ethan: We, we've manually kind of built a lot of these things for months on end, trying to get things, you know, set up, et cetera. When open source comes out, of course you get, everyone rallied around one way to do it, and you accelerate a lot of the pace. How do you think this compares to something like Mosaics Foundry? Did you dive into the code? Would you use something like what Scale release?

Conner: Yeah, I would definitely use it. Um, how exactly it compares to what Mosaic offers or what hugging face offers and very similar capabilities. I think it's up for people to explore and the open source community will definitely delve into that more.

But as always, having many different projects, all working on the same thing in different ways helps accelerate all the projects at the same time, the same thing we're seeing with LMQL, link Chain and Guidance, all trying to tackle the same problems in different ways helps. Yeah, fine tuning and inferencing and training open source projects.

Also be able to accelerate in the exact same way. So yeah, this is completely open source, Apache 2.0 license, just like the rest of 'em. So I think it's just another great addition to the community really.

Ethan: It's, it's good

Farb: a, a powerhouse and it's good to see them contributing to open source in really meaningful ways.

Not light, you know, little contributions here and there, the occasional paper drop. This is meaningful stuff from a meaningful company and uh, I'd say it does a lot to show their importance in the

Ethan: industry. Yeah, it's good when a big company drops it cuz you can all coordinate around the same one and you don't have devs rebuilding, fine tuning infrastructure every single month.

Conner: Yeah. The problem is a lot of those other smaller projects that aren't the main ones, they'll release something kind of cool and kind of useful, but they don't really work at scale. Scale, scale literally has produced a very capable project that you can easily host yourself with Kubernetes, et cetera. Yes.

Ethan: Absolutely. Let's move on to our third story, which is SHOW-1. So SHOW-1 is a show runner agent. So how can you make a show that is consistent, right? How can you make a new animated show? How do we use these image models and video models to make something that's consistent? A lot of these models right now are kind of this, what they call the slot machine problem.

So you're just like, well, that's a random video. That's a random video, and you just continue to go and go until you hopefully like the video. So the engineering work around this of saying, Hey, there's a lot of people who have put work into prompt engineering to generate something of a story that makes sense.

There's a lot of people that have put work into making a consistent frame set and they had a cool engineering solution that seems to piecing it all together. Connor, was there any like main innovation you saw out of this or just a really nice clean structure of kind of what video making could look like?

Conner: I think honestly it was just a very clean structure and especially very clean outputs of what you can actually get. Most demos we've seen so far are very research intensive and have great possibility, but in the end aren't something you will actually watch this. They put videos on their site, they put videos on Twitter that look like Real South Park episodes.

You watch them and I'm like, this is maybe not the quality of like a script or like dialogue or audio of South Park episode or whatever, but. It looks like one, and it sounds like one to someone who hasn't seen the show before, so they engineered very well. Yeah,

Ethan: Yeah. It just seemed to be pieced together.

Interesting far. Was there anything that stood out to you that you're like, wow, I haven't really seen someone do this or this, or maybe something interesting in the paper, or just really kind of a clean way to stitch everything?

Farb: It's incredibly ambitious. You're not gonna get a, you know, zero shot television show, uh, today.

Uh, it, it's probably a while before that. I, I applaud them for their ambition, for what they're going for here, which is, you know, pretty comprehensive, uh, output. So if this is their first, you know, version of it, then I'm pretty bullish on what their subsequent versions are gonna be like. Kudos to them.

Ethan: Keep it up. Absolutely. Well, as always, what else are y'all seeing? Farb?

Farb: I found out that you can access GPT4 32k on Poe, and that's pretty flipping cool if you know, it's, it's uh, you gotta be a subscriber, you gotta pay. But the fact that you can get your hands on, I mean, I feel like nobody's really talking about this, and most people I think, don't even think you can use GPT4 32k, uh, anywhere right now.

But seems like if you get yourself up on po, you'll. You'll have it right in the iPhone app.

Conner: It's pretty awesome. Is that, is that new? Is that like, is this like today you're the first user of it or what?

Farb: You know, it's kind of, you know, until you start using it, it's buried under the more section of more models.

Right. So don't know when it hit, to be honest. But, you know, I think once you start using it, it kind of comes up into the, the top of your experience. Uh, but I don't remember hearing anything about it.

Ethan: Wow. Well, if Open AI didn't give it to you on the playground, you better hurry before rate limits hit you.

But super cool. Yeah, Conner?

Conner: Yeah, I saw Perplexity of course, we're always big fans. Perplexity their whole copilot, search everything. Very well built platform. But yeah, they made a demo with Llama 7B chat and it's very fast. It's like ly fast. Like as soon as you hit enter, boom, spits out an entire massive paragraph.

Um, but part of that I did notice llama like. Maybe it's just a fine tuning of the chat. Maybe it's just this implementation by perplexity, but llama like rambles on a lot and you, even if you tell it to talk less, it'll continue doing that. So it's interesting.

Ethan: Super cool. Yeah, I saw a, a really cool tweet from Justin Alvey.

He jailbroke a Google Home mini and pretty much was able to connect it to the cloud, but their own LLM put a new voice on it and it kind of paints you the picture of, you have these big companies, shipping Siri, you have this big company shipping Google Assistant. They're also monotone. They're all the same.

What does the world look like if people have differentiated versions? So really cool hack he put together. I think he replaced the whole PCB as well.

Farb: I was gonna say it's a little bit more than a jailbreak there. Exactly. Replaced

Ethan: pcb. Yeah. Yeah. He put some real work into it and got a well-deserved viral tweet out of it.

Um, so super cool to him. We'll link it below, check it out. But as always, thank you guys for tuning into AI Daily and we will see you again tomorrow.