In this episode of AI Daily, hosts Farb, Ethan, and Conner delve into three big stories in the world of AI. First, discover the ripple effects of knowledge editing in language models, a benchmark of 5,000 facts highlighting challenges in current LLM editing, and an innovative in-context editing method. Next, we bring you updates on LK-99, a room temperature superconductor that may revolutionize the field. Learn about simulation findings and the potential end of Wakanda's unobtainium monopoly. Lastly, we explore how AI is impacting the field of Radiology. Uncover whether AI copilots or working independently is more effective for radiologists and the role of UX in AI adoption.
1️⃣ LLM Editing
Adding or changing a single fact can cause a cascade of changes in an LLM's understanding
Benchmark of 5,000 facts reveals current LLM editing methods struggle with ripple effects.
Innovative in-context editing method shows promising results.
2️⃣ LK-99 Updates
LK-99 superconductor shows potential with simulated copper bands for energy transfer.
Exciting news shifts markets as room temperature superconductivity gains traction.
Future engineering may lead to increased bands for practical superconducting applications.
3️⃣ AI Radiology Study
Combining AI and human expertise in radiology yields suboptimal results.
UX plays a vital role in AI adoption for medical applications.
Future implications suggest AI or human-only approaches may be more effective.
🔗 Episode Links
Connect With Us:
Follow us on Threads
Subscribe to our Substack
Follow us on Twitter:
Farb: Hello, good morning and welcome to another episode of AIDaily. We've got a few interesting stories here. As always, I'm Farb. Joined by my co-hosts, Ethan and Conner. As always, let's jump into our first story evaluating the ripple effects of knowledge editing in language models. Uh, in this paper, folks are trying to.
Understand the effects of changing individual facts or adding facts to an L L M and what ripple effects that causes. The example they give is, you know, if you say that Jack Depp is the son of Johnny Depp, well then that implies that, you know, There are siblings of Jack as well, so you have to, you know, update your understanding of the siblings if you're just adding a fact about Jack Depp being the son of Johnny Depp.
Conner, tell us a, tell us a little bit more about this paper and, and what you thought about it.
Conner: It's a very interesting issue. They pointed out with the problems of editing l l m knowledge bases that no one's really talked about before. As you said, you can pretty easily nowadays edit a single fact in L L M, but pretty much every fact has a ripple effect.
Another example they gave was, if you're updating the model to say that the Eiffel Tower is in Paris and not London, I. Then the model also has to understand the change in time zone that the Eiffel Tower is in, the change in country that the Eiffel Tower is in, there's a lot of ripple effects from essentially every fact that makes not just a singular fact you have to change, but a large swath of knowledge in the LLM.
So because of that, they wrote it up in this paper to more describe the problem deeper, and they put a benchmark together of 5,000 facts and 5,000 examples of different kinds of ripple effects. That is a benchmark for l m editing, and they tested most ways that LLMs are edited nowadays and found that they don't really handle ripple effects that well.
And they ended the paper with. Basically just an in context editing method where at the beginning of your prompt, you say the Eiffel Tower is in Paris and not London. And then now the model knows that whatever you prompted next, it uses that context of course. And they found that that of course blows the water out of any other example of LLM Editing.
Farb: Yeah, that's pretty interesting. Uh, Ethan, you think this is you know, got some. Deep impact on how we're gonna be able to use LLMs in the future. Is this like a fundamental flaw or is this just a, you know, a little, you know, trick for a paper?
Ethan: Yeah. I think half trick for paper, half engineering hacked, you know, some people use LLMs in combination with.
A search tool or a vector database to kind of set up a separate fact database so they can reduce these hallucinations. And some people are trying to say, Hey, we want all the facts embedded in the L L M and we're gonna keep editing the l l m and Oh no, a big problem with editing it is these ripple effects.
So it's kind of a common sense problem that actually it's. A fairly big problem for the LLMs. There's a lot of symbolic representations you have to edit just for one new fact. You know, if you think of a new fact, you learn, it updates a lot of your priors about the world. So for the people who use LLMs in this way, which is, you know, most of ChatGPT, and.
Kind of most of the large language models right now, it is a problem and I like the way they have this new benchmark. So we'll see if people kind of continue to use it this way or separate fact database become popular.
Conner: But yeah, this is pretty interesting for adversarial editing also, because the ripple effects also apply if you're trying to teach the LLM a lie if you're trying to like change the knowledge from a fact to a lie.
Mm-hmm. So for attackers who are trying to. Like the UAE and how they've edited their model Falcon to talk about the UAE better. If you wanna change what an LLM thinks in a false way. Also this applies in the same way. It's very interesting for that also.
Farb: Yeah. Well, if LLMs are approximating being humans, then fact checkers are probably something that we're gonna need for some, some time ahead.
It is. They are just trained on things humans have done, so immunity. Community notes for LLMs. Yeah, the real, the real reinforcement learning. The, uh, one thing, and the last thing, uh, I'll say about is that I thought it was interesting that they noted that, um, changing facts about popular entities actually caused the most problems, uh, which is somewhat understandable as the, as the tentacles of something popular are probably reaching farther into the LLM and requiring more things to be updated.
Yeah. Onto our next story. That was a nice primary source story for you all. Moving on to our next story, let's get back into the LK-99 News. We are so back. We were so gone. We are so back. Nobody knows what's going on. Nobody knows what's real anymore. That's what we're all here for. Some exciting new news.
It seems that, you know. There was a supercomputer simulation saying that this is, you know, possible, uh, there are some stories of, uh, people starting to replicate it. Um, one in China, I think maybe there's one in Romania. I think the folks down the street at Varda are also still, uh, working on it. Uh, this is super exciting.
Room temperature, superconductor LK-99. It's not dead yet. It seems like there's, you know, the, the, the weight seems to be. Trending towards it being somewhat real as opposed to everyone just, you know, coming at it, uh, over and over saying this, this is, this is Bull Ethan, what, what did you, what are your, what's your take? What are you thinking?
Ethan: I think it's pretty exciting. We saw manifold markets jump to like 55% after these two stories. So of course the first story was that big kind of quantum chemistry, super computer over at the DOE. They got to simulate. What is LK-99 actually doing? Um, and they were able to find that, hey, there are some copper bands within LK-99 that move energy at the firmy level.
So they're pretty much showing that there are some bands within this element that show Superconductivity. Which is pretty cool, and it also explains how the kind of the current that you can put through LK-99 is not super high because right now it's only a couple of these bands, but a huge discovery at the end of the day could mean we have another five, 10 years of engineering to actually get these elements and increase the number of bands and make it a usable superconductor.
But end of the day, They've simulated that, hey, there's two bans in here that exhibit these properties, so that's super cool. And then we got to see the news out of China where another, you know, Meisner effect floating rock, um, over at a kind of test case in China. So, two big, huge news. It kind of completely shifted the market.
It's got people excited again. I think, you know, I think it's showing we have something real here. How long it's gonna take to put into use. I don't know, but it's real.
Conner: I was gonna say it's pretty bullish that like the original hypothesis of the paper had that it was the copper atoms percolating into the lead crystal.
Like their hypothesis. That's, that is what made it superconductive. Mm-hmm. And then the d o e simulation, that that does actually cause superconductivity to be possible very bullish that they're right about that.
Farb: I love that, there's so many great parts of this story. Like the whole quartz tube thing may have just been some complete accident that, you know, the, the famous story of, um, Thomas Edison is, you know, he, he.
Figured out 5,000 ways to not make a light bulb. Yeah. Which is to say, a lot of times fundamental scientific progress just means, you know, trial and error until you get lucky and, you know, you can't actually discount that as a, you know, major part of what's moved science forward o over the years and possibly part of what's happening here.
And, um, I guess the folks in Wakanda will potentially be losing their monopoly on unobtainium. Uh, if this ends up being true. Sorry, Wakanda. And, uh, great. Anything else to add to this story?
Conner: I saw it's pretty interesting, like a big problem with this, especially because of the whole crystallization of copper into lead.
A big problem is like how that crystals form in earth. Gravity. So Delia and Varda, they were tweeting how like of course space doesn't have that problem. So very interesting. Superconductors in space. Exactly.
Farb: We're conductors in space. I love it.
Farb: Yeah, very powerful. All right, let's move on to our third story. Also, another primary source story, which we love, which we love over here about radiology, radiologists and ai. And I think this is interesting because it points to some real world applications of when we're combining. You know, basically the paper's asking, is it better to have an ai, uh, be your radiologist?
Is it better to have a radiologist be your radiologist? Or is it better to have them work together? And I. You know, Elon commented on this tweet with an exclamation point because the papers seemed to find that when the AI and the radiologists worked together, it was kind of like the worst results.
Uh, and, you know, they're, they're pointing to a few different things in there, uh, which we'll get into. What, what did you think about this, Conner? What were your takeaways?
Conner: Yeah, as you said, it's very interesting that like, Having the human in there at all kind of makes the worst result. So this kind of points to, in the future, it won't be human having the AI as a copilot.
It'll be either the AI with the human as a copilot or just the AI. Bullish on AI, really, sorry, humans.
Farb: The questionable robot in the group is, uh, talking trash about humans. Unsurprisingly. Ethan, what's your read?
Ethan: Yeah, it's like no matter how much progress we have with ai, if you can't get people to use it, Um, you're not gonna see its fast application and medical, et cetera.
So it kind of got me thinking about like, UX of ai, right? Are you gonna have to, like, people are into explainability of these models, so how do you get them to explain it to people so they feel comfortable using the outputs? How do you get it to psyop people? So they kind of like use the co-pilot more, right?
Is the LLM gonna have to use explainability and kind of like almost. psyop them into using some of these kind of outputs and models. So just kind of UX around how does AI affect people? How do we get it into, you know, real industry's hands. I think about lawyers too. It's kind of similar to, similar to medical and the fact that there's a lot of these tools out for lawyers now, but not all of them are using it.
They're not too into it. It's still such a slow moving industry. So how do we fix some of those problems and are we just gonna have to see the complete autopilot of them, our copilots on their way out? I think I'm with Conner on the fact that I've never been a big co-pilot person.
Farb: Yeah. You know, I think there's uh, some big imm implications here and probably we're gonna see some changes in the end.
It always just kind of comes down to who wants to underwrite what I. And you know, are you gonna underwrite the doctor? Are you gonna underwrite the AI? Are you going to underwrite both and be like, Hey, you know what, this is what our AI is saying to do and this is what the doctor's saying to do. You know, we can go in either direction for your treatment.
Uh, just know that either way, you can't sue us. We don't care if you're going to have to pick, pick one or the other. And, and maybe it's, maybe it's left up to the person. I mean, kind of what they've showed is that the, the radiologists favored their own interpretation over the ais and, you know, sort of thought that the Ai, AI came to a conclusion somehow, independently of their conclusion, even though it was based off of the same, uh, information.
So, you know, clearly it wasn't just kind of making up its own stuff. It was, it was, they were both informed the same way. So, you know, Which is more likely to get underwritten the the doctor or the AI results. And I wouldn't be surprised if it was the AI in the long run, and we may be, maybe we'll have this weird transition period where, like I said, you'll get both options presented to you and it's like, okay, what do you wanna do for your treatment?
The AI treatment or the doctor's treatment, they don't, they're not agreeing
Conner: here. Patients of course, often get second or third opinions from their primary care doctors, so this could be the same thing. You have your primary care doctor and then you have a second opinion from a global AI that has a lot more information than your doctor, but just isn't your personal doctor.
Ethan: Yeah. We'll, we'll be ineffective babysitters of AI until we can blame the AI .
Farb: Yeah, until you can underwrite the AI, you'll still have to underwrite the person. All right, let's jump into the, what we're seeing portion of our find show here. Uh, Ethan, what are you seeing out there?
Ethan: Uh, yeah. Neon, um, a fantastic serverless Postgres database.
Uh, I saw that they raised. 46 million in Series B, so congrats to them. We've used them before. Um, really fantastic product. And you know, they're kind of also catching onto the AI wave with PG Vector. So instead of using a full set vector database, you can use PG Vector within Postgres. And I think we're seeing a lot of application developers and companies use it, so they're really latching onto that wave and I think it's helping improve their product and find them new customers that might not wanna switch their whole database. So yeah, check 'em out. We like them.
Farb: Big fan. Nice. Conner, what are you seeing?
Conner: Langchain, of course, they raised 7 million a while back to productize, the Langchain framework. Langchain, of course, is just a framework around ais to prompting and chaining and all that. They raised 7 million to build products.
A while back. We talked about Lang Smith, which they announced a bit ago. And yeah, I've been playing around LangSmith for the past week. Pretty good. I'd recommend trying it out. That's a lot of logs and observability of how you can. Use Langchain or really any other framework. Very helpful, I think. Um, just monitoring how you use OpenAI and how your prompts output. If you're not saving in your own database, you should be saving it somewhere.
Farb: Um, I saw that, uh, OpenAI filed the trademark for GPT-5. So we don't know what that means. We don't know if that means it's coming or they're just getting way, way ahead of it. But thought that was, that was kind of interesting and it included some, you know, uh, audio and.
Language related, uh, stuff. So it wasn't just, it didn't seem like it was just language related, like text related. Had some audio portions to it. Uh, and also an interesting. Uh, paper that somebody, not, not a paper, basically an article, a rundown of, of the state of, uh, supply and demand in the GPU world, uh, heavily based around NVIDIA's, H100 GPUs.
And not surprisingly, they're finding that, uh, they're, it's tough to get them and especially the, the 8x clusters and, you know, people are, Fighting to get them. There's hopefully gonna be some more supply coming up here soon. Uh, the article was interesting though. It was, it was pretty detailed. They talked about different OEMs that you can try and work with, and if you're trying to get your hands on some GPUs, I highly recommend reading it.
Conner: Yeah. They really dig into the depths of like the actual materials needed to make GPUs of the substrate silicon, the rare earth metals. A lot of interesting details that. You don't think about day-to-day when using them of course, but it is important. I'm seeing the bigger picture of when more GPUs will be available, et cetera.
Ethan: Which it looks like 2025 based on their estimates and some other great market analysis. Another year and a half of backlog.
Farb: Alright, well thanks for joining us here for another exciting episode of AIDaily. We'll uh, be seeing you tomorrow probably. Have a great day everybody.