Welcome back to AI Daily! We're thrilled to have you join us for another captivating episode packed with exciting news and developments in the AI space. In this episode, we'll dive into three major AI acquisitions, impressive text-to-video advancements, and the expanded access to Inflection's powerful LLM.
1️⃣ AI Acquisitions
Three major acquisitions in the AI space: Mosaic ML, Cohere.io, and Mode Analytics.
Acquisitions reflect the increasing interest and heating up of the AI industry.
Focus on data analysis and foundational LLMs, with companies valuing talent and AI platforms.
Zeroscope introduces text-to-video models: one for generating lower-resolution versions and another for upscaling.
The models show promise, trained on 10,000 clips and 30,000 frames at 24 frames per second.
The current limitation is short clips (3-5 seconds) due to training constraints, but progress is being made towards longer videos.
Inflection announces broader access to their LLM (Language Model) and upcoming API release.
The model shows promising performance in benchmarks, particularly in academic work and trivia questions.
Training on H100s is noted as faster and more efficient, highlighting the significance of the hardware in model development.
Episode Links 🔗
Follow us on Twitter:
Subscribe to our Substack:
Farb: Good morning and welcome to AI Daily. We hope you survived the three day break from us. We surely missed you all, but thanks for joining us again for another exciting episode where we will talk about several acquisitions in the AI space, as well as some pretty impressive text to video. And, uh, another big LLM being opened up a little bit wider than it, than it was before from the folks over at, uh, inflection.
Let's jump jump into it. We have three acquisitions here, uh, mosaic ML being acquired, uh, co cohere being acquired, and, um, mode Analytics being acquired. Ethan, uh, did you get a chance to look into these three acquisitions and, uh, how much of the, uh, You know, acquisition amounts, will I be receiving?
Ethan: You might not be receiving any this time, but you know, I think this just speaks to the fact that the acquisition space is heating up for AI companies.
So you have all these big companies. Databricks was the first one buying Mosaic for 1.3 billion. Of course you had cohere. Being bought by ramp. And then you had ThoughtSpot being bought by Mode Analytics. So all of these were upwards of a hundred million dollars, $200 million, a billion dollars. And most of these companies were founded within the past two years with less than 50, 60 employees.
Um, so I think it's a big time in the space. Every big company is thinking about this. You've seen the massive Cloud giants and Google and Meta partnering with some of the, you know, open ais of the world, stability, AI of the world. And now you have all the other players starting to get acquisitions. So, You know, these numbers speak for themselves.
These companies see the value in not only the talent, but also the platforms that are being built around ai. And I think we're just gonna see more of this. Cool.
Conner: Yeah. But I believe ThoughtSpot acquiring Mode Analytics, but again, of course, these are all like very data AI analytics focused companies.
That's really the common theme we're seeing of who's being acquired nowadays. What do you, what do you mean by that? Of course data bricks is like a, like data warehouse, lakehouse, whatever they call it. And they're acquiring Mosaic so that people's data can probably be used easier to train with. Mosaic cohere.io is a like customer data analytics, and then mode analytics is a like business intelligence analytics.
Farb: So this, do you think this is less focused around foundational LLMs and more focused around data analysis or the intersection of the two?
Conner: It's definitely intersection. Mosaic is the only one that's clearly building foundational LLMs. Um, but all these are analyzing data and have lots of data because of that.
Farb: So, perfect for foundational LLMs. It seems like some big acquisitions for this early in the, in the game in terms of, you know, post, uh, I think we can call the, this new epoch kicking off with G P T 3.5, uh, last year. Pretty, pretty impressive. I, I doubt we'll see this slowing down probably, uh, picking up the pace and hopefully I'll get my cut of some of the future acquisitions.
I dunno why I got left out of this round. All right, moving on to our next story here. Some very impressive Text-to-Video, uh, from Zeroscope. They have looks like two models here. One that generates a lower resolution version and the second that upscales that. They kind of recommend using this, the Excel model to upscale, uh, the previous model.
Connor, did you get a chance to check it out?
Conner: What are your thoughts? Looks pretty interesting. It was trained using 10,000 clips and 30,000 individual frames that were tagged at 24 frames per second. Clips look pretty good. Um, some of the more simple ones like rain falling down or like water on a window, or even like zooming into like a moving city.
Even some little bit more complicated ones like a ballerina dancing. Pretty interesting models. Um, at fourth, very early, you wouldn't want to use this for something. This isn't anywhere near like a mid journey or even disabled effusion. Is it available in terms of an actual deal? Uh, yes, it is open source.
I'm front face. Very cool. Um, but the quality of course is, is not there yet. Um, and pretty short clips too. Only about like three or four seconds of each of them, but that seems to be like the big.
Farb: You know, bottleneck right now is this length of length of video. Ethan, do you think that's just a matter of, you know, hey, if we let you do an hour long video, then you'll consume our processing for the next two weeks and nobody else will be able to do anything?
Or is it just not possible to do it? What, why is everything, you know, three to five seconds?
Ethan: It's partially on that, but also in terms of training these models, they're training them on three to five second clips with the caption of that. You know, you can imagine training on an hour long video. You know, if that's even possible.
It would require a lot of GPU power. So we're seeing it at a low clip rate right now. Short clips, you know, less quality. But gosh, if you compare this, I think maybe beginning of the year or even end of last year, we saw meta's like very first, like one second moving gif and we are like, wow, like video is coming and now we're seeing a three to five second pretty well done open source model for text to video and just extrapolate that to what we're gonna be generating here at Christmas time.
And I think we're on a great pace.
Farb: Yeah. I mean, if you triple or quadruple it every year, we'll get there pretty quickly.
Conner: Absolutely. I think the longer videos probably require more of like the meta multimodal, abstract reasoning, um, because these are like clips you might use in B-roll is what we're seeing right now.
But if you want an actual video that transitions between subjects, you need a more layered model that can understand the differences. Of going from this scene to this scene. Absolutely.
Farb: I will happily report, as a movie file that says as a cinephile that they said that the. The res, the, the video outputs are, are better if you do them at 24 frames per second, probably because what it's trained on is at 24 frames per second, which is kind of the classic movie frame rate and not the ugly, disgusting video frame rate that a lot of televisions have by default.
And whenever I go to somebody's house, I, I take their remote and I immediately remove all of those settings so that they can watch things as they're supposed to be viewed. And. Some people like it. Absolutely. Some people don't. Don't invite me over if you don't want that. Remind gesture. Yeah. Thank you.
That's how I feel about it. Uh, alright. Moving on to our third and final story. Inflection is announcing, you know, basically a broader access to their, uh, l l M. They're going to, I think, have a p access soon. Uh, they talked a lot about the benchmarks that it seems to have, uh, you know, been able to perform against other, other models.
Out there, it seems like it's doing very well. Interestingly enough, it, it seems that, uh, the benchmarks are not including any ra, you know, reinforcement learning through human feedback. Uh, I thought that was one interesting point that they seem to make. I, I didn't dig in further into it. Uh, Ethan, what did, what did you get from this?
Ethan: Yeah, I didn't see them using that either. Um, I, I haven't got to try their new model. Um, of course, we've talked about PI on the podcast before, but at, at the end of the day, they seem to focus like some of the, they talked about trivia questions that they've done a lot better on. They've talked about MM, M L U, which is multitask language, understanding, kind of like academic benchmarks.
So they seem to be very focused on that. So when it comes to the API and conversational ability, You know, we've seen how these benchmarks aren't always as accurate, so I'm curious to see how this model's gonna perform and especially in people's like day-to-day kind of chat g bts use. Um, but they did put out some great benchmarks for, you know, academic work and trivia questions and some of these niches that people have found GPT4 to not be too good at.
So curious to see more from them. But, you know, as we always talk about, the more players in the foundational model space, the better. So excited for them. Yeah. Connor, what'd you think?
Conner: Yeah, I believe this is the second, uh, model that's publicly trained on H 100 s MPTs Mosaics 30 B. Um, as we mentioned last time was of course trained on H 100 s.
They fl to that they're publicly the first inflection, one narrowly lost that, lost that margin. Um, but yeah, as, as you mentioned, Ethan, it's very good to have other players in the space. Always, of course.
Farb: What's the significance of training them on H 100 s from your perspective?
Conner: It's faster, um, and more, more efficient.
Farb: But I mean, coming out and saying that, you know, we're the first is just something to say or like?
Conner: it must be just something to say, uh, the models aren't gonna actually be any better. But it is just an interesting note to say that we finally gone from the A100s to the H100s when it comes to these trainings.
Farb: Do you think as a net that makes the training cheaper or more expensive?
Conner: Probably more expensive, faster. It might even out in the end if they're using H one hundreds. Um, H one hundreds are also just more available than a 100 s right now, oddly enough. Yeah. Uh, if you go on Lambda api, look for, um, models, they have plenty of H 100 s available, but not many a 100 s. Hmm. So wonderful.
It, it is nice though, as, as you said, Ethan, another foundational model. PI got a lot of flack last time cause their initial demo was just GPT3.5, so they probably waited to pop back in the news cycle with inflection one, so. Mm-hmm.
Farb: All right. These are some pretty cool stories. Let's move on to chat a little bit about what else we're seeing.
I, I have to share Ethan Mollick, I don't even know if this was real, but he apparently was playing with Bing's ability to send it images. Uh, and he sent it an image of a, you know, a basically a. Interface for a nuclear, uh, nuclear reactor. Uh, and you know, just the buttons and stuff on it. And he's like, there is a whole bunch of alarms blaring.
What should I do? Uh, and you know, you can see this, we'll, we'll share the tweet, but you know, Bing basically starts being like, you're in front of a nuclear reactor. Uh, don't press this button if you don't know what you're doing. Get out of the area very quickly. Um, and then he's like, oh, I pressed this button.
What should I do? And it's like, wait, what did you do with like, exclamation point question mark? I, I honestly almost didn't even believe that it was actually, uh, real, but, but I think it was, I, I don't think he would lie about it being real. Uh, but welcome to the crazy new world where, He said he was trying to stress out Bing using, uh, images and he seemed to successfully accomplish his goal.
Conner: Seems successful. I love that.
Farb: There you guys see him.
Ethan: Yeah, I got to see that. Um, Harvard's new computer science teacher for their upcoming September class is gonna be an AI chatbot. Um, so interesting. It's, you know, some of these educational institutions are actually embracing some of this technology, so they're gonna be able to deploy this, students can chat with it, get help on some CS assignments, um, you know, props to them for actually integrating this and not banning it like some of these other schools.
So I thought that was really cool.
Conner: I think it makes a lot of sense for like a CS 50 introductory class where people don't really have that experience and everyone getting a one-on-one like tutor essentially would be very nice.
Farb: Wow. Really trashing the AI's ability to teach more advanced concepts, aren't you?
Conner: I think it's currently better for CS 50 than it might be for higher level. Yeah.
Farb: Wait till they're tenured and then good luck with them.
Ethan: Tenured AI.
Farb: it's a good name for, for company tenured ai. Good luck getting rid of us. Conner, what do you think?
Conner: I saw PanoHead. It was a pretty, pretty interesting model.
Um, I think it uses gans or something similar, but takes one shot of someone's face. Takes. You can take one picture of someone's face and then it over and over reiterates until it gets a very accurate, very realistic looking 3D model of their head.
Farb: So, I mean, it does a, I saw that too. It does a pretty impressive job.
Even on the back of the head. It gets a little bit wonky in the back of the head. It's, you know, it's like the. You didn't finish closing your head during your fetal stage or something like that. Uh, but still pretty impressive given that it has zero view of it whatsoever. Yeah.
Conner: Zero view of the back of the head.
So the fact that the back of the head looks okay at all saying Yeah, but you can definitely pipeline this together with something like, uh, this face does none exist. Or even a stable effusion nowadays. Yeah. And go from single shot someone's head to a 3D character model that you can use in some video game.
And it's available, I think. Right. Very valuable. Yep. Yeah. Open on GitHub. Awesome.
Farb: Love it. Fun episode. Thanks for joining us, everybody. We'll see you tomorrow on another episode of AI Daily. Have a great day. See you guys. Thanks guys.