TikTok's Effect House AI, Meta Voicebox, & LLaMA Goes Commercial

AI Daily | 6.16.23

Welcome to AI Daily, your go-to podcast for the latest updates in the world of artificial intelligence. In today's episode, we dive into three exciting stories. First up, we discuss Meta's groundbreaking release of Meta Voicebox, a cutting-edge text-to-speech model with state-of-the-art performance. In the second story, we shift gears to Meta's decision to make LLaMA, an open-source text model, available for commercial use. Our final story focuses on TikTok's Effect House, a platform that allows creators to design augmented effects for their videos.

Key Points

Meta’s Voicebox

  • Meta introduces a text-to-speech model called Meta Voicebox, offering state-of-the-art speech generation and impressive performance.

  • 20x Faster: Meta Voicebox stands out for being 20 times faster than existing alternatives, enabling tasks like noise removal and generating new voices with remarkable efficiency.

  • Open Source and Flow Matching Model: Meta Voicebox is open source and incorporates a flow matching model that utilizes diverse and less labeled datasets for training, leading to better models without extensive manual labeling.

  • Real Speech Classification: Meta Voicebox includes a classifier that distinguishes between audio generated by the model and real speech, showcasing Meta's commitment to releasing open source models and their potential integration into their own products.

Meta’s LLaMA Goes Commercial

  • Meta aims to make the popular open-source text model LLaMA available for commercial use, challenging established models like GPT-4 and emphasizing their commitment to open-source AI.

  • Meta's Vision and Focus: By providing open access to LLaMA and focusing on their metaverse vision, Meta aims to democratize AI and distance it from exclusive ownership by big companies like Google and Apple.

  • Business Aikido Strategy: Meta's decision to offer AI tools for free disrupts the market and positions them as leaders in the technology, attracting talent, boosting stock prices, and potentially increasing revenue through their primary source of income—advertising.

  • Competitive Advantage over Google: Unlike Google's research papers and open-source efforts, Meta's commitment to commercially usable models sets them apart, potentially prompting startups and enterprises to choose Meta's free offerings over paying for API costs from other providers.

TikTok Effect House AI

  • TikTok introduces Effect House, an augmented effects feature where users can create their own effects using text-to-image AI, revolutionizing video creation on the platform.

  • Democratizing AI Tools: Effect House allows TikTok users to easily generate and apply complex effects, eliminating the need for extensive engineering teams and democratizing the creation of captivating videos.

  • AI's Influence on Social Media: The integration of generative AI tools in social media platforms is becoming a prevalent trend, with TikTok leading the way and potentially inspiring other platforms like Instagram to follow suit.

  • Enhanced Creativity and Entertainment: The availability of AI-powered tools like Effect House empowers creators to produce more captivating and engaging content, promising a future with increased creativity and entertainment value on TikTok.

Episode Links

Follow us on Twitter:

Subscribe to our Substack:


Ethan: Good morning and welcome to AI Daily. Today we have three amazing stories for you, two of 'em regarding to Meta, one of 'em regarding to TikTok. So let's kick off with Meta's first story, which is Meta Voicebox. So Meta Voicebox is a text to speech generalizable model, uh, state-of-the-art performance going across, you know, generating speech.

And they got a lot of details in here far. Give us the rundown.

Farb: This is pretty cool stuff. The demos are also, you know, eating their own dog food. You can hear Mark Zuckerberg talking, but it's generative Mark Zuckerberg voice. Uh, I saw another video. It seems like they had a few different people at Meta make the video.

There was another, uh, I think, uh, you know, female Indian Voice, uh, that was doing the same, uh, video, but I, I guess they could have had every single employee at Meta make the video because it, it's all generative. So basically they're, Trying to level up the, the AI voice game, and probably one of the more important pieces about this is that it's.

20x faster than anything else out there right now. There are other things that are doing similar stuff, but to be able to do it 20x faster is a huge step, especially if you're trying to do things at the scale that Meta is doing. So it's doing things like taking a piece of audio with somebody's voice.

Maybe there's some noise in it. It can automatically remove that noise. It can take a piece of text with another piece of audio and create, uh, you know, a new voice, like make you say something you've never said before, but in your voice. It can do that across different languages while maintaining the person's own personal accent.

Uh, so it's pretty, pretty impressive stuff here. And again, doing this sort of thing, 20x faster is, is kind of mind blowing if you think about it. You know, you 20x it a couple more times and you're at thousands of. Times faster than original versions, which means this stuff is gonna be bordering on real time.

Think about, you know, star Trek Universal, uh, translator where you can basically hear somebody speak in real time and have it translated.

Ethan: Yeah. And they showed off style transfer. They showed off audio editing. They showed off lower word error rates based on volley, um, which was amazing to me. Connor, what did you see from this?

Is this one open source?

Conner: Yes, it's also open source like meta once again, does. They've interestingly built in a way across their flow matching model, which means they can take in a large, more diverse, less labeled dataset. Um, of course marching onwards towards models that get better and better with data.

With data that's less and less labeled, which means it's easier to train these models. They don't have to have people go through and label all the specific data. Yeah, we get better models, not a terministic, faster snappier, as far said, getting pretty close to real time.

Ethan: I, I think what's really cool too is they actually also detailed a classifier that they could pretty much distinguish between audio generated with this and generate between real speech.

Um, once again, just huge news from Meta releasing all these open source models between speech and text, et cetera. And I think it's such an interesting position for them. You know, they're, these are. Kind of risky ish models and releasing them to open source, letting everyone else use it really primes them to bring this to their own products soon.

So really cool. Uh, second story for today is Meta is now wanting to build LLaMA for commercial use. Um, so LLaMA of course, is a really popular, um, open source text model competing with the likes of, you know, GPT4 or Clod or any of the other LLMs. And they want to make this available for commercial use, which, People want.

So Connor, what do you think of this?

Conner: Yeah, I feel like this really, I feel like it's really in their name, in Meta, like their focus is not AI here. So they're, they're trying to pull away from the focus of AI and make it so that hey, it's open. Anyone can use it. This is not something owned by the big companies.

This should not be owned by Meta, by Google, by Apple. This should be opened to everyone. And then I think honestly, that'll probably help meta focus on their vision of actually building a metaverse, continuing the march of Open-Source AI.

Ethan: Yeah, it's important that, you know, people have complained about LLaMA just being for research purposes only.

What does this unlock for the whole ecosystem if this goes commercial available for.

Farb: This is a example of some business, Aikido by Zuck. Yeah. Facebook Meta's business is not going to be charging developers to use AI APIs. They make their money from advertisers. So if they can go out there and disrupt somebody else's business by doing what they charge for for free.

Classic business, Aikido, and it positions meta to be where all of the AI stuff is happening. It positions them as leaders in the technology, which is great for their recruiting. It's great for their stock price, and it's probably gonna be great for their revenue. It's a bold move to say, Hey, we're not gonna try and make a bunch of money off of AI APIs.

We're gonna try and make money off what we normally do and give all of these cool tools away for free.

Conner: Absolutely. Google, of course, does a lot of research, papers, does a lot of open source, but when it really gets down to it, you can't actually use the models open source. You can't actually use it commercially.

So it's a very big difference when Meta says, Hey, all of our models, in our future, we're gonna have all these models be open sourced, commercially usable, use 'em for what you want to. And hopefully those people were attack Google, like you said, for business Aikido, yeah.  

Ethan: LLaMAs, definitely a little bit behind GPT4, but the moment llama goes commercial, I think you're gonna see it just a wave of all these startups and enterprises say, no, we're not gonna pay those API costs.

We're gonna go set up our own server and run llama. So amazing as always. Um, our last story of today is TikTok's Effect House. So TikTok has launched, if you don't know, effect House, it's their ar kind of augmented effects. You know, it's how people make all these really amazing videos on TikTok and they've dropped text to image ai right within effect house.

Um, for what does this do to TikTok?

Farb: So effect house is something you can use if you want to make effects that other people use on TikTok. And what's cool about this is, you know, if you wanna make a effect that lays a tiger face over somebody's face, well you can type in. Tiger face and it'll do text to image and it'll create that effect.

And boom, you're out there with your own cool, uh, effect on TikTok. And we've, we've said this before, that this wave of cool generative AI tools is making its way into social media. I would predict that, you know, by next year, which means probably in about four weeks, cuz every time I predict by next year, it happens three weeks later.

You know, half or more of social media is gonna have some form of AI happening on the content as it goes out. Pretty, pretty bold and powerful. Move from TikTok. Uh, your move. Insta. Let's see what happens. We know Instas coming, coming hard with it as well.

Ethan: Yeah, I remember when these effects used to take like months to make by a whole team of Snapchat engineers and now you're seeing text AI. Let's create a new effect. Connor, what were we about to say?

Conner: I was gonna say, we've seen Snapchats, my ai, of course we've seen, we talked about TikTok, TikTok Taco, that they were like mess, like beta testing a little bit while ago. Indonesia. Mm-hmm. And then of course, Zuckerberg said WhatsApp, Instagram Messenger are all gonna have AI or generat of AI coming to them.

So TikTok has now brought to effect house. So I'm very certain that. Instagram reels is gonna have it next.

Ethan: More tools for creators, more interesting videos, more entertainment. Go TikTok. Um, as always, what else are you guys seeing? Connor?

Conner: Uh, so The Guardian and how they have decided to approach gender of ai.

They've decided, of course, the very median, median way of how they're gonna do it. They said, Hey, we're gonna use it some for some use cases, but we're still gonna be very balanced about journalists about. First in human first reporting. Um, so they, they're gonna use it for some of their more research, for some more like editing stories, but they still wanna have journalists doing the original reporting.

So not nothing crazy, nothing crazy surprising, but sensical.

Ethan: Yeah, Farb?

Farb: There was this pretty cool language to rewards for robotic skill synthesis, and this is all, you know, ha happening in a, in a synthesized environment, but basically getting robots to do things based off text input. Turn on a faucet, open a drawer, pick up an apple, stand on your, your hind legs.

This stuff is, you know, happening in simulators and it's kind of funny, if you watch the simulator, the robot's hands are kind of like flying around, so you probably don't want that happening with a knife in a real robot's hands. But, you know, again, give it a few weeks, uh, maximum a year. Uh, and the, you know, kitchen staff at the, the local restaurant may have a, may have a robot in there, ho Hopefully it doesn't have the, the shaky hand problem though.

Ethan: Absolutely. Yeah, I saw Mercedes-Benz is bringing ChatGPT to all their customers. So they're using Microsoft's Azure, uh, service. They're gonna handle all the data privacy themselves. And starting today they're bringing the beta program to almost a million vehicles. And you know, I think this is really exciting to see ChatGPT within a car and also, you know, a generalization.

But if my car has ChatGPT and LLMs and your application does not yet, you should probably get on it.

Farb: I gotta say that sounds to me, I hope I'm wrong, but it sounds to me bordering on totally useless, if not dangerous. But I do think it'll be worth the memes alone, you know? What's that?

Ethan: The jealous Tesla owner here?

Farb: It's okay. No, no, not at all. Not at all. The, um, the Tesla's the safest car in the world. Uh, the, but I will say for Mercedes, the memes alone should be worth it. So kudos to Mercedes a hundred percent.

Ethan: Sell a few more cars. But thank you as always for tuning into AI Daily, and we'll see you again next week.

AI Daily