Apple's WWDC Recap - The Dawn of a New Era

Ai Daily | 6.6.23

Don't miss this special edition of AI Daily, where we dive into the exciting announcements from yesterday's WWDC event. Join us as we explore all things AI related and discuss the groundbreaking features that were unveiled. We'll cover everything from transformers to neural networks, bringing you the latest insights into the world of AI.

Key Points:

  1. Apple did not mention AI during the WWDC event, but they referenced technologies like transformers and neural networks.

  2. Transformers have been implemented to improve autocorrect, keyboard experience, and dictation on iPhones.

  3. Apple's new M2 Ultra GPU is useful for training transformer models.

  4. Apple introduced the Curated Suggestions API for creating multimedia journals on Apple devices.

  5. The Curated Suggestions API uses on-device neural networks for privacy.

  6. Live voicemail transcribes voicemails in real-time on the device and allows users to answer calls midway.

  7. Siri now supports back-to-back commands and utilizes audio transformer models.

  8. FaceTime reactions allow users to separate subjects from photos and create stickers.

  9. ML or AI is used to differentiate subjects from the background in photos and FaceTime videos.

  10. Apple's AirPods and AirPlay have adaptive audio features that adjust volume and transparency based on user interactions, possibly using audio transformer models.

  11. Apple's Vision Pro includes digital avatars that reconstruct a user's face based on scans and movements, but chin detection technology is still in development.

Links Mentioned:

Follow us on Twitter:

Subscribe to our Substack:


Transcript:

Farb: Good morning and welcome to a special edition of AI Daily, where we are going to cover yesterday's big WWDC announcements and all things AI related and, um, But first we counted up the number of times that they said AI in this event, and it is an unprecedented amount of times. Total number of times was zero.

Zero times that they mentioned AI in this event. But repeatedly did they mention things like transformers and neural. They loved the word neural. Uh, And other things that obviously include what we think of as ai, uh, whether they're LLMs and transformers, or vision based or audio based. So we're gonna get into those right now.

Are you boys ready?

Conner: Absolutely.

Farb: Of course. Here we go. Our, our first piece is the Transformer Auto Correct and dictation. Ethan, you wanna take this one?

Ethan: Yeah, so they've pretty much been implementing Transformers to improve autocorrect and improve the keyboard experience and improve dictation. So every time you're interacting with your iPhone across any app, of course you're using the keyboard.

Maybe you're using dictation. If you've seen, you know, we've talked about this whisper's been better. On a dictation side, people have talked about, Hey, how can we improve the keyboard experience? Well, apple has put it in, of course, without mentioning ai, but they've been implementing transformers on a device level.

So now you can have your own autocorrect words, you can use words like ducking and keep it consistent across your experience. So really cool that they're putting this on device, not sending it to a server, just improving the keyboard experience for the iPhone, using a mini LLM on device.

Farb: So do we think transformers on devices new?

Actually they weren't doing any sort of transformers on device before. I don't believe so.

Conner: Well, it, it was ML in the past, but it was more like markoff chains and more simple models. Um, jumping to transformers pretty big, especially cuz it unlocks thing that Ethan said of like, learning your preferences like ducking, et cetera.

Farb: So, yeah. Fantastic. How about the Mac, GPU U stuff? Connor, what did, what did we see

Conner: there? Yeah, they introduced the M two Ultra, uh, faster CPU and gpu. The piggy thing here for us, of course, is it jumped from 64 to 76 quarters in the gpu and they specifically mentioned that it is actually extremely useful for training transform models.

Um, and I'm assuming of other models of course, too, but big jump for that.

Farb: Yeah, they, they're very specific about saying that, and I, I, I almost wondered whether that was the sort of thing where maybe like a month or two ago or at the beginning of the year, somebody was like, okay, go through the script for the WWC keynote and pick the places where we want to mention these things.

That the dev, remember, this is wwd C it's for developers. Uh, where can we add the sentences that tell the developers what we want them to know about our devices without using the word ai? Uh, Big, big news curated suggestions, API for journal journal's, a new product, uh, from Apple that allows you to sort of create a multimedia journal of your life, like you're journaling or writing in a writing in a diary.

And one, one thing that I thought would be interesting about this is if you think about this combined with Apple's legacy product, which is how you pass on your, uh, apple iTunes accounts to somebody when you pass away. Uh, it would be crazy to think, you know, in 80 years from now, some kid might get their parents' Apple Legacy ID and have this multi-decade long journal that their parent has been creating, uh, in their Apple device, which is a really crazy idea to th to think about and, and super cool.

Ethan, anything else about the Curata suggestions? API

Ethan: Yeah, once again, just in the Apple ethos of privacy, everything is on device. So a bunch of mini neural networks on device that pull your photos and pull your messages or any other apps can link into this suggestions API as well, and feed into, once again an on device neural network and move into your journal.

And like you said, from an application standpoint, I think it's gonna be really fun.

Conner: Yeah, we've seen these exact kinds of models before, but it doesn't work better. There's no better place for it than having access to your fitness data, having access to your meditation data, having access to your music data, so very good application.

Farb: No SMSs in there. How about live voicemail? What are we seeing there? This one's really cool to me. This is how I, you know, would probably prefer just all of my phone conversations occur, uh, via live voicemail. Uh, Connor, can you tell us how that

Conner: works? Yeah. Someone gives you a call, uh, let's say, and they're leaving a voicemail if you wanna read it.

If you wanna see if something is a problem, then. It live. Transcribes entirely on device. Gets the voicemail data and shows you tri on your phone exactly what they're saying in the voicemail. Another neural network that works very well on device thanks to the neural channel. And you can answer the call midway through the live voicemail.

Farb: Yep. Mm-hmm. Or just hang up on them. That would be nice too, either one. Okay. Uh, Siri, one word, Siri, back to back commands. Ethan, what's up with Siri?

Ethan: Yeah, so not a huge update to Siri, but a really nice experience update. Once again, just like before on live voicemail, utilizing these new audio transformer models, making it instead of, Hey Siri, you just say Siri.

And also being able to say back to back commands. So they're, I wonder if they're using another small l o m on top to manage these back-to-back commands. You know, before you just say, Hey, Siri, set an alarm. Hey Siri, what's the weather? But being able to say, Hey, Siri, set an alarm and tell me the weather.

Probably involves some sort of, once again, that small on device l o m that can split up these two commands and put 'em back into Siri normal flow. So really cool experience update. But I think we're gonna see more out of Siri in the next year or so.

Farb: I, I always wonder whether, you know, this seems like it's part of iOS 17, but I may, I wonder whether the.

Hardware capabilities of the upcoming iPhone are somewhat relevant to features like this, which is to say assuming a feature like this is backwards compatible with previous iPhones that are on an older hardware, will it work just as well, uh, to do Siri on an older iPhone versus hey Siri on an older iPhone?

Don't know, just questions that come, come to mind because it is, it is pretty neat to see that you can drop the hay from Siri to just Siri. Absolutely. FaceTime reactions and iMessage stickers. So this is taking a feature that Apple had before where you could long press on an object in a photo and sort of separate that object from the rest of the photo.

Uh, and they're allowing you to do that in iMessage. How, how's this gonna work on her?

Conner: Yeah. It's always been a very neat feature, being able to get an image of like your dog, and then pull out the dog into just a separate image. Now this takes a step forward from that. So the image part is iMessage. You can pull out that picture of your dog or pull out that picture of any subject and then make it a sticker.

Uh, this works with still photos. This also works with live photos. And then these stickers, of course, as they showed you can throw in your messages, throw around different apps. Um, the second p and then second part of this is FaceTime reactions, where it pulls the subject out of the FaceTime video and shows balloons behind them, shows confetti behind them.

Just another, it's very useful and shows the ubiquitous ubiquitousness of pulling subjects from a photo.

Farb: So, yeah, this is, I think something where, like you could do like the heart shape here and then you'd see all these hearts start, uh, flying in. How, how is it using ML or AI for that Connor?

Conner: Uh, it's probably some sort of control net or some sort of similar model that Apple has proprietary and it.

Differentiates the subject of who's in frame versus the background and just pulls out this part of the photo and erases the rest.

Farb: Yeah, I don't know that we cover this here, but in one of their cool, um, presentations, they're, they're showing how you can essentially do a keynote, um, presentation in FaceTime, where it'll sort of like, like a television camera, you know, like a television anchor.

It'll put the keynote behind you, but in front of what's, you know, the background. So the keynote would be kinda like floating over here, uh, not in front of my face, but in front of the window and behind my head. And it's just constantly staring at everything that's happening and, and you know, doing some, some algos to see, to see what it's seeing.

Uh, that, that, that's really cool. And how about with the, um, iMessage, sorry, with the, with the, with the photos, pulling that out. How do, how does it use ML or, or AI to separate the, the objects in the photo?

Conner: The same tech, I'm, I'm sure it's some sort of control nets type model where it knows what the subject is and it knows what the background is and it just, Erases the background.

It keeps the subject, which is what you want.

Farb: So yeah, it seems like this is a Apple's hitting its own inflection point where all of the, the various tech and things that they've built over the years to do ML and these different types of. Algorithms are all starting to fly together now and create more and more features, uh, out of it.

And, and doing things like jacking up the number of GPUs that are in a device will even accelerate, you know, the different types of things you can do on your devices more than just these software changes. Uh, and so, uh, we also have on device intelligence, AirPods and Airplay, uh, it was a big event yesterday.

Tim Cook said the biggest ever. And I, I don't think he was exaggerating. He, he delivered on the biggest ever WW d c, uh, intelligence, AirPods and Airplay. Ethan, what, uh, what are we seeing there?

Ethan: Yeah, Apple's always been fantastic at their audio models. You know, if you've used AirPods, of course, like most of us have their noise cancellation, their um, transparent audio has been fantastic, and they're taking that to the next step.

Adaptive audio, personalized volume, conversational awareness. So if you walk up to someone and you start talking to them, you know, instead of fumbling up here and changing back to noise cancellation, it knows from its audio models. Hey, you've begun to talk to someone. We should probably lower the volume or switch over your transparency mode.

So they've always been really good at these audio models. Um, and they might be using some form of kind of audio transformer here as well, like dictation, but fantastic to see them like evolve upon just what it means to kind of connect this digital, physical world.

Farb: Do we think that's happening on the AirPod itself or on your iPhone?

Ethan: They have  the chips in the AirPods. Um, so I imagine, I'm not sure how much of the neural network is on the AirPod itself. I bet it's on device, but they're, you know, the chips they have in there, I believe. What, what, what do they call that chip?

Farb: Do you remember the M one in the air one?

Ethan: Maybe? Yeah. W one or something like that.

Um, but the efficient streaming of that, you know, that makes it so that anyone doing Bluetooth headphones can't just start doing this. Um, so they do have some proprietary hardware of course, that allows us.

Farb: There, there it's like equally mind bending that the, that the ML might be happening in the AirPod or on your phone through the AirPod.

Yeah. Not quite sure, which is more mind bending to, to be honest, it's, we'll just say it's magic, uh, until someone tells us otherwise. Very, very cool. And then, uh, what about airplay? Did we cover the airplay one as well?

Ethan: Um, airplane's very similar. Um, so, uh, improving upon this adaptive audio, being able to put up your iPhone and intelligently know, hey, you know, similar to we do on a Mac right now, but being able to put your iPhone in front of your TV and just intelligently know based on the positioning of your iPhone.

Hey, it's time to start this FaceTime call. It's time to improve the audio here, et cetera.

Farb: And so lastly, we saw the Amazing Vision Pro announced. Just one more, one more thing, one more big thing we, we won't dig into every aspect of the Vision Pro here, but we'll, we will touch upon this r really cool feature called Digital Avatars.

Connor, can you tell us about the digital avatars and how AI or ML plays a role there?

Conner: Yeah, so normally when you take a FaceTime call, of course it you're showing your actual face. But in the Vision Pro, if you're taking a FaceTime call, they now have these digital personas where a bit like we saw out of Facebook and they're like labs.

It reconstructs your entire face based off an initial scan and then based off your eye movements and your mouth movements, and I think a little bit more of like your chin movements or whatever. Um, But it reconstructs your face into an avatar that looks pretty realistic and pretty good. A lot of Noel's going on here, so remains to see if they'll be able to actually ship this, but I'm sure they will.

Farb: Yeah. Maybe like Facebook couldn't come up with your feet. Apple haven't figured out your chin yet, so they've got everything but your chin. There will be no chin in your avatar until we have Chin, chin technology caught up to the rest of the tech. What's cool, what's gentech?

Ethan: What's cool with these digital avatars too, is it's probably some form of like multimodal model.

You know, you're bringing in the eye movements, you're bringing in your mouth movements, you're bringing in the audio of your mouth in order to like accurately like transform your mouth into something that's on camera. So probably some form, some form of really cool multimodal model taking in all these different data inputs from your face.

And that's why this has been such a difficult problem for so long. You know, one of the things that interests me, like I said, is. You can look at someone's mouth, but you also should know what they're saying, like the actual audio coming out of it, cuz that's gonna determine how their mouth moves. So Apple's probably got some really cool multimodal model going on once again on device that make this possible.

Farb: I mean, the device has an M two chip mm-hmm. On it. Mm-hmm. Which is crazy. And then it also has the new R one ship, the Reality One Ship, which is just the Be Beginning. And I think it's mainly focused around keeping the latency of the video image, uh, down to the 12 milliseconds that they're saying they're able to get, which will have a huge impact on making this something that you can wear for a while without getting, you know, dizzy or nauseous as it'll do a much better job of connecting what you're.

Ears are experiencing, for example, and what your eye and your, your head movement is experiencing. It's pretty, uh, pretty badass piece of technology to see coming out of Cupertino over there. Um, well, alright, let's, let's move on to the, what we're seeing section, uh, Ethan, what are you seeing out there that isn't Reality Pro?

Ethan: Yeah, definitely a lot on reality Pro. Wish you could talk about it more, but I did get to see that there was a, if y'all saw this, the AI drone kills the human operator. There was a huge. Fuss around this, you know, DOD was deploying some AI to try to improve, Hey, how can we be more offensive with ai? And it's, they, someone made up a huge claim that it started taking out human operators and then it was taking out the communication network for the human operator.

But they did a whole fact check on this. And, you know, as normal with PR of AI Doism, this was proven to be definitely not that. Um, so always funny to see these types of articles.

Farb: To be fair called this out news. When we, when we shared it internally, my first reaction was fake news. Absolutely.

Conner: To be fair, if I was the d o d and the story got leaked, I would retract quickly saying, oh, it was just a thought experiment.

Farb: You know too much Connor. I know. You know little too much. You may wanna lock that front door. Mm. What do you see in Connor besides a bunch of DOD officials outside your door?

Conner: Yeah, they're, they're, they're worrying to see every time they come knocking.

Um, but no, I saw a stable diffusion control net working on QR codes. It was pretty neat, pretty interesting. They will show the picture on the side, of course, but it takes a normal QR code and then a control nets it so that instead of just white on black, it's a very picturesque, like mountains or like trees or like a forest.

And it really shows a full picture while still being a QR code that's actually functional.

Farb: Tell people real quick what a control net is and what control netting it means when you say it.

Conner: Yeah, we, we mentioned control net earlier in the episode two. Control net is where you select parts of an image that have to remain the same, that cannot change, and the shape that they are, and then the rest of the image around it and the colors, et cetera, can all change.

However you prompt table diffusion to change it.

Farb: Very cool. So it keeps the QR code part and you QR code functional.

Conner: The rest of the image can be anything. So

Farb: very cool. Uh, alright. For what I saw today, I saw David Deutch today. That's, uh, that was my, what I saw today. Uh, I was l lucky enough, uh, and honored enough to do a FaceTime with David Deutch.

And if you don't know who David Deutch is, he's probably one of the most important physicists and thinkers in human history who were lucky enough to still have, you know, Alive and well with us these days. He's, uh, famous for inventing quantum computing, uh, that, that little thing. And then also the church touring deutche, uh, principle, which is, uh, principle about the universe and com computability.

And, uh, the connection to AI is I was helping, uh, David, uh, get connected to GPT4 cuz he was having some issues getting his GPT4 up and running. There's lots of different buttons and different things, uh, and you know, there's differences between paying for the API access and paying for the, uh, chat plus access.

So there was, I think a little bit of, um, of a, of a difference there and we got him all squared away and he's, um, he's out there training the LLMs of the future on all of our behalf.

Ethan: I'm very excited to see David and G four combined. This will be, this will be fun.

Farb: That is a powerful combination. It's good news for the world.

So good news for the world is right and, uh, good news. For the rest of you, our episode, or I guess bad news for the rest of you, our episode is drawn to a conclusion here, but we'll be back tomorrow with more exciting news in the AI world. Thank you all for joining us. Thank you, Ethan and Connor and everyone.

Have a great day. Thank you all.

0 Comments
Authors
AI Daily