Forget Text-To-Image, Check Out Text-To-3D-World!

Intro

Now I'm sure you've heard of AI generative images by now, but have you seen AI generative 3D worlds? I'm gonna show you some really cool and some really fun new tools in this video. And I think you're really gonna dig them. So let's check it out. In this video, I wanna show you three exciting and kind of just fun tools. Now, if you remember in a previous video, I showed you this website called Hugging Face, where if you come over to huggingface.co, you can come up to spaces and you can see some really cool AI tools that people are building and making available for you to play around with. I wanna show a couple here and then one that's not in Hugging Face, but also still pretty fun and cool.

Pics2Picks

So the first one I wanna show you is called Pix2Pix Video. And in my last Hugging Face video, I showed you Instruct Pix2Pix, which was a tool where you can upload a picture and then give it some text prompts and then it would do the things that you told it to do to the picture. Well, this is the same thing, but with video. Now, the easiest way to get to Instruct Pix2Pix would be to come over to futuretools.io and type in Pix2Pix and you'll find it right in here. Just click to open it from Future Tools. Now, they actually give a couple demonstration videos that you can play around with. One of them is this hand here, where it shows a three second clip of a hand moving around. And then you can see right here, it gives a prompt that says, " Make it a marble sculpture". And then you can see the output here, which is the same hand in sort of a marble sculpture sort of look. The other demo they give you is this one here, where it shows a four second clip of waves kind of crashing on a beach here. And then it has the instruction, " Make it molten lava". And then you can see the output here, turns it all red so it looks closer to molten lava. Now you can give these your own prompts. So for example, if I click back on this one that turns it into a marble sculpture, I could say, " Make it a detailed green alien hand". And then click Generate Pix2Pix. And it says it's gonna take about 150 seconds. So I'll go ahead and pause the video and I'll show you the result in a second. All right, and here is the result. You can see it turned it into a green alien hand instead. It's pretty cool. Now, if I'm totally being honest, it kind of looks like they just took the original image and put like a green filter over it. Or in the marble example, it kind of looks like they took a marble filter and just overlaid it over the image. And there's some issues, but this is super, super new technology. This is the very, very beginning of this development. We're gonna see this improve very quickly. I mean, just think about where AI art was six months ago to where it is now. We're gonna start to see this with the AI video generation. Although still a bit basic looking, this is going to turn into something really, really great. This time I'm gonna upload my own video.

I took a little less than two second clip of me waving at the camera. And let's see what I can do with this. Let's change it to make him a detailed robot. And let's see what we get out of that. All right, so let's see what it generated here. Here's me as a robot. Kind of funny, 'cause it just kind of looks like it overlaid sort of a film over me. But then when it pauses at the end, you can see it kind of gave me this like robot ears and my eyes and hair look a little more robot-y.

My arm kind of looks like it's part of my chair now. So something fun to play with still in the very, very early stages, but I'm looking forward to seeing how this kind of technology advances because we know that at the rate this stuff's moving, this is gonna be like probably insanely cool within a couple of weeks.

Image to Sound Effects

All right, so the next one I wanna show you is also on Hugging Face and it's called Image to Sound Effects. And once again, probably the easiest way to find it is go to Future Tools and type in, let's say, Sound Effects. I just typed in sound F and it's Image to Sound Effects. You find it right here. I'll also make sure it's linked below the YouTube video if you're watching this directly on YouTube. So this one does exactly what it sounds like it does. You upload an image to it and then it tries to create a sound effect based on the image. And once again, this is a very, very early tool. When something like this is in the early phases, it may not be super impressive, but it gives you a little glimpse into where things are going and what we're gonna be capable of very soon. I found a couple of random images on the internet and I wanna see how they sound. So the first one is a picture of a train. Let me just drag this in here. You can see we've got a steam engine going across a track. Almost looks like it could be from Harry Potter, but I went ahead and I dragged and dropped this train in here and I'm gonna come down and click on Generate Sound Effects from Image. You can see it's gonna take about 35 seconds here. All right, and it generated an audio file. So let's go ahead and press play and let's hear what it thinks this image would sound like.

So probably pretty accurate. I mean, I don't know if it would be one constant, like shoot, shoot, shoot, shoot the whole time, but it's got the idea, right? You can tell it's kind of creating a train sound and I didn't put any additional manual image description here. All I did was put it in this picture and it noticed that it was a train and generated the sound of a steam engine. So let's see what happens if I add a manual image description. Let's go locomotive on a bridge. All right, let's take a listen.

Probably a little more accurate, right? Because we gave it some additional prompting here. I'm going to go ahead and delete this one and clear everything here. And then, so I found this image of a cow online, just random Google image, and I'm going to leave the image description blank and see what it generates for us the first time around, and then we'll add an extra prompt and see how that generates. All right.

Not too bad. Now let's go ahead and add some extra prompt here. Let's just say a cow standing in a field and let it generate one more time and see if it gets a little bit better. All right. Let's take a listen.

Interesting. I think that actually got worse. I think when I added the prompt of a cow standing in a field, it actually made it sound less like a cow. So very interesting. All right. I just want to try one more image just out of curiosity here. We've got some orcas, some killer whales jumping out of the water and I've already processed it and this is what it sounds like.

Not too bad. You can hear the splashes. You can tell that there's birds flying around. It's at the ocean. I thought that one actually sounded pretty impressive, but pretty cool technology that it can actually look at an image and try to figure out what it thinks the audio would sound like from that image. Again, this is very, very, very early stuff. It's still kind of in that open source phase where a bunch of really smart people are playing around with it, trying to figure out how to improve it. So just imagine where we're going to be in a few weeks from now or a few months from now on this technology. You're going to be able to upload a picture and it's going to probably nail the audio on it 'cause we're really early. All right, so there's one last tool that I want to show you. This one's another really fun one that I've really enjoyed playing with.

Latent Labs

And it is called Latent Labs. And what this is, is a text to 3D world generator. So you enter a prompt and it uses stable diffusion to make a 360 world that you can kind of spin around in and look in. Kind of gives you an example of a Vincent van Gogh painting of a medieval castle, starry night, beautiful flowers, night sky with stars, depth of view. Screw it, let's see what this generates for us. So it generated this 3D world where I can actually click and move around and look around in this world. Look at that. So let's go ahead and add another prompt here. Let's do a futuristic cyberpunk cityscape. And let's go ahead and generate that and see what it comes up with. Look at this, we've got a futuristic cyberpunk cityscape in this 3D environment where I can look around.

Not a hundred percent sure what the practical use case is yet, but it's just kind of showing you how quickly this technology is advancing and the sort of stuff that people are doing with it right now. Let's do a third one just for fun. Right now it's on stable diffusion 1.5. Let's set it on stable diffusion 2.1 and let's go a colorful underwater scene with coral and vibrant sea life. Let's generate that. Look at this. We got an undersea 3D world. It's not really giving me the 3D coral.

And yes, there are some seams in here still, but I have actually talked to one of the people who are developing this and they're actually working on making it so these seams disappear. And this is what it generated in version 2.1 of stable diffusion. Let's flip it back to version 1.5 and see what the difference is with the exact same prompt. And here's what it does in stable diffusion 1.5. Ooh, I actually think it came out better in 1.5. Still got a pretty obvious seam here, but I know that's something that's getting worked out. But look at that, isn't that cool? I just think that this tech is really cool. Like this is just something that I nerd out over. Again, I don't know the use case. Maybe if you're a game developer or you're working on some sort of virtual reality or metaverse type project, you have some ideas for use case for this. In my opinion, I just think it's nerdy cool, right? Like I'm a nerd and I just go, man, I just want to try more prompts and see what else I can make it do and look around and see what cool stuff I can have it generate. I just love this stuff. Hopefully you love this stuff as well. That's it. I just want to show off three really cool emerging AI technologies that are in the works that if you're checking this stuff out now, you're on the ground floor, you're super early. You're seeing this at the very, very beginning before this kind of tech starts to hit the mainstream and everybody's starting to talk about this. Like that's where you're at right now by watching these videos. And I just think it's so cool to be seeing this stuff at the very, very start of it, the very early stages of this technology. This is just cool. I nerd out about this stuff. This is what I do all day now is look at this kind of stuff.

Future Tools

And if you want to nerd out with me on this kind of stuff, make sure you head over to futuretools.io. This is the site where I curate all of the cool tools that I come across. There's over 730 right now. If you want to narrow it down to just the ones that I think are the coolest that I come across, you can narrow it down to Matt's Picks. That'll narrow it down to 129. And if that's still a little bit overwhelming, join the free newsletter and every single Friday, I'll just send you the five coolest tools that I came across for the week. Head on over to futuretools.io. That's where you can access all of this stuff. And thanks so much for tuning in. If you haven't yet, please click the subscribe button and I'll make sure I make more cool AI videos for you. And make sure you give the thumbs up on this video as well. 'Cause if you like this video, you'll see more videos like it from me and from other really cool creators that are doing stuff like what I'm doing. So press that thumbs up button, press that subscribe button, and we'll make sure some cool new AI stuff keeps showing up into your feed. Thanks again for tuning in. I'll see you guys in the next one. Bye.