This Week in AI: Insanity Unleashed
Google’s Deep Mind Gemini 1.5 and Open AI’s Sora
This week has legitimately been one of the most insane weeks in AI. Some of the announcements that we got this week literally will change the world and just raise the bar for all AI companies. Now I know there’s probably going to be a lot of comments of people saying oh why you click baiting me or whatever this video is Justified just wait until you see the insanity that was this week in the world of AI.
Google’s Deep Mind Gemini 1.5
I’m going to start with Thursday’s announcement from Google’s Deep Mind where they announced Gemini 1.5. It was only last week that we got Gemini Ultra and this new Gemini 1.5 is actually pretty dang crazy. This new model from Google Deep Mind uses the mixture of experts architecture. Mixture of experts is where you have a handful of smaller language models. When you give a prompt into the model, it will then decide which of the experts to send that prompt to before getting a response back. This is just a much more efficient way of doing these large language models because as it sends the prompt to these smaller language models, it only has to process it off of a much smaller set of data.
I even put out this Tweet back in December that breaks down how the mixture of experts concept works so I’ll link below if you want to learn more about it. But that’s not really the part that’s so crazy about Gemini 1.5. Gemini 1.0 had a 32,000 token context window giving you a combined 24,000 words of input and output from the large language model. Gemini 1.5 on the other hand can now run up to 1 million tokens in production. 1 million tokens is around 750,000 words of input and output text. Put that in some more context: all seven of the Harry Potter books contain 1, 84,7 words. With a 1 million context window, we are this close to being able to upload the entire Harry Potter book series and ask questions about it.
It’s also got better understanding across modalities. It was given a 44-minute silent Buster Keaton movie and the model was able to accurately analyze various plot points and events, and even reason about small details in the movie that could have easily been missed. So it analyzed a 44-minute silent film – no text, no transcript, nothing for it to work with on the text side and was able to analyze the plot points based on the video alone.
With that large of a context window and the ability to upload that much text all at once, a lot of large language models tend to lose stuff within the text. With Claude, we have a 200,000 context window, or roughly 150,000 words. Researchers do this test called the needle in a haystack test where they will take, for example, a 150,000 word piece of text and bury a sentence or a word or something somewhere in the middle of that text, and then they will ask the large language model to answer a question about that little piece of text that they placed somewhere in the middle of this large bank of text. Now a lot of tools like Claude and chat GPT struggle to find that little nuanced piece of information in this large amount of text. However, Gemini 1.5, it says right here, in a needle and a haystack evaluation where a small piece of text containing a particular fact or statement is purposely placed within a long block of text, 1.5 Pro found the embedded text 99% of the time in blocks of data as long as 1 million tokens.
Open AI’s Sora
This week, we also got Sora from Open AI. The most insane AI text to video model anybody has ever seen. It can generate up to 60-minute videos, and some of the videos it generates are just insanely realistic. This is mind-blowing. Sora is capable of generating images as well. They actually released their entire research paper on it. The research paper has a lot of the demos that we’ve already seen in it, as well as a technical explanation of how this model actually works.
This announcement of Gemini 1.5 and Sora alone would have been huge news for this week but on the day that this dropped, this was the biggest news in the AI world for like 2 hours because then Open AI stepped in and said we’re not going to let Google have a moment today – we’re going to unveil something that is just going to blow everybody’s minds and we got Sora.
Stable Cascade, Chat with RTX, V JEEPA, and 11 Labs
This week was also marked by the introduction of Stable Cascade from Stability AI and Chat with RTX from Nvidia. Meta released V JEEPA, a key step towards advancing machine intelligence with a more grounded understanding of the world. 11 Labs introduced a new feature on their site to help you make money with your voice by training your voice into 11 labs and letting other people actually access your voice. You can earn cash rewards or credits to the site by doing so.
Mark Zuckerberg’s View on Meta Quest and Apple Vision Pro
This week, Mark Zuckerberg also shared his thoughts about the Apple Vision Pro and compared it to the Meta Quest. He claimed that the Meta Quest is better in most aspects but gave credit to Apple’s eye tracking abilities. It’s clear that the competition between Meta and Apple is pushing both companies to innovate further.