Gemini Pro Tops LLM Rankings | New Stealth Model Release in January – Grab Yours Now!

29
361




Breaking <a href='https://ainewsera.com/japan-govt-subsidy-for-ai-robots-to-offset-labor-shortage-nhk-world-japan-news/artificial-intelligence-news/ai-and-robotics/' title='Japan govt. subsidy for AI, robots to offset labor shortage | NHK WORLD-JAPAN News' >News</a> from the LLM Arena

Breaking News from the LLM Arena

So some breaking news from the LLM arena – if you’re not aware, there’s this place called The Chadut Arena benchmarking LLMs in the wild. This place pits multiple different LLMs against each other, and people from all over the world test them with their favorite prompts to see which one performs better.

You don’t know which one’s which – you just give it a prompt and you vote whether Model A or Model B performed better. For the longest time, the top spots were held by GPT 4 models like GPT 4 Turbo and GPT 4 0314 from March of 2023 or 0613 from June of 2023.

Google Bard Surpasses GPT 4

Today, the people behind Chadut Arena posted some breaking news – Google Bard just made a stunning leap, surpassing GPT 4 to the second spot on the leaderboard. A big congrats to Google for their remarkable achievement. The race is heating up like never before, and I’m super excited to see what’s next for Bard and Gemini Ultra release.

Update on Rankings

As of now, Google Bard has received over 3,000 votes, whereas the other models that have been there for a longer period have around 30,000 votes. The current ranking shows GPT 4 Turbo with a rating of 1249 and Bard coming in at a close second with 1215. But take this with a grain of salt as this is very new, and things might shift around quite a bit.

Initial Impressions of Google Bard

I ran a quick experiment on the Chadut Arena, testing Bard January 24th Gemini Pro against GPT 40613. The responses I received were impressive, showcasing the capabilities of Bard. The model seems to have a good understanding of complex questions and provides insightful answers.

Google’s AI Ambitions

There have been recent announcements regarding Google’s AI ambitions for 2024. Google CEO Sundar Pichai outlined key goals for the company, focusing on delivering advanced, safe, and responsible AI, improving knowledge, learning, creativity, and productivity, and fostering innovation on Google Cloud.

Challenges Faced by Google

However, Google faces challenges internally, with reports of layoffs and employee cynicism plaguing the company. The prevalent sense of burnout and disillusionment points to the urgent need for Google to prioritize employee well-being and foster a positive work culture.

AI-Powered Features and Collaborations

Google has been rolling out AI-powered features for education, classroom management, accessibility, and more. The recent collaboration between hugging face and Google aims to make open-source AI models more accessible to the research community.

Future of Google’s AI

With the upcoming release of Gemini Ultra and the potential of Google Bard, the competition in the AI arena is heating up. Google’s focus on AI innovation and collaboration with key players in the industry signals a promising future for the company.

What do you think about Google’s AI advancements and challenges? Share your thoughts in the comments below. Stay tuned for more updates from the LLM arena!


29 COMMENTS

  1. It's def better. I have a very locallized, creative and specific brazillian way to test LLMs and for the first time I voted for Bard instead of GPT4 turbo. Impressive. It even added a layer of localized cultural expression to the test.

  2. It looks like the 'bard-jan-24-gemini-pro' api might be using RAG with internet access, unlike 'gemini pro dev' or the GPT-4 api. This could help explain some of the huge jump in rankings and it's a bit worrying for fairness in rankings. Why compare models with internet access to models without? There's also likely a secret update to the Bard model that increases performance, and I tested the "bard-jan-24-gemini-pro" api and it definitely seems better in terms of an llm itself compared to the 'gemini pro dev' api, showing that it might be both a newer updated model and with internet access. I'm not sure how much internet access would help though in a regular conversation.

  3. Interesting. I routinely use free ChatGPT and free Bard for coding. ChatGPT was always better, but a few days ago surprisingly Bard solved a problem for me where GPT3.5 failed. Then I played around with it some more and it seemed smarter than usual. Hope Google keeps it up, great to have more options.

  4. I been using Bard and Copilot GPT 4 at the same time and Bard has had better responses. I have found both to alucinante. Bard many time just refusing to answer things about health or controversies. GPT 4 has always an answer. I also use both for coding. I had a very difficult time with ChatGPT, just kept giving me the wrong code, but Bard was always on the right track.

  5. Google search and LLM is a good innovator's dilemma. Google can either milk their search business model until they become irrelevant due to chatGPT and others. Or they step into the chat bot game themselves which might speed up the decline of their "golden goose" even more

  6. I've been in machine learning for over a decade. Some perspective. Google has been at the bleeding edge of AI for many years, since before OpenAI was founded. Same thing for Microsoft Research. There are always ups and downs but at the end of the day it's worth remember this: whatever breakthrough tech one of these tech giants has, within 6 months all the other tech giants have it too.

  7. Google still has 90% of the search traffic, but what is the overall total search traffic these days? My Google usage is way down these days, as I use GPT4 for almost everything. (I do still use search when I need to check something critical – but Google’s search results quality has really declined of late.

    The biggest difference is that I spend so much more time with GPT: There’s an entirely new usage dynamic that Google’s missing out on.

  8. If you want a reality check on current Llama ask them about something from not popular but searchable fictional universe. Like something that has Wikia (maybe on fandom) but doesn't have Wikipedia page. Ask about something that is easily searchable within said Wikia but is not available directly from front page.
    As of this week no LLM available for free was capable of retrieving such info for me even when I tried to help it within prompts as much as I could.
    For me it was huge disappointment, and cold shower as far as rapid coming of AGI. This is something that any 10yo human with knowledge of internet searchcan easily do while best LLMs quickly start googling some ridiculous stuff and either hallucinate answers or just admit to be unable to help.

  9. Good's gemini pro model sucks. bard himself is mid but somewhat useful but google is still struggling. They could of had the best AI I mean they own the entire web, youtube I mean they could of trained the AI on the entire web and gave bard which I know isn't gemini pro but they could of gave it real time api acsess to watch videos with vison, read the transcript with the data of the entire web to help you with you whatever. the potential is huge but they just don't see it. so far openAI is the winner.