March 10, 2023 | V. "Juggy" Jagannathan
Big tech is at war so to speak, a war started by the popularity of ChatGPT, based on a large language model (LLM). Google, Microsoft, Meta and Amazon have all jumped into the fray. So, what exactly are they all doing? What does it all mean? Where are we headed? That’s the storyline for this week’s blog.
Microsoft incorporated ChatGPT capabilities into its Bing search engine, and soon, there was a flood of news surrounding its capabilities. It told the New York Times on some query: “make me feel sad and angry.” Now, this projection of human-like response is unfortunate as these models have no notion of common sense or sentiency. This topic is explored in more detail in this podcastopens in a new tab from KUOW. And, of course, there were other problems as outlined by this New York Times articleopens in a new tab.
Google released their Bard chatbotopens in a new tab to select group of testers. It, too, generated a lot of publicity, unfortunately, the negative kind which sank Google’s stock price! But fundamentally, the capability is very similar to ChatGPT, and the problems are very similar as well as the long-time sceptic Gary Marcus documentsopens in a new tab.
Meta released their GPT-3 clone, LLaMAopens in a new tab, as a foundational model and application programming interface (API) for open use. They claim it is better than GPT-3 despite being 10x smaller. They optimized these models for inference, i.e., being able to run on smaller machine to do actual tasks. The tradeoff is basically to train for a longer time and with more data – a trick that has been shown to be effective before by DeepMindopens in a new tab, a division of Google.
OpenAI, not to be outdone with the Meta release, released their own version of the APIopens in a new tab to ChatGPT tech which they claim is 90 percent more efficient.
Amazon flexed its muscle hereopens in a new tab as well and announced a partnership with Hugging Faceopens in a new tab to help democratize the use of large language models, of course, using the AWS platform.
Multimodal large language models (MLLM)! There is a spate of new work showing up in arXivopens in a new tab – the open access archive for all technical papers in this space. We saw diffusion models work wonders generating images from a textual description last year. This spate of new models, starting with images and text, can incorporate images in reasoning.
Google’s DeepMind developed a model last year that they call Flamingoopens in a new tab, which integrates images and text. One can chat with this model and query its understanding of images shown or describe short videos.
Microsoft has uploaded a research paper about a new model this month, Kosmos-1opens in a new tab (wonder what the schedule is for Kosmos-2?) that can reason over images. It can answer questions written on a page, without using optical character recognition (OCR).
Amazon has developed its own reasoner over images and text, which they call multimodal chain-of-thought (CoT) reasoningopens in a new tab. CoT has been shown by Google and others to be an effective way to get LLMs to explain their reasoning and achieve more accuracy.
This articleopens in a new tab from The Verge and this oneopens in a new tab from Forbes are good snapshots of the frenzy of activities in this sphere right now.
One thing is clear with all the APIs and chatbots being released – we are at an inflexion point. Stanford University Human-Centered Artificial Intelligence just released a report on “Generative AI: Perspectives from Stanford HAIopens in a new tab” – a dozen viewpoints on the applications and threats enabled by Generative AI. Interesting reading. The perspectives range from health care to legal to education to almost all walks of life – “reinvention of work” itself.
With the current state of LLMs and Generative AI, make sure you exercise a healthy dose of caution in using the tech. Veteran CBS reporter Lesley Stahl, in her coverage of this tech on 60 Minutes, was horrified when ChatGPT confidently proclaimedopens in a new tab she has been working for NBC for 20 years!
Whether you want to plan a trip, eat something healthy, write a story, write a blog post, learn about coding, learn biology, or whatever you are trying to do, the chatbot is ready to help. As long as you keep in mind that the advice or help you are getting may be completely off base.
And that leads to the fundamental problem yet to be addressed by these LLMs. The complete lack of any notion of common sense. I found this researchopens in a new tab on exploring the commonsense dimension interesting. The work done by researchers at Allen Institute of AI asks this question: “Do language models have coherent mental models of everyday things?” Sadly, they don’t. At least not yet.
I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile pageopens in a new tab.
“Juggy” Jagannathan, PhD,opens in a new tab is an AI evangelist with four decades of experience in AI and computer science research.