March 10, 2023 | V. "Juggy" Jagannathan
Big tech is at war so to speak, a war started by the popularity of ChatGPT, based on a large language model (LLM). Google, Microsoft, Meta and Amazon have all jumped into the fray. So, what exactly are they all doing? What does it all mean? Where are we headed? That’s the storyline for this week’s blog.
Microsoft incorporated ChatGPT capabilities into its Bing search engine, and soon, there was a flood of news surrounding its capabilities. It told the New York Times on some query: “make me feel sad and angry.” Now, this projection of human-like response is unfortunate as these models have no notion of common sense or sentiency. This topic is explored in more detail in this podcast from KUOW. And, of course, there were other problems as outlined by this New York Times article.
Google released their Bard chatbot to select group of testers. It, too, generated a lot of publicity, unfortunately, the negative kind which sank Google’s stock price! But fundamentally, the capability is very similar to ChatGPT, and the problems are very similar as well as the long-time sceptic Gary Marcus documents.
Meta released their GPT-3 clone, LLaMA, as a foundational model and application programming interface (API) for open use. They claim it is better than GPT-3 despite being 10x smaller. They optimized these models for inference, i.e., being able to run on smaller machine to do actual tasks. The tradeoff is basically to train for a longer time and with more data – a trick that has been shown to be effective before by DeepMind, a division of Google.
OpenAI, not to be outdone with the Meta release, released their own version of the API to ChatGPT tech which they claim is 90 percent more efficient.
Amazon flexed its muscle here as well and announced a partnership with Hugging Face to help democratize the use of large language models, of course, using the AWS platform.
Multimodal large language models (MLLM)! There is a spate of new work showing up in arXiv – the open access archive for all technical papers in this space. We saw diffusion models work wonders generating images from a textual description last year. This spate of new models, starting with images and text, can incorporate images in reasoning.
Google’s DeepMind developed a model last year that they call Flamingo, which integrates images and text. One can chat with this model and query its understanding of images shown or describe short videos.
Microsoft has uploaded a research paper about a new model this month, Kosmos-1 (wonder what the schedule is for Kosmos-2?) that can reason over images. It can answer questions written on a page, without using optical character recognition (OCR).
Amazon has developed its own reasoner over images and text, which they call multimodal chain-of-thought (CoT) reasoning. CoT has been shown by Google and others to be an effective way to get LLMs to explain their reasoning and achieve more accuracy.
This article from The Verge and this one from Forbes are good snapshots of the frenzy of activities in this sphere right now.
One thing is clear with all the APIs and chatbots being released – we are at an inflexion point. Stanford University Human-Centered Artificial Intelligence just released a report on “Generative AI: Perspectives from Stanford HAI” – a dozen viewpoints on the applications and threats enabled by Generative AI. Interesting reading. The perspectives range from health care to legal to education to almost all walks of life – “reinvention of work” itself.
With the current state of LLMs and Generative AI, make sure you exercise a healthy dose of caution in using the tech. Veteran CBS reporter Lesley Stahl, in her coverage of this tech on 60 Minutes, was horrified when ChatGPT confidently proclaimed she has been working for NBC for 20 years!
Whether you want to plan a trip, eat something healthy, write a story, write a blog post, learn about coding, learn biology, or whatever you are trying to do, the chatbot is ready to help. As long as you keep in mind that the advice or help you are getting may be completely off base.
And that leads to the fundamental problem yet to be addressed by these LLMs. The complete lack of any notion of common sense. I found this research on exploring the commonsense dimension interesting. The work done by researchers at Allen Institute of AI asks this question: “Do language models have coherent mental models of everyday things?” Sadly, they don’t. At least not yet.
I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.
“Juggy” Jagannathan, PhD, is an AI evangelist with four decades of experience in AI and computer science research.