As revealed in public developer documents, Elon Musk’s artificial intelligence (AI) company, xAI, is working on integrating multimodal inputs into its Grok chatbot. This development implies that users will soon be able to upload images to Grok and receive text-based responses.
In a recent blog post by xAI, a teaser indicated that the upcoming Grok-1.5V version will introduce “multimodal models across various domains.” The latest updates in the developer documents suggest advancements towards the implementation of a new model.
The developer documents showcase a sample Python script illustrating how developers can leverage the xAI software development kit library to generate responses based on both text and images. By reading an image file, setting up a text prompt, and utilising the xAI SDK, developers can create responses efficiently.
Enhancements for Grok users
Grok, initially launched by xAI in November 2023, is accessible to users subscribed to the X Premium Plus service. The most recent update, Grok 1.5, introduced enhanced reasoning capabilities to the platform in March.
The model is trained on various textual data from publicly available sources up to Q3 2023 and datasets meticulously reviewed by human evaluators. While Grok-1 was not trained on xAI data, it possesses real-time knowledge of the world, including information from x posts.
Founded by Elon Musk in March 2023, xAI is a newcomer to the AI industry, lagging behind competitors like OpenAI’s ChatGPT. However, xAI’s blog post highlights that their Grok 1.5 model is narrowing the gap with GPT-4 across different benchmarks, covering a broad spectrum of academic problems from grade school to high school.
Challenges in benchmarking Large Language Models
Benchmarking large language models can be contentious. Models may excel in benchmarks if the data is part of their training set, akin to memorising answers rather than understanding the content. Despite these challenges, xAI is making significant strides with Grok’s development.
The landscape of AI is evolving towards multimodal conversational chatbots, with notable advancements announced at events like Google I/O and OpenAI’s release of GPT-4o. Grok’s integration of multimodal capabilities signifies a step forward in keeping pace with industry trends and enhancing the user experience.