Friday, 31 January 2025
26.7 C
Singapore
27 C
Thailand
21.3 C
Indonesia
27.1 C
Philippines

Gemini AI makes office robots more useful

Googleโ€™s Gemini AI improves robotic navigation, combining natural language and visual instructions for seamless indoor assistance.

Do you need help in an office building, big box store, or warehouse? Just ask the nearest robot for directions.

Google researchers have unveiled a breakthrough in robotic navigation that combines natural language processing with computer vision. This new study, published on Wednesday, highlights their work on an everyday robot that navigates indoor spaces using simple language prompts and visual inputs.

Transforming robotic navigation

Previously, robots required detailed environmental maps and specific physical coordinates to move around. This cumbersome process limited their utility. However, with recent advancements in vision language navigation, you can now instruct robots using natural language commands, such as โ€œgo to the workbench.โ€ Googleโ€™s team has pushed this further by enabling robots to understand and act on spoken and visual instructions simultaneously.

Imagine youโ€™re in a warehouse and need to find where an item belongs. You can show the robot the item and ask, โ€œWhat shelf does this go on? โ€Powered by Gemini 1.5 Pro, the AI can interpret your question and the visual data and then guide you to the correct spot. The robots were also tested with commands like, โ€œTake me to the conference room with the double doors,โ€ โ€œWhere can I borrow some hand sanitizer?โ€ and โ€œI want to store something out of sight from public eyes. Where should I go? โ€

In an Instagram Reel demonstration, a researcher activated the system with an โ€œOK robotโ€ and requested to be led to a place where he could draw. The robot responded, โ€œGive me a minute. Thinking with Geminiโ€ before quickly navigating the 9,000-square-foot DeepMind office to a large wall-mounted whiteboard.

The magic behind the navigation

These robots weren’t entirely unfamiliar with the office layout. The team used โ€œMultimodal Instruction Navigation with Demonstration Tours (MINT).โ€ Initially, they manually guided the robot around the office, pointing out specific areas and features using natural language. This can also be achieved by recording a video tour of the space with a smartphone. The AI then creates a topological graph, matching what its cameras see with the โ€œgoal frameโ€ from the video.

Next, the team implemented a hierarchical vision-language-Action (VLA) navigation policy. This policy combines environmental understanding with common-sense reasoning, enabling the AI to translate user requests into navigational actions.

The results were impressive, with robots achieving โ€œ86 percent and 90 percent end-to-end success rates on previously infeasible navigation tasks involving complex reasoning and multimodal user instructions in a large real-world environment,โ€ according to the researchers.

Room for improvement

Despite these successes, there is still work to be done. The robots cannot yet autonomously perform their demonstration tours, and the AIโ€™s response time ranges from 10 to 30 seconds, making interactions slower than desired. The researchers are aware of these limitations and are working on enhancing the systemโ€™s efficiency and autonomy.

This innovation signifies a significant leap in robotic navigation, bringing us closer to a future where robots can seamlessly assist in complex indoor environments using natural language and visual cues.

Hot this week

How DeepSeekโ€™s AI efficiency could reshape energy demands

DeepSeekโ€™s efficient AI model challenges energy forecasts, potentially reshaping the future of nuclear power and data centre energy demands.

Former Intel CEO Pat Gelsinger embraces DeepSeekโ€™s AI model for his startup, Gloo

DeepSeekโ€™s open-source AI model, R1, impressed former Intel CEO Pat Gelsinger. It is reshaping the AI industry with affordability and innovation.

X rolls out vertical video feed to global iOS users

X expands its vertical video feed globally for iOS users, aiming to compete with TikTok and increase ad revenue through engaging video content.

Tumblr TV emerges as a TikTok alternative nearly a decade after its launch

Tumblr TV officially launches as a TikTok alternative nearly 10 years after its creation, attracting new users amidst TikTok's uncertain future.

How the Nintendo Switch 2 could redefine the industry?

Explore how the Nintendo Switch 2 could redefine gaming with cutting-edge features, industry innovation, and a fresh take on hybrid gaming experiences.

Apple CEO praises DeepSeekโ€™s AI despite controversy

Apple CEO Tim Cook praises DeepSeekโ€™s AI despite OpenAIโ€™s allegations, while Apple Intelligence faces a slow start and AI news summaries spark controversy.

Nvidia’s DLSS 4 brings enhanced image quality and efficiency

Nvidiaโ€™s latest GPU driver update brings DLSS 4 to unsupported games, improves video upscaling, and introduces Smooth Motion for RTX 50-series owners.

Appleโ€™s revenue rises despite an 11% drop in China sales

Appleโ€™s Q1 2025 revenue rose 4% to US$124.3B, despite an 11% decline in China iPhone sales. Strong growth in services and Mac sales helped offset losses.

Pentagon moves to block DeepSeek after staff access Chinese servers

The Pentagon is blocking DeepSeek after employees unknowingly connected work computers to Chinese servers, raising national security concerns.

Related Articles