HMI

17 April 2025

17 mins read

The future of multimodal HMI: voice, touch, gestures, and intelligent agents

17 April 2025

17 mins read

Przemyslaw Krzywania

HMI Director

Modern Human-Machine Interfaces (HMIs) are evolving beyond single modes of interaction. Multimodal HMI which combines voice recognition, touch/tactile controls, gesture input, and intelligent agent support is rapidly becoming the norm across industries, from automotive to industrial and consumer electronics. This trend is driven by a simple truth: no single interface method is perfect on its own, but together they can complement each other’s strengths. By harnessing multiple input modalities, designers can create user experiences that are more intuitive, safer, and more personalized than ever before.

In this article, we explore how voice, touch, gestures, and AI-based assistants are shaping the future of HMI, the challenges of integrating these technologies, and what it all means for user experience across different sectors.

For decades, interacting with machines meant pressing physical buttons and turning knobs.

Today, those traditional controls are giving way to smart displays, voice assistants, and mid-air hand gestures. The automotive sector has been a bellwether for this change: luxury cars introduced touchscreens and voice commands years ago, and even added features like gesture control (as seen in BMW’s 7 Series) as optional enhancements.

Now, this multi-modal approach is sweeping across most vehicle segments and inspiring similar shifts in other fields. Users have grown to expect the same kind of seamless, multimodal HMI in their cars, industrial equipment, boats, and even home appliances.

Not only do multiple input methods give users choice in how they interact, they also improve overall reliability and accessibility. Each interface modality has pros and cons: touchscreens offer rich visual feedback but can be hard to use with gloves; voice control allows hands-free operation but might falter in noisy environments; gestures can feel intuitive but often require learning proper motions.

A multimodal interface allows one method to fill in the gaps of another. For example, a driver might use a quick hand gesture to skip a music track, then a voice command to set navigation, and finally touch a screen icon to confirm a selection – whatever is most natural at that moment.

This flexibility not only caters to personal preference but also improves accessibility for users with different abilities or situational needs.

Get in touch to consult or develop your HMI idea

Learn more

Learn more

Voice recognition comes of age in HMI

Voice recognition technology has improved significantly in recent years, to the point that talking to our devices has become commonplace.

In vehicles especially, voice control has surged in adoption as drivers demand safer, hands-free ways to interact with infotainment and controls. Virtual assistants like Amazon Alexa and Google Assistant – as well as automakers’ own AI voice agents – are now taking center stage in the car, allowing drivers to adjust climate settings, get directions, or even send messages without taking their hands off the wheel.

Crucially, advances in natural language processing (NLP) mean these systems are becoming far more intuitive, understanding context, regional accents, and user intent with much greater accuracy.

From a user experience perspective, voice interfaces can be empowering and supportive, since they let people accomplish tasks while staying focused on primary activities (like driving or operating machinery). Studies have noted that voice control is a safer alternative to manual interfaces because it reduces the need for drivers to look away from the road or remove their hands from the controls. A well-implemented voice HMI can feel like an honest conversation with your car or device, rather than a series of rigid commands.

However, designing a good voice interface also requires understanding its limitations.

Background noise, multiple speakers, or strong accents can still pose challenges for accurate recognition. Automakers and device manufacturers are working to improve microphone arrays and noise-cancellation so that voice commands can be understood even in a noisy factory or at motorway speeds. They also face the task of making voice responses understanding and helpful – an intelligent agent should confirm or clarify commands in a supportive way, rather than just failing silently.

Despite these challenges, the trajectory of voice HMI is clearly upward, with continual AI-driven improvements making voice a central pillar of multimodal systems.

The enduring importance of touch and tactile feedback

Touchscreens have become nearly ubiquitous in modern HMIs – from the tablet-like displays in car dashboards to the control panels of industrial machines. A well-designed touch interface is passionate in the way it engages users: vivid graphics, swipe and pinch gestures, and context-sensitive menus can make interacting with a system feel almost as familiar as using a smartphone.

Modern HMIs are indeed highly touch-oriented, offering functions like swiping, tapping, and pinch-to-zoom to enable intuitive and efficient interaction. In consumer and automotive contexts, large high-resolution displays provide rich visual feedback and can adapt to show exactly the information needed at any given moment.

Yet, even as touch interfaces dominate, designers have learned that tactile feedback and physical controls still hold a vital place.

Not all functions are best served by a flat screen. “Haptic elements are still extremely important, especially for the ‘blind’ operation of important functions like driving or when you need clear feedback,” notes one industrial HMI expert.

In a car, for instance, a physical knob for volume or temperature allows adjustment by feel, without requiring the driver’s eyes on a screen. In critical environments – say, adjusting a machine setting in a factory – a solid button press or a detent in a dial provides reassurance that the command has been registered. Moreover, environmental conditions can interfere with touch: cold weather can render touchscreens sluggish or unusable with gloved hands, and bright sunlight or water on a screen can make touch inputs unreliable.

For these reasons, the most effective HMI designs often blend touch with tactile feedback.

Touch technology itself is advancing to bridge the gap with physical feedback. Haptic feedback on touchscreens (through vibrations or adaptive textures) can simulate the feeling of pressing a button. Some research prototypes even explore mid-air haptics – using ultrasound or air jets to provide touch sensations without contact – though these are still experimental.

Regardless, the goal is to make touch-based HMIs more empowering by giving users confidence that their inputs are registered and meaningful, reducing any uncertainty that might come with purely digital controls.

Gesture control: adding a new dimension to interaction

Gestural interfaces, using hand movements or body posture to issue commands, add a futuristic flair to HMIs.

In cars, gesture control gained attention when premium models allowed drivers to adjust the volume or answer calls with a wave of the hand. The idea is appealing: gestures are contactless (useful for hygiene and convenience) and can be quicker than hunting for a button. They also have cross-industry appeal, for example, in medical or industrial settings, gesture control can allow an operator to manipulate a system without physical touch, which is valuable if their hands are occupied or sterile.

However, gestures in practice have seen mixed success, precisely because they introduce a new modality that users must learn. A swipe in the air isn’t as universal as a tap on a screen as the system must be trained to recognize specific motions, and the user must perform them correctly.

“For gesture controls to work well, the user needs to be trained on proper hand placement and movement,” one automotive analyst explained, noting the challenges of using gestures in a moving vehicle where bumps and vibrations can confuse the sensors.

Lighting conditions and camera quality also impact reliability. Early implementations in cars have sometimes left drivers unsure whether the system caught their gesture, leading to repeated tries or simply reverting to voice or touch. Nonetheless, ongoing advances in computer vision and sensor technology are making gesture recognition more robust.

Depth cameras and infrared sensors (some borrowed from gaming and AR systems) help detect gestures in varied conditions. In the future, we may see more natural gestural vocabularies, for instance, simply pointing at something on a screen and having the system respond, or using gaze and gesture together to select an object. There’s passionate interest in making gesture control truly seamless, because when it works, it can feel like magic.

Outside automotive, gestures are already proving valuable: in factories, an engineer can swipe in mid-air to flip through schematics on a dirty touchscreen, and in operating theatres, surgeons can navigate imaging without touching a contaminable surface. These use cases underscore that gesture input can find its niche as part of a multimodal HMI, as long as it’s introduced with user training, proper feedback (visual or auditory cues confirming the gesture), and fallback options if the gesture is missed.

Intelligent agents and AI-driven assistance

Perhaps the most transformative element in the future of HMI is the rise of intelligent agents – AI-powered assistants that can understand context, learn from user behavior, and even make proactive suggestions.

Unlike traditional interfaces that simply wait for user input, an intelligent HMI agent can carry on a dialogue and handle complex tasks.

In cars, this could be a voice-activated assistant that not only follows commands, but also offers help: “You seem to be low on fuel, shall I find a nearby petrol station?” or “It’s 8 PM, would you like to call home?” Such agents draw on cloud connectivity, the vehicle’s sensors, and knowledge of the user’s preferences to deliver a more personalized experience.

Today’s in-vehicle voice assistants already show glimmers of this. Many new models integrate AI assistants (some via smartphone platforms, others proprietary) that use natural language understanding. These systems are growing more understanding and conversational, so drivers can speak almost naturally and be understood.

According to recent analysis, automakers are investing heavily in AI capabilities to create predictive and personalised interfaces, aiming to have vehicles adjust settings and content to individual drivers automatically. For example, the HMI might learn a driver’s routine and favorite climate settings, so each morning the car is pre-warmed and tuned to a preferred radio station, without a single button press.

The impact of intelligent agents isn’t limited to cars.

In smart homes, voice assistants (Alexa, Google, etc.) have become the de facto HMI for many tasks – from controlling lights to answering questions – essentially acting as an HMI for the entire home ecosystem. In industrial control rooms, AI-based advisory systems can watch over processes and alert operators to anomalies, effectively sharing the control interface with the human operators. The key benefit is an empowering partnership: the AI handles routine or data-heavy aspects while the human stays in charge of decisions.

Of course, the integration of AI raises its own design considerations.

A user must trust the intelligent agent – trust that it understands their voice commands, respects privacy, and provides useful help rather than annoyance. Transparency is important. For instance, if an AI agent adjusts something proactively, the HMI should inform the user in an honest and clear way.

As these agents grow more capable (with advancements in machine learning and context awareness), we can expect HMIs to shift from being mere control panels to becoming collaborative partners. The future of multimodal HMI may well involve an intelligent co-pilot by your side – one that you can talk to, touch via a screen, or even signal with a gesture, and it will seamlessly interpret and respond across all these channels.

Get in touch to consult or develop your HMI idea

Learn more

Integration challenges and user expectations

Implementing a fully multimodal HMI is not without challenges.

Combining various input methods

One major hurdle is ensuring that the different input methods work together smoothly and without confusing the user. The system must decide, for example, what happens if a voice command and a touch input occur at nearly the same time – which one takes priority? There must also be a consistent design language so that whether a user speaks or taps, they receive coherent feedback. Integration needs careful planning and lots of testing.

Context-awareness

A well-designed multimodal system will understand the context and perhaps even suggest the best interaction mode for the situation. For instance, if the car’s microphones detect a lot of ambient noise (windows down, highway speeds), the system could proactively highlight touch or gesture controls knowing voice recognition confidence might be low. Conversely, if the driver’s hands are occupied, the system could encourage voice input. Achieving this level of adaptive behavior requires complex sensor fusion and AI logic behind the scenes.

User experience

People now expect their car’s HMI or their machine’s control tablet to be as responsive and user-friendly as their personal smartphone. This means fast response times, intuitive interfaces, and minimal learning curve. It also means consistency – if a user can pinch-zoom on a map in one context, they’ll expect that gesture to work on other screens or devices. Lack of standardisation across brands and models can frustrate users. Experts have called for more common frameworks for voice commands and other interactions across vehicles, to reduce confusion. The industry is gradually moving in that direction, with collaborations between automakers and tech companies to align on certain voice assistant behaviors and gesture vocabularies.

Hardware and environment

High-quality microphones, cameras, and touch sensors add cost and complexity. Ensuring voice recognition works with heavy accents or multiple languages, ensuring touchscreens are readable in bright sun or that gestures aren’t accidentally triggered by unintentional movements – these are all the gritty details that HMI engineers grapple with. Users may not think about them explicitly, but any failure in these areas quickly erodes trust.

An honest approach is to acknowledge these limitations and provide fail-safes: if voice fails, allow an easy switch to manual control. If a gesture is missed, provide subtle feedback or ask for confirmation rather than executing wrongly.

Cross-industry insights: from cars to cockpits to factory floors

While a lot of multimodal HMI innovation is highlighted in cars, similar trends and lessons are playing out in other industries.

In industrial automation, for example, there is a strong recognition that no single interface technology fits all operational needs.

A manufacturing plant’s control system might combine a touchscreen panel with physical emergency stop buttons, voice command capabilities for maintenance technicians, and even foot pedals or gesture sensors for situations where hands are busy. This “mixed technology” approach provides a holistic solution that enhances operator usability and safety. Integrating haptic elements like buttons and knobs alongside touch, voice, and gesture interaction results in more intuitive and flexible operation – allowing users to tailor the interface to their context and ergonomic needs.

Other sectors have their own take on multimodal HMI. In the maritime domain, modern ship bridges and leisure boats are adopting touch displays and voice controls for navigation systems and onboard services. But given the possibility of rough seas and noisy engine rooms, these systems still include rugged physical controls and large, clear indicators for reliability.

By the way, if marine HMI is something you’re interested in, make sure you check our marine HMI solution.

In medical devices, surgeons might use voice commands to adjust settings on a machine when their hands are occupied, or foot gestures to navigate an on-screen menu, all while a nurse can still fall back to pressing a physical switch if needed – different modalities working in concert to ensure patient safety and efficiency.

Cross-industry learning is accelerating. Automotive HMI innovations (like sleek UI design and AI assistants) are inspiring user-friendly improvements in industrial and medical contexts.

Conversely, the discipline and safety focus of industrial HMI is feeding back into automotive design. For instance, the realisation that too much touch-screen dependency can be dangerous without tactile backups, or that interface design must accommodate users of varying skill levels. In summary, whether it’s a car, a combine harvester, a marine navigation system or a smart fridge, the core principles of multimodal HMI remain: combine modalities thoughtfully to serve the user’s needs in any situation, and leverage each technology where it fits best.

Looking ahead: trends shaping the future of HMI

The future of multimodal HMI is exciting, with rapid technological advances on the horizon.

Augmented reality interfaces

One prominent trend is the expansion of augmented reality (AR) in interfaces. AR head-up displays in cars are moving from luxury gadgets to mainstream features, projecting navigation cues and safety alerts onto the windshield in real time. This effectively adds a new visual modality to the HMI, merging digital information with the user’s direct view of the world.

Similarly, in industrial use, wearable AR glasses can overlay data on equipment as a worker looks at it, enabling situation-specific guidance without needing to refer to a separate screen.

Biometrics

Future HMIs may recognise the user by face or fingerprint and automatically adjust to their preferences (seat position, interface layout, etc.). They might even monitor driver alertness via eye-tracking or measure stress levels and respond (for example, activating a relaxation mode or restricting certain interactions if the user is overloaded). While these technologies raise important privacy considerations, they promise to make interfaces more personalised and adaptive.

Artificial intelligence

We expect next-generation HMIs to be even more context-aware and predictive. Natural language interfaces will get better at carrying multi-turn conversations (so you can correct a voice command or ask follow-up questions naturally). Intelligent agents might coordinate multiple devices – imagine your car, home, and personal devices all working together via one cohesive HMI agent that travels with you. This kind of seamless ecosystem is on the horizon, as companies invest in connected IoT integration and standardize protocols for devices to communicate.

Simplification on the surface

Designers are focusing on minimalistic and intuitive UI design – displaying information only when it’s relevant, reducing clutter, and using clear, understanding visuals and sounds to guide users. The more the technology does in the background, the more important it is that the foreground (what the user sees and hears) remains approachable and not overwhelming. The best multimodal HMI of the future might be one that doesn’t make the user think about the technology at all – it will simply feel like the most natural way to get things done.

Embracing a multimodal, user-centric HMI future

At Spyrosoft, we see the future of multimodal HMI as bright and full of opportunity.

In my opinion, the future of HMI is clearly multimodal.

From what I see in our projects and discussions with clients across automotive, home appliances, industrial and other markets, the strongest direction is the combination of voice and touch. These two modalities naturally complement each other: touch gives precision and reliability, while voice supported by an intelligent agent in the backend enables context, memory and understanding of user intent in a more natural way.

I believe that the intelligent agent layer is the real game-changer. It’s not just about recognizing commands, but about remembering context, discovering the environment, and helping the user in a proactive way. This is where HMI stops being a screen and becomes an assistant.

At the same time, I have to say that “in-air” gesture control, like we often see in movies, doesn’t work in real life. Our findings and client feedback confirm that people don’t find it intuitive and rarely want to use it. That’s why I don’t expect such solutions to be adopted at scale, and I don’t recommend investing heavily in them.

To me, the future is voice + touch, backed by intelligent, context-aware agents. That’s the foundation of HMIs that people will actually use and enjoy.”

Looking for HMI solutions?

At Spyrosoft, we are passionate about crafting such forward-looking HMI solutions.

We combine technical expertise with a supportive, human-centric design approach to deliver interfaces that meet modern user expectations across automotive, industrial, and consumer domains.

If you’re looking to explore the possibilities of multimodal HMI for your product or want guidance on integrating voice, touch, gestures, and AI into a seamless user experience – we’re here to help.

Let's talk

About the author

Przemyslaw Krzywania

HMI Director