The Stanford Virtual Human Interaction Lab (VHIL) has released an interesting study of the stressors that come with video calls, and outlined potential solutions that can help. The research is already in early stages, and the results will need to get confirmed, but these can be already useful guidelines to experiment with.
What are the issues?
One of the interesting models the study introduces is that video calls create a context and a feeling similar to being in a crowded elevator: it’s a kind of social awkwardness we all know well, and in video we do that for hours at a time.
There are four general areas of highlights (eye gaze at a close distance, cognitive load, all day mirror, reduced mobility), but I prefer breaking it down with a little more detail:
- Everyone is too close by — the presentation of all the other people’s faces right in front of you is similar to having a group of people at half a meter distance staring at you. Previous research (Hall, 1966) shows that anything under 60cm is considered intimate distance. In normal meetings nobody is that close, and nobody is really front-to-front with each other.
- Extended eye contact from everyone — the closeness is also paired up with eye contact: people aren’t just close, they are all also looking at the screen and generally toward the camera, thus everyone is making constant eye contact, even when not speaking. In normal meetings this never happens, where people take notes, look elsewhere, or can be immersed in thought when not speaking, and even the speaker can’t look everyone in the eyes at the same time.
- Lack of a (shared) object to look at — turns out that presentations, printouts, and other materials, provide a place where to focus while people speak.
- Sending extra cues — due to technological constraints (latency, quality, etc) we feel pushed to exaggerate our nonverbal cues, thus not just hyper-monitoring what we do, but also making sure the motions are extra visible.
- Loudness — research has shown that people speak 15% louder in video (Croes et al, 2019), again likely to microphone quality. Shouting for extended periods of times adds stress.
- Mismatched Visual Cues — we automatically process a lot of visual cues, but on camera these aren’t likely to match. Imagine someone giving a side glance: in a physical room we would notice who was that glance directed to. But in video even if we try we can’t as every person screen layout is different, and even there it might be about something happening in their room. The brain thus can’t decode these signals, and it adds stress.
- Overlapping Audio — with larger groups, it gets harder and harder to identify who is speaking.
- Always On Mirror — our camera pointed back at us creates an implicit pressure to monitor ourselves and judge ourselves. It takes attention away from the discussion itself, leading to poorer conversations. Continuous self evaluation is stressful, and again unrealistic: we don’t spend most of our lives in front of a mirror. This also seems to impact women more than men (Ingram et al, 1988).
- Reduced Mobility — cameras have a narrow field of view, and assumed video etiquette requires the person to stay at the center of the screen. It’s known that more movement lead to better ideas (Opprezzo, Schwartz, 2014).
What to try?
Most of this is coming from the study as well, but I’ve integrated with some personal advice that I found empirically beneficial. Don’t consider these as hard advice, consider these as things you can try: they should be beneficial, but if they don’t, it’s ok.
Also please note that a lot of calls could be really avoided. Shared editing of documents, textual communication, shared work tools, all are better instruments to use. Replacing videos is possible. But if you really can’t avoid them:
- Audio Calls — while there are some drawbacks with audio-only calls, they are truly more effective: they eliminate eye contact, allow for mobility, and provides also a beneficial illusion of having everyone’s attention — even if not technically true as people might walk or do small tasks, it’s easier to focus only on audio, like paying attention to a podcast while doing laundry. Default to audio-only calls, and switch to video only when absolutely necessary.
- Normalize Walking — or in general any kind of other low-cognition activity while doing a call. We know it’s beneficial to talk while walking, it’s given as advice to have better conversations. Why shouldn’t this be the default?
- Shrink the Videos — change the settings to avoid automatic full screen, and reduce the call window to a smaller portion of the screen. Bonus: if you place this window close to your camera, it will also avoid the effect of not looking at the camera when speaking which adds additional stress.
- Use Whiteboards — instead of presenting, use a shared whiteboard software like Miro: it avoids the full screen presentation mode of video softwares, and creates a new center of attention so people aren’t forced to look at other’s faces. Even just a shared live editable doc (Dropbox Paper, Google Doc, etc) would help.
- Turn Off Your Mirror Video — a lot of apps allow to turn on the reflected view of yourself. Check how you look before, but then hide your camera.
- Small Groups — lots of audio could get confusing and make management of the conversation harder. Does everyone need to be in?
- Get a Separate Camera — get a camera and place it offset from the screen, so it doesn’t look you’re looking in that direction when you’re looking at the screen. This might be perceived as a break of etiquette, but if we are able to accept it, and look at the camera only when speaking, it will add distance to the camera and reduce eye contact strain.
Advice for developers
If anyone working at a conference call software comes by, here are a few things that I’d love seeing built in:
- Default to Video Off — this takes very little: let’s set a default with video off. And make sure there’s a preview video before joining just in case it needs to be turned on.
- Positional Audio — audio right now comes from a single direction, it’s thus impossible to associate a specific spatial place to every person in the room. The software could place virtually each participant in a different position, thus helping our brain to decode who’s speaking. The technology already exists. Make it.
- Shared Whiteboards, not Stage Presentations — the primary focus should be a shared space where people can sketch, interact, collaborate, not every person video, not a single presenter booming full screen. Presentation can be a modality available when actively picked, but it should be an exception.
- Stop Full Screen Takeovers — the amount of apps that take over the full screen is still concerning. Stop doing that. Give me a window, and allow me to manage it as I prefer. And please remember the position of the window.
- Live Captioning — having subtitles to be automatically generated and visible, even if not 100% accurate, is hugely helpful.
I’ll close by quoting the end of the study:
With slight changes to the interface, the potentialBailenson J. N. (2021) Nonverbal Overload: A Theoretical Argument for the Causes of Zoom Fatigue
to continue to drive productivity and reduce carbon emissions by
replacing the commute. Videoconferencing is here to stay, and as
media psychologists it is our job to study this medium to help
technologists build better interfaces and users to develop better use