Part I: Use of Computer Vision and Machine Learning in Ergonomics. Recent advances in computer vision and machine learning are revolutionizing ergonomics today. In this two-part article, we will cover:

  • How these technologies can be used in ergonomics

  • The positive impact these technologies can have on ergonomics

Let’s start with a few definitions for context. 

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world.   Machine learning is the use of algorithms to analyze large amounts of data (e.g., thousands of digital images) to enable a machine to solve a problem. 

Here are different ways computer vision is used in the world today include:

  • Fingerprint Recognition
  • Optical Character Recognition (OCR)
  • Medical Imaging
  • Automotive Safety
  • Movie Animation
  • Motion Capture (MOCAP)

Illustration from 3D Motion Capture Market 2020, Global Info Research 

The last example, motion capture, is relevant to our discussion of ergonomics. Motion capture is the process of recording patterns of movement digitally.  Image from 

One optical approach involves the installation of multiple cameras surrounding a worker who is wearing LED markers on the body. The specialized cameras in these motion capture systems (e.g., OptiTrack, VICON, PhaseSpace) track the light emitted from the LED markers on the worker’s body and then use a combination of hardware and software to synthesize the data from the multiple cameras to represent an image. Image from

Another optical approach involves the use of a depth/3D camera (e.g., Kinect, ENSENSO, Intel RealSense).  The standard digital camera in a typical smartphone outputs a 2D grid of pixels.   Each pixel is assigned one of the RGB codes that represents all the colors from the combination of the Red, Green and Blue colors. Each of the three colors has an integer value of 0 to 255.  For example, pure bright green is 0, 255, 0. Depth cameras, on the other hand, assign a value to each pixel based on the distance from the camera.  And yes, there are RGB-D cameras that represent images in both of these ways.    

A non-optical approach involves Inertial measurement units (IMUs), sensors applied to a worker’s body with straps. These sensors in these systems (e.g., XSENSE, LPMOCAP, Qualysis) contain an accelerometer, gyroscope, and magnetometer that capture movement data which is then synthesized into a 3D representation of the worker’s movements.

For ergonomics, we can use the output of all these motion capture approaches to evaluate key risk factors (e.g., postural data, frequency and duration).  But each of these approaches has drawbacks in an industrial setting:

  1. The need to disrupt or discomfort the worker (e.g., stopping to apply sensors or markers)
  2. The need for specialized camera equipment
  3. Disruption, expense and, sometimes, impossibility of installing a multi-camera system to capture a worker’s movement

Now, instead, imagine if you could capture a worker’s motion with an ordinary smartphone without the need for markers or sensors.

Essentially, by leveraging computer vision and machine learning you can take a “raw” video of a worker and analyze it using machine learning algorithms to create a “processed” video showing the worker’s skeleton.  

And, to be applicable for industrial ergonomics, algorithms have to accommodate different types of situations you would encounter in an industrial setting, such as a body part of the worker being occluded by a piece of equipment or a part.   Advanced versions of these algorithms are capable of determining the location and movement of the joints even when parts of the body are occluded. A key output is the video with the moving skeleton of the worker superimposed on top of it.

The data used to create that skeleton can be visualized in other ways too.  For example, a graph of the risks for a particular body part over the course of the video.  Or a bar charts that compares the risk profiles of two or more videos. 

And for a corporate ergonomist, ergonomic consultant or risk control consultant, these graphs, along with individual frames from the processed videos, can be used to make impactful reports and presentations.  This can help identify and justify ergonomic improvements that will protect worker safety. 

In Part II of this article, we will discuss the positive impact computer vision and machine learning can have on ergonomics.