Wearable physical activity monitors are growing in popularity and provide the opportunity for large numbers of the public to self-monitor physical activity behaviours. The latest generation of these devices feature multiple sensors, ostensibly similar or even superior to advanced research instruments. However, little is known about the accuracy of their energy expenditure estimates. Here, we assessed their performance against criterion measurements in both controlled laboratory conditions (simulated activities of daily living and structured exercise) and over a 24 hour period in free-living conditions. Thirty men (n = 15) and women (n = 15) wore three multi-sensor consumer monitors (Microsoft Band, Apple Watch and Fitbit Charge HR), an accelerometry-only device as a comparison (Jawbone UP24) and validated research-grade multi-sensor devices (BodyMedia Core and individually calibrated Actiheart™). During discrete laboratory activities when compared against indirect calorimetry, the Apple Watch performed similarly to criterion measures. The Fitbit Charge HR was less consistent at measurement of discrete activities, but produced similar free-living estimates to the Apple Watch. Both these devices underestimated free-living energy expenditure (-394 kcal/d and -405 kcal/d, respectively; P<0.01). The multi-sensor Microsoft Band and accelerometry-only Jawbone UP24 devices underestimated most laboratory activities and substantially underestimated free-living expenditure (-1128 kcal/d and -998 kcal/d, respectively; P<0.01). None of the consumer devices were deemed equivalent to the reference method for daily energy expenditure. For all devices, there was a tendency for negative bias with greater daily energy expenditure. No consumer monitors performed as well as the research-grade devices although in some (but not all) cases, estimates were close to criterion measurements. Thus, whilst industry-led innovation has improved the accuracy of consumer monitors, these devices are not yet equivalent to the best research-grade devices or indeed equivalent to each other. We propose independent quality standards and/or accuracy ratings for consumer devices are required.