The benchmark covers applications including autonomous driving and natural language processing on different form factors such as smartphones, PCs, edge servers, and cloud computing in data centres.
Measuring inference will provide information on how quickly a trained neural network can process new data.
It is made up of five benchmarks, focused on the common ML tasks of image classification (predicting a label for a given image from the ImageNet dataset), object detection – picking out an object using a bounding box within an image from the MS-COCO dataset, and machine translation, which translates sentences between English and German, similar to auto-translate in chat and email applications.
The benchmark reference code implementations define the problem, model, and quality target, and provide instructions to run the benchmarks. Reference implementations are available in ONNX, PyTorch, and TensorFlow frameworks.
MLPerf was created in February 2018, by engineers and researchers from Baidu, Google, Harvard University, Stanford University, and the University of California Berkeley. It launched the training benchmark suite in May 2018. Members include Arm, Cadence, Centaur Technology, Dividiti, Facebook, Futurewei, General Motors, Google, Habana Labs, Intel, MediaTek, Microsoft, Myrtle, Nvidia, Real World Insights, University of Toronto, and Xilinx.
The benchmarks will accelerate the development of hardware and software for ML applications, said Vijay Janapa Reddi, associate professor, Harvard University, and MLPerf Inference working group co-chair. The benchmark is also intended to stimulate innovation in academia and research bodies.
“Our goal is to create common and relevant metrics to assess new machine learning software frameworks, hardware accelerators, and cloud and edge computing platforms in real-life situations,” said David Kanter, co-chair of the MLPerf inference working group.