Trying out the Intel Neural Compute Stick 2 – Movidius NCS2_weixin_30321709的博客-程序员宝宝

技术标签: 运维  嵌入式  raspberry pi  

Trying out the Intel Neural Compute Stick 2 – Movidius NCS2

Disclaimer: The opinions in this article (and on this website in general) are entirely mine and not those of my employer Dell EMC. Testing has been done in a short period of time and may not accurately reflect real-world performance.

PDF version of this post is available here

What is it?

This week I’ve tested the Intel Neural Compute Stick 2 (NCS2). A USB stick for visual computing which originally comes from Movidius – a company acquired by Intel in September 2016. Intel’s landing page for the NCS2 can be found here.

The NCS2 is equipped with 16 VPU’s or Visual Processing Units which are designed specifically for image and video processing. With this it’s possible to run Machine Learning frameworks like Caffe and Tensorflow and to leverage CNN (Convolutional Neural Networks) to do inferencing on data.

So what?

What makes this really interesting is that since it comes in USB format it can simply be plugged into devices at the edge that normally lack the processing power to run machine learning frameworks. With that it becomes possible to process IoT data where it’s generated. The NCS2 has the potential to make Edge Computing a reality instead of just a buzzword.

The compound impact could be significant if the network backhaul is taken into consideration. Imagine replacing a constant stream of video data across the network with just the metadata resulting from running inferencing on the very same video stream. Massive amounts of video vs. some text data. Network savings alone could pay for this pretty quickly.

Note that the very same type of chip is sold embedded in video cameras and drones. They’re pretty expensive though. A webcam + a Raspberry Pi and one of the NCS2 sticks could be a very low-cost way to get a security camera with real-time item recognition for very little money in comparison. 

Why not use a GPU?

Many edge devices lack the capabilities to host a GPU, either due to space, cost or thermal limitations (GPU’s generate a lot of heat and many edge devices are passively cooled). 

Hang on – isn’t that just a devkit?

Yes and no. The USB stick can be used as a devkit to develop code for embedded versions of the Movidius chip of the kind that may be destined for cameras, drones, robots, etc. However, it can also be used as a very flexible drop-in solution for any edge devices or IoT gateways with low processing power but where the capability to do inferencing is desired.

Is it actually useful?

Yes, it looks like it might actually be able to do the job. The job being processing video directly on the edge devices. In particular it shines when plugged into platforms with low-end CPUs which would never be able to run inferencing on their own.

Functionality testing

To find out if it’s powerful enough to process video real-time I used a webcam and fed the video stream to a sample application from the OpenVINO toolkit (link). This particular demo app actually does several things at the same time: Facial recognition, Age detection, Pose detection and Mood detection. All these are run stacked in one command (all details will be included hands-on post shortly). It actually performs very well, although note that the video is not in full HD. Accuracy on the age detection isn’t the best, but of course that reflects more on the algorithm / training data than the NCS2 (it thinks I’m in my 20’s which is flattering).

Facial, Age and Mood detection with Intel Neural Compute Stick 2

 

Performance testing

Inferencing can be done on a CPU as well. So, for the NCS2 to be useful it would have to outperform whatever CPU is already on the platform it’s plugged into. Therefore I ran a benchmark test (this one)on both the NCS2 and a number of CPUs. The CPUs it was compared to were:

When the NCS2 was being used for the benchmark I was also curious to see if the platform / computer the NCS2 was plugged into affected the benchmark results. Maybe the NCS2 performance would be affected by the host CPU, memory and storage?

The platforms where the NCS2 was tested:

  • Dell Edge Gateway 5000
  • Dell Latitude 7440
  • Dell Precision 5530
Edge Gateway 5000 and Movidius

Note that the floating point precision differs when running the benchmark on CPU vs the NCS2, so it’s not completely apples to apples. This is because the NCS2 only support half precision (FP16) whereas the CPU only support full, or normal, precision (FP32). This probably doesn’t make a huge difference when doing inferencing, which is the only thing the NCS2 is likely to be doing in a real-world application. For learning however, the floating point precision may cause the algorithm to learn either garbage or nothing at all. This article is summarizes the topic nicely for those interested: FP16 and FP32 difference for deep learning

Each platform was tested with CPU and with the NCS2. Three platforms x 2 tests = 6 results.

NOTE: I only ran these tests a few times, so please don’t consider it exhaustive. It would need to be run dozens of times for each and have the results balanced out to get more accurate readings. However, this is all I had time for and it’s at least an indicator of performance. 

The NCS2 completely outperform the Atom CPU on the Edge Gateway 5000. This is where it was tested initially. Further testing shows that it’s more or less equal to a Gen4 Intel i7 but falls behind when compared to a Gen8 Intel i7 CPU.

This is expected of course. The NCS2 isn’t a very expensive device at $87.99 USD (Amazon.com at the time of writing). This is a pretty cheap way to add Machine Learning power on devices which have lower-end CPUs, like IoT edge gateways. 

There are slight differences in results when the NCS2 is running on less powerful platforms vs. newer machines. This indicates that there are more factors that play into the results than the NCS2 itself, like the type of CPU, memory and storage on the platform the NCS2 is plugged into.

From the results it’s clear that the Movidius NCS2 can’t compete with a modern i7 CPU, but of course it’s a lot cheaper and supposedly draw a lot less power. That would make it ideal for connecting to edge devices where limiting power consumption may be desired.

Practical considerations

For those who may be interested in getting one these I’d like to point out a few things.

1. USB speeds

The NCS2 changes speeds when an app uses it for execution of a neural network. More importantly, when this happens the OS believes that the original USB 2.0 device has been removed and is being replaced with a new USB 3.0 device. This is reversed when code execution finishes.

Movidius stick plugged in:

Feb 22 06:29:33 localhost kernel: [  396.100651] usb 1-1: new high-speed USB device number 11 using xhci_hcd<br>
Feb 22 06:29:33 localhost kernel: [  396.230055] usb 1-1: New USB device found, idVendor=03e7, idProduct=2485<br>
Feb 22 06:29:33 localhost kernel: [  396.230068] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3<br>
Feb 22 06:29:33 localhost kernel: [  396.230077] usb 1-1: Product: Movidius MyriadX<br>
Feb 22 06:29:33 localhost kernel: [  396.230084] usb 1-1: Manufacturer: Movidius Ltd.<br>
Feb 22 06:29:33 localhost kernel: [  396.230091] usb 1-1: SerialNumber: 03e72485

Benchmark_app run starting

Feb 22 06:30:19 localhost kernel: [  442.564640] usb 1-1: USB disconnect, device number 11<br>
Feb 22 06:30:20 localhost kernel: [  442.993334] usb 1-1: new high-speed USB device number 12 using xhci_hcd<br>
Feb 22 06:30:20 localhost kernel: [  443.122975] usb 1-1: New USB device found, idVendor=03e7, idProduct=f63b<br>
Feb 22 06:30:20 localhost kernel: [  443.122989] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3<br>
Feb 22 06:30:20 localhost kernel: [  443.122997] usb 1-1: Product: VSC Loopback Device<br>
Feb 22 06:30:20 localhost kernel: [  443.123005] usb 1-1: Manufacturer: Intel Corporation<br>
Feb 22 06:30:20 localhost kernel: [  443.123012] usb 1-1: SerialNumber: 00000000000000000

Benchmark_app run finished

Feb 22 06:31:28 localhost kernel: [  511.851893] usb 1-1: USB disconnect, device number 12<br>
Feb 22 06:31:29 localhost kernel: [  512.126213] usb 1-1: new high-speed USB device number 13 using xhci_hcd<br>
Feb 22 06:31:29 localhost kernel: [  512.255008] usb 1-1: New USB device found, idVendor=03e7, idProduct=2485<br>
Feb 22 06:31:29 localhost kernel: [  512.255020] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3<br>
Feb 22 06:31:29 localhost kernel: [  512.255027] usb 1-1: Product: Movidius MyriadX<br>
Feb 22 06:31:29 localhost kernel: [  512.255034] usb 1-1: Manufacturer: Movidius Ltd.<br>
Feb 22 06:31:29 localhost kernel: [  512.255040] usb 1-1: SerialNumber: 03e72485

2. The NCSDK and NCSDK2 can’t be used

There are two versions of the NCSDK available. Both of these are for the original NCS and won’t work with the NCS2. This is clearly stated on the Intel webpage but if you’re like me you may miss it and make the assumption that the NCSDK2 is for the NCS2 stick. I wasted a fair bit of time on this before realizing it wasn’t working by design.

Instead use the Intel distribution of OpenVINO which is available here

3. Can it be run in a container?

Yes, but there are no pre-written Dockerfiles for the NCS2 as there were for the original NCS. The NCSDK2 contains a Docker file but it won’t work since it’s for a different version of the neural compute stick (see the NCSDK note above).

However, it’s not hard to build a container with OpenVINO and run that. I have verified that this works. In fact, there’s an excellent Dockerfile by Mateo Guzman available here and I’ve forked it here since I wanted to make some modifications to it. Feel free to use either of them.

Additionally, if you’re impatient or short on time, I’ve made a pre-built docker image on Docker Hub which can be accessed here.

NOTE: The container has to be run in privileged mode. This is due to the NCS2 device ID changing and the USB device being re-enumerated the moment code is loaded onto it (see the USB speed changes note above). The NCSDK2 has a way to change the mode of the original NCS so it can be run in non-privileged mode but this won’t work on the NCS2. I don’t know of a workaround so far.

I’ll write a post on usage shortly which will contain more detail on how to run the demo apps, download and optimize the models for the sample apps as well as how to runt the sample apps themselves.

4. Can it be used on a Raspberry Pi?

Yes, it appears so although I haven’t tested it on the Pi yet. Intel has instructions for installing OpenVINO on the Pi here. I may post a Dockerfile or Docker image for the Pi when I’ve had a chance to test it.

5. What about heat generation and cooling?

The NCS2 is passively cooled and actually is its own heatsink. It’s made of metal and the “fins” on both sides of it allow enough airflow to keep it cool. It does get warm during testing but so far not extremely so.

Intel Movidius NCS2 heatsink

6. Size

It’s a bit broad, which can make it difficult to plug in at times. It also risks covering other USB ports or as in my case – the power inlet for my laptop. A USB extension cable or powered USB hub could easily mitigate this of course.

Conclusion

The Intel Movidius Neural Compute Stick 2 does seem like a valid option for running inferencing at the edge. While it’s not as powerful as a full-on GPU nor a modern CPU, it has the potential to excel in the niche of low-power edge devices like IoT gateways where the onboard CPU isn’t powerful enough to do inferencing on its own. 

转载于:https://www.cnblogs.com/cloudrivers/p/11553179.html

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_30321709/article/details/101747816

智能推荐

Documentation_咔啡的博客-程序员宝宝

Getting Started with MavenGetting Started in 5 MinutesGetting Started in 30 MinutesIntroductionsThe Build LifecycleThe POMProfilesRepositoriesStandard Directory LayoutThe Dependency Mechanis...

《剑指offer》第2章 03-15题解_剑指offer2专项突破题解_Netfishless的博客-程序员宝宝

第二章 面试需要的基础知识1. 数组剑指 Offer 03. 数组中重复的数字找出数组中重复的数字。在一个长度为 n 的数组 nums 里的所有数字都在 0~n-1 的范围内。数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次。请找出数组中任意一个重复的数字。输入:[2, 3, 1, 0, 2, 5, 3]输出:2 或 3 排序后扫描复杂度O(nlogn)哈希表,空间复杂度O(n),时间复杂度O(n)原地哈希:有的元素多次重复,有的元素确实,使用原地哈希

but failed to unregister it when the web application was stopped. To prevent a memory leak, the JDBC_写bug小能手的博客-程序员宝宝

这几天自己做项目有点头晕了,发布项目的时候,看着没什么问题,我又不是特别心细的人,看着是正常发布了,请求接口的时候一直是报错,但是也不显示错误信息(当时的我也没看到警告,只是怀疑是不是配置文件或者什么让我改坏了,只想着tomcat的问题,没想到从自己项目上入手),我是修改配置文件,内存大小,重启机器,查看ip是不是变了,看看能不能ping通等等各种招式都招呼上去了。看到不管用的那个瞬间我是真的...

Canvas类的使用_TnTlittlefish的博客-程序员宝宝

计划项目中有涉及到用户签字确认的功能,想先写写签字模块,要实现的功能是通过判断用户手势在指定模块绘制出轨迹并保存,所以就写个涂鸦板之类的小Demo练手。在写的过程中最重要的是Canvas类的使用,下面对Canvas类的知识进行总结,以备自己日后查阅。Canvas类(android.graphics.Canvas)就是表示一块画布,你可以在上面画你想画的东西。当然,你还可以设置画布的属性,如画布

flex实现多行多列-内容单行或者2行-超过2行显示省略号_flex 多行省略号_billycoder的博客-程序员宝宝

一个学生问到的问题效果代码&lt;!DOCTYPE html&gt;&lt;html lang="en"&gt; &lt;head&gt; &lt;meta charset="UTF-8" /&gt; &lt;meta http-equiv="X-UA-Compatible" content="IE=edge" /&gt; &lt;meta name="viewport" content="width=device-width, initial-scale=1.0".

随便推点

php链接echarts教程,图文详解echarts的使用方法(饼状图实例)_weixin_39918128的博客-程序员宝宝

在页面布局时经常需要插入一些图表,比如饼状图,柱状图,地图等等,但是这些代码比较难写,因此我们通常会用借助echarts,那你知道如何使用echarts吗?这篇文章就和大家讲讲echarts的使用方法,有一定的参考价值,感兴趣的朋友可以看看。以饼状图为例,介绍echarts的使用步骤第一步:打开echarts官网,网址:http://echarts.baidu.com第二步:进入首页,下载echa...

android 自定义控件_山水有相逢-马哥哥的博客-程序员宝宝

Android自定义View实现很简单继承View,重写构造函数、onDraw,(onMeasure)等函数。如果自定义的View需要有自定义的属性,需要在values下建立attrs.xml。在其中定义你的属性。在使用到自定义View的xml布局文件中需要加入xmlns:前缀="http://schemas.android.com/apk/res/你的应用所在的包路径".

473. 火柴拼正方形——回溯算法_回溯法问题的解空间火柴_errNotFoundException的博客-程序员宝宝

473. 火柴拼正方形(Java)题目链接:https://leetcode-cn.com/problems/matchsticks-to-square/难度:中等1.题目输入为小女孩拥有火柴的数目,每根火柴用其长度表示。输出即为是否能用所有的火柴拼成正方形。2.示例输入: [1,1,2,2,2]输出: true解释: 能拼成一个边长为2的正方形,每边两根火柴。3.题解思路:先判断所有火柴的总长度是否为4的倍数。如果不是则直接返回false。如果是4的倍数就尝试将火柴放入四个边看能否组

layui弹出框赋值_layui弹出层赋值-程序员宝宝

弹出层的select,radio,checkbox标签需要重新刷新才能赋值。获取弹出框window。

BMS电池管理系统_dogRuning的博客-程序员宝宝

电池管理系统(Battery Management System,简称BMS)是一种用于监控和管理电池组的电子系统。BMS主要应用于锂离子电池、铅酸电池、镍氢电池等可充电电池系统。它的主要目的是确保电池在安全、高效、可靠的状态下运行,以提高电池的使用寿命、性能和安全性。

电池管理系统BMS,BMS菊花链通信系统_「已注销」的博客-程序员宝宝

S32K144+LTC6804/LTC6811/LTC6813,原理图+源代码。电池管理系统BMS,BMS菊花链通信系统,主板+从板。