使用MAX78000/MAX78002 進行臉部辨識
使用MAX78000/MAX78002 進行臉部辨識
作者:ADI Tuncay Kayaoglu
摘要
本文描述一種利用人工智慧(AI)微控制器 MAX78000/MAX78002實現臉部辨識應用的方法,由三個獨立的模型組成:臉部偵測CNN模型、臉部辨識CNN模型和點積模型。
引言
MAX78000 [1]和 MAX78002 [2]是 AI 微控制器,包含一個超低功耗的卷積神經網路(CNN)推理引擎,用於在電池供電的物聯網(IoT)裝置上運行AI邊緣應用。這些微控制器可執行許多複雜的CNN網路,以滿足關鍵性能要求。
本文並描述一種基於單一臉部辨識應用的方法,調用以下三個CNN模型,分別執行不同的任務:
- 臉部偵測CNN模型檢測所捕捉影像中的臉部,並擷取僅包含一個臉部的矩形子影像。
- 臉部辨識 CNN模型透過為給定的臉部影像生成嵌入向量(embedding),來識別影像中的人物。
- 點積模型會輸出表示給定影像的嵌入向量與資料庫中的嵌入向量之間相似度的點積。
模型可以使用點積相似度作為距離度量,進而根據嵌入向量的距離,將影像識別為某個已知物件或標記為「未知」。
MAX78000 臉部辨識應用
臉部辨識 應用[4]只能在MAX78000 Feather板[3]上運行,因為該板支援SD卡。
臉部偵測、臉部辨識和點積 模型按順序依次執行。
本應用的難點在於,當所有模型超過MAX78000 CNN引擎的8位元權重容量432KB及MAX78000內部快閃記憶體的儲存限制時,該如何利用這些模型。在此示例中, 臉部偵測 和點積模型的權重儲存在MAX78000內部快閃記憶體中,而 臉部辨識CNN模型的權重儲存在外部SD儲存卡中,並且在偵測到臉部時立即重新載入。
SDHC_weights 子專案可將臉部辨識CNN權重(weights_2.h)以二進位格式儲存在SD卡中。
人臉偵測
人臉偵測CNN模型有16層,使用168x224 RGB影像作為輸入。
Face Detection CNN:
SUMMARY OF OPS
Hardware: 589,595,888 ops (588,006,720 macc; 1,589,168 comp; 0 add; 0 mul; 0 bitwise)
Layer 0: 4,327,680 ops (4,064,256 macc; 263,424 comp; 0 add; 0 mul; 0 bitwise)
Layer 1: 11,063,808 ops (10,838,016 macc; 225,792 comp; 0 add; 0 mul; 0 bitwise)
Layer 2: 43,502,592 ops (43,352,064 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 3: 86,854,656 ops (86,704,128 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 4: 86,854,656 ops (86,704,128 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 5: 86,854,656 ops (86,704,128 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 6: 173,709,312 ops (173,408,256 macc; 301,056 comp; 0 add; 0 mul; 0 bitwise)
Layer 7 (backbone_conv8): 86,779,392 ops (86,704,128 macc; 75,264 comp; 0 add; 0 mul; 0 bitwise)
Layer 8 (backbone_conv9): 5,513,088 ops (5,419,008 macc; 94,080 comp; 0 add; 0 mul; 0 bitwise)
Layer 9 (backbone_conv10): 1,312,640 ops (1,290,240 macc; 22,400 comp; 0 add; 0 mul; 0 bitwise)
Layer 10 (conv12_1): 647,360 ops (645,120 macc; 2,240 comp; 0 add; 0 mul; 0 bitwise)
Layer 11 (conv12_2): 83,440 ops (80,640 macc; 2,800 comp; 0 add; 0 mul; 0 bitwise)
Layer 12: 1,354,752 ops (1,354,752 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 13: 40,320 ops (40,320 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 14: 677,376 ops (677,376 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 15: 20,160 ops (20,160 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
RESOURCE USAGE
Weight memory: 275,184 bytes out of 442,368 bytes total (62%)
Bias memory: 536 bytes out of 2,048 bytes total (26%)
每次執行之前,相應的CNN權重、偏置和配置都會載入到CNN引擎中。
// Power off CNN after unloading result to clear all CNN registers
// It is needed to load and run other CNN models
cnn_disable();
// Enable CNN peripheral, enable CNN interrupt, turn on CNN clock
// CNN clock: 50MHz div 1
cnn_enable(MXC_S_GCR_PCLKDIV_CNNCLKSEL_PCLK, MXC_S_GCR_PCLKDIV_CNNCLKDIV_DIV1);
/* Configure CNN_1 to detect a face */
cnn_1_init(); // Bring CNN state machine into consistent state
cnn_1_load_weights(); // Load CNN kernels
cnn_1_load_bias(); // Load CNN bias
cnn_1_configure(); // Configure CNN state machine
人臉偵測CNN模型的輸出是邊界框的座標及其置信度分數。非極大值抑制(NMS)演算法會選擇置信度分數最高的邊界框,並將其顯示在TFT上。
如果 人臉偵測CNN模型偵測到了臉部,則只包含一個臉部的矩形子影像會被調整為112x112 RGB影像,以符合臉部辨識CNN模型的輸入。
臉部辨識
臉部辨識CNN模型有17層,輸入為112x112 RGB。
Face Identification CNN:
SUMMARY OF OPS
Hardware: 199,784,640 ops (198,019,072 macc; 1,746,752 comp; 18,816 add; 0 mul; 0 bitwise)
Layer 0: 11,239,424 ops (10,838,016 macc; 401,408 comp; 0 add; 0 mul; 0 bitwise)
Layer 1: 29,403,136 ops (28,901,376 macc; 501,760 comp; 0 add; 0 mul; 0 bitwise)
Layer 2: 58,003,456 ops (57,802,752 macc; 200,704 comp; 0 add; 0 mul; 0 bitwise)
Layer 3: 21,876,736 ops (21,676,032 macc; 200,704 comp; 0 add; 0 mul; 0 bitwise)
Layer 4: 7,375,872 ops (7,225,344 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 5: 21,826,560 ops (21,676,032 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 6: 1,630,720 ops (1,605,632 macc; 25,088 comp; 0 add; 0 mul; 0 bitwise)
Layer 7: 14,450,688 ops (14,450,688 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 8: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 9: 12,544 ops (0 macc; 0 comp; 12,544 add; 0 mul; 0 bitwise)
Layer 10: 3,261,440 ops (3,211,264 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 11: 10,888,192 ops (10,838,016 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 12: 912,576 ops (903,168 macc; 9,408 comp; 0 add; 0 mul; 0 bitwise)
Layer 13: 10,838,016 ops (10,838,016 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 14: 809,088 ops (802,816 macc; 6,272 comp; 0 add; 0 mul; 0 bitwise)
Layer 15: 7,225,344 ops (7,225,344 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 16: 22,656 ops (16,384 macc; 0 comp; 6,272 add; 0 mul; 0 bitwise)
Layer 17: 8,192 ops (8,192 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
RESOURCE USAGE
Weight memory: 365,408 bytes out of 442,368 bytes total (82.6%)
Bias memory: 1,296 bytes out of 2,048 bytes total (63.3%)
載入臉部辨識CNN的配置、權重和偏置之前,必須清除CNN引擎狀態機和記憶體,以避免其受到之前CNN模型執行的影響。一種清除方法是透過調用cnn_disable()函數來關閉CNN引擎。
// Power off CNN after unloading result to clear all CNN registers
// It is needed to load and run other CNN models
cnn_disable();
// Enable CNN peripheral, enable CNN interrupt, and turn on CNN clock
// CNN clock: 50MHz div 1
cnn_enable(MXC_S_GCR_PCLKDIV_CNNCLKSEL_PCLK, MXC_S_GCR_PCLKDIV_CNNCLKDIV_DIV1);
/* Configure CNN_2 to recognize a face */
cnn_2_init(); // Bring CNN state machine into consistent state
cnn_2_load_weights_from_SD(); // Load CNN kernels from SD card
cnn_2_load_bias(); // Reload CNN bias
cnn_2_configure(); // Configure CNN state machine
臉部辨識 CNN模型的輸出是一個長度為64的嵌入向量,對應於臉部影像。將嵌入向量作為輸入提供給點積模型之前,向量會進行L2歸一化。
點積
點積模型有一個線性層,使用長度為64的嵌入向量作為輸入。
Dot Product CNN:
SUMMARY OF OPS
Hardware: 65,536 ops (65,536 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 0: 65,536 ops (65,536 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
RESOURCE USAGE
Weight memory: 65,536 bytes out of 442,368 bytes total (14.8%)
Bias memory: 0 bytes out of 2,048 bytes total (0.0%)
同樣的,在載入點積CNN的配置、權重和偏置之前,必須清除CNN引擎狀態機和記憶體。
// Power off CNN after unloading result to clear all CNN registers
// It is needed to load and run other CNN models
cnn_disable();
// Enable CNN peripheral, enable CNN interrupt, turn on CNN clock
// CNN clock: 50MHz div 1
cnn_enable(MXC_S_GCR_PCLKDIV_CNNCLKSEL_PCLK, MXC_S_GCR_PCLKDIV_CNNCLKDIV_DIV1);
/* Configure CNN_3 for dot product */
cnn_3_init(); // Bring CNN state machine into consistent state
cnn_3_load_weights(); // Load CNN kernels
cnn_3_load_bias(); // Reload CNN bias
cnn_3_configure(); // Configure CNN state machine
點積 模型的輸出是1024個點積相似度,每個相似度表示給定臉部與資料庫中記錄的臉部之間的相似性。如果點積相似度的最大值高於閾值,則具有最大相似度的物件將被判定為所識別的人臉,並顯示在TFT上。否則,輸出將是「未知」。
MAX78002 臉部辨識應用
臉部辨識應用[5]在MAX78002評估(EV)套件板[6]上運行。
與MAX78000 應用類似, 人臉偵測、臉部辨識和點積模型按順序依次執行。然而,MAX78002具有更大的內部快閃記憶體,可以儲存模型的所有權重。
本應用僅在初始化時載入所有權重和層配置,每次推理時僅重新載入偏置。透過這種方法,模型切換的開銷大幅減少。
為了將所有模型統一存放在CNN記憶體中,必須以相似的方式安排各層的偏移量,使模型彼此相連。在此示例中,層偏移量的安排如表1所示。
| CNN 模型 | 起始層 | 結束層 |
| 臉部辨識 | 0 | 72 |
| 點積 | 73 | 73 |
| 臉部偵測 | 74 | 89 |
使用該方法時需注意,MAX78002支援的最大層數為128層,不得超過此限制。
在合成階段,若模型的起始層不為0,則必須在network.yaml檔中的模型層之前增加直通層。network.yaml檔的示例可以在AI8x-Synthesis儲存庫[7]中找到。
對於本應用,另一個需要考慮的因素是權重偏移量的調整。MAX78002有64個並行處理器,每個處理器有4096個CNN核心,可儲存9個8位元精度的參數。調整權重偏移量時,應考慮前一個模型的內核記憶體使用情況。
在此示例中,權重偏移量的安排如表2所示。
| CNN 模型 | 權重偏移量 | 核數 |
| 臉部辨識 | 0 | 1580 |
| 點積 | 2000 | 114 |
| 臉部偵測 | 2500 |
在合成階段,可以使用“–start-layer” 和 “–weight-start”參數來增加直通層和權重偏移量。合成腳本的示例 參見AI8x-Synthesis 儲存庫[7]。
初始化時,所有模型權重和配置都會載入。
cnn_1_enable(MXC_S_GCR_PCLKDIV_CNNCLKSEL_IPLL, MXC_S_GCR_PCLKDIV_CNNCLKDIV_DIV4);
cnn_1_init(); // Bring CNN state machine into consistent state
cnn_1_load_weights(); // Load kernels of CNN_1
cnn_1_configure(); // Configure CNN_1 layers
cnn_2_load_weights(); // Load kernels of CNN_2
cnn_2_configure(); // Configure CNN_2 layers
cnn_3_load_weights(); // Load kernels of CNN_3
cnn_3_configure(); // Configure CNN_3 layers
臉部偵測
臉部偵測CNN模型有16層,使用168x224 RGB影像作為輸入。
Face Detection CNN: SUMMARY OF OPS Hardware: 589,595,888 ops (588,006,720 macc; 1,589,168 comp; 0 add; 0 mul; 0 bitwise)
Layer 74: 4,327,680 ops (4,064,256 macc; 263,424 comp; 0 add; 0 mul; 0 bitwise)
Layer 75: 11,063,808 ops (10,838,016 macc; 225,792 comp; 0 add; 0 mul; 0 bitwise)
Layer 76: 43,502,592 ops (43,352,064 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 77: 86,854,656 ops (86,704,128 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 78: 86,854,656 ops (86,704,128 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 79: 86,854,656 ops (86,704,128 macc; 150,528 comp; 0 add; 0 mul; 0 bitwise)
Layer 80: 173,709,312 ops (173,408,256 macc; 301,056 comp; 0 add; 0 mul; 0 bitwise)
Layer 81 (backbone_conv8): 86,779,392 ops (86,704,128 macc; 75,264 comp; 0 add; 0 mul; 0 bitwise)
Layer 82 (backbone_conv9): 5,513,088 ops (5,419,008 macc; 94,080 comp; 0 add; 0 mul; 0 bitwise)
Layer 83 (backbone_conv10): 1,312,640 ops (1,290,240 macc; 22,400 comp; 0 add; 0 mul; 0 bitwise)
Layer 84 (conv12_1): 647,360 ops (645,120 macc; 2,240 comp; 0 add; 0 mul; 0 bitwise)
Layer 85 (conv12_2): 83,440 ops (80,640 macc; 2,800 comp; 0 add; 0 mul; 0 bitwise)
Layer 86: 1,354,752 ops (1,354,752 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 87: 40,320 ops (40,320 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 88: 677,376 ops (677,376 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 89: 20,160 ops (20,160 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
RESOURCE USAGE
Weight memory: 275,184 bytes out of 2,396,160 bytes total (11%)
Bias memory: 536 bytes out of 8,192 bytes total (7%)
當本應用以臉部偵測模型啟動時,相應的偏置值在每次執行之前載入到CNN引擎中。然後,執行臉部偵測的 CNN_1 和FIFO 配置。
cnn_1_load_bias(); // Load bias data of CNN_1
// Bring CNN_1 state machine of Face Detection model into consistent state
*((volatile uint32_t *) 0x51000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x51000008) = 0x00004a59; // Layer count
*((volatile uint32_t *) 0x52000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x52000008) = 0x00004a59; // Layer count
*((volatile uint32_t *) 0x53000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x53000008) = 0x00004a59; // Layer count
*((volatile uint32_t *) 0x54000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x54000008) = 0x00004a59; // Layer count
// Disable FIFO control
*((volatile uint32_t *) 0x50000000) = 0x00000000;
與MAX78000應用類似,臉部偵測CNN模型的輸出是一組邊界框座標和相應的置信度分數。非極大值抑制(NMS)演算法會選擇置信度分數最高的邊界框並將其顯示在TFT上。
如果臉部偵測 CNN模型檢測到臉部,則僅包含一個臉部的矩形子影像會被選中並調整為112x112 RGB影像,以符合臉部辨識CNN模型的輸入。
臉部辨識
臉部辨識別CNN模型有73層,使用112x112 RGB影像作為輸入。
Face Identification CNN:
SUMMARY OF OPS
Hardware: 445,470,720 ops (440,252,416 macc; 4,848,256 comp; 370,048 add; 0 mul; 0 bitwise)
Layer 0: 22,478,848 ops (21,676,032 macc; 802,816 comp; 0 add; 0 mul; 0 bitwise)
Layer 1: 2,809,856 ops (1,806,336 macc; 1,003,520 comp; 0 add; 0 mul; 0 bitwise)
Layer 2: 231,612,416 ops (231,211,008 macc; 401,408 comp; 0 add; 0 mul; 0 bitwise)
Layer 3: 1,404,928 ops (903,168 macc; 501,760 comp; 0 add; 0 mul; 0 bitwise)
Layer 4: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 5: 6,522,880 ops (6,422,528 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 6: 1,003,520 ops (903,168 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 7: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 8: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 9: 50,176 ops (0 macc; 0 comp; 50,176 add; 0 mul; 0 bitwise)
Layer 10: 6,522,880 ops (6,422,528 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 11: 1,003,520 ops (903,168 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 12: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 13: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 14: 50,176 ops (0 macc; 0 comp; 50,176 add; 0 mul; 0 bitwise)
Layer 15: 6,522,880 ops (6,422,528 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 16: 1,003,520 ops (903,168 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 17: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 18: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 19: 50,176 ops (0 macc; 0 comp; 50,176 add; 0 mul; 0 bitwise)
Layer 20: 6,522,880 ops (6,422,528 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 21: 1,003,520 ops (903,168 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 22: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 23: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 24: 50,176 ops (0 macc; 0 comp; 50,176 add; 0 mul; 0 bitwise)
Layer 25: 13,045,760 ops (12,845,056 macc; 200,704 comp; 0 add; 0 mul; 0 bitwise)
Layer 26: 702,464 ops (451,584 macc; 250,880 comp; 0 add; 0 mul; 0 bitwise)
Layer 27: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 28: 6,472,704 ops (6,422,528 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 29: 501,760 ops (451,584 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 30: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 31: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 32: 25,088 ops (0 macc; 0 comp; 25,088 add; 0 mul; 0 bitwise)
Layer 33: 6,472,704 ops (6,422,528 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 34: 501,760 ops (451,584 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 35: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 36: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 37: 25,088 ops (0 macc; 0 comp; 25,088 add; 0 mul; 0 bitwise)
Layer 38: 6,472,704 ops (6,422,528 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 39: 501,760 ops (451,584 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 40: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 41: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 42: 25,088 ops (0 macc; 0 comp; 25,088 add; 0 mul; 0 bitwise)
Layer 43: 6,472,704 ops (6,422,528 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 44: 501,760 ops (451,584 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 45: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 46: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 47: 25,088 ops (0 macc; 0 comp; 25,088 add; 0 mul; 0 bitwise)
Layer 48: 6,472,704 ops (6,422,528 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 49: 501,760 ops (451,584 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 50: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 51: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 52: 25,088 ops (0 macc; 0 comp; 25,088 add; 0 mul; 0 bitwise)
Layer 53: 6,472,704 ops (6,422,528 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 54: 501,760 ops (451,584 macc; 50,176 comp; 0 add; 0 mul; 0 bitwise)
Layer 55: 6,422,528 ops (6,422,528 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 56: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 57: 25,088 ops (0 macc; 0 comp; 25,088 add; 0 mul; 0 bitwise)
Layer 58: 12,945,408 ops (12,845,056 macc; 100,352 comp; 0 add; 0 mul; 0 bitwise)
Layer 59: 351,232 ops (225,792 macc; 125,440 comp; 0 add; 0 mul; 0 bitwise)
Layer 60: 3,211,264 ops (3,211,264 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 61: 1,618,176 ops (1,605,632 macc; 12,544 comp; 0 add; 0 mul; 0 bitwise)
Layer 62: 125,440 ops (112,896 macc; 12,544 comp; 0 add; 0 mul; 0 bitwise)
Layer 63: 1,605,632 ops (1,605,632 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 64: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 65: 6,272 ops (0 macc; 0 comp; 6,272 add; 0 mul; 0 bitwise)
Layer 66: 1,618,176 ops (1,605,632 macc; 12,544 comp; 0 add; 0 mul; 0 bitwise)
Layer 67: 125,440 ops (112,896 macc; 12,544 comp; 0 add; 0 mul; 0 bitwise)
Layer 68: 1,605,632 ops (1,605,632 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 69: 0 ops (0 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 70: 6,272 ops (0 macc; 0 comp; 6,272 add; 0 mul; 0 bitwise)
Layer 71: 809,088 ops (802,816 macc; 6,272 comp; 0 add; 0 mul; 0 bitwise)
Layer 72: 14,464 ops (8,192 macc; 0 comp; 6,272 add; 0 mul; 0 bitwise)
RESOURCE USAGE
Weight memory: 909,952 bytes out of 2,396,160 bytes total (38.0%)
Bias memory: 7,296 bytes out of 8,192 bytes total (89.1%)
要運行臉部辨識模型,相應的偏置值應在每次執行之前載入到CNN中。然後,執行臉部辨識的CNN_2和FIFO 配置。
cnn_2_load_bias(); // Load bias data of CNN_2
// Bring CNN_2 state machine of Face ID model into consistent state
*((volatile uint32_t *) 0x51000000) = 0x00108008; // Stop SM
*((volatile uint32_t *) 0x51000008) = 0x00000048; // Layer count
*((volatile uint32_t *) 0x52000000) = 0x00108008; // Stop SM
*((volatile uint32_t *) 0x52000008) = 0x00000048; // Layer count
*((volatile uint32_t *) 0x53000000) = 0x00108008; // Stop SM
*((volatile uint32_t *) 0x53000008) = 0x00000048; // Layer count
*((volatile uint32_t *) 0x54000000) = 0x00108008; // Stop SM
*((volatile uint32_t *) 0x54000008) = 0x00000048; // Layer count
// Enable FIFO control
*((volatile uint32_t *) 0x50000000) = 0x00001108; // FIFO control
臉部辨識CNN模型的輸出是一個長度為64的嵌入向量,對應於輸入的臉部影像。將嵌入向量作為輸入提供給 點積模型之前,向量會進行L2歸一化。
點積
點積模型有一個線性層,使用長度為64的嵌入向量作為輸入。
Dot Product CNN:
SUMMARY OF OPS
Hardware: 65,536 ops (65,536 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
Layer 73: 65,536 ops (65,536 macc; 0 comp; 0 add; 0 mul; 0 bitwise)
RESOURCE USAGE
Weight memory: 65,536 bytes out of 2,396,160 bytes total (2.7%)
Bias memory: 0 bytes out of 8,192 bytes total (0.0%)
要運行點積 模型,相應的偏置值應在每次執行之前載入到CNN引擎中。然後,執行點積 的CNN_3和FIFO 配置。
cnn_3_load_bias(); // Load bias data of CNN_3
//Dot product CNN state machine configuration
*((volatile uint32_t *) 0x51000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x51000008) = 0x00004949; // Layer count
*((volatile uint32_t *) 0x52000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x52000008) = 0x00004949; // Layer count
*((volatile uint32_t *) 0x53000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x53000008) = 0x00004949; // Layer count
*((volatile uint32_t *) 0x54000000) = 0x00100008; // Stop SM
*((volatile uint32_t *) 0x54000008) = 0x00004949; // Layer count
// Disable FIFO control
*((volatile uint32_t *) 0x50000000) = 0x00000000;
點積 模型的輸出是1024個點積相似度,每個相似度表示給定臉部與資料庫中記錄的臉部之間的相似性。如果點積相似度的最大值高於閾值,則具有最大相似度的物件將被判定為所識別的人臉,並顯示在TFT上。否則,輸出將是「未知」。
增加新物件的影像
本應用允許將新物件增加到資料庫中。按觸控式螢幕上的“Record”(記錄)按鈕,可以輸入物件名稱。
下一步是用攝影機拍攝對象的臉部。按“OK”(確定)按鈕可將拍攝的影像增加到資料庫,或按“Retry”(重試)按鈕重新拍攝。
本應用根據新物件的臉部來計算嵌入向量,並將其儲存在內部快閃記憶體資料庫中。 點積CNN模型使用嵌入向量資料庫來識別物件並做出最終判斷。
結語
憑藉超低功耗CNN推理引擎,MAX78000/MAX78002微控制器非常適合電池供電的物聯網應用。AI微控制器MAX78000 [1]和 MAX78002 [2]支援運行多個模型,進而以高效節能的方式實現非常複雜的應用。
參考資料
[1] MAX78000 數據手冊
[2] MAX78002 數據手冊



