본문 바로가기
  • AI 개발자가 될래요
논문 리뷰

[논문 리뷰] Effective Adapter for Face Recognition in the Wild

by 꿀개 2024. 4. 30.

[논문 리뷰] Effective Adapter for Face Recognition in the Wild

 

https://arxiv.org/abs/2312.01734

 

Effective Adapter for Face Recognition in the Wild

In this paper, we tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions. Traditional heuristic approaches-either training models directly on these degraded images or their enhanced count

arxiv.org

 

논문의 주 아이디어

process both the unrefined and enhanced images using two similar structures, one fixed and the other trainable. 

→ minimizes the domain gap while providing varied perspectives for the face recognition model, where the enhanced image can be regarded as a complex non-linear transformation of the original one by the restoration model.

두 가지 구조를 사용하여 원본과 개선된 이미지를 처리함으로써
얼굴 인식 모델에 다양한 시각을 제공하고 도메인 간 차이를 최소화

 

1. Introduction

network overview

Our method (c) addresses this challenge by integrating the features from the LQ images with those of enhanced HQ images in the fusion structure.

 

● one of solutions: face restoration

- would lose vital information from the original images

- not consistently achieve perfect recovery

 

Therefore,

retaining the original lowquality images in conjunction with the enhanced ones is essential, ensuring that the model remains the original image domain encountered in real-world settings but has more explicit features by the enhanced images.

 

an adapter design integrated with a pre-trained face recognition model.

1. the adapter processes the restored images, and the frozen, pre-trained face recognition model handles the original low-quality images.

2. After that, we propose a novel fusion module to ensemble the features of both two views by nested Cross- and Self-Attention mechanisms.

3. Last, the fused features are used for similarity calculation.

 

The key to our face recognition framework is an adaptor design that the pre-trained model on high-quality images can initialize. It allows the model to adapt low-quality images quickly without training from scratch. In this way, the adaptor design keeps the original performance of low-quality images as the lower bound.

 

3. Approach

The motivation of adaptor design is to utilize images enhanced by the face restoration model.

However, the instability of the enhancement model.

design focuses on an adaptor that receives the enhanced images as another input source to complement the information missing in the original ones.

 

 Baseline Model

: Arcface wrapped with the ResNet50 backbone

Framework

 

4. Experiments

To simulate the degradation of the real-world environment, we incorporate a simulation tool, TurbulenceSim [12], to apply different levels of degradation to our training and testing datasets.

 

4.2 Implementation Details

We use the pre-trained ArcFace [15] backbone to generate embeddings for gallery images. For the probe images, we use our framework to generate the embeddings.

we use the CoderFormer [59] model with the fidelity weights w = 0.5 to generate HQ facial images for the HQ Branch.

 

table1. verification results

 

The LQ Branch remains frozen throughout this process, while the HQ Branch and the Fusion Structure, including Self-Attention, Cross-Attention, and the Feed Forward Network, are actively trained.

 

there is an improvement in their ability to recognize faces in LQ images.

This improvement is attributed to our model’s capability to effectively process and combine features from LQ and HQ images,

thereby enriching the existing face recognition models with an increased capacity to handle image quality degradation.

 

4.4 Ablation Study

RestoreFormer as the restoration model in our method has the best face recognition accuracy.

we all use ArcFace [15] as our pre-trained backbone.

 

table4. verification results using four different face recognition strategies on several datasets.

 

As shown in Table 4, the experimental results show that performance drops significantly when directly using the restored images as the input for the pretrained model, which indicates there is a significant domain gap between the original and restored images.

Fine-tuning the model on restored images does improve as the model learns the properties of the restored images.

However, the domain gap problem still exists because the model lacks information about the original images after training.

Our fusion structure approach significantly outperforms other methods, which combine features extracted from original and restored images.

 

The residual structure is designed to integrate the embedding from the LQ Branch with the fused feature obtained from our Fusion Structure.

 

Cascade structure.

We also explore the design of the cascade structure, which involves repeating our feature fusion method in successive stages. 

 

table 6. verification performance using different cascade layers of fusion structure on several datasets.

 

실험 결과, 

cascading the fusion process five and three times does not improve results

왜냐하면,

increasing the complexity with additional fusion stages does not necessarily benefit the face recognition system and that a single-stage fusion is sufficient for optimal performance.

 

Orders of Cross and Self-Attention mechanisms.

Orders of Cross and Self-Attention mechanisms.

 

Fig. 4 shows two different orders of Self-Attention and Cross-Attention within our fusion structure, which processes parallel lines of LQ and HQ features.

 

- Self-Attention First

: We first pass the features with the Self-Attention layer independently. This is succeeded by the Cross-Attention layer to fuse the features from two branches. Next, we apply the additional Cross-Attention layer to integrate the final fusion feature.

As it is shown in Fig. 9 (b), Self-Attention is utilized to process the input features. By modeling and associating these features, the mechanism captures the global contextual information of the image. This approach enables the model to consider the entire image’s contextual information. Consequently, richer features essential for effective face recognition are extracted, thereby improving the accuracy of the system.

– Cross-Attention First (Ours)

: We first pass the features with the CrossAttention layer to generate the fused features. We then pass the fused feature to the Self-Attention layer to enhance the representation of the features. Next, we apply the additional Cross-Attention layer to integrate the final fusion feature.

As it is shown in Fig. 9 (a), Cross-Attention is designed to integrate features from both the restored and original image streams. This integration is essential for the model to effectively understand and interpret faces in varying conditions. By aligning and correlating features from the dual inputs, Cross-Attention ensures that the combined feature set is and comprehensive, enhancing the face recognition system’s ability to adapt to diverse image qualities.

 

fig9. detailed illustration of fusion structure components

 

 

왜 이런 결과가 나왔는지에 대한 고찰이 없음....!!!!!!!;;

 

5 Conclusion

In this paper, we introduce a novel adapter framework to enhance face recognition in real-world scenarios with low-quality images. Our approach leverages a frozen pre-trained model and a trainable adapter to bridge the gap between original and enhanced images. Specifically, the Fusion Structure integrates advanced nested Cross-Attention and Self-Attention mechanisms. The extensive experiments across multiple datasets show that our method significantly improves accuracy and reliability in face recognition than conventional techniques. This work sets a new standard in the field, offering a robust solution for varied applications and paving the way for future advancements in face recognition technologies. In the future, we would enhance our adapter framework to address more complex image quality issues, such as varying lighting and obstructions. We aim to adapt our approach for real-time applications, broadening its utility in fields like surveillance and mobile authentication.