[데이터셋 조사] Audio-Visual 데이터셋 조사
LRS3
This dataset introduced by Afouras et al. exclusively comprises of real videos. It consists of 5594 videos spanning over 400 hours of TED and TED-X talks in English. The videos in the dataset are processed such that each frame contains faces and the audio and visual streams are in sync.
https://mmai.io/datasets/lip_reading/
FakeAVCeleb - 구글폼 필요
The FakeAVCeleb dataset is a deepfake detection dataset, which consists of 20,000 video clips in total. It comprises of 500 real videos sampled from the VoxCeleb2 and 19500 deepfake samples generated using different manipulation methods applied on the set of real videos. The dataset consists of the following manipulations where the deepfake algorithms used in each category are indicated within brackets.
• RVFA: Real Visuals - Fake Audio (SV2TTS)
• FVRA-FS: Fake Visuals - Real Audio (FaceSwap)
• FVFA-FS: Fake Visuals - Fake Audio (SV2TTS + FaceSwap)
• FVFA-GAN: Fake Visuals - Fake Audio (SV2TTS + FaceSwapGAN)
• FVRA-GAN: Fake Visuals - Real Audio (FaceSwapGAN)
• FVRA-WL: Fake Visuals - Real Audio (Wav2Lip)
• FVFA-WL: Fake Visuals - Fake Audio (SV2TTS + Wav2Lip)
https://github.com/DASH-Lab/FakeAVCeleb
KoDF - 구글폼 필요
This dataset is a large-scale dataset comprising real and synthetic videos of 400+ subjects speaking Korean. KoDF consists of 62K+ real videos and 175K+ fake videos synthesized using the following six algorithms: FaceSwap, DeepFaceLab, FaceSwapGAN, FOMM, ATFHP, and Wav2Lip. We use a subset of this dataset following to evaluate the cross-dataset generalization performance of our model.
https://deepbrainai-research.github.io/kodf/
DF-TIMIT
The Deepfake TIMIT dataset comprises deepfake videos manipulated using FaceSwapGAN. The real videos used for manipulation have been sourced by sampling similarlooking identities from the VidTIMIT dataset. We use their higher-quality (HQ) version, which consists of 320 videos, in evaluating cross-dataset generalization performance.
https://zenodo.org/records/4068245
DFDC
The DeepFake Detection Challenge (DFDC) dataset is another deepfake dataset that consists of samples with fake audio besides FakeAVCeleb. It consists of over 100K video clips in total generated using deepfake algorithms such as MM/NN Face Swap, NTH, FaceSwapGAN, StyleGAN, and TTS Skins. We use a subset of this dataset consisting of 3215 videos, as used in [21, 22] to evaluate the model’s cross-dataset generalization performance.
https://ai.meta.com/datasets/dfdc/
* 본 내용은 "AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection", CVPR, 2024 논문에서 발췌한 것.
https://arxiv.org/abs/2406.02951
'Deep Learning' 카테고리의 다른 글
[자율주행] 다중센서기반 물체 탐지/인식 기술 조사 (0) | 2024.05.27 |
---|---|
[얼굴 인식/Face Recognition] 얼굴 인식 관련 참고 자료 (1) | 2024.04.29 |
[하이퍼파라미터 튜닝 꿀팁] 배치(Batch)를 늘렸다면, Decay를 높이세요! (0) | 2023.11.16 |
ChatGPT4에 이미지 입력으로 넣는 법 / 지피티(GPT) 이미지 해석 (0) | 2023.11.13 |
ChatGPT4 프롬프트로 DALL·E3 사용법 (1) | 2023.11.13 |