Create an AI system that analyzes videos — recognizing scenes, faces, and speech — for automatic tagging or summarization.
Use CNNs for visual recognition and ...
Create an AI system that analyzes videos — recognizing scenes, faces, and speech — for automatic tagging or summarization.
Use CNNs for visual recognition and speech-to-text for audio.
Build a Python pipeline to detect key moments or highlights.
Evaluate model performance on real-world video datasets.
Tech stack: Python, OpenCV, TensorFlow/PyTorch, Whisper (speech recognition), FFmpeg.
