Course - Computer Vision and Deep Learning - TDT4265
TDT4265 - Computer Vision and Deep Learning
About
Examination arrangement
Examination arrangement: Aggregate score
Grade: Letter grades
Evaluation | Weighting | Duration | Grade deviation | Examination aids |
---|---|---|---|---|
Assignment | 40/100 | |||
School exam | 60/100 | 4 hours | D |
Course content
Modern computer vision (CV), driven by deep learning (DL), increasingly known as visual intelligence (VI), allows machines to interpret and understand visual data. This technology, crucial today in fields like autonomous driving and medical image computing, is expected to revolutionize various industries by enabling more accurate and efficient visual analysis.
The course will cover the mathematical and computational foundations essential for deep learning-based CV, alongside key neural architectures, and their training mechanisms, including supervised, self-supervised, unsupervised, and reinforcement-based learning. It will address crucial computer vision tasks, highlighting influential and state-of-the-art models for each task. The course will investigate the principal frameworks and tools in the field and explore the application domains that are driving advancements in computer vision.
Some more details about the course content: DL fundamentals: From neurons/units to neural networks (NNs). Ground truth (GT) data, parameters (weights and biases), activation functions and loss functions. Computational graphs, update rule, gradients, and supervised learning. Forward and backward pass in shallow NNs, matrix notation. Normalization (data/batch) and initialization (parameters). Hyper-parameter tuning and gradient decent optimization (from simple to SOTA optimizers). Generalization and regularization. Architectures: Fully Connected (Dense) NNs (FCNNs), Convolutional NNs (CNNs) and different types of convolutions (inc. Residual NNs and Capsule Nets), Recurrent NNs (RNNs, LSTMs, GRUs) for CV (e.g., sequences of frames in a video), Transformers and the self-attention mechanism. Vision Transformers. Graph NNs (GNNs) for CV. Retentive Networks (RetNets). CV tasks: Supervised: Image Classification, Object Detection, Segmentation (semantic, instance, panoptic), Depth estimation and POSE estimation etc. Object Tracking (e.g., same ID on object in a video sequence). Self-Supervised Learning (SSL): Large Vision Models and Multi-model (inc. images, video) Foundation Models. Unsupervised Learning: Autoencoders (AE) and Variational Autoencoders (VAE). Generative Adversarial Networks (GANs). Normalizing flows. Diffusion models. Reinforcement learning in the context of CV: Value-based methods, Policy gradient methods and Actor-critic methods.
Learning outcome
Knowledge:
- Understand the fundamental concepts and mathematical principles behind deep learning algorithms and their application to modern computer vision.
- Recognize the structure and functionality of various neural network architectures (FCNNs, CNNs, Vision Transformers etc.), as well as their roles in addressing specific computer vision tasks.
- Comprehend the theoretical aspects of learning mechanisms such as supervised, self-supervised, unsupervised, and reinforcement learning, and how they contribute to the field of visual intelligence.
Skills:
- Apply knowledge of deep learning to construct and train neural networks for a range of computer vision tasks, such as image classification, object detection, segmentation, depth estimation, pose estimation and generative AI for vision tasks.
- Employ state-of-the-art optimization techniques, normalization processes, and regularization methods to enhance the generalization of neural network models.
- Utilize principal frameworks and tools established in the field to implement and evaluate computer vision models.
General competences:
- Analyze and critically assess different neural network models and architectures, and select the most appropriate one for a given visual intelligence task.
- Integrate advanced computer vision solutions in various application domains, such as autonomous driving and medical image computing, to improve accuracy and efficiency.
- Exhibit problem-solving abilities by tuning hyperparameters and adjusting network architectures to optimize performance for computer vision tasks.
Learning methods and activities
Lectures, self study, assignments, and a real-world mini project.
Lectures will be given in English.
Developing practical skills (tools, key DL-frameworks etc.) is an important part of the course.
Compulsory assignments
- Exercises
Further on evaluation
The final grades are based on two parts, a real-world mini-project (40%) and a digital school exam (60%). Both parts are assigned a letter grade and then weighted and combined to form the final letter grade in the course. Both parts must be passed individually the same semester, in order to pass the course.
The examination papers will be given in English only.
If there is a re-sit examination, the examination form may change from written to oral.
If a student decides to retake the course for grade improvement or if the student failed the course, then they have to redo both parts of the course.
Traditional assignments are considered compulsory activity and a certain amount to this work must be approved to be allowed to attend the exam.
For group work differentiated grades may be applicable if the work effort within the group has been unevenly distributed.
Recommended previous knowledge
Some experience in Python programming.
Basic knowledge related to linear algebra, calculus and statistics.
TDT4195 Visual Computing fundamentals or equivalent.
Course materials
- Book: Understanding Deep Learning, Simon J.D. Prince (online)
- Book: Neural Networks and Deep Learning, Michael Nielsen (online)
- Book: Deep Learning, Ian Goodfellow et. al. (online)
- Supplementary material will be handed out as needed.
Credit reductions
Course code | Reduction | From | To |
---|---|---|---|
SIF8066 | 7.5 |
No
Version: 1
Credits:
7.5 SP
Study level: Second degree level
Term no.: 1
Teaching semester: SPRING 2025
Language of instruction: English
Location: Trondheim
- Informatics
- Technological subjects
Department with academic responsibility
Department of Computer Science
Examination
Examination arrangement: Aggregate score
- Term Status code Evaluation Weighting Examination aids Date Time Examination system Room *
- Spring ORD School exam 60/100 D INSPERA
-
Room Building Number of candidates - Spring ORD Assignment 40/100
-
Room Building Number of candidates - Summer UTS School exam 60/100 D INSPERA
-
Room Building Number of candidates
- * The location (room) for a written examination is published 3 days before examination date. If more than one room is listed, you will find your room at Studentweb.
For more information regarding registration for examination and examination procedures, see "Innsida - Exams"