course-details-portlet

TDT4265

Computer Vision and Deep Learning

Choose study year

Credits 7.5

Level Second degree level

Course start Spring 2025

Duration 1 semester

Language of instruction English

Location Trondheim

Examination arrangement Aggregate score

About the course

Course content

Modern computer vision (CV), driven by deep learning (DL), increasingly known as visual intelligence (VI), allows machines to interpret and understand visual data. This technology, crucial today in fields like autonomous driving and medical image computing, is expected to revolutionize various industries by enabling more accurate and efficient visual analysis.

The course will cover the mathematical and computational foundations essential for deep learning-based CV, alongside key neural architectures, and their training mechanisms, including supervised, self-supervised, unsupervised, and reinforcement-based learning. It will address crucial computer vision tasks, highlighting influential and state-of-the-art models for each task. The course will investigate the principal frameworks and tools in the field and explore the application domains that are driving advancements in computer vision.

Some more details about the course content: DL fundamentals: From neurons/units to neural networks (NNs). Ground truth (GT) data, parameters (weights and biases), activation functions and loss functions. Computational graphs, update rule, gradients, and supervised learning. Forward and backward pass in shallow NNs, matrix notation. Normalization (data/batch) and initialization (parameters). Hyper-parameter tuning and gradient decent optimization (from simple to SOTA optimizers). Generalization and regularization. Architectures: Fully Connected (Dense) NNs (FCNNs), Convolutional NNs (CNNs) and different types of convolutions (inc. Residual NNs and Capsule Nets), Recurrent NNs (RNNs, LSTMs, GRUs) for CV (e.g., sequences of frames in a video), Transformers and the self-attention mechanism. Vision Transformers. Graph NNs (GNNs) for CV. Retentive Networks (RetNets). CV tasks: Supervised: Image Classification, Object Detection, Segmentation (semantic, instance, panoptic), Depth estimation and POSE estimation etc. Object Tracking (e.g., same ID on object in a video sequence). Self-Supervised Learning (SSL): Large Vision Models and Multi-model (inc. images, video) Foundation Models. Unsupervised Learning: Autoencoders (AE) and Variational Autoencoders (VAE). Generative Adversarial Networks (GANs). Normalizing flows. Diffusion models. Reinforcement learning in the context of CV: Value-based methods, Policy gradient methods and Actor-critic methods.

Learning outcome

Knowledge:

Understand the fundamental concepts and mathematical principles behind deep learning algorithms and their application to modern computer vision.
Recognize the structure and functionality of various neural network architectures (FCNNs, CNNs, Vision Transformers etc.), as well as their roles in addressing specific computer vision tasks.
Comprehend the theoretical aspects of learning mechanisms such as supervised, self-supervised, unsupervised, and reinforcement learning, and how they contribute to the field of visual intelligence.

Skills:

Apply knowledge of deep learning to construct and train neural networks for a range of computer vision tasks, such as image classification, object detection, segmentation, depth estimation, pose estimation and generative AI for vision tasks.
Employ state-of-the-art optimization techniques, normalization processes, and regularization methods to enhance the generalization of neural network models.
Utilize principal frameworks and tools established in the field to implement and evaluate computer vision models.

General competences:

Analyze and critically assess different neural network models and architectures, and select the most appropriate one for a given visual intelligence task.
Integrate advanced computer vision solutions in various application domains, such as autonomous driving and medical image computing, to improve accuracy and efficiency.
Exhibit problem-solving abilities by tuning hyperparameters and adjusting network architectures to optimize performance for computer vision tasks.

Learning methods and activities

Lectures, self study, assignments, and a real-world mini project.

Lectures will be given in English.

Developing practical skills (tools, key DL-frameworks etc.) is an important part of the course.

Compulsory assignments

Exercises

Further on evaluation

The final grades are based on two parts, a real-world mini-project (40%) and a digital school exam (60%). Both parts are assigned a letter grade and then weighted and combined to form the final letter grade in the course. Both parts must be passed individually the same semester, in order to pass the course.

The examination papers will be given in English only.

If there is a re-sit examination, the examination form may change from written to oral.

If a student decides to retake the course for grade improvement or if the student failed the course, then they have to redo both parts of the course.

Traditional assignments are considered compulsory activity and a certain amount to this work must be approved to be allowed to attend the exam.

For group work differentiated grades may be applicable if the work effort within the group has been unevenly distributed.

Recommended previous knowledge

Some experience in Python programming.

Basic knowledge related to linear algebra, calculus and statistics.

TDT4195 Visual Computing fundamentals or equivalent.

Course materials

Book: Understanding Deep Learning, Simon J.D. Prince (online)
Book: Neural Networks and Deep Learning, Michael Nielsen (online)
Book: Deep Learning, Ian Goodfellow et. al. (online)
Supplementary material will be handed out as needed.

Credit reductions

Course code	Reduction	From
SIF8066	7.5 sp

This course has academic overlap with the course in the table above. If you take overlapping courses, you will receive a credit reduction in the course where you have the lowest grade. If the grades are the same, the reduction will be applied to the course completed most recently.

Subject areas

Informatics
Technological subjects

Contact information

Språkvelger

course-details-portlet

Computer Vision and Deep Learning

About

About the course

Course content

Learning outcome

Learning methods and activities

Compulsory assignments

Further on evaluation

Recommended previous knowledge

Course materials

Credit reductions

Subject areas

Contact information

Course coordinator

Lecturers

Department with academic responsibility

Timetable

Timetable

Examination

Examination

Ordinary examination - Spring 2025

School exam

Assignment

Re-sit examination - Summer 2025

School exam

Språkvelger

Course - Computer Vision and Deep Learning - TDT4265

course-details-portlet

Computer Vision and Deep Learning

About

About the course

Course content

Learning outcome

Learning methods and activities

Compulsory assignments

Further on evaluation

Recommended previous knowledge

Course materials

Credit reductions

Subject areas

Contact information

Course coordinator

Lecturers

Department with academic responsibility

Timetable

Timetable

Examination

Examination

Ordinary examination - Spring 2025

School exam

Assignment

Re-sit examination - Summer 2025

School exam