mryab
diff --git a/‎README.md‎
Lines changed: 16 additions & 13 deletions b/‎README.md‎
Lines changed: 16 additions & 13 deletions
diff --git a/‎week01_intro/README.md‎
Lines changed: 3 additions & 1 deletion b/‎week01_intro/README.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎week01_intro/lecture.pdf‎
420 KB b/‎week01_intro/lecture.pdf‎
420 KB
@@ -1,7 +1,7 @@
 # Efficient Deep Learning Systems
 This repository contains materials for the Efficient Deep Learning Systems course taught at the [Faculty of Computer Science](https://cs.hse.ru/en/) of [HSE University](https://www.hse.ru/en/) and [Yandex School of Data Analysis](https://academy.yandex.com/dataschool/).
 
-__This branch corresponds to the ongoing 2024 course. If you want to see full materials of past years, see the ["Past versions"](#past-versions) section.__
+__This branch corresponds to the ongoing 2025 course. If you want to see full materials of past years, see the ["Past versions"](#past-versions) section.__
 
 # Syllabus
 - [__Week 1:__](./week01_intro) __Introduction__
@@ -10,20 +10,18 @@ __This branch corresponds to the ongoing 2024 course. If you want to see full ma
 - [__Week 2:__](./week02_management_and_testing) __Experiment tracking, model and data versioning, testing DL code in Python__
   - Lecture: Experiment management basics and pipeline versioning. Configuring Python applications. Intro to regular and property-based testing.
   - Seminar: Example DVC+Weights & Biases project walkthrough. Intro to testing with pytest.
-- [__Week 3:__](./week03_fast_pipelines) __Training optimizations, profiling DL code__
-  - Lecture: Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads. 
-  - Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of profiling with py-spy, PyTorch Profiler, PyTorch TensorBoard Profiler, nvprof and Nsight Systems.
-- [__Week 4:__](./week04_distributed) __Basics of distributed ML__
-  - Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
-  - Seminar: Multiprocessing basics. Parallel GloVe training.
-- [__Week 5:__](./week05_data_parallel) __Data-parallel training and All-Reduce__
-  - Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
+- [__Week 3:__ ](./week03_fast_pipelines) __Training optimizations, FP16/BF16/FP8 formats, profiling deep learning code__
+  - Lecture: Measuring performance of GPU-accelerated software. Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads. 
+  - Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of profiling with py-spy, PyTorch Profiler, Memory Snapshot and Nsight Systems.
+- [__Week 4:__](./week04_data_parallel) __Data-parallel training and All-Reduce__
+  - Lecture: Introduction to distributed training. Data-parallel training of neural networks. All-Reduce and its efficient implementations.
   - Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
+- __Week 5:__ __Sharded data-parallel training, distributed training optimizations__
 - __Week 6:__ __Training large models__
 - __Week 7:__ __Python web application deployment__
-- __Week 8:__ __Software for serving neural networks__
+- __Week 8:__ __LLM inference optimizations and software__
 - __Week 9:__ __Efficient model inference__
-- __Week 10:__ __Guest lecture__
+- __Week 10:__ Guest lecture
 
 ## Grading
 There will be several home assignments (spread over multiple weeks) on the following topics:
@@ -37,11 +35,16 @@ Please refer to the course page of your institution for details.
 # Staff
 - [Max Ryabinin](https://github.com/mryab)
 - [Just Heuristic](https://github.com/justheuristic)
-- [Alexander Markovich](https://github.com/markovka17)
+- [Yaroslav Zolotarev](https://github.com/Q-c7)
+- [Maksim Abraham](https://github.com/fdrose)
+- [Gregory Leleytner](https://github.com/RunFMe)
+- [Antony Frolov](https://github.com/antony-frolov)
 - [Anton Chigin](https://github.com/achigin)
-- [Ruslan Khaidurov](https://github.com/newokaerinasai)
+- [Alexander Markovich](https://github.com/markovka17)
+- [Roman Gorb](https://github.com/rvg77)
 
 # Past versions
+- [2024](https://github.com/mryab/efficient-dl-systems/tree/2024)
 - [2023](https://github.com/mryab/efficient-dl-systems/tree/2023)
 - [2022](https://github.com/mryab/efficient-dl-systems/tree/2022)
 - [2021](https://github.com/yandexdataschool/dlatscale_draft)
@@ -1,12 +1,14 @@
 # Week 1: Introduction
 
 * Lecture: [link](./lecture.pdf)
-* Seminar + bonus home assignment: [link](./seminar.ipynb)
+* Seminar: [link](./seminar.ipynb)
 
 ## Further reading
 * [CUDA MODE reading group Resource Stream](https://github.com/cuda-mode/resource-stream)
 * [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) and [CUDA C++ Best Practices Guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
+* [Modal GPU Glossary](https://modal.com/gpu-glossary)
 * [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM)
+* [GPU Puzzles](https://github.com/srush/GPU-Puzzles)
 * [PyTorch Performance Tuning Guide](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html)
 * [Earlier version of this guide from NVIDIA](https://tigress-web.princeton.edu/~jdh4/PyTorchPerformanceTuningGuide_GTC2021.pdf)
 * [Docs for caching memory allocation in PyTorch](https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management)