Fundamental limits of Byzantine-resilient distributed learning
Description
In a distributed learning setting, multiple worker machines are hired to help a main server with the expensive training of a machine learning model. Each worker is assigned a subtask, from which the main server tries to reconstruct the total computation result.
In the presence of faulty or malicious workers, called Byzantine workers, a common strategy is to distribute subtasks redundantly [3]. Since the redundancy introduces large computation overheads for the workers, strategies to reduce this overhead are required. One approach is to use interactive consistency checks at the main server, which can reduce the redundancy by up to 50% [1].
The interactive consistency checks are not for free, but cause additional computation and communication cost. For specific parameter regimes, this cost is well-studied. However, it is unkown how large this cost is in general. Therefore, we ask the following research questions:
1) How much computation is needed to guarantee error-free reconstruction of the computation result?
2) How much communication is needed?
3) What is the best trade-off between communication and computation cost?
The focus of this project is to study these research questions fundamentally. That is, we aim at understanding what the least amount of communication and computation possible is. The student will analyze these questions through mathematical tools, such as graph theory or information theory. The findings shall be compared against existing schemes [1,2] to evaluate their (sub-)optimality.
[1] C. Hofmeister, L. Maßny, E. Yaakobi and R. Bitar, "Byzantine-Resilient Gradient Coding Through Local Gradient Computations," in IEEE Transactions on Information Theory, vol. 71, no. 4, pp. 3142-3156, April 2025, doi: 10.1109/TIT.2025.3542896.
[2] S. Jain, L. Maßny, C. Hofmeister, E. Yaakobi and R. Bitar, "Interactive Byzantine-Resilient Gradient Coding for General Data Assignments," 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 2024, pp. 3273-3278, doi: 10.1109/ISIT57864.2024.10619596.
[3] R. Tandon, Q. Lei, A. G. Dimakis, N. Karampatziakis, "Gradient Coding: Avoiding Stragglers in Distributed Learning", in Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3368-3376, 2017.
Prerequisites
Mandatory:
- strong mathematical background
- prior knowledge in information theory
- basic knowledge in graph theory
- interest in theoretical fundamental research
Recommended:
- proficiency in algebra and probability theory
- basic knowledge in coding theory
Contact
Supervisor:
Graph Entropy in Combinatorics
Description
Information theory and combinatorics are deeply intertwined. Beyond the use of combinatorics in coding theory and compression, there are many -sometimes surprising- connections.
One such connection is the use of graph entropy in combinatorial existence proofs.
This seminar topic is about explaining the proof technique introduced in [1] and [2] and applied in [3]. The goal is a tutorial-style paper with the focus on clear exposition through well chosen worked examples and visualizations.
[1] M. Fredman, and J. Komlós, On the Size of Separating Systems and Perfect Hash Functions, SIAM J. Alg. Disc. Meth., 5 (1984), pp. 61-68.
[2] J. Körner, Fredman-Komlós bounds and information theory, SIAM J. on Algebraic and Discrete Meth., 4(7), (1986), pp. 560–570.
[3] N. Alon, E. Fachini, and J. Körner, Locally Thin Set Families, Combinatorics, Probability and Computing, vol. 9 (Nov. 2000), pp. 481–488.