Daniel Wong Lin-Kit

Daniel (Lin-Kit) Wong

Hi! I am a final year (7th) PhD student at Carnegie Mellon University. I am advised by Greg Ganger. I am a member of the Parallel Data Laboratory & Computer Science Dept.

I am a systems builder and designer with a focus on distributed systems and ML.

Résumé (Mar ‘24) | Publications (10 papers, 2 patents):

ML for Systems (Thesis, Baleen [FAST24])
Systems for ML
- TensorFlow model parallelism [IEEEMicro20, arXiv19a, NeurIPS20]
- Mainstream [ATC18], SelectiveBackprop [arXiv19b]
Distributed Systems [SoCC20]
SWE intern at Google, Dropbox, Meta

Other interests: Security, Bioinformatics [BMCGenomics13], Neuroscience, Cloud Computing.

I'm always keen to talk about research. Reach out if you have problems, insights, or data to share! Details at bottom.

Feb ‘24: I presented Baleen at FAST 2024 (PDF, Code, More).

PhD Research (Publications)

Ongoing: ML for Flash Caching:
- Automated drift mitigation in caching Spring ‘24 - Present
- ML for eviction, placement, optimizing for peak Spring ‘23 - Present
- Baleen: ML for flash admission and prefetching Spring ‘20 - Spring ‘23
  Daniel Lin-Kit Wong, Hao Wu, Carson Molder, Sathya Gunasekar, Jimmy Lu, Snehal Khandkar, Abhinav Sharma, Daniel S. Berger, Nathan Beckmann, Greg Ganger - FAST 2024
- Students: if you're interested in ML for caching, email me! Some project ideas:
  - Online version of Baleen (track episodes on-the-fly, adaptive score cut-off)
  - Using episodes to design eviction & other caching policies
  - Baleen for CDN and key-value caches
Past work - ML and Systems:
- Co-optimizing scheduling and device placement in TensorFlow with deep RL for automatic model parallelism. NeurIPS, IEEE Micro, 2 patents
  Google Summer ‘19 intern^, Fall ‘19 Student Researcher*
  Daniel Wong, Peter Ma^, Sudip Roy*, Yanqi Zhou* ^Google Platforms Performance, *Google Brain (ML for Systems)
- Selective-Backpropagation: Accelerating Deep Learning by Focusing on the Biggest Losers. Fall ‘18 - Fall ‘19
  Angela H. Jiang, Daniel L.-K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai
- Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing. Fall ‘17 - Spring ‘18
  Angela Jiang, Daniel Lin-Kit Wong, Christopher Canel, Ishan Misra, Michael Kaminsky, Michael A. Kozuch, Padmanabhan Pillai, David G. Andersen, Gregory R. Ganger. USENIX ATC 2018.
  Part of the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS).
- Speeding up evolutionary neural architecture search with adaptive weight sharing. Summer ‘18 - Fall ‘18
Past projects - Distributed Systems:
- CANDStore: High availability in cheap distributed key value storage. Spring ‘19 - Summer ‘20
  Thomas Kim, Daniel Lin-Kit Wong, Gregory R. Ganger, Michael Kaminsky, David G. Andersen. ACM SoCC 2020.
Past projects - Dimensionality reduction on neuroscience datasets:
- 10-708 (PGM) project: Stitching neural population recordings (electrophysiological) from different days. Spring ‘20
  Adam Smoulder, Sami Horn, Daniel Wong
Keen to explore:
- Applications of clustering & dimensionality reduction for time series and graphs.
  I'm keen to explore interpretable machine learning methods that find correlations in time series and graphs, with an especial interest in visualizations, interpretability and causality.
- Areas I have a soft spot for: neuroscience, systems security, HCI, psychology.
Past explorations - Failures in Systems (keen to revisit with new data)
- Transient failures (grey failures). Fall ‘19
  How can we balance initiating recovery quickly and overreacting to transient failures?
- Affordable robustness to failures in distributed storage. Spring ‘19 - Fall ‘19
  3-way cross-region replication is expensive and slow. It helps mitigate rare risks like a hurricane taking out a data center, but why pay that price for common events like equipment failures? Can we detect and predict correlated failures?
  Poster on theoretical modelling on transient failures. Despite strong industry interest, we lacked real world data. Email me if you have datasets to share!

Teaching & Coursework at CMU

Teaching Assistant in 2020 & 2024: 15-719 Advanced Cloud Computing Greg Ganger, Majd Sakr, George Amvrosiadis
10-708 Probabilistic Graphical Models Eric Xing
18-699 Neural Signal Processing João Semedo
15-857 Analytical Performance Modeling & Design of Computer Systems Mor Harchol-Balter
15-824 Logical Foundations of Cyber-Physical Systems André Platzer
15-712 Advanced Operating Systems & Distributed Systems Dave Andersen
15-740 Computer Architecture Nathan Beckmann
15-719 Advanced Cloud Computing Greg Ganger, Majd Sakr, George Amvrosiadis

Other highlights

Distributed storage systems internships
- Dropbox Vacuuming: rewrote their garbage collection system.
- Facebook Core Data Cache Client: fixed cache coherency for regions with multiple replicas.
Systems security:
- Undergraduate dissertation (with Robert Watson): compartmentalisation of cryptographic components (i.e., OpenSSL) (or, avoiding future Heartbleeds with Capsicum.)
- CTFs (Capture The Flag): Cybersecurity Challenge Singapore (joint) winner, 1^st in Facebook CTF, 2^nd in MIT-Cambridge C2C.
Algorithms
- Clustering/Bioinformatics: PLW (first-author) [BMCGenomics13] with Li Xiao-Li and Ng See-Kiong at I2R, A*STAR.
- International Olympiad in Informatics medallist (my 2016 note on where S'pore's IOI members are now).
Sensor systems: neighbor discovery for sensor networks [IEEE MASS18] with Ben Leong at NUS.
Deep learning: explored deep reinforcement learning, neural architecture search, hyperparameter optimization, and optimizer design at IHPC, A*STAR.
More details on my projects here (my résumé is more up to date.)

I'm a tinkerer at heart. I am always on the lookout for novel challenges to work on; I optimise for learning and doing meaningful, impactful work. I bask in the energy of synergistic collaborations, and the opportunity they give me to wade into new domains and learn from cool people.

I'm a software engineer and have a relentless urge to automate and optimize all parts of my work process.

I enjoy musicals, singing and karaoke (car karaoke setup), cooking, Singaporean food, skiing & snowboarding, gliding, long scenic drives (and walks), waterfalls, baking, rock climbing, ice skating, computer games, scuba diving, and last but not least, good nigiri (including making it). I did my undergraduate studies at the University of Cambridge and am a member of Churchill College. I grew up in Singapore, am a 华中子弟, and am a proud alumnus of my high school computer club EC³ (where I learnt to code and hack stuff together.)

My stuff: Quora | GitHub | Google Scholar

More about me: Publications | Biography