Office: GHC 9015
I am a systems builder and hacker who is interested in systems design and distributed systems.
I spent the last few years working on machine learning systems. My interests are diverse: before my PhD, my research ranged from clustering (in bioinformatics) to systems security.
Spring ‘20: I'm interested in topics from (but not limited to) ML for Systems, distributed systems, and dimensionality reduction on time series (both systems & neural datasets). Reach out if you have problems, insights, or data (especially on correlated failures) to share!
Résumé (Feb ‘20) | Publications
Daniel Wong, Daniel Berger, Nathan Beckman, Greg Ganger (and multiple Facebook collaborators)
Thomas Kim, Daniel Wong, Anuj Kalia, Rajat Kateja, Michael Kaminsky, Greg Ganger, David G. Andersen
How can we balance initiating recovery quickly and overreacting to transient failures?
Deep learning is great at image recognition, but many system problems don't look like that. I'm keen to learn about interpretable machine learning methods that find correlations in time series and graphs, with an especial interest in visualizations and causality.
Sequential, graph structure. Data and tasks often have a temporal aspect (e.g., traces), and a complex non-linear, graph structure (e.g., from task dependencies, or distributed nodes).
Unsupervised learning. Dimensionality reduction and clustering provide insight (e.g., understanding root causes of correlated failures), or can be used as preprocessing to make the problem more tractable by removing noise and reducing the decision space (e.g., optimizing dataflow graphs).
Interpretability. Systems design and optimization choices are about tradeoffs. Interpretability aids debuggablity, and increases practitioners' faith in decisions and findings from ML methods.
Systems often depend on hand-crafted heuristics for good performance. How can we replace these with automatically generated heuristics that are customized for each workload?
Daniel Wong, Peter Ma^, Sudip Roy*, Yanqi Zhou*
^Google Platforms Performance, *Google Brain (ML for Systems)
Angela H. Jiang, Daniel L.-K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai [preprint]
Angela Jiang, Daniel Lin-Kit Wong, Christopher Canel, Ishan Misra, Michael Kaminsky, Michael A. Kozuch, Padmanabhan Pillai, David G. Andersen, Gregory R. Ganger. USENIX ATC 2018.
Part of the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS).
3-way cross-region replication is expensive and slow. It helps mitigate rare risks like a hurricane taking out a data center, but why pay that price for common events like equipment failures? Can we detect and predict correlated failures?Outcome: I performed simulations based on theoretical modelling and presented a poster on transient failures at PDL Retreat 2019. Although industry often brought up this issue, this project did not continue because of a lack of real world data to model the failures. I would be keen to revisit this project. Hit me up if you are able to offer any datasets!
I'm a tinkerer at heart, and am always on the lookout for novel challenges to work on. In seeking opportunities, I aim to optimise for learning and to do meaningful, impactful work. I enjoy the synergy of collaborations, and how they give me the opportunity to learn from other people.
I'm a software engineer and have a relentless urge to automate and optimize all parts of my work process.
I enjoy cooking, musicals, singing, Singaporean food, skiing &snowboarding, gliding, long scenic drives (and walks), waterfalls, baking, rock climbing, ice skating, scuba diving, and last but not least, good nigiri. I did my undergraduate studies at the University of Cambridge and am a member of Churchill College. I grew up in Singapore, am a 华中子弟, and am a proud alumnus of my high school computer club EC3 (where I learnt to code and hack stuff together.)