Open Source
I view building software in the open as a mode of creative exploration. It lets me quickly act on inspiration, delve into new topics, and make tools that improve people's lives.
You'll see that I particularly like programming languages, distributed systems, machine learning, computer graphics, music, and art.
If you find something interesting, let me know!
Table of Contents
- Algorithm and Data Structure Library
- Bore – Localhost Tunnels
- Canvas Hashlife
- Char-RNN Keras
- classes.wtf
- Competitive Programming Workspace
- Composing Studio
- Crepe – Logic Programming in Rust
- Dispict
- Estimation Markets
- Fast Semantic Segmentation
- Graphics Workshop
- Handwriting Generator
- Inline SQL
- Julia Fractal Renderers
- Langevin Dynamics for Composition
- Path Tracer
- Pencil Sketch Rendering
- Percival
- Polygon Triangulation with Holes
- Procedural Harmony
- Redis Rope
- Rustpad
- Set with Friends
- Universal Gravity Simulator
- µKanren-rs
Dispict November 2022
Creative aesthetics tool to discover art museums.
Design a growing artistic exhibit of your own making, with semantic search powered by OpenAI CLIP. Bring your own labels and context.
Creativity and curiosity require associative thinking. What if computers could connect us with masterpieces made by artists of the past? Dispict is a new way to experience the Harvard Art Museums’ online collection (containing over 200,000 digitized works) through language-image machine learning models, helping you curate your own personalized galleries.
Inline SQL October 2022
Weaving relational queries into the fabric of Python.
The inline-sql
library provides a simple embedded language for writing SQL in
Python programs. Operations run directly within the calling process, and they
execute on Pandas dataframes. Local variables are automatically interpolated as
parameters.
This is implemented as a small wrapper around DuckDB, so it is extremely fast. From my own experimental perspective, the intent of the project was to see whether we can usefully and ergonomically embed declarative, relational query languages within general-purpose media.
classes.wtf August 2022
A course catalog with extremely fast full-text search.
Harvard has many course search websites, but none of them are good. This project is an attempt to take the problem more seriously: write high-performance software and set great defaults so that people can get better, more useful suggestions, 100x faster.
Classes.wtf is a custom, distributed course search engine that focuses on speed and quality of results. The goal is for the entire {request, computation, response, and render} pipeline to take under 30 milliseconds. I built it in a weekend out of personal annoyance and launched a week later; it’s now pretty popular among students at my school.
Redis Rope July 2022
Fast native data type for manipulating large strings in Redis.
redis-rope
is a fast and versatile
rope data type for
large strings in Redis, distributed as a native
module. It’s primarily written in
Zig and tested with Rust.
The ropes in this module are backed by splay trees, which are a self-adjusting data structure that has logarithmic amortized worst-case performance, while recently-accessed indices are also quick to access in subsequent operations. Each splay tree node stores between 64 and 127 bytes of data.
More than just being theoretically interesting, this module also tries to be somewhat practical by caring about safety, correctness, and speed. It approaches the performance of ordinary strings for simple operations and is hundreds of times faster for complex operations.
Bore – Localhost Tunnels April 2022
A modern, simple TCP tunnel in 400 lines of Rust.
bore
is a minimal CLI tool that exposes local ports to a remote server,
bypassing standard NAT connection firewalls. That’s all it does: no more, and no
less.
With a single binary, you can expose any number of local TCP ports to the public
internet at a specified remote server address. There is a public instance of the
tunneling server running at bore.pub
, available for anyone to use, similar to
ngrok or
localtunnel. It’s also very easy
to host your own server with a single bore server
command in the same binary
executable, with optional authentication.
Links: GitHub, Crates.io, Documentation
Percival November 2021
Web-based, reactive Datalog notebooks.
Percival is a declarative data query and visualization language. It provides a reactive, web-based notebook environment for exploring complex datasets, producing interactive graphics, and sharing results.
Percival combines the flexibility of Datalog as a query language for relational data with the beauty of exploratory visualization grammars. These declarative components interact through a reactive dataflow system. Because Percival uses web technologies (including Web Workers for multithreaded, sandboxed execution), fully-interactive notebooks can be shared with anyone on the Internet, making data analyses more tangible to others.
Composing Studio September 2021
Collaborative music composition for everyone.
Composing Studio is a free and easy-to-use online music notation editor that lets anyone collaborate in real time on short musical pieces. To use it, just go to composing.studio in your browser, create a new session, and share the link with other musicians! There’s no setup or installation required. You’ll be able to typeset musical notation while seeing each other’s work in real time (just like Google Docs), with instant sheet music rendering and live audio playback.
This project originated at HackMIT 2021 (we won a grand prize), where I pitched the idea and formed a team of four programmer-musicians. Although we hadn’t previously known each other, we all shared the same goal of exploring collaborative music with the global community.
µKanren-rs September 2021
A featherweight relational programming language.
This is a Rust implementation of µKanren, a very minimal language in the miniKanren family. See the original Scheme implementation here for reference. Similar to how functional programming focuses on functions as the core unit of abstraction, relational programming aims to represent common mathematical concepts as logical relations.
The library was originally implemented as an exercise for a graduate programming languages design seminar at Harvard, but I made the code available to the open source community as a Rust crate.
Links: GitHub, Crates.io, Documentation
Rustpad June 2021
A self-hosted online collaborative code editor.
Rustpad is an efficient and minimal collaborative text editor based on the operational transformation algorithm. It lets users collaborate in real time while writing code in their browser. Rustpad is completely self-hosted and fits in a tiny Docker image, no database required.
Architecturally, client-side code communicates via WebSocket with a central server that stores in-memory data structures. This makes the editor very fast, allows us to avoid provisioning a database, and makes testing much easier. It demonstrates the power of distributed systems and concurrent network programming, designing with consideration for the entire stack.
Pencil Sketch Rendering May 2021
Geometry processing for real-time pencil sketching.
This research project explores the task of drawing 3D objects, either triangle meshes or implicitly represented as signed distance fields (SDFs), in the art style of a pencil sketch. We review, implement, and extend existing methods with geometry processing techniques.
Our primary contribution is a new scale-invariant algorithm for estimating surface curvatures of an SDF within the graphics pipeline, which was a previously unexplored topic. This algorithm has the advantage of enabling real-time rendering of dynamic geometries at arbitrary scales (modeled by implicit functions), without the noise sensitivity of mesh-based methods.
Graphics Workshop April 2021
Learn computer graphics by writing GPU shaders!
This workshop contains a selection of self-guided projects designed to teach the basics of computer graphics. Each project introduces an important graphics technique that is extensively used in real-world applications. The code is designed to run in real time on modern GPUs, without requiring any extra software. I wrote this when frustrated with the lack of proper documentation and simple code examples for GPU shaders.
The topics include fragment shaders (GLSL), procedural texture generation, rasterization, lighting calculations, and real-time ray tracing. I hosted a live version of this workshop at Harvard, and it has been used by thousands of self-learners after gaining popularity in the /r/gamedev community.
Links: GitHub, Deployment
Path Tracer December 2020
A physically-based path tracer in Rust.
This a physically based, CPU-only rendering engine written in Rust. It implements a Monte Carlo path tracing algorithm for global illumination. There’s a lot of features, including k-d tree mesh acceleration, physical material properties (microfacet BSDF with multiple importance sampling), HDRI environment maps, OBJ/MTL/STL files, depth of field, and particle physics simulation.
It’s also parallelized with rayon and available as a library on crates.io. The entire source code, including code for the example images and more, is very short (~3K SLOC). We’re still looking to extend it with bidirectional path tracing and other features.
This won top project out of 100 students in MIT’s computer graphics class (6.837, Fall 2020).
Links: GitHub, Crates.io, Documentation
Langevin Dynamics for Composition November 2020
Generative modeling of Bach chorales by gradient estimation.
This research project introduces a new generative model for music composition, based on annealed Langevin dynamics and a noise-conditional score matching algorithm using transformers. Unlike implicit models such as GANs, this learns an explicit distribution of the input data.
We study if Langevin dynamics and score matching can combine the controllability of Markov chain Monte Carlo (MCMC) methods with the global view and fast convergence of stochastic gradient descent, to produce high-quality, structured musical compositions.
Our contribution is to look in the direction of designing generative deep learning models for music that strongly avoid local minima, while retaining controllability.
Crepe – Logic Programming in Rust September 2020
Fast, compiled Datalog for static analysis, with Rust integration.
Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.
Inspired by the power of Souffle and Formulog, this project is unique in enabling safe integration of arbitrary functions from the host language (Rust) within compiled Horn clauses. It also includes many features of modern Datalog implementations: stratified negation, semi-naive evaluation, and automatic index generation.
Links: GitHub, Crates.io, Documentation
Fast Semantic Segmentation August 2020
State-of-the-art, real-time semantic segmentation with MobileNetV3.
While working at Nvidia, I open sourced PyTorch code and pretrained weights for real-time semantic segmentation of street images. The goal was to make it easy for anyone to use cutting-edge machine learning algorithms for semantic segmentation tasks, especially in embedded applications.
This was a time-consuming effort aided by researchers from ADLR. Although the MobileNetV3 paper reported 72.6% mIoU accuracy, previous public implementations only scored around 40-55% mIoU. After many adjustments to model architecture, loss functions (RMI), and hyperparameters, I was able to train models reaching 72.3% mIoU, while running inference at up to 37.3 FPS on a modern GPU.
I currently maintain this code as a package on PyPI, along with scripts for inference and exporting models to different formats for deployment.
Set with Friends January 2020
An online, real-time multiplayer card game.
Set with Friends is a web implementation of a real-time pattern matching card game called Set. Originally, when I was at SPARC 2019, I wondered how I could bridge the 3000 mile gap between some of the friends I had made after we parted ways. Set with Friends lets you play Set online with minimal overhead (no login necessary); it’s as simple as sharing a custom link and having fun!
During the design process, I learned about Firebase, so working on this app introduced me to serverless architectures and provided a good exercise in creating live interfaces. We completely overhauled the design in May and released version 2.0 in June. At its peak, the website had over 40,000 monthly active users for half a year, and as of September 2022 there have been 5,000,000 games played.
Competitive Programming Workspace September 2019
An online, cloud-synchronized workspace for competitive programmers.
Top competitive programmers have dedicated setups on their local machines that let them quickly creating new programs from templates, generate input files, and run code on suites of test data. But what if you’re working on a different computer, and you don’t have all of these tools installed?
Workspace is a side-by-side problem viewer and code editor that allows you to run code online. It supports advanced features like autocompletion and automatically parses of test cases for each problem, saving time in a programming competition. This was inspired by CS Academy‘s interface.
The web server lets you scrape problems from online judges at the click of a button. All code is automatically saved and synchronized with a MongoDB instance in the cloud, so it persists across sessions and browsers. See the website below to try it out.
Polygon Triangulation with Holes May 2019
Fast computational geometry algorithms in WebAssembly.
This was an experiment to implement the ear clipping algorithm from computational geometry for polygon triangulation. Although easy to describe, triangulation is a surprisingly difficult problem, as there are many corner cases involving concave polygons, holes, and nested polygons. Therefore, we need to use some clever techniques to make the problem tractable.
Note that there is a randomized O(N log* N) algorithm for this problem (Seidel 1991). However, for most practical applications, ear clipping is fast enough and produces more robust results, while having lower constant-factor overhead due to the use of simpler data structures.
I implemented the algorithm in C++ from scratch, then compiled it to WebAssembly using Emscripten to produce code that could be run by the browser.
Procedural Harmony February 2019
Dynamic programming applied to four-part harmony.
This is a Python program that takes as input a string containing keys and roman numerals representing a chord progression, and voices the resulting chords according to common practice voice leading conventions. This can be used to try out chord progressions, compose a short chorale, and solve music theory exercises.
Parsing and interpreting the roman numerals was done with Music21. This project required satisfying a technical set of voice leading constraints, which I framed as a combinatorial optimization problem. Although the set of possible voicings is exponential in size, you can use dynamic programming on intermediate states for a relatively fast algorithm. See the website below for an interactive version deployed with Flask.
The code and idea behind this project were used as the basis for an abstract at ISMIR 2020, written by computational music theory researchers at McGill University.
Algorithm and Data Structure Library January 2019
A competitive programmer's library of algorithms and data structures in C++.
I maintain an open-source library of about thirty-five C++ templates for algorithms and data structures. These are fairly advanced tricks targeted toward top competitive programmers, including fast implementations of FFT, Aho-Corasick string search, and Dinic’s blocking flow algorithm, as well as data structures like link-cut trees and centroid decomposition.
This is accessible on a static website, which I developed using Angular. The website also includes a searchable collection of more than 600 code samples from other open-source libraries.
Canvas Hashlife December 2018
An ultra-fast simulation of Conway's Game of Life in the browser.
Ever wanted to simulate a 2100 × 2100 grid of cells over 250 generations? Well, now you can do that in just milliseconds without any specialized software, right in your browser!
How is it possible to simulate such a massive pattern without running out of CPU or memory? The secret is Bill Gosper’s Hashlife algorithm, which combines quadtrees and memoization to compress space and time. In the image, you can see a frontend web application computing the 498,913,509,376th generation of a Turing machine pattern in just under a millisecond.
Char-RNN Keras October 2018
Character-level language models with recurrent neural networks in Keras.
Inspired by Andrej Karpathy’s blog post, this is an implementation of a multi-layer long short-term memory (LSTM) network in Keras, for learning and sampling from character-level patterns in text. This was trained to generate writing in the style of historical authors, like Shakespeare, Hugo, and Carroll.
A really interesting application was training this model on a text-based corpus
of folk music, from which it could compose fairly convincing new music samples.
See the music-gen
demo below.
Handwriting Generator August 2018
Variational autoencoder that learns latent features in handwriting.
This is a neural network that encodes 28×28 images of handwritten characters in a 6-dimensional latent space using deep learning. It consists of two parts: an encoder block that puts the input through several convolutional and downsampling layers to extract latent variables, and a decoder block that takes latent variables and a label to reconstruct the original image, minimizing mean squared error. During the training process, we add some Gaussian noise to the latent space, which is the so-called “reparameterization trick” of a variational autoencoder.
I implemented the model in Keras and trained it on a dataset of over 400,000 handwritten characters. You can see the results in the web demo below, which stitches the outputs together onto a canvas. Without any prior knowledge, the model is able to isolate six of the most important characteristics in human handwriting.
Estimation Markets July 2018
Run interactive, simulated markets on estimation problems.
Inspired by online prediction markets like PredictIt, the Estimation Markets website allows users to create real-time markets on Fermi estimation problems. Players use their knowledge on the virtual trading floor, competing to earn the most money. I used this app to run estimation markets in classes at SPARC, AlphaStar Academy, and MIT Splash, with great feedback from the participants.
Julia Fractal Renderers April 2018
Fast, interactive fractal renderers in C++, Java, and WebGL.
Originally, this project started out as a multithreaded Julia fractal renderer in C++, which I used to create high-quality static images and animations. However, I also wanted to interactively explore the fractals by zooming and panning, so I created a Java Swing app with similar multithreaded performance.
The speed of these implementations was heavily CPU-bound though, so I finally implemented an OpenGL shader that could render Julia fractals with blazing-fast speed, fast enough to be explored interactively with almost no lag. I rendered using WebGL and connected it to mobile-friendly controls with TypeScript. You can play with this fractal explorer in the website linked below.
Links: GitHub, GitHub (Java), GitHub (WebGL), Website
Universal Gravity Simulator May 2016
Control hundreds of tiny planets with gravity.
This was one of my first fun programming projects, a visual experiment
simulating gravity in a system with hundreds of interacting objects.
Calculations are done using simple physics formulas, with a bit of damping to
make them more robust. The user can click to add an invisible mass that attracts
all objects to the cursor. Everything is rendered in real time on a <canvas>
element, and it’s quite satisfying to move the colorful balls around!