Since 2007, I have worked on many personal and academic software projects. I’m in the process of updating this site to showcase some more recent projects.
Medical NLP: Effective Querying in Medical Case Databases
In this project, we introduce a new product to accelerate the adoption of AI in Health Care. In cooperation with the MGH & BWH Center for Clinical Data Science, we applied and optimized natural language processing methods for dealing specifically with medical patient reports collected in hospitals. Reports are written in free form and pose special challenges, such as use of highly specialized vocabulary, abbreviations, typing errors and negated sentences. Standard approaches towards extracting key information or querying, while working well in general use cases, typically fail miserably on these kind of data. We developed a highly efficient search engine for medical databases that helps (i) clinicians to retrieve relevant cases, aiding both clinical diagnostics and treatment, and (ii) machine learning engineers working in the healthcare domain, looking towards automatic extraction of relevant train and test data.
Orbital Trajectory Fitting: Detecting Near-Earth Asteroids
This report summarizes a multi-step method for identifying full orbit trajectories of asteroids in our solar system. The input to this method is a list of partial ”tracklets,” a series of visible-light observations of an object in the night sky over the course of several hours. The output is a series of ”orbital elements” that completely define the asteroid’s trajectory through our solar system. Some steps of this method have been completed as a prior project and are outside the scope of this paper; specifically, (i) the partitioning of the night sky into 768 spatial regions and monthly time windows, and (ii) the initial clustering of asteroids within those regions/windows into ”asteroid slices”. The steps that are within the scope of this paper are (a) the optimization of the six ”motion parameters” of each cluster from a variety of optimization solvers, (b) performing refinements on the earlier clusters using the optimized motion parameters, (c) converting the motion parameters into time independent ”orbital elements,” and (d) meta-clustering the asteroid slices from different time/space windows into final asteroid trajectories. Ultimately, our algorithm correctly identifies 69.9% of just over 21,000 asteroids in a labeled dataset with an approx. 1% error rate of spurious asteroids. We hope to run this algorithm on a full, unsolved database of asteroids known as the Isolated Tracklet File (ITF).
Video Generation: Using FCNs in an Adversarial Setup
In this project, we present a variety of deep learning based setups for text-to-image and text-to-image sequence generation. Image sequence generation is a challenging task and an actively researched branch within computer vision, posing special challenges such as temporal coherency. To address this problem, we describe a variety of partial and complete solutions that we developed in three stages: (1) Text to Image Synthesis: using a traditional GAN, we generate a single image from a textual representation. (2) Text + Image to Video Synthesis: using a fully convolutional network, we generate an image sequence given a single frame and a description of the action taking place. (3) Inspired by the recent success of generative adversarial networks, we then also train this architecture in a truly adversarial setting. Throughout our work, we make use of different datasets. Primarily, we evaluated our approaches on our own synthetic datasets with increasing difficulty, then moving to natural images from a human action dataset. We also performed text to image synthesis experiments on the T-GIF dataset, but noticed that the high diversity and other issues with this dataset make it rather unsuitable for video generation experiments.
CLFlow: Cluster Management for TensorFlow (2017)
- Developed software to create software-managed computing clusters to parallelize heavy computing tasks on inexpensive hardware setups.
- Software supports automatic job distribution, dataset synchronization, live training supervision and administration using central master node.
- Deployed in research lab, significantly aided multiple research projects.