posts tagged tech

Things that have taught me statistics

tech

A collection of resources that I’ve learned statistics from, plus mini-reviews of each.

In praise of gradient boosting

tech

When I’m trying to build a good predictive model for a dataset, I usually find it quite hard to do better than a fairly off-the-shelf gradient boosted decision tree machine. These work, in brief, as follows:

Why squared error?

tech

Someone recently asked on the statistics Stack Exchange why the squared error is used in statistics. This is something I’d been wondering about myself recently, so I decided to take a crack at answering it. The post below is adapted from that answer.

Decision trees for survival analysis

tech

Survival analysis is an interesting problem in machine learning, but it doesn’t get nearly as much attention as the usual classification and regression tasks, so there aren’t as many tools for it. Here I describe a nifty reduction that allows us to bring more traditional machine-learning tools to bear on the problem. Combined with the view of decision trees as greedy piecewise-constant loss-minimizing classifiers, it enables a number of powerful and flexible algorithms for large-scale discrete survival analysis.

A useful view of decision trees

tech

Decision trees for machine learning are often presented in an ad-hoc way, with “node impurity metrics” whose choice is never explained. But it turns out there’s actually fairly good theoretical motivation for such metrics (which nobody talks about much, for some reason). Each commonly-used impurity metric corresponds to treating a decision tree as greedily learning a piecewise-constant function that minimizes the expectation of some well-known loss function.

Tracking function dependencies

tech

I recently started looking into dependency tracking in Python–determining which pieces of code and data are required to compute a particular function. This led to a sequence of journeys into the weeds of cPython and various crazy interpreter hacks. Probably don’t try anything in this post at home, but hopefully they’re fun to learn about, at least.

My job hunt experience

tech

Some people have been asking me how my job search went–how I found out about and decided on the current company that I’m working for. I thought I’d write a bit about it, as a case study for other folks interested in doing similar things and because I learned some interesting stuff along the way.

Avoiding bugs in machine learning code, part 2

tech

In a previous post I explaind how hard it is to prevent, or find and fix, bugs in machine learning code via conventional strategies. In this installment, I’ll go over some strategies that do work.

Avoiding bugs in machine learning code

tech

At my work I’ve been writing a lot of machine learning code. Some of it is machine learning code responsible for moving around a whole lot of money, so it behooves us to be really careful when writing and testing it to make sure no bugs make it into our production systems. Unfortunately, machine learning bugs are often quite hard to catch, for a couple reasons.

Plants, continued fractions, and the golden ratio

tech

If you were a plant, how would you decide where and when to grow leaves? The exciting math behind a prosaic question.

« older newer »