## PHASE TRANSITIONS – WHEN A MODEL IS SCREAMING OUT THAT IT NEEDS REVISION

So, I’ve been thinking a lot lately about one particular subfield of physics and how it might apply to data. In general, the thing that has been on my mind is the area of Statistical Mechanics, which is the study of how a huge number of particles in atomic-level physics can aggregate together into our macroscopic world, but beyond the large area of stat-mech, I’ve been thinking in particular about the topic of phase transitions – what happens when these large scale models cease to be consistent with the small-scale behaviour of the system.

Why this is interesting to me is that, in a lot of ways, big data involves the same sort of problem that stat mech does – we try to amalgamate the behaviour of millions of website visitors, or IOT devices, or whatever, into a meaningful model that can be described by a few parameters. When we do this, we should be interested in looking at when our model will suddenly need to be revised or adjusted. So, I’m going to take this space here to talk a bit about how these come about in our physical descriptions of nature, and then speculate a bit on how they might be leveraged in discussing data.

So, let’s think about a gas… for simplicity, let’s say that it’s kept in a well- insulated container. How might we describe it? Well, na¨ıvely, we’d probably go and use a few variables that described the observed state of the gas. We’d use stuff like the pressure *P *of the gas, its volume *V *, and its temperature *T *. If we already know about molecules, we might use the number *N *of molecules in the gas. If the gas is diffuse enough, we get a relationship between these things that is something like our favorite formula from intro chemistry:

P V=N kT

where *k *is a constant called Boltzmann’s constant1 . Now, this isn’t the end- all of gas laws, however. If we start to factor in the simplest sorts of interactions between particles, we need to figure that the more dense the gas is, the stronger particles attract each other, and the more space is taken up by the particles, so we get something like:

where *a *and *b *are empirical constants that depend on the gas under consideration. Now, for sufficiently high values of *T *, this equation is nice and well-behaved. But it should be easy enough to see that if *T *is low enough, then you can get behavior that is somewhat crazy. In particular, for constant values of *N *and *T *, and sufficiently low *T *, you can find solutions where , which is crazy when you think of it – increasing the volume of the gas increases the

pressure! No stable gas can satisfy these conditions. You might think this means that the Van der Walls equation is junk (and it’s true that this equation, too, is an approximation), but what it really means is something more interesting – the Van der Waals equation of state is telling you that you aren’t having a gas at all anymore – the gas is undergoing a phase transition, and is liquefying or solidifying. And, on the other side of that phase transition, we will need a new equation of state to describe a liquid or a solid.

Now, what does this mean for data? In a lot of ways, we are doing the same thing with big data. We are measuring the net effects of millions of individuals doing billions of things, and tracking their aggregate behaviour using a model that has far fewer degrees of freedom than the underlying data. And it’s not the model being wrong – it’s the model screaming out that it no longer applies, and needs to be replaced with a model that applies more directly to the underlying situation.

For instance, imagine that we’re running a regression on a series of sensors for a predictive analytics platform, such as SparkCognition’s SparkPredict, and we are attempting to predict future sensor output on a wind turbine from previous sensor output. Generically, we’re going to be measuring a bunch of things like airflow velocity, temperature, pressure, etc. We build a model, train it, deploy it, and watch it evolve, and at first, it works beautifully. But then, winter comes, the behaviour of the turbine changes, and we’re predicting negative pressure. We know that’s impossible, but that’s just the model telling us that we need to come up with a new model, applicable to winter. We’ve undergone a phase transition, and we’ve learned about it from our model, itself. There are obviously much deeper dives to take here, but one of the first lessons is always that fundamental question that all mathematicians and scientists should ask themselves – is my model making sense?

Last modified: October 26, 2017