Jerry SchirmerWritten by | IoT & Big Data, Scientists' Corner


So, I’ve been  thinking a lot  lately  about  one particular subfield  of physics and how it might apply  to data. In general,  the thing  that has been on my mind is the  area  of Statistical Mechanics, which  is the study of how a huge  number of particles in atomic-level physics  can aggregate together into  our macroscopic world,  but  beyond  the  large  area  of stat-mech, I’ve been  thinking in particular about the  topic  of phase  transitions – what happens when  these large  scale models  cease to be consistent with  the  small-scale behaviour of the  system.

Why  this  is interesting to me is that, in a lot of ways,  big data involves  the same sort of problem that stat mech does – we try  to amalgamate the behaviour of millions  of website  visitors, or IOT  devices,  or whatever, into  a meaningful model  that can  be described by a few parameters. When  we do this,  we should be interested in looking  at when  our  model  will suddenly need  to  be revised  or adjusted.  So, I’m going  to  take  this  space  here  to  talk a bit  about  how these come about in our  physical  descriptions of nature, and  then  speculate a bit  on how they  might be leveraged in discussing  data.

So, let’s think about a gas…  for simplicity, let’s say that it’s kept in a well- insulated container.   How might  we describe  it?   Well,  na¨ıvely,  we’d probably go and  use a few variables that described the  observed  state of the  gas.  We’d use stuff like the  pressure P  of the  gas, its volume  V , and  its temperature T . If we already know about molecules,  we might use the  number N  of molecules  in the  gas.  If the  gas is diffuse enough,  we get a relationship between these things that is something like our favorite formula  from intro  chemistry:

P V  = N kT

where k is a constant called Boltzmann’s constant1 . Now, this  isn’t the  end- all of gas laws, however.  If we start to factor  in the simplest sorts of interactions between particles, we need to figure that the  more dense the  gas is, the  stronger particles attract each other, and  the  more space is taken up by the  particles, so we get something like:

where  a and  b are  empirical constants  that depend on the  gas  under  consideration.   Now,  for  sufficiently  high  values  of T ,  this  equation is  nice  and well-behaved. But it should  be easy enough  to see that if T is low enough,  then you can get behavior that is somewhat crazy.  In particular, for constant values of N  and  T , and  sufficiently low T , you can find solutions where  forumla_2, which is crazy  when  you  think of it  – increasing the  volume  of the  gas increases  the

pressure!  No stable gas can satisfy these conditions. You might think this means that the  Van  der  Walls  equation is junk  (and  it’s true  that this  equation, too, is an approximation), but  what it really  means  is something more  interesting – the  Van  der Waals  equation of state is telling  you that you aren’t  having  a gas at all anymore – the  gas is undergoing a phase  transition, and  is liquefying  or solidifying.  And,  on the  other  side of that phase  transition, we will need  a new equation of state to describe  a liquid  or a solid.

Now, what does this  mean  for data? In a lot of ways, we are doing the  same thing  with  big data. We are measuring the  net  effects of millions  of individuals doing  billions  of things, and  tracking their  aggregate behaviour using  a model that has far fewer degrees of freedom  than the underlying data.  And it’s not the model being wrong – it’s the  model screaming out that it no longer applies,  and needs  to be replaced with  a model  that applies  more  directly to the  underlying situation.

For instance, imagine  that we’re running a regression  on a series of sensors for a predictive analytics platform, such  as SparkCognition’s SparkPredict, and  we are attempting to predict future sensor  output on a wind turbine from previous sensor  output.  Generically, we’re going to be measuring a bunch  of things like airflow  velocity, temperature, pressure, etc.  We build  a model,  train it,  deploy it,  and  watch it  evolve,  and  at first, it  works  beautifully.  But then, winter comes,  the  behaviour of  the  turbine  changes, and  we’re  predicting negative pressure.  We know  that’s impossible, but  that’s just the  model  telling  us that we need  to come up with  a new model,  applicable to winter. We’ve undergone a phase  transition, and  we’ve learned about it from our model,  itself.  There  are obviously  much  deeper  dives to take here,  but  one of the  first lessons is always that fundamental question  that all  mathematicians and  scientists  should  ask themselves – is my model  making  sense?


Last modified: October 26, 2017