Monday, July 16, 2007

Word Fossils -- A Statistical Fantasy

We started getting cable television at our home just a few months ago. I never got much opportunity to watch any of the cable shows in the past, so I’ve watched it pretty intently for a while now. One series that fascinates me is Mythbusters on the Discovery Channel. These people are real scientists! They are amazingly resourceful, appropriately skeptical, knowledgeable and very funny. I particularly liked the show where they tried to read recorded sound off of a ceramic pot. The spectacle of the crew shouting through a phonograph horn into a lump of clay was so much better than anything on the Comedy Channel. The story behind the scene was even more intriguing.

The suggestion had been made that sound recordings would be naturally embedded by a potter applying tools to the surface of the wet clay as it rotates on the potter’s wheel, analogous in principle to an Edison phonograph, I suppose. The tool would vibrate and the vibration would leave a mark. The mark could theoretically be interpreted to reconstruct the original sound. Old pots could be made to give up the words of the dead. Even the audio track of the life of Jesus might be recovered if a potter had been nearby, in Jerusalem perhaps, inadvertently pressing His words into the stoneware.

Now, by my way of thinking, the physics of the situation is undeniable. Everything in the environment is going to effect that wet clay. We should be able to pick up that sound. Based on the historical analysis of potsherds, pebbles and ash we have reconstructed entire civilizations and chronicled the oft repeated Fall of Troy. The wet clay should record the nature of the tools, the chemistry of glaze, the location, the temperature, the humidity, the firing fuel, the fact that the potter was a big left-handed galoot with a sore tooth and even note the presence of a nearby supernova in a northern constellation. Of course vibrations in the air will effect the clay! So why couldn’t the Mythbusters team retrieve the sound?

Well, maybe they weren’t trying hard enough. But the fact is that, assuming it’s possible, it’s not going to be easy. Edison himself was famous for his persistence, and he only succeeded because he was controlling all the variables. We forget how hard it is to do something for the first time.


If I were going to attempt it, how would I approach the problem of interpreting a pot? First I would forget about all the mechanical nonsense and treat it as a data source. Laser scan the whole thing, CAT scan it, whatever, to get a data-rich digital map of the surface, and maybe the interior as well. I’m imagining that these scans come in as bit maps where each digital point is a 3D address with sensor measurements associated with that point. I imagine also that the preprocessing provided by the scanning equipment retains only the significant points, employing massive computing power to whittle down the 3D space into a 3D form. Nevertheless, from the resolution that these images have, I’m guessing that there would be a tremendous amount of data. I don’t think I’d manage it on my PC, so I’d have to have access to some specialized computing power. (Maybe I’m underestimating today’s PCs.) Then, being of a statistical bent, I would take a random sample of the surface points, something large but reasonable.

The first objective IMO would be to identify simplifying physical parameters – in particular, the axis of rotation. This seems very straightforward, but in fact it is not. The pot will have been cut from the wedge unevenly. The bottom will have been trimmed, probably, but not necessarily square with the pot. Furthermore, the shape will have warped in the kiln to a degree depending on imperfections, unevenness and non-symmetrical heating. Sore Tooth may have thumbed it a little hard in places. I’m guessing, though, that we could easily fit some fairly simple 3D curve as the axis using a least-squares process. Then I would do it again with another, completely independent, sample of the same data. … The results of the two tests will not be the same, I promise you.

At that point, we have to make a decision. Do we try to straighten out the surface, or the axis? I’m betting we go for the surface first to remove major anomalies. NASA has some algorithms that I’m aware of (I’m talking 25 years ago) that are used for stretching maps and images to correct for the angle of viewing and atmospheric distortions. They do it by marking known positions and then spreading the distortions evenly in order to maintain neighborhood consistency. It’s as if they took a rubber sheet and pinned some of the points to an earthlike, rounded surface. The points in between would adjust themselves over the theoretical frame. In this case we would we would search for large explainable anomalies and try to remove them statistically. In the case of thumbjabs or other dents, we might look for and catalog local distortions that have an inverse curve and try to reverse the curve, preserving the local terrain, but not the original altitude.

In order to make this surface correction, we might have to develop a whole science of thumbjabs and the propagation of their impact on the remainder of the vessel. Learning how to correct these simple anomalies (akin to Oklahoma City in the previous post) would require us to experiment with real clay, pushing and prodding otherwise perfect pots to see what happens. Maybe we could exclude the anomalous data, but then we would lose the detail. Remember, we’re ultimately interested in identifying the sound vibrations. These are just steps we would need to trace before getting there.

Eventually, having removed as many dents and bulges from the virtual shape as seems appropriate, we will refit the axis to see if we can get more consistency. Finally we may decide on an axis that represents the statistical combination of numerous attempts. We may look at all sorts of possible axes including expanding helices. The variation among our calculations will give us clues and the closeness of the fit will be our interim measure of success.

We don’t really expect a close fit at this point, because we don’t really have a circular pot. Sore Tooth pulled the pot off the wedge a little too vigorously and apparently stored the greenware overnight, laying the pot on its side and thus warping the circular pattern that one would hope for. That pattern, described with mental reference to integral calculus, perhaps, as a stack of many stubby cylinders of varying diameter, is a mere dream, one of Plato’s Forms. In fact, the slight torque exerted in the process of throwing the pot would have produced elliptical shapes at best and probably something more complicated. All this could be corrected. It takes work, but it can be corrected. The helical axis could be unwound; the out-of-round could be identified, measured and removed. (These parameters might also help us discover that Sore Tooth was left-handed.)

The overall concept here is that we are extracting significant variables that help us generate an idealized formulation of the pot by means of statistical corrections and, at thesame time, tell us stories about its making. The main measure of success will be the R-squared of the model. This is a statistical measure of the goodness-of-fit that is very widely used in scientific research. It is regarded as providing an intuitively helpful measure of the percentage of variance explained by the variables that have been incorporated in the model. The closer to 100 percent you can get, the better you understand the data. In the physical sciences, as opposed to the social sciences, we can sometimes get very high values for R-squared, and that is good. The whole purpose of this exercise is to find out what the surface should be, and subtract that value leaving us with the so-called residual, or remaining unexplained variation, where we hope to find the magic vibrations.

At this point, the form has effectively been removed and we are left with a giant tube-sock of data in the form of standard deviations from the mean. And what is the mean? The mean is the form, the surface in question, which is now zeroed out, and we are looking instead at small deviations away from that form. The purpose of removing the form is to allow us to find the sound track. If it were an LP disk, we would be looking at a single long track of numbers with a mean of zero, and we would be trying to avoid track repeats and track skipping. This case is a lot more complicated because the track might wander. Even if the pot were perfect, the presumed recording "head" would have produced a moving target.

Now that we have the relatively pure signal, we are going to mathematically scan across, trying to find a track, slightly up, slightly down, anything we can think of – looking for what? Sine waves! If our data density is good enough, we should be able to find horizontally connected amplitude signals that correspond to the classic curves. We don’t expect good sine waves, of course. We expect cluttered overlapping sine waves that we can break down with Fourier Analysis. We are looking for sound as opposed to noise. In order to find it, we have to make a guess at how fast each track was moving and maybe we have to know what kind of noises to expect in an ancient potter’s workshop. And then we have to get lucky.

In all likelihood we will not be able to find anything. So then we give up, right? Not even close. First we look for combinations of adjacent tracks that might have more statistical power. Then we’re going to repeat the whole process looking at variations on the inner wall, or in the thickness of the walls, or linearity of the mineral crystals, or variable chemical states caused by changing pressure. Who knows what would actually work. And we could study the various interactions of all these measures until the Second Coming. In the end, lines of inquiry are only dropped because people run out of money or get tired of it all. If the research is important, people never give up. They just keep looking for new approaches.

The statistical mindset is just one of these approaches. The mindset of engineering would handle it like the Mythbusters team did. The first thing you try to do is record sound using the same medium under perfect conditions and see if you can learn from that. But I wouldn’t do it that way because it doesn’t suit my personality. For one thing, statistical analysis will find a lot of other things as well, which I like.


The question returns to just how important this research might be. Think of it! To recover dead languages, to literally hear the words of Jesus, of Pericles, of Caesar, the roar of the crowd in the Colosseum. More likely, I suppose it is, to hear the yelps of children chasing around the workshop or the cackle of hens. Maybe there are better ways to spend our time – providing electricity for third world countries or preventing malaria might seem to be better goals. Part of it always comes back to faith. Do you really believe that the research can pay off? And do you believe it’s worth the effort?

In the end, I have my doubts about the song in the pot. The real problem is that a medium can only hold just so many dimensions of information before the residue becomes random noise. There are just too many tiny factors effecting the outcome in small ways, and it doesn’t help that the signal is analog, as is everything else in the Universe. As a matter of fact, I think I’ve changed my mind. The Mythbusters approach may actually be better. We just have to build the perfect reading head for the given medium.

The many little things get washed out by a background of big factors. Theoretically, you could use the same approach to look at the stars in the daytime. Just subtract the statistical effects of the sun and sky. In practice, though, you just can’t know it well enough to subtract it. And this is what you need to understand about the previous post concerning the impact of the death penalty on the homicide rate. There are some pretty strong factors that motivate the likelihood of murder and the decision to murder. Although you know it has to have some sort of effect in some cases, it’s really hard to identify and measure that effect. And when combined with other factors, the result might not be what you expect. For instance, if you correct for poverty, then the impact of the death penalty might be to increase the likelihood of murder rather than decrease it. The reason for this is that the deterrence effect will have differing impacts on different groups, an effect which is correlated with the direct effect, but stronger. This is one of many problems involved with extracting second and third order factors by means of statistical correction.

IMO, you can’t squeeze sound from stoneware, and tiny incentives have indiscernible effects.

Labels: , , ,

Links to this post

Links to this post:

Create a Link


At Saturday, July 28, 2007 12:29:00 PM, Blogger Steve said...

Very interesting JJ, as usual.

I suppose the odds of success with even an intact pot decrease with age. Slow chemical changes, radioactive decay, thermal cycling, viscous flow and such would tend to increasingly obscure whatever signal might have once been present.

Newer pot, better results. Yes?

For entirely different reasons, the same seems to me to be likely with the death penalty. Odds of deterrence seem inversely proportional to the age of the crime and the duration of the process.

Detecting a signal and effecting an outcome are two different things, of course, and I'm not claiming support for my attitude on the death penalty in any statistical or economic study. After all, any such study is picked apart by people smarter and more qualified than I am, and what's left to me is noise.

One thing that bothers me a bit about my position on the death penalty is that, if rapid execution is likely upon conviction of certain crimes, the incentive to leave no witnesses is increased. In that sense I agree that the death penalty could actually increase murders. It seems to me, though, that this would be a short-term consequence of the sort of reform and streamlining I'd favor. Extremely unfortunate if true, of course, but in the end it seems the greater good would best be served by a transition to reform and rapidity in the administration of capital punishment.

At Sunday, July 29, 2007 4:22:00 PM, Blogger jj mollo said...

One thing I'm saying, by analogy anyway, is that the deterrence authors are fooling themselves. Yes, we suspect that the death penalty has an effect, just as vibrations must have an effect on the pot. An argument can be made that the deterrence effect might be in the opposite direction from what they postulate, but I'm arguing mostly that there are so many substantive forces impacting the tendencies and decisions of the potential perpetrators that it will be close to impossible to measure such an effect. This implies further that the enhanced deterrent influence obtained by swift enforcement of the death penalty is insignificant.

I should have dropped the subject, though, because my main desire was to explore the analytic limitations of statistics. I have a lot of respect for what statistics can do, but having done some of it, I am aware that it is very hard to do correctly. It is also very easy to fool yourself by inadequate understanding of statistical implications and simple wishful thinking. Every scientist has pet theories that get promoted repeatedly. Statistical analysis, unfortunately, is the equivalent of heavy equipment for pushing these ideas around. Defensive responses crumple under the force of statistical authority. What people don't understand is how easily that authority can be commandeered, and how oblivious people are to their own psychological resistence to skeptical analysis.

At Sunday, July 29, 2007 10:35:00 PM, Blogger Steve said...

Oh, I didn't miss your point about analytic limitations of statistics. Not at all. I did understand it as a preamble to your point about the death penalty though.

I should drop the death penalty thing, too. In all likelihood there will be no reform until events force it. Still, it's an interesting topic at times.

One thing, though... I didn't quite follow you when your wrote that the near impossibility of measuring an enhanced deterrent effect implies that any such effect is insignificant. Why does difficulty of effect measurement necessarily imply insignificance of effect? Is it not a question of attribution rather than measurement? Presumably a change in murder trends would be apparent.

Cheers from Phoenix!

At Monday, July 30, 2007 12:00:00 AM, Blogger jj mollo said...

Well, if it can't be measured, then certainly the effect must be smaller than the effect of things that can be measured. If I cite a list of things which make a great difference and then say that the statistical impact of another thing was too small to measure, then you're going to wonder why I mentioned it.

Now it might be that the deterrence effect of the death penalty exists, but it's complicated. That would be an interaction effect. So, for instance, the death penalty might inhibit people who have impulse control problems but excite those who have narcissistic personalities. I'm not suggesting this as a fact, but I'm using it to represent a myriad of interaction possibilities. I believe that in sum, the authors were not able to show an effect, though they did incorrectly report one.

Certainly in particular cases the effect might have existed, but we don't know which direction. If there is no measurable trend, then the trend is probably smaller than other things which can be measured.

At Monday, July 30, 2007 12:11:00 AM, Blogger jj mollo said...

If, on the other hand, we can set up an experimental approach, where a random subset of states might proceed expeditiously with executions while the others defer any decisions for a while, then we would have more assurance due to experimental control. If I were the statistical emperor, I would repeat the experiment several times, reversing the groups one time and then with different subsets of states starting at different times of the year.

The result, however, would only answer the very narrow question of whether the death penalty has some overall deterrence effect. It would be much more interesting to figure out where and when it made a difference and what were the mechanics of that influence. Statistical analysis could help with that if you had the right data.

At Monday, July 30, 2007 12:14:00 AM, Blogger jj mollo said...

Then again, it probably still wouldn't give you much guidance on public policy -- even if you knew everything about how it worked.


Post a Comment

<< Home