I agree with Daniel Hillis that Chris Anderson's point, although provocative and timely, is not exactly breakthrough news. Science has always taken advantage of correlations in order to gain predictive power. Social science more than other sciences: we have few robust causal mechanisms that explain why people behave in such or such a way, or why wars break out, but a lot of robust correlations - for which we lack a rationale - that it is better to take into account if we want to gain insight on a phenomenon. If the increase of child mortality rates happened to be correlated with the fall of the Soviet Empire (as it has been shown) this is indeed relevant information, even if we lack a causal explanation for it. Then, we look for a possible causal mechanism that sustains this correlation. Good social science finds causal mechanisms that are not completely ad hoc and sustain generalizations in other cases. Bad social science sticks to interpretations that often just confirm the ideological biases of the scientist.
Science depicts, predicts and explains the world: correlations may help prediction, they may also depict the world in a new way, as an entangled array of petabytes, but they do not explain anything if they aren't sustained by a causal mechanism. The explanatory function of science, that is, answering the "Why" questions, may be just a small ingredient of the whole enterprise: and indeed, I totally agree with Anderson that the techniques and methods of data gathering may be completely transformed by the density of information available and the existence of statistical algorithms that filter this information with a tremendous computing capacity. So, no nostalgia for the good old methods if the new techniques of data gathering are more efficient to predict events. And no nostalgia for the "bad" models if the new techniques are good enough to give us insight (take AI vs. search engines, for example). So, let's think about the Petabyte era as an era in which the "context of discovery", to use the old refrain of philosophy of science, is hugely mechanized by the algorithmic treatment of enormous amounts of data, whereas the "context of justification" still pertains to the human ambition of making sense of
the world around us. This leaves room for the "Why"-questions, that is, why are some of the statistical correlations extracted by the algorithms so damn good? We know that they are good because we have the intuition that they work, they give us the correct answer, but this "reflective equilibrium" between Google ranked answers to ourqueries and our intuition that the ranking is satisfying is still in need of explanation. In the case of PageRank, it seems to me that the algorithm incorporates a model of the Web as a structured social network in which each link from a node to another one is interpreted as a "vote" from that node to the other. This sounds to me as "theory", as a method of extraction of information that, even if it is realized by machines, is realized on the basis of a conceptualization of reality that aims at getting it right.
A new science may emerge in the Petabyte era, that is, a science that tries to answer the question of how the processes of collective intelligence made possible by the new, enormous amount of data that can be easily combined by powerful algorithms are sound. It may be a totally new, “softer” science, uninhibited at last by the burden of the rigor of "quantitative methods", that make scientific papers so boring to read, that leaves to algorithms this burden and lets the minds free to movearound the data in the most creative way. Science may become a cheaper game from the point of view of the investment for discovering new facts: but, as a philosopher, I do not think that cheap intellectual games are less challenging or less worth playing.