I found out about Elizabeth Pisani’s Prospect article through its publication in the Australian Financial Review on 14 January 2011 as Hypothetical Crisis but was originally published on 17 November 2010 in Prospect. However, it gave me some food for thought regarding many areas where data and is used to make decisions and build models.
The article’s starting premise is that the internet age has changed science by making its “… tidy world of hypothesis, experimenting and knowledge generation…” obsolete and “… about to end”.
This ending of the current paradigm is driven home to the author by “Generation Y geeks” telling her that it’s all about scraping, data mining, terabytes and even petabytes of data. The article is worth reading as it asks many probing questions for anyone who works in any forecasting/model building area.
Let me first be upfront with my bias that theory(hypothesis) is as relevant as ever by retelling a story that I heard when I was a physics undergrad. The story was told by one of my professors, who was an experimental physicist but of a rare breadth and depth to have full appreciation of how the whole physics(theory and experiment) progresses together. At any rate, the story went that … in the sixties Australia had one of the world’s most powerful telescope arrays but as a scientific community it had not invested enough in theoretical physics so in effect they did not know where to “point” it.
The moral of the story is that no matter how much data you can amass, it has to be the right data and that data and its analysis are strictly theory led.
This is not to say that those that are algorithmic champions don’t have a point – they do – but I believe that they only have a point in a limited sense. For example when taxonomy is required or when a model/hypothesis builder is in pre-discovery mode.
It is easy to imagine that algorithmic analysis and high CPU power can lead to fast categorisation of data – say like cluster analysis.
The discovery mode is also important because model building is a trial and error business and the only way to get the intuition to be strong enough to see hidden relationships is by “getting ones hands dirty”.
My professional interest is in financial modelling and the way that organisations build and maintain models. In this field the purely quantitative approach can lead to disaster because often relationships break down at other times they are circumstantial. Moreover the communication aspects of a purely algorithmic approach can be tortured and lead to devastating outcomes of inappropriate usage by users like bankers, traders etc. This is because it is by creating theoretical explanations that we can tell a story that people relate to.
However in all fields that really seek to explain and understand the structure of the world the algorithmic approach is a “first approximation”, a heuristic and handmaiden to intuition but theory leads to deepening, which can then use the data to test it.
On a final note and paradoxically the algorithmic/data mining approach is almost unthinkable as a tool to discover Quantum Mechanics, which is the necessary condition of the microchip hence computer.
The article’s starting premise is that the internet age has changed science by making its “… tidy world of hypothesis, experimenting and knowledge generation…” obsolete and “… about to end”.
This ending of the current paradigm is driven home to the author by “Generation Y geeks” telling her that it’s all about scraping, data mining, terabytes and even petabytes of data. The article is worth reading as it asks many probing questions for anyone who works in any forecasting/model building area.
Let me first be upfront with my bias that theory(hypothesis) is as relevant as ever by retelling a story that I heard when I was a physics undergrad. The story was told by one of my professors, who was an experimental physicist but of a rare breadth and depth to have full appreciation of how the whole physics(theory and experiment) progresses together. At any rate, the story went that … in the sixties Australia had one of the world’s most powerful telescope arrays but as a scientific community it had not invested enough in theoretical physics so in effect they did not know where to “point” it.
The moral of the story is that no matter how much data you can amass, it has to be the right data and that data and its analysis are strictly theory led.
This is not to say that those that are algorithmic champions don’t have a point – they do – but I believe that they only have a point in a limited sense. For example when taxonomy is required or when a model/hypothesis builder is in pre-discovery mode.
It is easy to imagine that algorithmic analysis and high CPU power can lead to fast categorisation of data – say like cluster analysis.
The discovery mode is also important because model building is a trial and error business and the only way to get the intuition to be strong enough to see hidden relationships is by “getting ones hands dirty”.
My professional interest is in financial modelling and the way that organisations build and maintain models. In this field the purely quantitative approach can lead to disaster because often relationships break down at other times they are circumstantial. Moreover the communication aspects of a purely algorithmic approach can be tortured and lead to devastating outcomes of inappropriate usage by users like bankers, traders etc. This is because it is by creating theoretical explanations that we can tell a story that people relate to.
However in all fields that really seek to explain and understand the structure of the world the algorithmic approach is a “first approximation”, a heuristic and handmaiden to intuition but theory leads to deepening, which can then use the data to test it.
On a final note and paradoxically the algorithmic/data mining approach is almost unthinkable as a tool to discover Quantum Mechanics, which is the necessary condition of the microchip hence computer.