Data Mining
- by Dennis Faas on 20090513 @ 02:11PM EST | google it | send to friends
- Filed under Technology | Word Of The Day | (related terms: data mining, samples, data snooping, data dredging, process)
Data mining is the process of extracting hidden patterns from data.
With the amount of data available online doubling every three years, data mining is becoming an increasingly important tool to transform relative information into a tool used for predicting trends. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
Data Mining Results May Not Be Indicative
While data mining can be used to uncover patterns in data samples, it is important to be aware that the use of non-representative samples of data may produce results that are not indicative of the domain.
The discovery of a particular pattern in a particular set of data does not necessarily mean that pattern is representative of the whole population from which that data was drawn. Hence, an important part of the process is the verification and validation of patterns on other samples of data.
Data Dredging and Data Snooping
The term data mining has also been used in a related but negative sense, to mean the deliberate searching for apparent but not necessarily representative patterns in large amounts of data.
To avoid confusion with the other sense, the terms data dredging and data snooping are often used. Note, however, that dredging and snooping can be (and sometimes are) used as exploratory tools when developing and clarifying hypotheses.
Privacy Concerns and Ethics
Some people believe that data mining itself is ethically neutral. However, the way that data mining is used can raise ethical questions regarding privacy, legality, and ethics. In particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program or in ADVISE, has raised privacy concerns.
This document is licensed under the GNU Free Documentation License (GFDL), which means that you can copy and modify it as long as the entire work (including additions) remains under this license.

