Databases
Main article: Database management system
Database management systems emerged in the 1960s to address the problem of storing and retrieving large amounts of data accurately and quickly. One of the earliest such systems was IBM'sInformation Management System (IMS),[20] which is still widely deployed more than 40 years later.[21] IMS stores data hierarchically,[20] but in the 1970s Ted Codd proposed an alternative relational storage model based on set theory and predicate logic and the familiar concepts of tables, rows and columns. The first commercially available relational database management system(RDBMS) was available from Oracle in 1980.[22]
All database management systems consist of a number of components that together allow the data they store to be accessed simultaneously by many users while maintaining its integrity. A characteristic of all databases is that the structure of the data they contain is defined and stored separately from the data itself, in a database schema.[20]
The extensible markup language (XML) has become a popular format for data representation in recent years. Although XML data can be stored in normal file systems, it is commonly held in relational databases to take advantage of their "robust implementation verified by years of both theoretical and practical effort".[23] As an evolution of the Standard Generalized Markup Language(SGML), XML's text-based structure offers the advantage of being both machine and human-readable.[24]
[edit]Data retrieval
The relational database model introduced a programming language independent Structured Query Language (SQL), based on relational algebra.[22]
The terms "data" and "information" are not synonymous. Anything stored is data, but it only becomes information when it is organised and presented meaningfully.[25] Most of the world's digital data is unstructured, and stored in a variety of different physical formats[26][a] even within a single organisation. Data warehouses began to be developed in the 1980s to integrate these disparate stores. They typically contain data extracted from various sources, including external sources such as the Internet, organised in such a way as to facilitate decision support systems (DSS).[27]
[edit]Data transmission
Data transmission has three aspects: transmission, propagation, and reception.[28]
XML has been increasingly employed as a means of data interchange since the early 2000s,[29] particularly for machine-oriented interactions such as those involved in web-oriented protocols such as SOAP,[24] describing "data-in-transit rather than ... data-at-rest".[29]
[edit]Data manipulation
Hilbert and Lopez[18] identify the exponential pace of technological change (a kind of Moore's law): machines' application-specific capacity to compute information per capita roughly doubled every 14 months between 1986 and 2007; the per capita capacity of the world's general-purpose computers doubled every 18 months during the same two decades; the global telecommunication capacity per capita doubled every 34 months; the world's storage capacity per capita required roughly 40 months to double (every 3 years); and per capita broadcast information has doubled every 12.3 years.[18]
Massive amounts of data are stored worldwide every day, but unless it can be analysed and presented effectively it essentially resides in what have been called data tombs: "data archives that are seldom visited".[30] To address that issue, the field of data mining – which can be defined as "the process of discovering interesting patterns and knowledge from large amounts of data"[31] – emerged in the late 1980s.[32]