The data warehouse is aThematic,Integrated,Nonvolatile,Time varyingbe used forSupport managers’ decision makingA set of data.
Data warehouse data are usuallyBatch modeLoading and accessing, but not updating data in the data warehouse environment. Data warehouse
The data is loaded when it is loadedstatic snapshotThe format is carried out. When a subsequent change occurs, a new snapshot record will be written.
Data warehouse. So,The history of data is preserved in data warehouse.
Contents
The structure of data warehouse
Data warehouse environmentThere is a different layer of detail in the data.
- Early detail layer
- Current detail layer
- Mildly integrated data layer (data mart)
- Highly integrated data layer
The data is imported into the data warehouse by an operating environment. A considerable amount of data transformation usually occurs when data is transferred from the operation layer to the data warehouse layer.
Theme oriented
Data warehouses are oriented to enterprise themes that have been defined in the high-level enterprise data model, such as customers, products, transactions or activities, policies, claims, accounts.
DASD: Direct storage device direct access device
One theme will bestar schemaThe way of linking, such as customer theme, is linked by customer ID.
The phenomenon of first days to n days
data warehouseOnly step by step to design and load the data,That is to say, it is evolutionary rather than revolutionary.
granularity
granularityIt is the level of detail or degree of integration of data units in data warehouses.
The higher the degree of detail, the lower the particle size; the lower the detail, the higher the particle size.
In a data warehouse environment, granularity is the most important design problem because it will profoundly affect the size of data stored in data warehouses.
And the type of query that the data warehouse can answer. The lower the granularity level is, the wider the scope of query is, and conversely, the higher the granularity level, the less query.
When a data warehouse of a business or organization has large amounts of data, it is very meaningful to adopt double or multiple granularity levels in detail.
Live sample database
It is a subset of real file data or mildly integrated data from data warehouse.sampleIt means that it isA subset of a large database,liveIt means that this database needs to be doneCyclic Refresh。
The live sample data is used for statistical analysis and observation trends. When the data must be observed as a whole, the live sample database can provide very ideal results, but it is not suitable for processing single data records.
Zoning design method
Data partitioning is the dispersion of data to the possibleSeparate physical units for separate processingIn the middle.
The problem in the data warehouse environment isHow to partition the current detail data
Data organization of data warehouse
1. Simple stacking data
2. Rotation integrated data
3. Simple direct file
4. Continuous file
The lifecycle of data in a data warehouse containsData cleaning。
Data cleaning or data transformation are mainly in the following ways.
- The data is added to a round robin file that has lost its original details.
- Data are transferred from high-performance media such as DASD to large capacity media.
- Data is really removed from the system
- Data transfer from one level of architecture to another.