6234 Course Notes
6234 Course Notes
6234 Course Notes
(install SSAS start installation then take break) Companies want information, data is easy to come by, information is not. Companies want to: Track key performance indicators (# defects, sales, #returning students) Identify trends Nbr incidents reported is going up, Sales are down Make predictions Men between 18-25 are more likely to be in a car accident Tools available Relational Reporting provides summarized data to users from OLTP databases, easy to use, but some report are slow to run (matrix reports, history reports) OLAP stores aggregates (totals) and is faster than relational reporting for large volumes of data, especially matrix type reports Data Mining searching OLTP & OLAP for trends and patterns o Which is a bigger factor in who buys a bike? Age or income? o What combinations of courses are often taken by a student? OLAP Concepts distribute Handout on OLTP vs OLAP Data Warehouse contains large amounts of historical data usually combined from multiple sources and denormalized in a snowflake or star schema Data Mart subset of a data warehouse on a particular subject Facts numerical measurements, or measures that are summarized, e.g. $ Sales, quantity sold Dimension the different ways to categorize a fact e.g. by product, by Customer, by Region Cubes multidimensional structure that stores summarized fact & dimension data. When a user wants to query a data warehouse they use a cube Slicing and Dicing isolating individual results in a cube e.g. comparing each regions sales of bicycles (slicing), comparing each regions sales of bicycles by month (dicing) Pivot Tables user interface for browsing cubes, slicing and dicing SQL Server Analysis Services Features OLAP design, build, deploy and query cubes Data Mining Identify patterns and trends to try and make predictions. SSAS supports a number of data mining algorithms to data analysis Multiple Data Sources Data Source Views allow you to access data from multiple data stores OLAP and OLTP KPIs Allows you to monitor a metric or a combination of metrics based on a formula that you have identified as a key performance indicator
6234 SQL Server Analysis Services Server Stores analysis services database Analysis service runs on server and accesses the analysis services database Analysis services handles aggregations, transactions, calculations, metadata management, security, XML for analysis Client Allows clients to connect to Analysis services with ADO MD, ADO MD.Net, XML/A, OLEDB for OLAP. Users access cubes through tools like Microsoft Excel, or Performance Point Tools Business Intelligence Development Studio Visual Studio for developing BI applications: SSAS, Reporting Services and SS Integration Services. SQL Server Management Studio for manipulating databases and managing deployed Analysis service solutions SQL Server Configuration Manager to manage sql server client-server configurations (network protocols supported, Service settings for SSAS) Analysis Services Objects Data Sources connection information to your data source Data Source Views - a view that determines what objects in your data sources are available to a cube (could come from multiple data sources) Measures numeric facts that users analyze e.g. sales, units sold Measure Groups logical groupings of measures, e.g. sales is the measure internet Sales, retail Sales are measure groups (often one measure group = one fact table) Dimensions represent what you are aggregating the data by . Dimensions have have attributes and hierarchies e.g. Time dimension has year/quarter/month, location has country/province/city. Product Dimension has name, price, size, category Cubes create sub-totals (aggregations) of measures by different combinations of dimensions to create a multidimensional structure that users can query quickly and easily Installing SQL Server Analysis Services Resources the more dimensions the bigger your cubes will be, the more cubes the more memory and processing power you need Instances you can have one or multiple instances. Each instance has its own security, service packs, and listens on a different TCP port number Client connectivity use a pre-defined port number or the default 2383 Availability clusters improve availability Installation use SQL Server 2008 setup program
6234 SQL Server Analysis Services Upgrading SQL Server Analysis Services from SSAS 2000 Side-by-side can run on same machine as SQL Server 2000 Analysis Services. SQL Server 2000 Analysis service must be the default instance since it does not support multiple instances. If you dont want to lose functionality from 2000. Upgrade use wizard to upgrade from 2000 to 2005 good if you want to move completely to 2005, but if it screws up takes a long time to fix it. Migrate set up 2005 and migrate data from 2000 to 2005, good if you have multiple 2000 databases and you want to upgrade some but not all of them. Requires more hardware
Lab Notes detailed instructions in Lab Answer Key on CD (30-40 install) Go to E:\Labfiles\Evaluation and run Setup.exe to start the SQL Server Installation Wizard You will be prompted to install a Windows Hotfix and .NET framework as well, just choose Yes, it will install the components and the you have to restart the computer and re-execute Setup.exe Choose Installation New SQL Server Stand-alone installation Setup support rules - OK Product Key Specify Free Edition License Terms Accept Setup Support Files Install Setup Support Rules Next (warnings are okay) Feature Selection Database Engine Services, Analysis Services, Business Intelligence Development Studio, Management Tools Complete Instance Configuration Default Instance Disk Space Requirements Next Server Configuration Use the same account for all services NY-SQL-01\sqlserver Pa$$w0rd Database Engine Configuration Specify SQL Server Administrators Add NY-SQL02\Administrator Analysis Services Configuration Specify which users have administrator privileges for Analysis Services NY-SQL-02/Administrator Next, Next until you reach Ready to Install then choose Install Exercise 2 Verify Installation View log file by following link or opening at C:\program files\microsoft\sql server\100\setup bootstrap\log\20081014_142821\SummaryNYSQL02.txt Start Microsoft SQL Server Sql Server Management Studio Connect Server Type = Analysis Services Server Name= NY-SQL-02 Databases folder is empty for now
6234 SQL Server Analysis Services Module 2 Creating Multidimensional Analysis Solutions Lecture: 90 minutes Lab: 45 mins (p. 2-16 ex 1 Create Data Source, Ex 2 Create Data Source View, Ex 3 Create, Deploy & Process Cube) Online Mode Once your cubes have been deployed to your database, you can connect directly to SQL Server analysis database and make changes. The only thing in the project file is the database & server name, cant do version control with this, no files to save! Project Mode To initially create cubes you create an SSAS project and deploy them to the analysis database, changes are developed in project, saved in a file then deployed to a database. The project stores the cube definitions, not the data, you cannot browse a cube without deploying it. Reverse-engineering a project If you want to use Project mode, but do not have an existing SSAS project for your cubes, you can reverse-engineer one using File New Project Import Analysis Services 9.0 Database. BI Studio has Solution Explorer, Designers, Wizards, built-in help Source Control using a tool like SourceSafe helps prevent overwriting each others work, by using Check In, Check Out when in Project Mode. Demo Business Intelligence Studio 1. Open Project E:\mod07\labfiles\solution\AdventureWorks OLAP\AdventureWorks OLAP.sln 2. D:\LabFiles\Solution\AdventureWorksOLAP 3. Double Click Data Source to show properties of data sources 4. Double Click Data Source View to show Data Source View Designer 5. Double Click a dimension to open Dimension Designer Pane 6. Double Click Cube to open Cube Designer 7. (this cube has errors and cannot be deployed) 8. Go to Tools Options to show where you can customize BI Studio settings 9. Show Tools Options Source Control Plug in selection that you can set to integrate BIDS with a source control tool
6234 SQL Server Analysis Services Data Sources Contain connection strings to underlying databases that contain fact & dimension tables Impersonation options when you are in BIDS, your current users credentials are used to connect to the database to retrieve data, after deployment the impersonation credentials are used Specific Windows User Name & Pwd when service account does not have permissions to access database Service account usually selected, requires service account to have access to database Credentials of current user used for data mining, used for mining models and DMX OPENQUERY statements Demo Creating a Data Source 1. Create a SQL Server Analysis Project 2. In Solution Explorer, rt click Data Source New Data Source 3. Create data source based on New or Existing Connection 4. New connection select server-NY-SQL-01, Windows Authentication, database name AdventureWorksDW2008 5. Test Connection & Click OK 6. Click Next impersonation specifies how SSAS connects to the data source when processing the cube, or executing Data mining queries Usually choose default (uses the service account for SSAS service) 7. Give data source a meaningful name click Finish 8. Double click on Data source in Solution Explorer to show Data Source Designer 9. Point out Query Timeout Setting and Maximum nbr of Connections 10. The Maintain a reference to another object in the solution, allows you to specify the connection string by reading it from another data source Data Source Views Creating a Data Source View 1. In Solution Explorer, rt click Data Source View New Data Source View 2. Select the data source 3. Select tables dbo.FactInternetSales 4. Click Add related tables and Save Creating a Data Source View based on multiple Data Sources 1. Create a second data source Server=NY-SQL-01, DatabaseAdventureWorks2008, Use Service Account (you cannot add multiple data sources in the wizard, you add them afterwards in the designer) 2. Go to Data Source View designer rt click in table diagram choose Add/Remove Tables, choose a different data source point out first one is listed as primary data source, add Production.ProductReview Table from AdventureWorks2008
Browsing Data 1. Rt click on FactInternet Sales table and choose Explore data to show data in the table 2. Click Pivot Table, click on Fields button in toolbar to display list of fields 3. Drag promotion key field to rows 4. Drag SalesTerritoryKey field to columns 5. Drag SalesAmount to totals 6. Show pivot table created 7. Click on Chart tab, show chart created 8. Click on Pivot Chart, drag SalesTerritoryKey to Series 9. Drag PromotionKey to bottom of chart 10. Drag SalesAmount to chart 11. Click on PromotionKey, select promotions 1,2 12. Click on SalesTerritoryKey, select Territory 1,2 13. Show pivot chart created Data Source View Table and Column properties 1. Double Click on Data source view in solution explorer to bring up Data Source view designer 2. Rt click a table or column and choose Properties set Friendly Name property of a table and column to make names more user-friendly Add a Named Query From toolbar in Data Source View Designer, choose New Named Query (if you dont have permissions to create a view in the database, you can create a named query in your data source view) 1. Name LargeSales 2. SELECT * FROM FactInternetSales WHERE SalesAmount > 1000 3. Click OK/Finish show it in the Data Source View, rt-click explore data Add a Named Calculation 1. Select DimCustomer table in the data source view designer 2. Choose New Named Calculation from the toolbar 3. Name Full Name 4. Concatenate name columns FirstName + ' ' + LastName 5. Save the changes, show the full Name column, rt-click dimCustomer explore data and show the concatenated column in the data Add a logical Primary key (if an underlying table does not have a primary key, or a view or named query has no primary key, you can add a logical primary key, you cannot add a logical key if a primary key already exists) 1. Go to named query LargeSales, select SalesOrderNumber & SalesOrderLineNumber, rt-click set logical primary key
6234 SQL Server Analysis Services Creating Relationships Foreign keys in underlying database appear as relationships in Data source view, you can add or delete relationships in Data Source view, especially useful for named queries and tables from multiple data sources 1. Go to named query Large Sales, select productKey column, rt-click add relationship, make relationship to DimProduct - productKey Creating Diagrams When you have a lot of tables allows you to focus on a particular subset of the data source view 1. Go to diagram organizer pane rt click Add New Diagram 2. Drag FactInternet Sales & dimTime tables to the Diagram
Cubes
We now have our Data source view, which contains fact & dimensions. Now we want to pre-calculate aggregates/subtotals in a cube, to speed up reporting by different dimensions. A cube is made up of one or more measures from a fact table(Sales, quantity) which will be aggregated by one or more dimensions (Product, time) from a dimension table. Remember Measure & Dimension attribute names here will be viewed by the users, so try to give them meaningful names Dimensions - You can have the Cube Wizard create your dimensions, or design the dimensions separately and re-use them in different cubes Attributes & Hierarchies can be created through cube wizard or added later in cube designer Date & Time Dimensions One of the most common dimensions users want is time, they want to see data by month, calendar quarter, fiscal quarter, year, day of week, hour, etcSSAS has special support for this dimension You probably want to create your own DimTime for a Time Dimension, you can include the attributes of interest to you and include extra attributes like stat holiday, manufacturing shift The dimension wizard can be used to create a Time dimension table for you If you just want standard attributes for the time dimension you can choose Server Time dimension which contains standard hierarchies and attributes that are stored on the server instead of within a dimension table. You specify the range of dates for the server time dimension.
6234 SQL Server Analysis Services Creating a Cube with the Cube Wizard 1. Rt click Cubes in Solution Explorer New Cube 2. In Cube Wizard Choose Build the cube using a data source (if you choose autobuild, the wizard will suggest dimensions, measures, attributes and hierarchies, for more control deselect this checkbox) 3. Select the AdventureWorks DW 2008 Data source view to use for the cube 4. Specify FactInternetSales as your measure group table 5. Select the Order Quantity and Sales Amount measures (they can be renamed here by click in their column name if you want or later in cube designer) 6. Select DimProduct, DimDate, DimPromotion as dimensions 7. Name the cube InternetSales 8. Rt-Click Cube in Solution Explorer choose Process to build and deploy cube 9. Go to dimension designer drag color to product dimension 10. Drag CalendarYear to DimDate 11. You will get an errorremember the Data source how there were two places to specify login information? Well BIDS can login but the service account we selected for impersonation cant! So we need to set up an account for the service account in SQL Server a. Go to SSMS connect to AdventureWorksDW2008 b. Security Users New User c. Username: sqlserver, Login name: NY-SQL-01\sqlserver, default schema:dbo d. Schemas owned by this user: db_owner, db_securityadmin e. Database role membership: db_owner, db_securityadmin 12. Now process the cube again 13. Go to Browser drag SalesAmount Measure, Product Color and Order DateCalendar Year to cube 14. Go to Cube Designer Pane, rt-click Sales Amount, show how you can change the name of the measure & set the FormatString=currency 15. Right click anywhere in measures pane show how you can switch between Show Measures in Grid and Show Measures in Hierarchy views 16. Re-process and reconnect to cube to show currency formatting 17. Walk through the different panes in the Cube Designer 18. Open Microsoft Excel, Data Tab From Other Sources Analysis Services NY-SQL-01 Add data source and cube, to pivot table, then select a measure group and add a dimension to the rows and columns groups to the pivot table Cube Designer Tabs each is covered later in the course Cube Structure Tab - To add, modify properties of measures and dimensions, attributes and hierarchies Dimension Usage Tab to show which facts/measures are related to which dimensions Calculations Tab to create calculated fields KPIs Tab KPI measure progress a business is making toward meeting its goals. You create a KPI define the metric value to examine and a goal to achieve. Then
6234 SQL Server Analysis Services you define MDX expressions to calculate the current status of the KPI and the trend. Then you choose a visual indicator to show the trend Actions Tab Actions are initiated by users, for example a URL action to navigate to a website, reporting action to link a report in reporting services to a cube, drillthrough action to provide access to detailed data Partitions Tab You can partition the data to make searching more efficient Aggregates Tab for pre-calculating aggregates (slows processing, speeds queries) Perspectives Tab You can create Views of relational data, you can create perspectives of Cubes to simplify or focus on a particular part of a cube for users Translations Tab To add captions in multiple languages Browser Tab To browse cube data from SSAS
Creating a Cube without a Data Source (optional demo) use this when you are trying to figure out what fact and dimension tables you will need based on the cube you are designing, rather than designing a cube based on an existing set of fact and dimension tables 1. Rt Click cubes in Solution Explorer New Cube 2. In Cube Wizard Choose Build without a data source and no template 3. Add a measure Sales Sales Group Single Sum 4. Add new dimensions Time, Course Category, Course (SCD is slowly changing dimension requires extra attributes to handle) 5. Select Time periods for time dimension 6. Give the cube a name and click finish, you can also generate the schema for the cube at the same time (required if you want the demo to work) will launch the generate Schema wizard to create the database tables in the specified data source. 7. Go to Cube Designer Browser can drag columns and rows, no data because you havent created any data yet 8. Go to SQL Server Management Studio, show tables created in Adventure Works DW Database Course, Course Category, etc.. Lab Notes Lab says use AdventureWorksDW should be AdventureWorks DW 2008
6234 SQL Server Analysis Services Module 3 Working with Dimensions Lecture: 70 minutes Lab: p 3-20 45 mins (ex 1 Configuring Dimensions, Ex 2 Hierarchies and Relationships, Ex 3 Sorting and Grouping) Dimensions give us a way of aggregating/totaling our fact data, e.g. by product, by time, by color, by size. Dimensions are made up of one or more attributes from a dimension table Each column from the dimension table can be an attribute of a dimension (e.g. Productid, Name, Color, Size, Category) Each dimension needs a key attribute to link it to the fact table, this is usually a PK-FK relationship in the underlying database Dimension Attributes often form hierarchies e.g. day, week, month, year or subcategory, category After you define a dimension with the necessary attributes & hierarchies you can re-use it in as many cubes as you like (The date dimension is frequently re-used)
Dimension Designer used to edit dimensions, specify which attributes to include from the dimension table, hierarchies, and translations for attribute headings 1. Create a Cube showing FactInternetSales, Customer, Time & Product 2. Deploy and process the cube 3. In Solution Explorer, Dbl Click On Product Dimension to go to Dimension Designer 4. Show each of the Dimension Designer Tabs Dimension Structure Tab edit attributes, hierarchies, hierarchy levels Attribute Relationships create, modify, delete attribute relationships Translations Tab to enter multilingual translations for the dimension Browser Tab to browse members of the hierarchy (after deploying) Dimension Storage MOLAP Multidimensional OLAP (the default) dimension data is stored in the cube which is faster when you execute a query, but if new dimension records are added or modified, changes are not picked up until you process the dimension. ROLAP Relational OLAP leaves dimension data stored in the relational database as source, queries are slower, but provides real-time data, since dimension data is read straight from the database table and not from the cube. Can be set in Properties of the dimension Storage Mode Editing Attributes & display folders 1. Rt-click an attribute, show the delete and rename options. Deleting only removes attribute from the dimension it will not affect the data source view 2. Delete French Description 3. Then Drag French Description back from Data Source View to add it back 4. Go to attribute properties, point out name property 5. Set AttributeHierarchyDisplayFolder to Description for Arabic Description, Chinese Description and other description Attributes
6234 SQL Server Analysis Services 6. Process & Deploy the Cube 7. Go to Cube Browser, Reconnect and show how those attributes are now contained in a Descriptions folder Attribute Column Bindings KeyColumn Each dimension has a KeyColumn, this is usually the Primary Key of the dimension table, this is how the dimension attributes are linked to the fact table. If the attribute is not tied to the logical primary key of the table, change the KeyColumn value This happens because data warehouses may not be normalized (e.g. StateName is tied to StateCode not to GeographyKey) NameColumn value displayed to the user for this attribute, e.g. Product is the key attribute but displays id numbers. You might choose to display ProductName to users when they request Product. If not specified KeyColumn is displayed ValueColumn value to be used when doing MDX calculations, e.g. might want to display a formatted date, but use an actual date column for calculations. If not specified KeyColumn is used 1. Rt-click EnglishDescription attribute, show the Properties for KeyColumn, NameColumn and ValueColumn Properties 2. Change NameColumn to a FrenchDescription 3. Process dimension 4. Go to dimension browser, Display English Description, point out how French Descriptions appear 5. Change the property back to its original NameColumn (need to Reconnect in browser to see changes) Attribute Hierarchies All attributes are part of at least one hierarchy: All/One - primarykey - attribute You can define additional hierarchies to allow users to drill-down and drill-up. Just drilling from the lowest level to All is not always useful (e.g. Sales of the 2791 Course to sale of all courses), you probably want more levels, (e.g. Sales of SQL Server, Sales of Microsoft, Sales of all Courses) Hierarchy types SSAS creates an All hierarchy for each attribute Natural Hierarchies - based on 1-many relationships in the database tables Category SubCategory, year-month, Aggregations are pre-calculated for natural hierarchies Non-Natural hierarchies created in dimension designer for reporting purposes e.g. Size Color, gender-city (many-many) Unbalanced Hierarchy different number of levels under different parents (e.g. manager staff, level of reporting from CEO to lowest level varies) Parent-Child Hierarchies are defined by self-referencing relationships in the dimension table Ragged Hierarchy number of levels is different because sometimes levels are skipped (e.g. Country state city, is sometimes Country city). In data this is represented with NULL values or sometime we store the parent name in the missing level
6234 SQL Server Analysis Services Displaying Ragged Hierarchies (You can set this property when you create a hierarchy, go to the attribute in the hierarchy and go to the properties) HideMemberIf = Never for regular hierarchy(there are no gaps) HideMemberIf =NoName for ragged hierarchy so that the NULL values are not displayed in the hierarchy
Country Canada Italy State Ontario NULL City Ottawa Rome
Canada-Ontario-Ottawa Italy-Rome HideMemberIf=ParentName for ragged hierarchy when storing parent name twice instead of NULL values
Country Canada Italy State Ontario Italy City Ottawa Rome
Canada-Ontario-Ottawa Canada-PEI SkippedLevels column Store a column in your dimension that specifies how many levels are skipped for each member
EmpId 14 13 Title Supervisor Pion Name Mike Joe MgrID 12 12 Skip 0 1
6234 SQL Server Analysis Services Dimension Attribute Properties IsAggregatable determines if there is an ALL level (might set to false on time because it would be very slow to show values for ALL dates) AttributeHierarchyOrdered determines if the hierarchy is ordered, the attribute used to order is specified in the Order By property. Setting AttributeHierarchyOrdered=False can speed up processing, because it does not need to be sorted unless queried AttributeHierarchyOptizimizedState if set to not optimized, no indexes are created for the hierarchy, speeds up processing, slows querying, default is FullyOptimized. If an attribute is only occasionally used in queries, you could set to NotOptimized. AttributeHierarchyEnabled Allows you to use this attribute in a cube and aggregate for this attribute. Set to false is good for attributes we want to display but never plan to pivot by (e.g. Product Description/Photo) AttributeHierarchyVisible If set to false, you can only access this attribute through a hierarchy Demo Hierarchy Attributes Disable All & Grand totals with IsAggregatable 1. Go to Dimension Designer for Dim Product Go to properties of Model Name attribute set IsAggregatable = False 2. Deploy & browse dimension show how All is no longer displayed for Model 3. Browse Cube, when you drag Model you do not see a Grand Total because it is not aggregatable Make attribute available only within Hierarchy 4. Go to DimensionDesigner Product, Size, Set AttrbuteHierarchyVisible=False 5. Deploy & browse & Re-connect dimension , Show how you cannot select Size attribute anymore (it can only be viewed through a hierarchy now) Create a user hierarchy 6. Go to Dimension Designer 7. Drag Color to Hierarchy Tab, drag Size under Color, rename Hierarchy Color Size 8. Deploy and Browse, Re-Connect Dimension show hierarchy created 9. Deploy and browse the cube show how you can drag the hierarchy to the query and expand levels Demo Parent Child Hierarchy Manager Employee 1. Create New Data Source View Add DimEmployee and Related Tables 2. Create Cube using FactResellerSales and DimEmployee, DimSalesTerritory select measures: SalesAmount and Quantity 3. Go to DimEmployee in Dimension Designer, Process Dimension 4. Browse DimEmployee, show parentEmployeeKey Hierarchy 5. Select Parent Employee Key Attribute, go to properties Naming Template, specify CEO; VP; Director (do not expand type straight into property) 6. Deploy and Browse 7. Select a node on level 2 or 3, point out title bar displays values from naming hierarchy
6234 SQL Server Analysis Services 8. Point out MembersWithData Property on Employees Hierarchy Attribute, this controls when the CEO has data associated with him/her do we show his/her data or only data of his/her reports e.g. when showing manager do we show the mgrs sales + his staff sales or just his staff sales Demo Calendar Hierarchy 1. Go to Date Dimension 2. Right Click Date Dimension in Solution Explorer add Business Intelligence 3. Dimension intelligence, Time 4. Map Calendar Year= Year 5. Map Calendar Quarter = Quarter 6. Map Month= Month 7. Add hierarchy Calendar Year Calendar Quarter Month 8. Point out blue squiggle indicating you should add a relationship to improve performance 9. Go to attribute relationship add relationship Source Month Calendar Quarter 10. Source Calendar Quarter Calendar Year 11. Build and process show how the hierarchy appears in the dimension and cube Sorting using OrderBy Name sorts by name attribute in alphabetical order Key by one or more key columns (e.g. Quarter & Year) Secondary attribute use a different column for sorting e.g Month you might want to appear in order of occurrence not by alphabetical sorting of month name Sort by Attribute Name 1. Go to Time Dimension in Dimension Designer 2. Go to English Month Name 3. Deploy and Browse dimension, show how months are sorted by Month Name April, August, etc 4. Show how MonthNumberOfYear is sorted as characters 1,10,11,12,2 Sort by Attribute 5. Go back to dimension Designer 6. For MonthNumberOfYear Set OrderBy=Key (order by Name treats value as alphabetical order, e.g. month 1, month 10, month 11, month 12, month 2) 7. Deploy and browse dimension show how MonthNumberofYear is now sorted numerically 8. Go to properties of English month Name set Orderby=AttributeKey 9. set OrderByAttribute = MonthNumberofYear YOU CANT it is not listed you need a relationship between the two attributes 10. Go to the Attribute relationships pane, Add new relationship EnglishMonthName to MonthNumberofYear 11. Go to Attributes of English Month Name, set OrderByAttribute = MonthNumberofYear 12. Deploy and Browse show how EnglishMonthName is sorted correctly Sort by composite Key (do not demo unless you practice it first)
6234 SQL Server Analysis Services 13. Go to properties of Month Key Add CalendarYear Column to the key, move it to be the first column of the key, make MonthNumberofYear second column 14. Go to Orderby = Key 15. Set Name property of English month Name to English Month Name (otherwise by default key column is displayed) 16. Deploy and Browse, show how months are now sorted by year 2000-January is not the same as 2007-January Grouping For Attributes that have no natural hierarchies you can use grouping to make them into smaller groups (you dont decide grouping, SSAS does) Cannot do grouping on top level of hierarchy, or on consecutive levels of hierarchy, or on ROLAP DiscretizationMethod EqualAreas divide into groups with equal members Clusters use the k-algorithm to divide into clusters (more meaningful but slower to process attribute must be numeric) DiscretizationBucketCount how many groups to create NamingTemplate default is first & last value of group (e.g. January-March) Demo Grouping 1. Go to Property page of Product, List Price in Dimension Designer 2. Set DiscretizationMethod = EqualAreas 3. Set DiscretizationBucketCount=4 4. Deploy & Browse show how values are broken into buckets EXTRA INFO Attribute Relationships Attribute Relationships define dependencies between attributes. By default you have a relationship between each non-key attribute and the key attribute. Show Key Attribute, expand show how all other attributes are listed Adding a hierarchy tells SSAS to build a cube containing that hierarchy and allows a user to drill down and drill up when querying the cube. Adding a relationship improves processing. Whenever you have a 1-1 or 1-many relationship between columns in a dimension, you should add an attribute relationship between them. E.g. Add CalendarYear as attribute of CalendarQuarter. Set attribute relationship property One (Month name Month Number of Year), or Many (Year Quarter), and rigid (YearQuarter) or Flexible (Category-Subcategory) 1. Go to Time dimension Set up attribute relationships 2. Drag Calendar Year attribute from TimeKey to Calendar Quarter 3. Drag Calendar Quarter attribute from TimeKey to CalendarMonth 4. Drag Month NumberYear to MonthName (relationshiptype=One)
6234 SQL Server Analysis Services Add Business Intelligence Define Time Intelligence allows you to specify which fields map to calendar quarter, year, and so on to get default hierarchies Define Account Intelligence allows you to define things like which columns specify if an account is income or expense Specify a unary operator to change the default aggregation for parent-child hierarchies in a cube Create custom member formula to replace default aggregation with a different operator Specify Attribute ordering LAB NOTES Ex 3 Task 1 STEPS HAVE MANY MISTAKES AND ARE MISSING STEPS!!! LESS MISTAKES in LAB ANSWER KEY but still keep an eye on students 1. Go to the Calendar Date Hierarchy in the Date Dimension not Calendar Time in Time dimension 2. To do a New Attribute from Column, right click the Month column in Data source view in Dimension designer, this creates a duplicate attribute for the columnbut if you already have MonthNumberOfYear in the Date Dimension you can skip that step! 3. Instead of expanding the TimeKey column, go to the attribute relationship pane and show how all the attributes are related to the DateKey 4. Create a new relationship Source =Month to MonthNumberOfYear 5. Change OrderByAttribute Property of Month to MonthNumberOfYear 6. Set OrderBy property of Month to AttributeKey Task 2 Set DiscretizationMethod = Automatic
6234 SQL Server Analysis Services Module 4 Working with Measures and Measure Groups Lecture: 70 minutes Lab: 60 mins (p 4-17 Ex 1 Configure Measures, Ex 2 Define Dimension Usage and Relationships, Ex 3 Configure Measure Group Storage) ***TYPOS IN EX 2*** Measure Display Properties Name the Name displayed to the user Format String how data will be displayed, Currency, Percent, True/False, or user-defined dd/mm/yyy, $#,#0.00 (regional settings in control panel determines date and currency formatting) DisplayFolder to organize measure into folders for the users Visible to hide measures used for calculations that are not meant to be displayed directly to users. MeasureExpression can be A*B or A/B (cannot be more complicated i.e. A*B*C) Demo Member display properties 1. Go to Cube Designer build cube for Internet Sales showing Customer, Product and Date 2. Change Name property of measure Order Quantity to Quantity Ordered 3. Change Format String of SalesAmount to currency 4. Go to properties of two Cost measures and set Display Folder Name = Cost 5. Deploy & Browse show how measures are contained in a folder Measure Values a column in a fact table (sales amount, quantity ordered) row based, e.g. count number of rows in a table (nbr orders, nbr students) based on an MDX expression (net profit) Aggregating Measures Additive - Across all dimensions (sales amount can be totaled for all product, all customers, all years, etc) Semi-additive - Across some dimensions but not others (e.g. inventory should be aggregated across warehouses but not across months) Non-additive - Not aggregated across any dimensions (e.g. nbr distinct records) Aggregate Functions Sum(additive) adds up values across dimensions Count(additive) counts number of values Min(semi-additive) returns lowest value Max(semi-additive) returns highest value DistinctCount(non-additive) None supplies values directly from fact table without aggregations 1. Go to cube designer browse cube showing Sales Amount by color and marital status 2. Select Sales Amount Measure change Aggregate Function =Min 3. Deploy and Browse (use Customer & Color to show diff values)
6234 SQL Server Analysis Services Measure Groups Measure Group Properties AggregationPrefix common prefix used for any aggregation names, and partitions created for a measure group DataAggregation can SSAS creates aggregates for persisted and/or cached data for the measure group. Default is create aggregates for persisted and cached data ErrorConfiguration Default error messages come from msmdsrv.ini file, Custom you can define error messages for duplicate keys, null keys, etc.. and define the action to occur when an error occurs in processing, e.g. convert to a specific value, stop processing EstimatedRows estimated number of rows in fact table (good for aggregation wizard) Estimated Size estimated size in bytes of the measure group (good for aggregation wizard) IgnoreUnrelatedDimensions - Determines whether unrelated dimensions are forced to their top level when members of dimensions that are unrelated to the measure group are included in a query. Default setting is True. So basically if you try to show internet sales by geography region (there is no link between them) if True you see the total for all regions listed for each region (so Australia shows the grand total, so does Canada), if you set it to false, the individual regions will display NULL instead of the grand total, which I personally prefer) ProcessingMode Regular data is not available until processing is complete LazyAggregations data is accessible as soon as available, but total processing time is increased ProcessingPriority processing priority of the cube during background operations such as lazy aggregations and indexing StorageLocation file system storage location for the measure group, if not specified location is inherited from the cube that contains the measure group Type type of the measure group Demo Error Configuration 1. Go to Cube Structure tab of Cube Designer 2. Select a Measure Group and Display Properties 3. Change ErrorConfiguration =Custom 4. Expand and show different options for different errors, errorlimit, errorlogfile 5. Point out ProcessingMode Property StorageMode MOLAP Multidimensional - stored aggregations and copy of data in multidimensional format, best for query performance, but requires cube to be processed to see most recent data. Proactive caching helps with that. ROLAP Relational - stores aggregated data as indexed views in the relational data source, and reads source data from relational tables, query time and processing time is slow but consumes less memory and allows real-time updates of data
6234 SQL Server Analysis Services HOLAP Hybrid stores aggregations as Multidimensional, but leaves source data in relational data source, fast for queries of aggregated data, slow if underlying data is required because must go to relational data source
Proactive Caching Since MOLAP & HOLAP aggregations can become out of date when source data changes, you can use Proactive caching to update aggregations on a schedule or when source data changes. Update Cache when data changes updates MOLAP when notified of data changes (requires setting up notifications on partitions) Silence Interval - how long Cube must be inactive before beginning to process new MOLAP image SilenceOverrideInterval - how long to wait before beginning to process new MOLAP image even if cube is active Drop outdated cache how long to wait before dropping an outdated cache when a new cache is created Update Cache periodically interval of time after which to refresh cache Notifications is set at the partition level, when to be notified to update the cache BringOnlineImmediately If checked, allows users to query (will use ROLAP for data) while MOLAP image is being processed, if unchecked MOLAP processing must be completed before cube can be accessed Enable ROLAP aggregations create indexed views for aggregations Apply settings to dimensions applies storage mode and proactive caching settings to dimensions Demo Measure Group Storage Properties 2. Go to Cube Structure tab 3. Go to properties of a measure Group 4. Show StorageMode Property 5. Show StorageLocation Property StorageLocation specify folder where cube is stored (overridden by partitions) 6. Select Proactive Caching Property Select Custom show options 7. Enable Proactive Caching Relationships between Measure Groups & Dimensions Somehow SSAS has to figure out which measures go with which dimensions (eg which sales are for which product color). This is done by creating relationships between measure groups and dimensions. Regular - Relationships between measure groups and dimensions are created based on the PK-FK relationships of underlying tables (eg FactInternetSales DimProduct) Reference - If you have a snowflake schema there may be no direct PK-FK relationship, so you can create dimension based on multiple tables, one of which has a PK-FK relationship with the fact table or create the relationships manually with columns from multiple tables (e.g. Category-DimSubcategory-DimProductFactInternetSales)
6234 SQL Server Analysis Services Fact dimension is stored in a fact table and has PK-FK with the fact table (e.g. parent-child hierarchies employee-manager) Many-Many uses an intermediate dimension table to break up a many-many relationship into two 1-many relationships Demo relationship type 1. Go to Dimension Usage tab of Cube Designer 2. Click on a few existing relationships to show relationship type Partitions Partitions allow you to store data in separate partitions. For example data for each quarter could be in its own partition, so if you only want current data you only need to search one partition and some partitions do not need to be reprocessed as often so you shorten processing time. Or if you do need to search multiple partitions, if you have multiple processors the searching can be done in parallel. Make sure you define the partitions so data is not stored in two partitions to save on memory! You can partition horizontally, you have multiple fact tables Orders1998, Orders1999, and each fact table is a partition of a single measure group You can partition vertically, you have a single fact table and you define a query that filters which data goes in which partition e.g. SELECT dbo.FactResellerSales Where orderdatekey>= 20040601 AND orderdatekey <= 20041231 You can define the Partition slice to tell SSAS what is in each partition so that queries know which partition to query e.g. to get all products in category 1 (Bikes) from 2001 and 2002 {[Date].[Calendar Year].&[2001],[Date].[Calendar Year].&[2002]}* {[Product].[Product Categories].[Category.&[1]} Usually you partition by time, or by a dimension member (country, product category) You can set storage options for each partition, e.g. ROLAP for current month so it is always up to date, HOLAP for previous quarters of the current year, MOLAP for past years so they are quick to query, but dont need to be reprocessed since data is fixed. Define notifications to be used by proactive caching per partition 1. Go to partitions tab of cube designer 2. Change property Table source to query sourceadd a where clause Where orderdatekey>= 20040601 3. Add a second partition with a complementary WHERE clause Where orderdatekey < 20040601 4. Select a partition go to Storage settings 5. Choose custom setting 6. Enable proactive Caching 7. Go to notifications tab
6234 SQL Server Analysis Services SQL Server specify tracking tables specify list of tables in database that if they receive an update you want to be notified (e.g. FactInternetSales) (separated by ; to send notifications for) Client initiated XMLA command notify NotifytableChange to notify of changes instead of SQL Server initiated ScheduledPolling queries are run on a scheduled basis to detect changes
Designing Aggregations You can precalculate aggregations to speed up query time. It is not practical to pre-store all aggregations, so you have to balance pre-calculated aggregations with memory usage Aggregation Design Wizard helps you determine what aggregations to pre-calculate 1. Go to partitions tab of cube designer 2. Rt-click on a partition choose Design Aggregation brings up Wizard 3. Choose standard settings Click Next 4. When tables are listed, click count to count nbr of records in each table (or enter estimated number of rows, if tables are not fully loaded yet) click Next 5. Choose Design Aggregations until Option and click Start Estimated Storage Reaches ask wizard to create pre-calculated aggregations to a specified memory limit, use if memory is limited Performance Gain reaches (start with 30%), to generate pre-calculated aggregations to improve query performance by a specified amount I click Stop watch the graph and click stop when you have reached the desired improvement vs memory Do not design aggregations to remove existing aggregations 6. Try a couple of different options, Reset between options 7. Click Next after aggregations are calculated 8. Either Deploy & process, or save but do not process 9. Then you can go to the partition choose process and choose process Index (do not deploy & process cube first) and see how long it takes to process the additional aggregations to see if the processing time is too long To measure query improvements, check query times before and after aggregations are added. To measure processing cost, process index after designing aggregations, or compare processing times before and after aggregations are added. Once aggregations are designed for one partition, they can be copied to another partition using Object Explorer in SQL Server Management Studio Lab Notes Unfortunately this is not the greatest lab, it works, but doesnt teach much. Exercise 1, Task 3 Open the .dwproj file not the .Sln file Exercise 2: Task 2 the regular relationship was created automatically by the dimension wizard Exercise 3: Task 2 Right Click Internet Sales and launch Aggregation Design Wizard
6234 SQL Server Analysis Services You will not get the Select Partitions to Modify screen They tell you to NOT design aggregationskind of pointless, why not try performance gain of 30%!
6234 SQL Server Analysis Services Module 5 Querying Multidimensional Solutions Lecture: 70 minutes Lab: 45 mins p 5-12 (ex 1 MDX Queries, Ex 2 Calculated Member, Ex 3 Named Set) MDX (multidimensional Expressions) Was created to query OLAP databases, designed by Microsoft but has been generally adopted by OLAP providers. It used as a query language for querying and as an expression language for calculations SQL queries an OLTP database, returns a 2-D set of results, like a table MDX queries an OLAP database returns a cube Cells intersection between the measure and the dimensions Tuple an expression that identifies a cell or section of a cube (e.g.Bikes in January, or Bikes in January bought by Smith, OR just bikes). When a tuple represents a section of the cube you are slicing the cube Set a collection of tuples from the same hierarchy is called a set, sets are in {}, each member of the set is separated by a , an ordered set of tuples, from the same hierarchy e.g. {[Sales].[Bikes].[January],[Sales].[Bikes].[February]} make up a set You must have at least one axis, ON COLUMN Then you can add more axis ON ROWS, ON PAGE, MDX for Queries SELECT query_axis_clause FROM subcube // is used for comments with MDX queries MDX Queries 1. Open BIDS MOD05\labfiles\project and deploy the cube 2. Open SQL Server Manager Studio Connect to Analysis Services on Server NYSQL-01 3. Expand the MOD05 database 4. Click on Adventure Works and Choose New Query 5. Walk through MDX Queries in handout ON COLUMNS uses one or more tuples to define an axis, returns a row of values ON ROWS uses one or more tuples to define an axis, returns a pivot table ON PAGES adds 3rd dimension, returns a cube, cannot be viewed in SQL Server Management Studio
6234 SQL Server Analysis Services Calculations Analysis services stores the syntax for calculations, Calculations do not add to the size of the cube, they are calculated at runtime 1. Show MDX Cube example in SSMS then do this example in BIDS 2. Create new cube for FactInternetSales make sure to include Sales amount and Unit Price measures with dimProduct and DimCustomer 3. Go to Product Dimension add Color attribute 1. Deploy and Process Cube 2. Go to Calculations Tab 3. Click New Calculated Member on Toolbar 4. Name [Price without Tax] 5. Hierarchy Members 6. Calculation [Measures].[Sales Amount]-[Measures].[Tax Amt] 7. Set format string = Currency in additional properties 8. Process & deploy cube 9. Browse cube add Sales Amount, Unit Price and Price Without Tax by Product 10. Show Script View on Calculations Tab Creating a calculation using the MDX command instead of the form view (you can use the example on the MDX queries handout. Just go to script view and type this command after the last command. To calculate pct of total sales for each product category CREATE MEMBER CURRENTCUBE.[MEASURES].[Percent of Color Sales] AS ([Measures].[Sales Amount])/ ([Measures].[Sales Amount],[Dim Product].[Color].[All]) If you set format of the calculation to Percentage it will multiply by 100 Named Sets A set is like a Database View, it contains a subset of the cube. You can create a permanent set to use for calculations, or temporary sets for use within queries or a single session. 1. Go to Calculations tab 2. Add Named Set 3. Name [Dark Colors] 4. Expression {[Dim Product].[Color].&[Black],[Dim Product].[Color].&[Blue]} 5. Browse Cube Sales Amount by color 6. Drag Named Set to Dimension Filter above Cube area (subcube area), show how cube now only shows colors in named set, a subset of the original cube. 7. Show Script View on Calculations Tab To create a SET using MDX command Create a set of all measures for the combination of Black & Blue products
CREATE SET [Adventure Works].[Black and Blue] AS {[Product].[Color].[Black],[Product].[Color].[Blue]} ;
When you use CREATE SESSION SET the set only exists for the session
SCOPE Allows you to define a subcube that can then be used as the target for MDX calculations. It is like an Update statement, first you define what records to update, then you define what the new value should be when the cube is browsed This allows you to change the values displayed in a cube after the cube is processed For example, if you didnt want anyone to see the values for Black Products. 1. Go to calculations Tab 2. Click Script View 3. After CALCULATE; 4. Add SCOPE([Measures].members); 5. ([Dim Product].[Color].&[Black])=NULL; 6. END SCOPE; 7. Process Cube, Browse cube, show sales for all color products, black does not appear We want to set 2002 Sales for Q4 bikes to be 50% more than Q1 sales SCOPE (measures.[sales amount], [product].[category].[bikes], [Date].[Calendar].[Q4 CY 2002]); THIS = ([product].[category].[bikes], [Date].[Calendar].[Q1 CY 2002]) * 1.50; END SCOPE; We want to increase sales quotas for 2002 one hundred fold SCOPE ([Date].[Fiscal Year].&[2002],[Date].[Fiscal Quarter].Members, [Measures].[Sales Amount Quota]) ; This = [Measures].[Sales Amount Quota]* 100 ; END SCOPE;
6234 SQL Server Analysis Services Module 6 Customizing Cube Functionality Lecture: 75 minutes Lab: 60 mins (Ex 1 KPIs, Ex 2 Actions, Ex 3 Perspective, Ex 4 Translation) **There is a .txt file in D:\Labfiles with MDX expressions for KPI Exercise Key Performance Indicators KPIs KPIs are a measure of business metrics against targets, e.g. sales, registrations, completed census surveys Trend indicators show the trend of the KPI over time You can assign visual indicators in the cube for the KPI & trend Creating a KPI Name name of KPI Associated Measure Group which measure groups are associated with the KPI Value Expression MDX expression to calculate the KPI (e.g. Sales Amount) Goal Expression the value or MDX expression to calculate the KPI target (e.g. increase of 15% over last year, Last years Sales * 1.15) Status Indicator graphic to show indicating status of KPI (happy face, traffic light) Status Expression MDX expression to calculate value for status indicator must return value between -1 and 1 (-1 bad, 0 acceptable, 1 good) Trend Indicator graphic to show indicating trend of KPI (usually an arrow) Trend Expression - MDX expression to calculate value for status indicator must return value between -1 and 1 (-1 bad, 0 acceptable, 1 good) Create a KPI 1. Build a cube based on FactInternetSales and related tables 2. Go to Date Dimension and Add Calendar Year, and Calendar Quarter 3. Create a Hierarchy called Calendar with Calendar Year and Calendar Quarter 4. Build & Process cube 5. Go to KPI Tab, click New KPI 6. Name=Sales KPI 7. Measure Group = Fact Internet Sales 8. Drag Sales Amount measure from the Metadata tab to the Value expression [Measures].[Sales Amount] 9. Make the goal to be 1.2* the sales from the previous year, usually you would compare to currentMember, but since the data is old, we hard code 2004 as the year to go one before. 1.2* ([Measures].[Sales Amount], ParallelPeriod([Order Date].[Calendar].[Calendar Year],1,[Order Date]. [Calendar].[Calendar Year].[2004])) 10. Status indicator = faces 11. Status Expression = 1 (-1 bad, 1 good) (eventually this will be MDX expression 12. Trend Indicator = arrows 13. Trend Expression = 0.8 (-1 bad, 1 good)
6234 SQL Server Analysis Services 14. Deploy the project, on KPI tab, reconnect switch to Browser View, show the indicators. 15. Change status (0) and trend values (.3), redeploy & browse to show changes to symbols Use MDX Expressions for Status & Trend 16. Change Status to use a case statement that divides Value by Goal and based on the percentage returns either -1, 0 or 1 CASE WHEN KpiValue("Sales KPI") /KpiGoal ("Sales KPI") >= 1 THEN 1 WHEN KpiValue("Sales KPI") /KpiGoal ("Sales KPI") >= .5 THEN 0 ELSE -1 END 17. Now process and deploy the cube, you should see a happy face for status, if you change the goal to be 4* instead of 1.2 * you will see a neutral face, if you change the goal to be 10* instead of 1.2 you will see a sad face 18. Change Trend to an MDX expression that compares current value of KPI to value from previous time period CASE WHEN ([Measures].[Sales Amount])> ([Measures].[Sales Amount],[Order Date].[Calendar Year].[2003]) THEN 1 WHEN ([Measures].[Sales Amount]) < ([Measures].[Sales Amount],[Order Date].[Calendar Year].[2003]) THEN -1 END 19. Deploy and browse 20. If you want - open Microsoft Excel 2007, Data Other Sources Analysis Services, connect to the cube, in the Pivot Table field list you will see KPIs listed after the measuresyou could show Sales Amount by Calendar Year and add the Sales KPI Value, Goal, Status, and Trend (numbers look a little wacky though, may need to use previous member current member to show KPIs per year) In the lab you will use the ParallelPeriod function, you pass it a parent value in a hierarchy (e.g. year, the number of period you want to go back, usually 1, then the value you want to move back from e.g. current Month. This will move from current Month, up to the year level, move back one year, and go to equivalent month in that previous year. ParallelPeriod([Date].[Fiscal Time].[Fiscal Year],1, [Date].[Fiscal Time].CurrentMember)) PARALLELPERIOD accepts an expression for the level in the hierarchy, nbr of periods to lag, member to use as start point to compare to. Browsing KPIs Excel 2007 and Performance Point allow you to browse KPIs or you can use MDX quries
6234 SQL Server Analysis Services Go to SQL Server Manager and execute MDX query to retrieve KPIs SELECT {KPIValue("Sales KPI"), KPIGoal("Sales KPI"), KPIStatus("Sales KPI"), KPITrend("Sales KPI")} ON COLUMNS FROM [MyCube] Actions Actions are MDX expressions that allow users to browse data, launch an application, go to a URL or other defined action Actions are server based so they can be managed centrally and attached to the cube Action types Drillthrough allow user to drill to more data Report submit a URL request to SQL Reporting Services to launch a report Dataset return a dataset based on an MDX query to the client application Proprietary custom actions you define Rowset return a rowset based on an OLEDB command, good for returning relational data Statement run OLEDB Commands that return success or failure URL display dynamic webpages CommandLine Execute a command at the command prompt, must be created with an MDX statement, cant be done in Business Intelligence Studio HTML execute HTML scripts in Browsers, cannot be created in Business Intelligence studio, you must use MDX expressions Action Properties Name name for the action Action Target the target Type & Target object Condition optional MDX expression to limit scope of the action Action Content the action to take, syntax depends on type of action selected Invocation how the action should run in the application Application the application associated with this action, allows client applications to control which actions to show Description optional description of the action Caption name the user will see for the action Caption is MDX allows you to use MDX expression for the caption Report Server, Parameters for Report Actions defines report server and report parameters Drillthrough columns, maximum rows Columns and rows to return for a drillthrough action
6234 SQL Server Analysis Services Demonstrate a Drillthrough Action 1. Go to Product dimension, add color, size, English product name 2. Go to Actions Tab of Cube Designer 3. Add New Drillthrough Action 4. Leave Condition Blank 5. Add Measure group Members Fact Internet Sales (this defines what you rt-click on to get the drillthrough) 6. Add Drillthrough Columns: Dimension Product Attributes color, size, product name (what details you will see when you drillthrough) 7. Go to additional properties 8. Set Default=True 9. Maximum Rows = 1000 10. Caption=Drillthrough 11. Deploy & Browse Cube 12. Create Cube with SalesAmount and Product - Color 13. Select a SalesAmount Rt-click Drillthrough 14. Show the data displayed by the drillthrough action Demonstrate a URL Action 15. Go to actions tab of cube designer 16. Add new action 17. Name = Go to website 18. Target Type=Attribute Members 19. Target Object=Product.color 20. Action Type=URL 21. Action Expression=http://www.bernardcallebaut.com 22. Deploy & Browse Cube 23. Drag Product.color to cube, rt-click a product choose Go to Website 24. Show how it launches specified website You can reference the selected attribute in the url e.g."http://www.bernardcallebaut.com?Product="+ [Dim Product].[Color].CurrentMember.Name Perspectives A perspective is a logical subset of a cube to help focus data for users, perspectives do not store data and cannot be used for security (since you cant use permissions to say you can only see this perspective) they are meant to simplify browsing for the user 1. Go to Perspectives tab of cube designer 2. Create new perspective 3. Give perspective a name Internet 4. Deselect all measures, KPIs, actions & dimensions that do not apply 5. Deploy & Browse Cube 6. Reconnect to Cube & select Internet from Perspectives Listbox 7. Show how only selected measures and dimensions are available
6234 SQL Server Analysis Services Translations Allows you to display the caption in a different language, or display a different column for different languages 1. Go to Translations tab of cube designer 2. Create new Translation 3. French-Canada 4. Enter French captions for dimensions and measures 5. Deploy & Browse Cube 6. Reconnect and Select French Canada from Languages Listbox 7. Point out the names of measures and dimensions have switched to French translation 8. Go to Dimension Designer for Time 9. Go to Translation Tab 10. Add French translations for captions 11. Click on ellipsis button in English Month Name where you enter French translation 12. select French Month Name to use when language is French 13. deploy and browse cube to show French data displayed LAB NOTES there are .txt files with the Status and Trend expressions in E:\Mod06\Democode\UsingKPI_*.txt The Trend expects a hierarchy level of Calendar Year which does not exist in the Date Calendar Hierarchy, add Calendar Year above Calendar Quarter and then reprocess the dimension and cube then the Trend will work Exercise 3 only clear checkboxes for all measures, leave the dimensions!
6234 SQL Server Analysis Services Module 7 Deploying and Securing an Analysis Database Lecture: 60 minutes Lab: 60 mins (p7-18 Ex 1 Deploy Solution, Ex 2 Secure Solution) Deploying an Analysis Services Database In Analysis Services 2000 you had to backup a database and restore it to production. This still works but now we have new options 3 Steps completed when you deploy a project 1. Build project in BIDS, any definition errors are caught here 2. Database with name of the project is created and objects defined in project are created within this database 3. Database is processed How to deploy Project? Deployment Wizard cant do incremental updates, scripts will recreate entire database, you can save XMLA script to re-run later XMLA Script Created by Wizard or SQL Server Management Studio, run XMLA scripts to recreate database cannot do incremental updates Synchronize Database Wizard in SQL Server Management Studio to synchronize SSAS databases on separate instances. If target database exists data is synchronized, if target does not exist creates a new copy. Backup and Restore - Backup database restore on another server, like you did in SSAS 2000 Setting Deployment Options Go to project properties show configuration settings under Build and Deployment Deployment Server Edition Edition of the Server on which solution will be deployed (Enterprise, Standard, Developer) Output Path where output files are placed after a build Remove Passwords Whether to remove known passwords from connection strings, if removed passwords will need to be supplied when deployed project is processed Deployment Mode whether only changed objects or all objects are deployed (e.g. if you have two cubes in the project does it send out both?) Processing Option whether to do processing, and if so whether to do full processing of cube on deployment Transactional Deployment deploy as a transaction? Deployment Server and Database where to deploy
6234 SQL Server Analysis Services Use Deployment Wizard to generate XMLA Script to create the database. Can be run graphically or from command prompt .asdatabase file contains the definitions for all the SSAS objects and is created when you do a build 1. Start All Programs Microsoft SQL Server 2008 Analysis Services Deployment Wizard 2. Select E:\Labfiles\Mod06\Labfiles\bin\Mod06Lab.asdatabase (this file is created when you do a build of a Project in BIDS) 3. Specify target server and database 4. Specify whether or not to deploy roles & partitions 5. Expand Data Source Connection strings to show how you can change them for deployment 6. Select whether to deploy & process or just deploy 7. Specify location E:\Mod06\Labfiles\bin\Mod06Lab Script.xmla 8. Open E:\Mod06\Labfiles\bin\Mod06Lab Script.xmla 9. If you run that script in a SQL Server Management Studio it will create and process the cube Create XMLA Scripts using SQL Server Management Studio, then run scripts on production server. 1. Open SQL Server Management Studio, Connect to Analysis Services NY-SQL01 2. Rt click database - Script Database As CREATE TO - Query Editor Window 3. Show generated XMLA Use Synchronize Database Wizard in SSMS to synchronize two separate Analysis services databases. If 2nd database does not exist it is created, if it already exists, synchronizes the data with the target Target database remains online during synchronization so users can still query First synchronization must synchronize all files Second synchronization can be changes only Done with the SYNCHRONIZE XMLA command 1. Go to SQL Server Management Studio 2. Rt-Click Databases folder Choose Synchronize to launch wizard 3. Show how you select source & destination servers
6234 SQL Server Analysis Services Security SSAS relies on Windows Authentication to authenticate users. After user is authenticated SSAS controls permissions based on the users role membership Fixed server role for administrators Can create database roles for users with specific rights Grant permissions for database and cube dimensions, dimension members, cells within a cube, mining structures, mining models, data sources and stored procedures Role permissions are additiveif you are granted two roles and one role is denied access the other has accessthe access is granted ---this is different from most security models Granting access to Fixed Server Role (Administrator role) Members of Administrators local group are members of fixed server role automatically 1. Open SQL Server Management Studio 2. Highlight Server name (NY-SQL-01) rt-click Properties 3. Go to security Tab, show Add for adding users to this role Creating User Roles 1. Open SQL Server Management Studio 2. Go to a SSAS Database, expand roles folder 3. Rt-click Roles Folder choose Add Role 4. Show tabs where you can define permission levels for different objects 5. Open BIDS, create a project, go to Menu-Project Choose New role or in Solution Explorer Rt Click Roles New role, show tabs to define role in BIDS Permission considerations You do not need access to a data source to access a cube. You need access to a data source if you use a mining model that connects to a data source to access user-defined data, so create a role for this purpose if required. Granting permission on a cube by default grants permissions on dimensions in the cube Cell Permissions For cell level security you need an MDX expression, if expression returns 1 (True) value is displayed, if returns False (0) the value is not displayed Read access can read cells and includes calculated cells based on these cells Read Contingent can read cells but not calculated cells based on these cells e.g. NOT Measures.CurrentMember IS [Measures].[Sales Amount Quota] will hide Sales Amount Quota Test Permissions On Cube Browser tab in BIDS click Change User button Test Administrator permissions using Run As LAB NOTES: If you get a connection error running the .sql script, copy the script to a clipboard, create a new query window and then paste the script into the query window and run it.
6234 SQL Server Analysis Services Module 8 Maintaining a Multidimensional Solution Lecture: 75 mins Lab: 40 mins p8-25 (ex 1 Processing, Ex 2 Logging and Monitoring, Ex 3 Backup and Restore) Processing One of the important jobs of an Analysis Services DBA is to process the objects (cubes, dimensions, mining models) OLAP databases and Cubes must be deployed and processed. Deploying creates the schema. Processing is when the multidimensional objects are populated with data and aggregations are calculated. When processing occurs, the data source is queried to fetch the source data. Processing is done within a transaction, so if all the dimensions process correctly but the cube processing fails, everything is rolled back. Processing Cubes Process Default detects the state of the cube and executes the appropriate processing option (useful after a Process Structure) Process Full Does 2 steps: Process Data (reading data from data source), and Process Index (processing aggregations and indexes). This processes the object and all objects it contains. Any old data is cleared out. It is done as a transaction. Temporary files are created containing the processing, so users can continue to access the cube during processing. When processing is complete during the Commit the temporary files are moved to production and users cannot access the cube. This takes a lot of memory! You basically have your entire database stored twice plus temporary files created to do calculations as well!! So although we love Process Full, it is not always an option. You should do this (or Data then Index) if the structure of the cube changes Process Incremental adds new fact data, it creates new files and then merges them with existing files. Warning only use this to add new values, if you use Process Incremental on a fact table that includes values already processed they will be double counted!!!! So add new fact data to a partition and process that partition, or use Process Incremental and specify a query for the new fact data to be processed Process Data populates data but does not build indexes or aggregations. It is usually used for cubes, it is similar to Full process except if you process data for a cube it will never reprocess the dimensions it will always use the existing dimension files. If Process Full is failing for a large cube, break it into a Process Data followed by Process Index Process Index creates or rebuilds indexes and aggregations for all processed partitions Unprocess clears the data Process Structure creates only the cube definitions for previously processed cubes. You can browse the structure see the names of measures & dimensions, but you cannot query the cube data. When you are satisfied and want the data, run the Process Default so users can query the data.
6234 SQL Server Analysis Services Processing Dimensions When a new row is added to a dimension (e.g. new product), or an attribute of a dimension changes (e.g. employee changes location) you need to reprocess the dimension Process Default dimension data or indexes are processed if they have not been processed or are out of date Process Full the entire dimension is re-processed, dimension data and indexes are dropped and re-created (unavailable to users at that time). Process Update/Incremental is like an incremental update of a dimension picks up new records and updates to attributes Process Data processes the dimension data (not indexes) Process Index creates indexes for attributes in the dimensions Demo Go to SQL Server Management Studio 1. Open Adventure Works UDM Cubes Adventure Works UDM 2. Rt click Process show how you can change the process options in the drop down box 3. Go to change settings a. Parallel tasks (how many processors & how much memory have you got?) + all objects as a single transaction (if dimensions succeed and cube fails it all rolls back) b. Sequential dimensions and cubes can be done as separate transactions c. Writeback table option whether or not writeback is enabled so you can create or modify your cube without going to the data source d. Process Affected Objects e.g. dimensions for the cube? e. Dimension Key Errors by default if there is an error it rolls back the transaction, you might want to handle duplicate key values or Key Not Found (fact record has product id that does not exist in Product dimension) 4. Show how there is a script button at the top to generate an XMLA script for you Batch processing You can process several objects at once in parallel or in sequence and you can control the order Use Ctl-click to select multiple objects in SQL Server Management Studio and rt-click and choose Process Use Ctl-click to select multiple objects in the solution explorer in Visual Studio and rtclick then choose process Use XMLA scripts or SQL Server Agent to automate batch processing To create an XMLA script use SQL Server Management Studio rt-click Script Use SSIS tasks to do batch processing Demo SSIS Project 1. Open BIDS, Create new SSIS project 2. Point out the two SSAS objects a. Analysis Services Processing Task specify SSAS connection and one or more objects to process, same processing options as in SSMS b. Analysis Services Execute DDL Task can execute XMLA scripts
Logging Logging can be enabled on each instance of SSAS Five Error Logs Error maintains errors configured and raised during processing and other operations Flight Recorder short-term log tracks activity on Analysis Services instance, used for troubleshooting, only enable in production when troubleshooting is required. Has high overhead Query records statistical information about running queries on instance good for Usage-Based aggregation design Exception - should only be used with guidance from Microsoft Support Trace should only be used with guidance from Microsoft Support Demo Logging Properties & Query Log 5. Go to SQL Server Management Studio, connect to Analysis Services 6. Select NY-SQL-01 Rt-click Properties 7. Show the different logs listed 8. Go to Log\QueryLog\QueryLogConnectionString set value create connection to relational database AdventureWorksDW 9. Go to Log\QueryLog\QueryLogTableName = OLAPQueryLog 10. Go to Log\QueryLog\CreateQueryLogTable = True 11. Server will log 1 in 10 queries by default, change Log\QueryLog\QueryLogSampling = 1 to log every query 12. Click OK To save settings 13. Restart Analysis Services to pick up log setting changes 14. Connect to DatabaseEngine NY-SQL-01 AdventureWorksDW show OLAPQueryLog table created 15. Go to cube, and execute New Query 16. SELECT [Measures].[Total Product Cost] ON COLUMNS 17. FROM [Adventure Works UDM] 18. Go to AdventureWorksDW and execute the query 19. SELECT * FROM olapquerylog Now you can go to BIDS and the Aggregations tab and choose Usage Based Optimization from the toolbar to design aggregations based on the queries that have run!
6234 SQL Server Analysis Services Monitoring with SQL Server Profiler SQL Server Profiler has events, event classes and event categories to test functionality and performance of MDX queries. You can capture traces in production and replay them in a test environment to test and optimize. Demo SQL Server Profiler 1. Start SQL Server Performance Tools SQL Server Profiler 2. File New Trace NY-SQL-01 Analysis Services 3. Go to Events show the different events you can trace Monitoring with System Monitor System Monitor includes counters for SQL Server 2008 Object Names start with MSAS 2008 Before optimizing you may want to restart or clear the cache so you only have the statistics in which you are interested. The XMLA script on the slide will clear the cache Demo System Monitor 1. Start All Progams Administrative Tools Performance 2. Rt-click - Add Counter show the MSAS counters Optimization Suggestions Usage Based Optimization Wizard reads information in query log Make sure query log is representative of usage patterns You can filter the query log based on a date range, users or frequency of queries Make sure you have correct counts of records for SSAS to design the aggregations and have the proper attribute relationships Demo Usage Based Optimization Wizard 1. In SQL Server Management Studio 2. Expand a OLAP Database 3. Expand Measures 4. Expand a Measure 5. Select a partition rt-click choose Usage Based Optimization Wizard 6. OR you can launch it from the aggregation tab in BIDS Usage Based Aggregation Sample Perfmon counters MSAS Memory 2005: Limit Memory High KB N/ A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\m smdsrv.ini Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\m smdsrv.ini
N/ A
6234 SQL Server Analysis Services MSAS 2005: Memory MSAS 2005: Memory MSAS 2005: Storage Engine Query MSAS 2005: Storage Engine Query MSAS 2005: Storage Engine Query MSAS 2005: Storage Engine Query MSAS 2005: Connectio n MSAS 2005: Connectio n MSAS 2005: Locks MSAS 2005: Threads MSAS 2005:Proc Aggregati Memory Usage KB File Store KB Queries from Cache Direct / sec Queries from Cache Filtered / Sec Queries from File / Sec Average time /query N/ A N/ A N/ A Displays the memory usage of the server process. Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache Displays the rate of queries answered from the cache directly
N/ A
N/ A
N/ A
Current N/ connectio A ns Requests / sec Current Lock Waits Query Pool job queue Length Temp file bytes written/s N/ A N/ A N/ A N/ A
Displays the number of connections against the SSAS instance Displays the rate of query requests per second Displays the number of connections waiting on a lock The number of queries in the job queue
6234 SQL Server Analysis Services ons MSAS 2005:Proc Aggregati ons ec Temp file rows written/s ec N/ A Shows the number of bytes of data processed in a temporary file
6234 SQL Server Analysis Services Backup and Restore You need to back up analysis services and your relational database Analysis Services Backup will backup MOLAP multidimensional structure, aggregates and data HOLAP multidimensional structure and aggregates (crucial to backup relational database where data is stored) ROLAP multidimensional structure (crucial to backup relational database where data is stored Security considerations User performing backup or restore needs appropriate file system permissions to write to backup location and be a member of Analysis services Server role or have Full Control permissions on database they are backing up You can encrypt a backup with a password How to Backup Analysis Services Database In SQL Server Management Studio, select database in Object Explorer, rt-click Backup Use XMLA Script (example in course book) Create SQL Server Agent job job step type SQL Server Analysis Services Command and specify XMLA backup script Backup options Backup file name name of backup file Database name name of database to backup Allow overwrite overwrite existing backup files? Apply compression compress backup files less space but slower to execute Encrypt Backup file allows you to encrypt backup so cannot be restored without password Password password for encrypted backup file Backup remote partition whether to backup data from remote partitions Remote partition backup location Security how to back up roles and permissions (only available in XMLA backups) Restoring SSAS Database Sql Server Management Studio select database in Object Explorer, rt Click Restore Use XMLA script (example in course book) Restore options Restore database name of database to restore From Backup file location of backup file to use when restoring Allow Database overwrite - replace existing database? Include Security Information whether to copy security information Password to decrypt encrypted backups
Chapter 9 Data Mining Lecture: 70 mins Lab: 45 mins (p9-18 Ex 1 Create Data Mining Structure, Ex 2 Add Data Mining Model, Ex 3 Explore Data Mining Model, Ex 4 Validate Data Mining Model What is Data Mining? Data Mining is the process of searching through data to extract patterns and trends using various algorithms Use data mining to predict unknown values based on statistics and patterns in previous data Use Data mining to: Explore data find out profile of users who bought a product Find Patterns find out what types of products a particular customer purchases Predict predict sales for the next quarter Cubes vs Data Mining Cubes show us aggregations (total sales) Data Mining shows us the patterns of data (products frequently purchased together) Cubes show us what has happened. Data Mining can forecast the future Data Mining Structure - made up of case tables and one or more data mining models. When you create a mining model SSAS retrieves the data from the data source and stores it in a proprietary format, to avoid storing the same data multiple times, a mining structure is created so the models can share the same data. Case Table stores the source data for training data mining models. This data is used to train the models so it should be accurate and relevant and plentiful Data Mining Model defines which data mining algorithm to use, which columns the algorithm uses and whether each column is an input column, a key column or a predictable column Key Columns identify the row and are usually the primary key of the table (e.g. Customer Key) Input Columns are factors that might affect the output (e.g. Age, Marital Status, Gender) Predictable Columns are what we want the model to try and predict (e.g. Sales Total, Bike Buyer y/n) A predictable column is often an input column as well Ignore Columns Some columns should be ignored by the model (e.g. Name) Sources Cubes (must be in same database as data mining model) Relational Databases (must define a Data Source View) Training the model
6234 SQL Server Analysis Services To train the model you process it in SSAS, training the model involves loading it with data and executing the mathematical algorithm associated with it to derive useful patterns and rules from the input data. These patterns and rules are then stored on the server. When a model is trained it must read input data from a single table called a case table. If the data you want to analyze is in two tables, you can nest the tables e.g. Product table could be a nested table of the Product Purchased column in the CustomerOrders Table SSAS only supports one level of nesting! So you must design your data tables accordingly Validating Models You have different models you can use to analyze data, so you can try each one and see which is most accurate. You define one Data mining structure which you use for multiple data mining models much like we define one data source view to user for one or more cubes. Then you use Lift charts to compare results and determine which model is the most accurate for the given data. Steps in developing and deploying Data Mining Solution 1. Define mining domain gather requirements, what problem are you trying to solve: customer profiling? Sales forecasting? Determine what data is needed 2. Prepare data put data into a source data format appropriate for Data Mining 3. Construct Data Schema data source view & data mining structure 4. Build model Identify and build models Using Data Mining Wizard 5. Explore Model Mining Viewers 6. Validate Model Mining Accuracy Chart 7. Deploy Model BI Studio or SQL Server Management Studio Data mining algorithms (2000 only supported Decision Trees & Clustering) Decision trees predicts probability of each state of input based on each state of predictable column e.g. try to see if a marketing approach will be successful with customers Time Series to predict future continuous values based on historical continuous values e.g. future sales of bikes based on sales in past 3 years Clustering Algorithm groups cases into clusters with similar characteristics. Use to assess probability of a point entering a cluster or to assign a data point to a cluster e.g. will this person earn $20-30K a year or $30-60K a year Association Rules looks for items that occur together in a transaction (helpful for finding cross-selling opportunities, e.g. most people who took SQL Server also took Windows Server) Sequence Clustering clusters together cases with similar sequences. Useful for predicting patterns like the paths users take on a web site Neural Network Algorithm similar to decision trees, calculates probabilities for each possible state of the input attribute with each state of the predictable attribute. (e.g predict stock movements)
6234 SQL Server Analysis Services Nave Bayes algorithm used for predictive modeling. Assumes all columns are independent, runs faster, but might not detect all correlations. Use for steps like which customer is most likely to buy a product Linear Regression Algorithm relates two continuous columns. Can predict values outside the existing range Logistic Regression Algorithm similar to linear regression but constrained to vales the output column can contain. Linear regression might suggest something reaching 110%, Logistic would never exceed 100%
Data Mining Tools Data Mining Wizard -create Data Mining structures Data Mining Designer - to configure the structure, add new data mining models, train models and create predictions Create a Data Mining Structure 1. Open Visual Studio. Open the E:\MOD09\Democode\Adventure Works DM Folder\Adventure Works DM.sln 2. You will need to upgrade it to 2008 and you will need to edit the data source because it currently points to MIAMI AdventureWorksDW instead of NY-SQL01 AdventureWorksDW2008 3. Labfiles\Starter\AdventureWorksDataMining.sln 4. Show Data Source view containing V-TargetMail 5. In Object Explorer, click Data Mining Structure Rt-click New Data Mining Structure 6. Launches Data Mining Wizard 7. Select create structure for relational Database 8. Select your Data Mining Algorithm (Decision Tree) 9. Select the Data Source View to use for the model AdventureWorksDM_DSV 10. Select the Case table to use (v_targetMail) 11. Select Key Column (the key value that identifies the different rows) Customer Key 12. Select Predictable column (the column whose value you want to try and predict (BikeBuyer) 13. Try Suggest for Input Column to see which columns have patterns for the predictable column (Bike Buyer, Age, CommuteDistance, MaritalStatus) 14. Accept default Columns Content and Data Type (discrete have fixed set of values e.g. gender, province, continuous have any value e.g. salary) 15. Enter name for mining structure and model allow drill-through 16. Process Model
6234 SQL Server Analysis Services Demo Data Mining Designer 1. Mining Structure Tab to change properties or columns or Data Mining Model (e.g. Change column property to discrete or continuous) 2. Mining Models - to add or modify algorithms used. Add a Clustering Algorithm reprocess/train new model 3. Mining Model Viewer - to explore the model data, each algorithm has its own viewer format. (do not bring up more tabs for cluster view it will probably hang!!!!) a. Show decision tree model, darker boxes have more buyers b. Change level to show how only more important dependencies are listed c. Rt-click on a box and choose drill-through to see the records for this box d. Click on Dependency Network tab of Mining Model Viewer, move slider down, it drops off the less significant dependencies so you can see which factors had the largest influence on the predictor column e. Show Cluster Diagram 4. Mining Accuracy Chart To test accuracy of a mining model or compare accuracy of several mining models with Lift charts a. Select v_target Mail as case table b. Click on Lift chart, cant chart the models, but scores for models are listed at the bottom of the chart c. Lift charts show different lines showing prediction for each model Ideal Model line is actual data, Random Guess line is without using a model to predict, and other lines are for each mining model in the data mining structure, so you can see which is closest to the Ideal Model line. d. Classification Matrix tells you numerically how far off model was Where the row & column match is the actual number, the other value is the error of the model. 5. Mining Model Prediction To specify input tables and map columns in these tables to the inputs of mining models and display the results of the prediction on your data in different format. Also use Prediction Query Builder to write DMX queries (Data mining Extensions) a. Select Singleton Query Enter Age 35, Commute Distance 1-2 Miles, Marital Status S b. In prediction query choose Prediction Function, Field PredictProbability, , Criteria [Bike Buyer],1 c. Choose Switch to Query result view button to see calculated probability d. Click Switch to SQL View to show DMX query that was created SELECT PredictProbability([Bike Buyer],1) From [mm_DecisionTree_bikeBuyers] NATURAL PREDICTION JOIN (SELECT 85 AS [Age], '10+ Miles' AS [Commute Distance], 'M' AS [Marital Status]) AS t
6234 SQL Server Analysis Services For reference only do not show to students. DMX SELECT <expression_list> FROM <mining_model> [NATURAL] PREDICTION JOIN <source_data> AS <alias> ON <column_mappings> <expression_list> the predictable columns from the model that will be retrieved (e.g. Customer Age, Education Level, Occupation) PREDICTION JOIN ON joins mining model to the source data If names in mining model match names in source data you can use NATURAL PREDICTION JOIN <source_data> is the input dataset could be OPENQUERY or OPENROWSET that fetches data from a relational model SELECT statement Another DMX query An application rowset such as a ADO.Net DataReader e.g. find out probability that a particular customer falls into a certain group and is therefore likely to buy the new bike SELECT ClusterProbability(Cluster B) FROM [CustomerProfilingMC] NATURAL PREDICTION JOIN (SELECT 35 as [Age], Professional As [Occupation], 80000 as [Yearly Income]) as t
6234 SQL Server Analysis Services Getting Started Data Mining 1. What is the business trying to do? What is their product? Their goal? Understand the business first 2. Understand the data, where is the data? Bring it together. Clean it up. If half the records say ON, and the other half say Ontario, the data mining algorithm wont know that is the same thing. Create calculated columns, like BikeBuyer to help predictive analysis. Break the data up into two sets about 2/3 to train the model, and 1/3 to test the model. 3. Identify attributes of the data, e.g. for a customer age, income, nbr of children. Which attributes are discrete (fixed number of values like Gender) vs continuous (can be anywhere in a range , age, salary) 4. Select the right algorithm (if you arent sure you can validate after the fact) 5. Choose a data set to data mine (usually 2/3 of the data, the remaining 1/3 saved to validate the model. The algorithm will analyze the data and create a data mining model. 6. Use your model to make predictions, or show patterns Examples of real world data mining Fraud detection for credit card transactions Amazon & Chapters people who bought this book also bought Professional sports to identify which players were successful in what situations (which players playing together improved the team) Data Mining Algorithms in SQL Server 2008 Classification predict one or more discrete variables based on attributes of input data. For example predict if the child will be a boy or girl, how many cars a family will purchase, will someone default on a loan Microsoft Decision Trees Nave Bayes Neural Network Regression similar to classification but they predict continuous instead of discrete variables. For example predict salary, age, sale price of a house. At least one attribute of the input data should be continuous as well (e.g. purchase price is a continuous input attribute for sale price) Microsoft Regression Trees Time Series Linear Regression Logistic Regression Segmentation very popular segmenting by input attributes who is buying bikes, people under 35? Women? People who live < 5 km from work? Helps target customers Clustering Sequence Analysis grouping input data by a sequence of operations. What web pages are people going to in what order based on input attributes (target internet advertising, people under 30 go this way, over 30 go this way) Sequence Clustering Association which data values are associated. Which products are purchased together. Helps supermarkets figure out what products to put on the same shelf.