Eme
Eme
Eme
EME is an object oriented data storage system that version controls and manages various kinds of information associated with AbInitio applications, which may range from design information to operational data. In simple terms, it is a repository, which contains data about datametadata.
Project
Project
A Project is a collection of related graphs and its associated elements like dml, xfr etc in the EME Datastore.
Project structure
Typically a project should contains maximum of 5 to 10 graphs. This helps in organizing the code efficiently within EME. With increase in the number of graphs in a Project, the time taken to perform dependency analysis on the graphs and related data increases.
Before adding a Project to an existing application, which already has a number of Projects in place, the impact it might have on other Projects and on the Application as a whole must be considered.
SQL
A public Project is public in the sense that their data and metadata are expected to be shared with other Projects and a private Project is private in the sense that their data and metadata are not expected to be shared with other Projects.
Select the Project /directory or file you want to check out by browsing to the particular Project /directory or file. In sandbox host dropdown list select the host on which the sandbox resides. Enter the path to an existing sandbox (the sandbox must be associated with the concerned Project, which is being checked out) or mention a new one in the directory field, which would be created during check out. The advanced options dialog can be seen by clicking on advanced button.
The first two options specify whether to check out the required files from the parent project and whether to check out required files from the common Projects. The default is check out the required files from the parent project. A file is required if it is directly referenced in a graph or if it is referenced in an include in a dml or xfr. While checking out a whole project these two options are disabled as shown above. Run host setup script makes sure to run the host profiles set up script before check out and mark files read only on check out does exactly what it says. The default is on for both of these options. We can select a particular tagged version of the object we want to check out from the tag drop down list. By default the latest version is checked out.
On clicking next, if the sandbox doesnt exist then a confirmation is asked whether to create the new sandbox or not. Clicking yes creates the sandbox and checks out the object mentioned to this sandbox. You will be prompted to enter the sandbox locations of stdenv and any common projects associated with the project, unless the sandbox has already these values specified or the sandbox is a pre-existing one.
Clicking on Do Check out performs the checkout operation and on its completion a window shows the operations performed.
Locking
A lock must be acquired on the object to be modified in the sandbox after successful completion of checkout. To modify a graph that has been checked out, first open the graph in the GDE and then click on the lock symbol on the menu. This checks whether the version in the sandbox is the latest version of the object in the data store and if it is, the lock symbol turns green showing that the graph is now locked and is editable. If the graph has already been locked in some other sandbox, after opening the graph in the GDE the lock is red in colour denoting that there is already a lock on it. A lock can be acquired on an object only if the sandbox version and the current version of the object in the EME are the same. Once a lock is acquired and the changes are complete the object must be checked into the data store to create a new version in the Datastore. For Non-AbInitio objects which cant be locked from the GDE,a lock can be obtained from the Unix command line using the air commands available to obtain a lock on the particular object.
Choose the Sandbox host from the drop down list In the Directory or file field,browse to the particular file in the sandbox that you want to checkin. You may select a file under the sandbox or you may also select the whole sandbox in which case the whole project would be checked into the EME datastore. Browse to the parent Project in Project Directory field,which points to the Project directory in the EME data store where the object would be checked in. To go to the advanced options in check in click on the advanced button. The checkin tab indicates how you want the checkin to be performed.By defaultForce overwriteis unchecked. Once it is checked the object is checked in even if there are conflicts and becomes the latest version in the datastore.Run Host Setup scriptcauses to run the host profiles setup script before each checkin. It is advised not to change any settings here.
The analysis tab specifies how much dependency analysis is done and on which objects during check in.
A tag, which is a descriptive piece of text and a comment, can be attached to the version that will be checked in.This can be mentioned in the tag tab of advanced options dialog box. The tagging standards are described in another document. After filling in the tag information, on clicking next in the check in wizard a check in ready dialog is displayed.
Clicking on Do Checkin performs the actual check in and displays a window similar to the check out finished window with the results of check in and dependency analysis (if specified in the advanced option).
Parameters
A parameter is a name-value pair with some additional attributes that determine when and how to interpret or resolve its value. Parameters are used to provide logical names to physical location and should always be used instead of hardcoded paths in graphs. We can have two types of parameters, graph and Project parameters.
Graph parameters
Graph parameters, as the name suggests are specific to the individual graphs and are private to them. They affect execution of the graph for which they have been defined. Graph parameters can be defined by navigating to Edit>Parameters in the GDE which opens the graph parameters editor.
Project parameters
Project parameters are inherited by all the graphs in the Project and are accessed from the GDE by the sandbox parametered it or in Project>Edit Sandbox>Parameters. This shows a dialog box prompting to enter the sand box path. Choose the correct host and the sand box path and press OK to open the sand box parameter editor, which exactly like the graph parametered it or shown as above.
Private Value: If a parameter is specified as a private value, any subsequent changes to it remain private to the local sandbox and are not checked in into the EME. This is useful when different users want different values for the same parameter. Value: This column specifies the value of the parameter. Interpretation: This determines how the parameter is going to be evaluated. Constant: Value is taken literally. $ Substitution: Variables with $ prefixes are replaced with their values ${} Substitution: Variables within {} and with $ prefixes are replaced by their values but other occurrences of $ are ignored. Shell: Korn shell syntax is used to evaluate the value of the parameter. Required: This attribute can take two values, required (the default) or optional. If it is required, the value column cant be left blank but if it is optional, it can be left blank.
SESSION I (Day 1)
Introduction to Ab-Initio
What is Ab Initio? Applications of Ab Initio Architecture Co>Operating system Types of Development GDE Co>Op system Configuration Sandbox Environment Graph Component Properties Attribute Editor Graph Properties View Data Panel Expression Editor
Type Reference Key Specifier Reference Expression Reference Transform Reference Package Reference Function Reference DML Utilities DML Examples Run SQL Intermediate File Lookup File Concatenate Gather Interleave Merge Gather Logs Redefine Format Replicate Filter by Expression Join Reformat Rollup
Components
Metadata Management
Concepts Commands
Ab Initio is Latin for From the Beginning From the beginning the software was designed to support a complete range of business applications, from simple to the most complex. The graphical development environment and a powerful set of components allows the customers to get valuable results from the beginning. Moving Data Move small and large volumes of data in an efficient manner. Deal with the complexity associated with business data. High Performance Scalable Solutions Better Productivity. Ab Initio software is a general purpose data processing platform for mission critical applications such as: Data warehousing Batch Processing Click-Stream Analysis Data Movement Data Transformation Computers come in many shapes and sizes: Single-CPU, Multi-CPU Network of single-CPU computers Network of multi-CPU computers Multi-CPU machines are often called SMPs (for Symmetric Multi Processors). Specifically-built networks of machines are often called MPPs (for Massively Parallel Processors). Distribution a platform for applications to execute across collection of processors within confines of a single machine or across multiple machines.
Reduced Run Time Complexity The ability for applications to run in parallel on any combination of computers where the Ab Initio Co>Operating system is installed from a single point of control. Ab Initio software consists of two main programs. Co>Operating System, which your system administrator installs on a host UNIX or Windows NT Server, as well as on processing nodes. (The host is also referred to as the control node). Graphical Development Environment (GDE), which you install on your PC (client node) and configure to communicate with the host (control node).
Ab Initio Architecture
Co>Operating System
Co>Operating system is a powerful engine for every kind of data processing. It delivers crucial facilities including distributed and parallel execution, platform independent data transport and Process Monitoring. Co-operating system delivers: Unlimited scalability double the number of cpu's and execution time is halved Flexibility open component model for extending and customizing ab initio's functionality. Portability The Co>Operating system runs heterogeneously across a huge variety of operating system and hardware platforms from OS/390 on mainframes, to 10 different implementations of Unix, to windows NT and windows 2000. Parallel and distributed application execution Control Data Transport Transactional semantics at the application level Check pointing Monitoring and debugging Parallel file management Metadata-driven components
GDE Layout
A Sandbox Environment
A sandbox is a collection of graphs and related files that are stored in a single directory tree, and treated as a group for purposes of version control, navigation, and migration. Setting up a standard working environment helps a development team work together The Sandbox capability allows an application to be designed to be trivially portable
Sandbox Parameters
Start the Ab Initio GDE Go to Repository-Edit Sandbox
Sample Graph
Components
Components may run on any computer running the Co>Operating System. The Ab Initio Component library contains a diverse built-in set of components. The particular work a component accomplishes depends upon its parameter settings. Some components may require a data transformation parameter, that is, a set of business rules to be applied to an input(s) to produce a required output.
Datasets
A dataset is a source or destination of data. It can be a simple file, a database table, a SAS dataset, . Datasets may reside on any machine running the Co>Operating System.
Datasets may reside on other machines if connected by FTP or database middleware. Data within a dataset must always be exactly described using Ab Initios Data Manipulation Language (DML) to form record format metadata.