Digital Ad Platform
Digital Ad Platform
Digital Ad Platform
The ad platform will provide ad banner ad poster, or text messages depending on the device support.
Aim
It enable publishers large and small to benefit from the growth and economic power of the mobile web. Regardless of size or budget, a wide variety of businesses can achieve successful results with digital advertising. Digital ad platform technology helps advertisers understand and reach your audience most effectively so they can generate the most revenue. For advertising campaigns, ad serving technology makes it possible to serve the right ad at the right
The benefits to Advertiser are: small companies will get brand name, advertiser will get target audiences, advertisement at low cost and immediate delivery of advertisement content. Plus the customers get filtered ads. Digital marketing is inexpensive when examining the ratio of cost to the reach of the target audience. Companies can reach a wide audience for a small fraction of traditional advertising budgets. The nature of the medium allows consumers to research and to purchase products and services
Internet marketers also have the advantage of measuring statistics easily and inexpensively; almost all aspects of an Internet marketing campaign can be traced, measured, and tested. The advertisers can use a variety of methods, such aspay per impression, pay per click,pay per play, andpay per action. Therefore, marketers can determine which messages or offerings are more appealing to the audience. The results of campaigns can be measured and tracked immediately because online marketing initiatives usually require users
overview
General features
Incorporate latest technology Consistency should be maintained User friendly High level of security implementation Easy to use Easy to implement Common look and feel in multiple browsers Quick
Features in detail
Setting up Interactive Ad-campaigns as per the media plans finalized by the publishers for a particular client Generating tags to set up various type of ads, dynamic targeting and optimization Optimizing ad campaigns that are underdelivering in order to maximize profits for the publisher and to provide maximum exposure for the advertiser Reporting and availability forecasting of clicks and impressions to help publishers plan their marketing strategies
OBJECTIVE
REACH
This is essential in understanding how big your audience is, where they come from and whether you need to target specific markets, or can broadcast your links openly.
RELEVANCE
You could say that this is something of a buzzterm, with lots of focus on gauging your audience correctly, trying to match their keywords or search terms with the right kind of advertising. You also need to match sure that your advertising runs long enough and yet not too much, as well
RECOGNITION
Were talking about results, what is the return on investment (ROI) for the advertiser and what is the revenue potential for the publisher, and is everyone getting value?
It is recognized that this is just one of any number of potential scenarios for the Digital Advertising industry, but it is something to reflect upon. If you score highly in these three categories then theres a good chance your campaigns will be a success.
Ad platform in detail
Ad operations have become the backbone in driving operational efficiency in the media industry. Managing Ad Operations is one of the most critical functions of the advertising world. Setting up media ad campaigns and placing them on the target publisher networks, ad tagging, optimizing ad campaigns, forecasting clicks, dynamic tracking and impression tracking are some of the key components of ad operations. Ad trafficking plays a pivotal role in effectively managing rich media ad campaigns. They aid coordination and communication between various
Ad operations at a glance
Generating high volumes of revenue through advertising is the sole aim of an advertiser. When it comes to digital marketing, advertising becomes critical as there is no dearth of competitors who are striving to win the same client that you are bidding for. With the transparency in how you are advertising your organization, it is important to have a strategy to increase the number of redirects on a site and thereby increase the number of eyeballs actually seeing what we want them to see and know. The present system lacks such transparency.
Affiliate marketing works on the basic concept of revenue sharing or commissions paid for the referred business. Internet and e-commerce have become an integral part of the global business plan of virtually every industry and, in some cases, has grown to a bigger business than the existing offline business. Blogging and interactive online communities have had a further impact on the world of affiliate marketing. This has led to a thrust in the affiliate marketing
Affiliates use different mechanisms or methods to advertise the products and services of their partners, such as Organic SEO, Paid SEM, e-mail Marketing, Display Advertising, and others to drive traffic to their partner site. In turn they are paid for every click or sale generated using various mechanisms, such as CPC (Cost Per Click) or CPM (Cost Per Thousand Clicks) or CPS (Cost Per Sale). At times, less orthodox techniques are also used, such as publishing reviews of products or services offered by a partner.
The Search Advertising segment, which is highlydriven by metrics, has been one of the forces to reckon with in the whole advertising gamut. The strategy of utilizing search engines brings up an opportunity to present consumers with advertisements tailored to their immediate purchasing interests. This encourages consumers to click on search ads instead of unpaid search results.
Search Marketing uses the strategy of utilizing search engines results as an advertising vehicle. This may include improving the Web site rank in organic listings, purchasing paid listings, or a combination of methods, all designed to increase visibility and clicks. Ads can be placed on Web pages that show the results of search engine queries as well as on Web pages that display published content. The ads placed using this form of interactive advertising are targeted to match the keywords entered on search engines.
Web Analytics is the measurement, collection, analysis, and reporting of Web traffic to understand and optimize Web usage. It comprises of off-site and on-site Web analytics. Traditionally, Web analytics refers to on-site visitor measurement. However, in recent years, classification is fuzzy mainly because vendors are producing tools that span both categories.
With reporting and analytics tools, one is able to effectively track Web site hits, detailed page views, visitor session details (first/repeat/unique visitors), page impressions, singletons, bounce rates, exits, visibility time, session durations, and so on, to determine those aspects of the Web site that work towards the business objectives. The use of Web analytics is done with a view to enable a business to attract more visitors, retain or attract new customers for goods or services, or to increase the revenue that each customer spends.
Reporting involves consolidating the data collected from log files, which often runs into terabytes, and performing ETL (ExtractTransform-Load) operations on it. The first part of an ETL process involves extracting the data from the source systems. The transform phase applies to a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. The load phase loads the data into the end target, usually the data warehouse (DW), and converts it to a format that can be further
Class diagram
System activity
Use case element Use case number Application Use case name
Use case This use case diagram shows what a publisher can do description Primary actor Publisher Precondition Trigger Basic flow For actions other than login, the publisher must be successfully login to the system The publisher has to initially create an account, after this he may add his websites in which he would like to show his ads. He may also select filters for the type of that may be displayed. He may also select a scheme from the different options available.
Alternate flows
Use case element Use case number Application Use case name Use case description Primary actor Precondition Trigger Basic flow
Description 2 Ad platform front end Advertiser interaction with ad platform This use case diagram shows what an advertiser can do Advertiser For actions other than login, the advertiser must be successfully login to the system The advertiser has to initially create an account, after this he may add his ads (text/ images). He may also select preferences for his ads to be displayed. He may also select a scheme from the different options available plus the output digital device.
Alternate flows
Sequence diagrams
Collaboration diagram
Functional requirements
The ads served to people should follow satisfy the constraints a given by the publishers and advertisers. The ad which is to be displayed should depend on many features like user characteristics, mobile characteristics, etc. All queries and responses should be stored for future reference. An ad that the digital device will support should be shown.
Non-functional requirements
The response time should be less The ad should be relevant The input and output formats should be more. The range of digital products and technologies considered should be more.
Flow diagram
Query
Search engine
Databa se
Adranking engine
Ad
Gui - publisher
Gui - advertiser
Gui - advertiser
schedule
40 40 56 40 5 40 6 40
53 5
95
25
65 5
Technologies
Languages: java, scala Databases: Mysql, CouchDB, Mongodb, Terrastore Server: Apache Jetty Build: Maven Search: Sphinx, Xapian, Solr
Sphinx C++, GPLv2, SQL Predefined fixed schema No updates Batch and Real-Time full-text indexes (efficient offline index construction and incremental onthe-fly index updates) Non-text attributes support: indexed, stored(hit db)? SQL database indexing(ODBC)
Advanced full-text searching syntax: boolean operators, phrase, proximity, strict order, field and position limits, exact keyword form matching, substring searches, etc. Rich database-like querying features: compute arbitrary arithmetic expressions, add WHERE conditions, do ORDER BY, GROUP BY, use MIN/MAX/AVG/SUM, aggregates etc. full-blown SQL SELECT is supported. Better relevance ranking. No statistical ranking (keyword frequencies). By default, keyword proximity, choose from a number of built-in
2. Xapian
C++, GPLv2, Python, Ruby, PHP, Perl, Java List of terms Dynamic/batch updates Supports Unicode and stores indexed data in UTF8. Highly portable Ranked probabilistic search - important words get more weight than unimportant words Relevance feedback - given one or more documents, suggests the most relevant index
Full range of structured boolean search operators then results are ranked by the probabilistic weights. stemming of search terms Wildcard search (e.g. "xap*"). Synonyms are supported, both explicitly (e.g. "~cash") and as an automatic form of query expansion. spelling corrections for user supplied queries. based on words which occur in the data being indexed, so works even for words which wouldn't
3. Solr
Java Predefined fixed schema Dynamic/batch updates Uses the Lucene library for full-text search Faceted navigation Hit highlighting Query language supports structured as well as textual search JSON, XML, PHP, Ruby, Python, XSLT, Velocity and
speed ease of setup support for MySQL high performance on indexing no additional configuration. Development and setup is faster Good documentation Much better (and faster) aggregation Geo-searching International text support Different units
Find similar images Find nearby, similar things Different sort orders Easy to use Simultaneous search and update, with new documents being immediately visible. scalable Accurate probabilistic ranking: more relevant Phrase &proximity searching. Structured Boolean queries: AND, NOT ,
Solr is built on top of Lucene. It adds many common functionality: web server api, faceting, caching, Hit highlighting , , , , , , Velocity and custom Java binary output formats over HTTP Extensible through plugins Pluggable relevance - boost
Sphinx
Xapian
Solr
Solr runs in a such as Tomcat or Jetty, which require terrible the Solr/Lucene documentation XML. Solr aggregation is lacking. The amount of time to serialize to and from XML.
Some example , , , , , and usages: Sphinx: craigslist.org; Lucene/Solr: Linkedin, Cnet, Netflix, digg.com.
Search terms
Indexing
Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. optimize speed and performance in finding relevant documents for a search query Without an index, the search engine would scan every document. additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.
Merge factors:How data enters the index. The indexer must first check whether it is updating old content or adding new content. Search engine index merging is similar in concept to the SQL Merge command and other merge algorithms. Storage techniques:How to store the index data, that is, whether information should be data compressed or filtered. Index size:How much computer storage is required to support the index. Lookup speed: How quickly a word can be found
Search engine architectures vary in the way indexing is performed and in methods of index storage to meet the various design factors. Types of indices include: Inverted index:Stores a list of occurrences of each atomic search criterion, typically in the form of a hash table or binary tree.
Commonly used search engine data structure efficient lookup of terms across large number of documents Usually stores positional information to enable phrase/
Citation index:Stores citations or hyperlinks between documents to support citation analysis, a subject of Bibliometrics. e.g. Google scholar, CiteSeer
Document-term matrix. Used in latent semantic analysis, stores the occurrences of words in documents in a two-dimensional sparse matrix
Document parsing
Document parsing breaks apart the components (words) of a document or other form of media for insertion into the forward and inverted indices. The words found are called tokens, and so, in the context of search engine indexing and natural language processing, parsing is more commonly referred to as tokenization. It is also sometimes called word boundary disambiguation, tagging, text segmentation, content analysis, text analysis, text mining, concordance generation, speech segmentation, lexing, or lexical analysis.
Tokenization
Unlike literate humans, computers do not understand the structure of a natural language document and cannot automatically recognize words and sentences. To a computer, a document is only a sequence of bytes. Computers do not 'know' that a space character separates words in a document. Instead, humans must program the computer to identify what constitutes an individual or distinct word, referred to as a token.
During tokenization, the parser identifies sequences of characters which represent words and other elements, such as punctuation, which are represented by numeric codes, some of which are non-printing control characters. The parser can also identify entities such as email addresses, phone numbers, and URLs. When identifying each token, several characteristics may be stored, such as the token's case (upper, lower, mixed, proper), language or encoding, lexical category (part of speech, like 'noun' or 'verb'), position, sentence number,
Format analysis
If the search engine supports multiple document formats, documents must be prepared for tokenization. The challenge is that many document formats contain formatting information in addition to textual content. For example, HTML documents contain HTML tags, which specify formatting information such as new line starts, bold emphasis, and font size or style. If the search engine were to ignore the difference between content and 'markup', extraneous
Format analysis is the identification and handling of the formatting content embedded within documents which controls the way the document is rendered on a computer screen or interpreted by a software program. Format analysis is also referred to as structure analysis, format parsing, tag stripping, format stripping, text normalization, text cleaning, and text preparation. The challenge of format analysis is further complicated by the intricacies of various file formats. Certain file formats are proprietary with
Common, well-documented file formats that many search engines support include: HTML ASCII text files (a text document without specific computer readable formatting) Adobe's Portable Document Format (PDF) PostScript (PS) LaTeX UseNet netnews server formats
Scoring
Scoring is based on two main criteria: Term frequency(TF): number of times a term appears in the document Inverse document frequency(IDF):one over number of times term appears in index (1/ DF)
Then, depending on the type of full text search, other criteria impact scoring. For example, in proximity searches, the proximity of search terms impacts scoring.
Faceting
Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information. A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, predetermined, taxonomic order. Each facet typically corresponds to the possible values of a property common to a set of digital objects.
Faceting is a technique for sub dividing a set of search results to make them more useful
Stemming
doesnt produce a real root word but converts derivatives into a single word Done at both index and query time Stemming and stop filters must be removed to support inexact words matches (phonemes) Works on a single word Easy to implement and takes less time Used as a kind of query broadening, a process called conflation. E.g. changed, changing, changes, change
Algorithms:
Brute force: lookup table Suffix striping algorithm Production technique: reverse of stemming e.g. run becomes runs, running (work phonetically) N-gram(syllable/word): frequency of bigrams e.g. post office => where post occurred with and without office following it Affix stemmers: suffix + prefix Matching algos: stem database, algorithm tries to match it with stems from the database, applying various constraints, such as on the relative length of the
N-gram - word
N-gram - syllable
Matching algos
Overstemming: two separate inflected words are stemmed to the same root, but should not have been - a false positive.
Understemming: two separate inflected words should be stemmed to the same root, but are not - a false negative.
Lemmatization
Gives root word Uses context of sentence Doesnt give much improvement E.g. changes, changed, changing, change converted into change The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a dictionary look-up. The word "walk" is the base form for word "walking", and hence this is matched in both stemming and lemmatization.
case
Proximity search
proximity search looks for documents where two or more separately matching term occurrences are within a specified distance, where distance is the number of intermediate words or characters. In addition to proximity, some implementations may also impose a constraint on the word order, in that the order in the searched text must be identical to the order of the search query. Proximity searching goes beyond the simple matching of words by adding the constraint of proximity and is generally regarded as a form of
For example, a search could be used to find "red brick house", and match phrases such as "red house of brick" or "house made of red brick". By limiting the proximity, these phrases can be matched while avoiding documents where the words are scattered or spread across a page or in unrelated articles in an anthology. Proximity search allows searching for words that are within a specific distance from each other
Wildcard characters allow definition of partial search terms. We allow search for a single character or multiple characters: Multiple character wildcard search: '*' (without quotes) It searches for 0 or more characters matching the '*'. Example: "portr*" "p*t" Single character wildcard search: ' ?' (without quotes) It searches for exactly 1 character matching the
Search operators
More advanced searches can be defined by using search operators in combination with single terms and phrases. They allow terms and phrases to be combined through logic operators. The following operators are provided:
Exampl e
The '+' or required operator requires that the term after the '+' symbol is matched in the search +mado result. nna Note: +child This is the default behavior of the application. The '-' or prohibit operator excludes results that contain the term after the '-' symbol. Note: madonn The prohibit operator cannot be used with just one a -child term. For example, the query -child will return
The AND operator links to terms and matches results where both terms exist. AND, It is similar to the and most useful in combination && madonn with the of search terms. (must be a AND The symbol && can be used in place of the word all upper child AND. case) Note: This is the default behavior of the application. OR, || The OR operator links two terms and matches madonn (must be results where either of the terms exists. a OR all upper The symbol || can be used in place of the word OR maria case) The NOT operator excludes results that contain the term after NOT. NOT, ! The symbol ! can be used in place of the word madonn (must be NOT. aNOT all upper Note:
grouping
Grouping of search terms can be used to define the behavior of operators (and hence form sub queries. Grouping is achieved via parenthesis ("(", ")"). By default, the following precedence rules are applied: NOT AND OR Example:
Fuzzy search
The '~' character at the end of a search term produces search results that are similar in spelling to the given search term. Example: "boat~" may produce search results like "boat", "coat", "goat" etc. Eliminates No Results Found error. Bi-directional matching. No guessing required.
Stop words
stop words are words which are filtered out prior to, or after, processing of natural language data (text). Any group of words can be chosen as the stop words for a given purpose. For some search machines, these are some of the most common, short function words, such as the, is, at, which and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as 'The Who', 'The The ', or 'Take That'.
synonyms
Query expansion
Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. In the context of web search engines, query expansion involves evaluating a user's input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as:
Search engines invoke query expansion to increase the quality of user search results. It is assumed that users do not always formulate search queries using the best terms. Best in this case may be because the database does not contain the user entered terms. This comes at the expense of reducing the precision. The goal of query expansion in this regard is by increasing recall, precision can potentially increase (rather than decrease as mathematically
At the same time, many of the current commercial search engines use word frequency ( Tf-idf) to assist in ranking. By ranking the occurrences of both the user entered words and synonyms and alternate morphological forms, documents with a higher density (high frequency and close proximity) tend to migrate higher up in the search results, leading to a higher quality of the search results near the top of the results, despite the larger recall. This tradeoff is one of the defining problems in query expansion, regarding whether it is worthwhile to perform given the questionable
In some cases, the query expansion removes the original intention of a query. If you replace "blogger" with "blog" in [blogger profile images], the query no longer includes the most important keyword.
multithreading
To maintain responsiveness of an application during a long running task. To enable cancellation of separable tasks. Some problems are intrinsically parallel. To monitor status of some resource (DB). Some APIs and systems demand it: Swing. To take advantage of multiple processors.
The system has concurrent interacting components and each component can be handled using separate threads Simplifies programming for problem
When you want to be doing 2 different things simultaneously. When you have a large problem that can be broken up and solved in smaller sections, or large I/O bound processes.
testing
Whether the systems works end-to-end Whether there has been substantial change in the overall logic Whether we get expected output Whether certain features of the system actually work
Why test?
Seven out of ten new software systems fail in some way upon deployment, according to the Standish Group International, a consulting firm in Dennis, Mass. Forbes Magazine, Shake Those Bugs Out, May 18, 2010
Even with the best people working with the best development tools and languages (like Java), it is impossible to produce defect-free code the first time.
Testing process
It is basically a two-step procedure. The first process is filling the test table with the results when the requests are fired. The second process consists of validating those results by firing those requests again and comparing both of them.
Get request
fire request
Database
save response
end
compare responses
end
Testing Idioms
Code a little, test a little, code a little, test a little... Run your tests as often as possible, at least as often as you run the compiler. Run all the tests in the system at least once per day (or night). Begin by writing tests for the areas of code that you're most worried about breaking. Write tests that have the highest possible return on your testing investment. When you need to add new functionality to the system, write the tests first. If you find yourself debugging using System.out.println(), write a test case instead.
snapshots
The limitation of the system is that it supports maximum 200 concurrent users. We would like to take the numbers higher.
Future enhancement:
Remove bugs Make the system more concurrent Reduce execution time Support more number of concurrent users
bibliography
references