Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Yieldbot Tech Talk – MongoDB to k/v




                        © 2012 Yieldbot
            © 2012 Yieldbot / CONFIDENTIAL
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                   What We Do
• Yieldbot technology creates marketplaces where
  advertisers target realtime consumer intent flowing
  through premium publishers.
• At a high level: Analytics + Ad Serving
   – Geo-distributed
      • Data collection
      • Realtime ad matching
   – Cascalog batch analytics
   – Rich Analytics Results visualizations



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Why MongoDB (Dec 2009)
•   Needed manageable by dev team (1 person!)
•   Flexible
•   Easy to get started, run on laptop or deploy
•   Scale wasn’t initially biggest concern
•   Could focus on other stuff
     – Lucene
     – Analytics
     – Ad serving dynamics




                            © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


       How MongoDB Used Initially
• Configuration
   – Publisher profiles, ad matching rules, etc.
• Data collection
   – Pageviews, impressions, clicks
• Analytics results
• Task state tracking
• Lookup tables for ad serving
• Real-time ad stats




                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Couple Aspects of Note
• Master/Slave
   – convenient for simple durability
   – convenient for geo distribution
   – not unique to Mongo, now similar redis topology
• Indexing
   – Easy to set up, but eventually RAM scaling issue
   – initially great for efficient views of data in UI
   – moved analytics results as key/value in mongo
• Durable sharded config (replica sets) expensive



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                 Data Collection
• Mongo: collections for pageviews, impressions, clicks
   – Wasn’t archived anywhere else
   – Not where you want to infinitely scale
• Now flows through redis, to files, to S3




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


     Data Collection with redis Assist
•   redis lists populated as events come in
•   Daemons pull off lists and write to files
•   Periodically compress and archive files to S3
•   S3 files used for input later
     – Hadoop (Cascalog) batch analytics
     – Advertising Stats Calculations




                            © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Matching Lookup Tables
• Mongo: collections for different lookup types
   – Eg., geo, url
   – Built periodically, updated on config change
   – Lookup in each, correlate results
• redis
   – Ability to pipeline operations in single server call
   – Set intersection across lookup dimensions and one
     response back
   – Same master/slave as Mongo for distribution



                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                  Configuration
• Mongo
   – Database per publisher
   – Collections for objects
   – Denormalized where possible
   – Manual Foreign Keys
   – Obviously best candidate for relational model
• History and Versioning was paramount to us
   – Roll our own: HeroDB




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                        HeroDB
• History and granular versioning highest goal
• Database built on top of git
   – Golden database is a bare repo
   – Can clone to anywhere, make changes, push
   – Changes in single commit are atomic
• How, when, and who changed it
• Ability to set to specific previous state of DB
• Much more to do, in production 6+ months
   – Recent change, caching



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                Analytics Results
• ARCv1, Mongo: indexed collections
   – Very easy to code to
   – Initially with everything else in same server
   – Moved out to dedicated server
   – Memory became an issue
       • Indexes bigger than data itself
   – Overhead of importing Cascalog results
       • Pull json files from S3 to local disk
       • mongoimport files into DB



                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


         Analytics Results Cont’d
• ARCv2, Mongo: paged data, key/value
   – Migrated app to key/value access pattern
   – Much better memory usage
   – Application sharded, publishers spread around
   – DB per day per publisher, most recent 7 held
   – Still overhead of importing Hadoop results
      • Pull json files from S3 to local disk
      • mongoimport files into DB




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


    Analytics Results - ElephantDB
• Cascalog support to directly write EDB format
   – Berkeley DB or LevelDB
• Ring Topology
   – Shards distributed around ring, consistent hashing
   – Configurable replication factor
   – Request to any node, forwards as necessary
   – Incrementally increase ring size
• Import from S3 efficient
   – Copy shard from S3 to local disk



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Real-time Ad Stats
• Mongo: DB per day, collection by entity type
   – Document per entity instance
   – stat_type.hour.minute nested values, atomic
     increment
   – Never a good story around aggregating at larger
     timeframes
• Enter redis again




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Real-time Ad Stats Cont’d
• redis has robust access patterns
    – More pipelining
•   Initially realtime and aggregated kept in redis
•   Issue with redis scaling is DB has to fit in memory
•   Time-period aggregations now kept in HBase
•   Only most recent hours kept in redis




                             © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Task State Tracking
• The last holdout
• Collection of tasks
   – Each task is a document
   – Indexed as needed
   – Mongo query and update syntax convenient
       • Both in static code, but also in Python or Mongo
         repl




                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Honorable Mention
• redis for the celery backend, used for task messaging
  infrastructure
• but was never mongo anyway...




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


      MongoDB Migration Summary
•   Configuration                     HeroDB
•   Data Collection                   to S3 via redis
•   Analytics Results                 ElephantDB
•   Task State Tracking               still Mongo
•   Matcher Lookup Tables             redis
•   Real-time Ad Stats                redis/HBase




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                       Thanks!



Site: yieldbot.com
Blog: blog.yieldbot.com
Twitter: @yieldbot
Email: info@yieldbot.com




                           © 2012 Yieldbot

More Related Content

Yieldbot Tech Talk, Sept 20, 2012

  • 1. Yieldbot Tech Talk – MongoDB to k/v © 2012 Yieldbot © 2012 Yieldbot / CONFIDENTIAL
  • 2. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 What We Do • Yieldbot technology creates marketplaces where advertisers target realtime consumer intent flowing through premium publishers. • At a high level: Analytics + Ad Serving – Geo-distributed • Data collection • Realtime ad matching – Cascalog batch analytics – Rich Analytics Results visualizations © 2012 Yieldbot
  • 3. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Why MongoDB (Dec 2009) • Needed manageable by dev team (1 person!) • Flexible • Easy to get started, run on laptop or deploy • Scale wasn’t initially biggest concern • Could focus on other stuff – Lucene – Analytics – Ad serving dynamics © 2012 Yieldbot
  • 4. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 How MongoDB Used Initially • Configuration – Publisher profiles, ad matching rules, etc. • Data collection – Pageviews, impressions, clicks • Analytics results • Task state tracking • Lookup tables for ad serving • Real-time ad stats © 2012 Yieldbot
  • 5. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Couple Aspects of Note • Master/Slave – convenient for simple durability – convenient for geo distribution – not unique to Mongo, now similar redis topology • Indexing – Easy to set up, but eventually RAM scaling issue – initially great for efficient views of data in UI – moved analytics results as key/value in mongo • Durable sharded config (replica sets) expensive © 2012 Yieldbot
  • 6. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection • Mongo: collections for pageviews, impressions, clicks – Wasn’t archived anywhere else – Not where you want to infinitely scale • Now flows through redis, to files, to S3 © 2012 Yieldbot
  • 7. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection with redis Assist • redis lists populated as events come in • Daemons pull off lists and write to files • Periodically compress and archive files to S3 • S3 files used for input later – Hadoop (Cascalog) batch analytics – Advertising Stats Calculations © 2012 Yieldbot
  • 8. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Matching Lookup Tables • Mongo: collections for different lookup types – Eg., geo, url – Built periodically, updated on config change – Lookup in each, correlate results • redis – Ability to pipeline operations in single server call – Set intersection across lookup dimensions and one response back – Same master/slave as Mongo for distribution © 2012 Yieldbot
  • 9. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Configuration • Mongo – Database per publisher – Collections for objects – Denormalized where possible – Manual Foreign Keys – Obviously best candidate for relational model • History and Versioning was paramount to us – Roll our own: HeroDB © 2012 Yieldbot
  • 10. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 HeroDB • History and granular versioning highest goal • Database built on top of git – Golden database is a bare repo – Can clone to anywhere, make changes, push – Changes in single commit are atomic • How, when, and who changed it • Ability to set to specific previous state of DB • Much more to do, in production 6+ months – Recent change, caching © 2012 Yieldbot
  • 11. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results • ARCv1, Mongo: indexed collections – Very easy to code to – Initially with everything else in same server – Moved out to dedicated server – Memory became an issue • Indexes bigger than data itself – Overhead of importing Cascalog results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  • 12. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results Cont’d • ARCv2, Mongo: paged data, key/value – Migrated app to key/value access pattern – Much better memory usage – Application sharded, publishers spread around – DB per day per publisher, most recent 7 held – Still overhead of importing Hadoop results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  • 13. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results - ElephantDB • Cascalog support to directly write EDB format – Berkeley DB or LevelDB • Ring Topology – Shards distributed around ring, consistent hashing – Configurable replication factor – Request to any node, forwards as necessary – Incrementally increase ring size • Import from S3 efficient – Copy shard from S3 to local disk © 2012 Yieldbot
  • 14. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats • Mongo: DB per day, collection by entity type – Document per entity instance – stat_type.hour.minute nested values, atomic increment – Never a good story around aggregating at larger timeframes • Enter redis again © 2012 Yieldbot
  • 15. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats Cont’d • redis has robust access patterns – More pipelining • Initially realtime and aggregated kept in redis • Issue with redis scaling is DB has to fit in memory • Time-period aggregations now kept in HBase • Only most recent hours kept in redis © 2012 Yieldbot
  • 16. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Task State Tracking • The last holdout • Collection of tasks – Each task is a document – Indexed as needed – Mongo query and update syntax convenient • Both in static code, but also in Python or Mongo repl © 2012 Yieldbot
  • 17. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Honorable Mention • redis for the celery backend, used for task messaging infrastructure • but was never mongo anyway... © 2012 Yieldbot
  • 18. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 MongoDB Migration Summary • Configuration  HeroDB • Data Collection  to S3 via redis • Analytics Results  ElephantDB • Task State Tracking  still Mongo • Matcher Lookup Tables  redis • Real-time Ad Stats  redis/HBase © 2012 Yieldbot
  • 19. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Thanks! Site: yieldbot.com Blog: blog.yieldbot.com Twitter: @yieldbot Email: info@yieldbot.com © 2012 Yieldbot