Ruby and Rails
About Me
•           a.k.a. ihower
    • http://ihower.tw
    • http://twitter.com/ihower
    • http://github.com/ihower
• Ruby on Rails Developer since 2006
• Ruby Taiwan Community
 • http://ruby.tw
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA for Rails
• Distributed Filesystem
• Distributed database
1.Distributed Ruby

• DRb
• Rinda
• Starfish
• MapReduce
• MagLev VM

• Ruby's RMI                 system
             (remote method invocation)

• an object in one Ruby process can invoke
  methods on an object in another Ruby
  process on the same or a different machine
DRb (cont.)

• no defined interface, faster development time
• tightly couple applications, because no
  defined API, but rather method on objects
• unreliable under large-scale, heavy loads
  production environments
server example 1
require 'drb'

class HelloWorldServer

      def say_hello
          'Hello, world!'


client example 1
require 'drb'

server = DRbObject.new_with_uri("druby://")

puts server.say_hello
puts server.inspect

# Hello, world!
# <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby://">
example 2
# user.rb
class User

  attr_accessor :username

server example 2
require 'drb'
require 'user'

class UserServer

  attr_accessor :users

  def find(id)


user_server = UserServer.new
user_server.users = []
5.times do |i|
  user = User.new
  user.username = i + 1
  user_server.users << user

DRb.start_service("druby://", user_server)
client example 2
require 'drb'

user_server = DRbObject.new_with_uri("druby://")

user = user_server.find(2)

puts user.inspect
puts "Username: #{user.username}"
user.name = "ihower"
puts "Username: #{user.username}"

# <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo:
# client2.rb:8: undefined method `username' for
#<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
Why? DRbUndumped
•   Default DRb operation

    • Pass by value
    • Must share code
• With DRbUndumped
 • Pass by reference
 • No need to share code
Example 2 Fixed
# user.rb
class User

  include DRbUndumped

  attr_accessor :username


# <DRb::DRbObject:0x1003b84f8 @ref=2149433940,
# Username: 2
# Username: ihower
Why use DRbUndumped?

 • Big objects
 • Singleton objects
 • Lightweight clients
 • Rapidly changing software
ID conversion
• Converts reference into DRb object on server
 • DRbIdConv (Default)
 • TimerIdConv
 • NamedIdConv
 • GWIdConv
Beware of garbage
•   referenced objects may be collected on
    server (usually doesn't matter)
•   Building Your own ID Converter if you want
    to control persistent state.
DRb security
require 'drb'

ro = DRbObject.new_with_uri("druby://")
class << ro
    undef :instance_eval

# !!!!!!!! WARNING !!!!!!!!! DO NOT RUN
ro.instance_eval("`rm -rf *`")

Insecure operation - instance_eval (SecurityError)
DRb security (cont.)

• Access Control Lists (ACLs)
 • via IP address array
 • still can run denial-of-service attack
• DRb over SSL

• Rinda is a Ruby port of Linda distributed
    computing paradigm.
•   Linda is a model of coordination and communication among several parallel processes
    operating upon objects stored in and retrieved from shared, virtual, associative memory. This
    model is implemented as a "coordination language" in which several primitives operating on
    ordered sequence of typed data objects, "tuples," are added to a sequential language, such
    as C, and a logically global associative memory, called a tuplespace, in which processes
    store and retrieve tuples. (WikiPedia)
Rinda (cont.)

• Rinda consists of:
 • a TupleSpace implementation
 • a RingServer that allows DRb services to
    automatically discover each other.

• We hardcoded IP addresses in DRb
  program, it’s tight coupling of applications
  and make fault tolerance difficult.
• RingServer can detect and interact with
  other services on the network without
  knowing IP addresses.
1. Where Service X?

                                                          via broadcast UDP
                2. Service X:


                        3. Hi, Service X @

                                                           Service X
                  4. Hi There
ring server example
require 'rinda/ring'
require 'rinda/tuplespace'

service example
require 'rinda/ring'

class HelloWorldServer
    include DRbUndumped # Need for RingServer

      def say_hello
          'Hello, world!'


ring_server = Rinda::RingFinger.primary
ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new,
                   'I like to say hi!'], Rinda::SimpleRenewer.new)

client example
require 'rinda/ring'

ring_server = Rinda::RingFinger.primary

service = ring_server.read([:hello_world_service, nil,nil,nil])
server = service[2]

puts server.say_hello
puts service.inspect

# Hello, world!
# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650
@uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like
to say hi!"]

• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
  [:name, :Class, object, ‘description’ ]
5 Basic Operations

• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)

• Starfish is a utility to make distributed
  programming ridiculously easy
• It runs both the server and the client in
  infinite loops
• MapReduce with ActiveRecode or Files
starfish foo.rb
# foo.rb

class Foo
  attr_reader :i

  def initialize
    @i = 0

  def inc
    logger.info "YAY it incremented by 1 up to #{@i}"
    @i += 1

server :log => "foo.log" do |object|
  object = Foo.new

client do |object|
starfish server example

   require 'rubygems'
   require 'starfish'

   class HelloWorld
     def say_hi
       'Hi There'

   Starfish.server = lambda do |object|
       object = HelloWorld.new

starfish client example

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda do |object|
     puts object.say_hi
     exit(0) # exit program immediately

starfish client example                 (another way)


       require 'rubygems'
       require 'starfish'

       catch(:halt) do
         Starfish.client = lambda do
           puts object.say_hi
           throw :halt



       puts "bye bye"

• introduced by Google to support
  distributed computing on large data sets on
  clusters of computers.
• inspired by map and reduce functions
  commonly used in functional programming.
starfish server example

require 'rubygems'
require 'starfish'

Starfish.server = lambda{ |map_reduce|
  map_reduce.type = File
  map_reduce.input = "/var/log/apache2/access.log"
  map_reduce.queue_size = 10
  map_reduce.lines_per_client = 5
  map_reduce.rescan_when_complete = false

starfish client example

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda { |logs|
     logs.each do |log|
       puts "Processing #{log}"

Other implementations
• Skynet
 • Use TupleSpace or MySQL as message queue
 • Include an extension for ActiveRecord
 • http://skynet.rubyforge.org/
• MRToolkit based on Hadoop
 • http://code.google.com/p/mrtoolkit/
MagLev VM

• a fast, stable, Ruby implementation with
  integrated object persistence and
  distributed shared cache.
• http://maglev.gemstone.com/
• public Alpha currently
2.Distributed Message

• Starling
• AMQP/RabbitMQ
• Stomp/ActiveMQ
• beanstalkd
what’s message queue?
          Message X
 Client                Queue

                      Check and processing

Why not DRb?

• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
  distributed programming: reliable and scalable.
• a light-weight persistent queue server that
  speaks the Memcache protocol (mimics its
• Fast, effective, quick setup and ease of use
• Powered by EventMachine

• Twitter’s open source project, they use it
  before 2009. (now switch to Kestrel, a port of Starling from Ruby
  to Scala)
Starling command

• sudo gem install starling-starling
 • http://github.com/starling/starling
• sudo starling -h
• sudo starling_top -h
Starling set example
require 'rubygems'
require 'starling'

starling = Starling.new('')

100.times do |i|
  starling.set('my_queue', i)

                     append to the queue, not
                     overwrite in Memcached
Starling get example
require 'rubygems'
require 'starling'

starling = Starling.new('')

loop do
  puts starling.get("my_queue")
get method
• After get, the object is no longer in the
  queue. You will lost message if processing
  error happened.
• The get method blocks until something is
  returned. It’s infinite loop.
Handle processing
 error exception
 require 'rubygems'
 require 'starling'

 starling = Starling.new('')
 results = starling.get("my_queue")

     puts results.flatten
 rescue NoMethodError => e
     puts e.message
     Starling.set("my_queue", [results])
 rescue Exception => e
     Starling.set("my_queue", results)
     raise e
Starling cons

• Poll queue constantly
• RabbitMQ can subscribe to a queue that
  notify you when a message is available for
• a complete and highly reliable enterprise
  messaging system based on the emerging
  AMQP standard.
  • Erlang
• http://github.com/tmm1/amqp
 • Powered by EventMachine

• Apache ActiveMQ is the most popular and
  powerful open source messaging and
  Integration Patterns provider.
• sudo gem install stomp
• ActiveMessaging plugin for Rails
• Beanstalk is a simple, fast workqueue
  service. Its interface is generic, but was
  originally designed for reducing the latency
  of page views in high-volume web
  applications by running time-consuming tasks
• http://kr.github.com/beanstalkd/
• http://beanstalk.rubyforge.org/
• Facebook’s open source project
Why we need asynchronous/
 background-processing in Rails?

• cron-like processing
  text search index update etc)
                                      (compute daily statistics data, create reports, Full-

• long-running tasks             (sending mail, resizing photo’s, encoding videos,
  generate PDF, image upload to S3, posting something to twitter etc)

 • Server traffic jam: expensive request will block
     server resources(i.e. your Rails app)
  • Bad user experience: they maybe try to reload
     and reload again! (responsive matters)
   processing for Rails
• script/runner
• rake
• cron
• daemon
• run_later plugin
• spawn plugin

• In Your Rails App root:
• script/runner “Worker.process”

• In RAILS_ROOT/lib/tasks/dev.rake
• rake dev:process
  namespace :dev do
    task :process do

• Cron is a time-based job scheduler in Unix-
  like computer operating systems.
• crontab -e

•   A Ruby DSL for Defining Cron Jobs

• http://asciicasts.com/episodes/164-cron-in-ruby
• or http://cronedit.rubyforge.org/
          every 3.hours do
            runner "MyModel.some_process"
            rake "my:rake:task"
            command "/usr/bin/my_great_command"

• http://daemons.rubyforge.org/
• http://github.com/dougal/daemon_generator/

• scheduling pieces of code (jobs)
• Not replacement for cron/at since it runs
  inside of Ruby.
           require 'rubygems'
           require 'rufus/scheduler'

           scheduler =

           scheduler.every '5s' do
               puts 'check blood pressure'

Daemon Kit

• Creating Ruby daemons by providing a
  sound application skeleton (through a
  generator), task specific generators (jabber
  bot, etc) and robust environment
  management code.
Monitor your daemon

• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/

• A library for robust daemon management
• Make daemon-dependent applications Just
  Work without having to start the daemons
off-load task via system
# mailings_controller.rb
def deliver
  call_rake :send_mailing, :mailing_id => params[:id].to_i
  flash[:notice] = "Delivering mailing"
  redirect_to mailings_url

# controllers/application.rb
def call_rake(task, options = {})
  options[:rails_env] ||= Rails.env
  args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" }
  system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &"

# lib/tasks/mailer.rake
desc "Send mailing"
task :send_mailing => :environment do
  mailing = Mailing.find(ENV["MAILING_ID"])

# models/mailing.rb
def deliver
  sleep 10 # placeholder for sending email
  update_attribute(:delivered_at, Time.now)
Simple Thread

after_filter do
    Thread.new do
run_later plugin

• Borrowed from Merb
• Uses worker thread and a queue
• Simple solution for simple tasks
  run_later do
spawn plugin

  spawn do
    logger.info("I feel sleepy...")
    sleep 11
    logger.info("Time to wake up!")
spawn (cont.)
• By default, spawn will use the fork to spawn
  child processes.You can configure it to do
• Works by creating new database
  connections in ActiveRecord::Base for the
  spawned block.
• Fock need copy Rails every time
threading vs. forking
•   Forking advantages:
    •   more reliable? - the ActiveRecord code is not thread-safe.
    •   keep running - subprocess can live longer than its parent.
    •   easier - just works with Rails default settings. Threading
        requires you set allow_concurrency=true and. Also,
        beware of automatic reloading of classes in development
        mode (config.cache_classes = false).
•   Threading advantages:
    •   less filling - threads take less resources... how much less?
        it depends.
    •   debugging - you can set breakpoints in your threads
Okay, we need
    reliable messaging system:
•   Persistent
•   Scheduling: not necessarily all at the same time
•   Scalability: just throw in more instances of your
    program to speed up processing
•   Loosely coupled components that merely ‘talk’
    to each other
•   Ability to easily replace Ruby with something
    else for specific tasks
•   Easy to debug and monitor
4.Message Queues
     (for Rails only)
• ar_mailer
• BackgroundDRb
• workling
• delayed_job
• resque
Rails only?

• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment

• a two-phase delivery agent for ActionMailer.
 • Store messages into the database
 • Delivery by a separate process, ar_sendmail

• BackgrounDRb is a Ruby job server and
• Have scalability problem due to
  Mark Bates)
                                         (~20 servers for

• Hard to know if processing error
• Use database to persist tasks
• Use memcached to know processing result

• Gives your Rails App a simple API that you
  can use to make code run in the
  background, outside of the your request.
• Supports Starling(default), BackgroundJob,
  Spawn and AMQP/RabbitMQ Runners.
• script/plugin install git://github.com/purzelrae/
• sudo starling -p 15151
• RAILS_ENV=production script/
  workling_client start
Workling example
 class EmailWorker < Workling::Base
   def deliver(options)
     user = User.find(options[:id])

 # in your controller
 def create
     EmailWorker.asynch_deliver( :id => 1)
• Database backed asynchronous priority
• Extracted from Shopify
• you can place any Ruby object on its queue
  as arguments
• Only load the Rails environment only once
delayed_job setup
                (use fork version)

• script/plugin install git://github.com/
• script/generate delayed_job
• rake db:migrate
delayed_job example
def deliver
  mailing = Mailing.find(params[:id])
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
delayed_job example
  custom workers
class MailingJob < Struct.new(:mailing_id)

  def perform
    mailing = Mailing.find(mailing_id)


# in your controller
def deliver
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
delayed_job example
       always asynchronously

   class Device
     def deliver
       # long running method
     handle_asynchronously :deliver

   device = Device.new
Running jobs

• rake jobs:works
  (Don’t use in production, it will exit if the database has any network connectivity

• RAILS_ENV=production script/delayed_job start
• RAILS_ENV=production script/delayed_job stop
                  just Integer, default is 0

• you can run multipie workers to handle different
  priority jobs
• RAILS_ENV=production script/delayed_job -min-
  priority 3 start

  Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)

  Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
        no guarantees at precise time, just run_after_at

Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)

                                    3, 1.month.from_now.beginning_of_month)
Configuring Dealyed
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 5 # sleep if empty queue
Delayed::Worker.max_attempts = 25
Delayed::Worker.max_run_time = 4.hours # set to the amount of time
of longest task will take
Automatic retry on failure
 • If a method throws an exception it will be
   caught and the method rerun later.
 • The method will be retried up to 25
   (default) times at increasingly longer
   intervals until it passes.
   • 108 hours at most
     Job.db_time_now + (job.attempts ** 4) + 5
Capistrano Recipes
• Remember to restart delayed_job after
• Check out lib/delayed_job/recipes.rb
   after "deploy:stop",    "delayed_job:stop"
   after "deploy:start",   "delayed_job:start"
   after "deploy:restart", "delayed_job:restart"

•   a Redis-backed library for creating background jobs,
    placing those jobs on multiple queues, and processing
    them later.
•   Github’s open source project
•   you can only place JSONable Ruby objects
•   includes a Sinatra app for monitoring what's going on
•   support multiple queues
•   you expect a lot of failure/chaos
My recommendations:

• General purpose: delayed_job
  (Github highly recommend DelayedJob to anyone whose site is not 50% background work.)

• Time-scheduled: cron + rake
5. SOA for Rails

• What’s SOA
• Why SOA
• Considerations
• The tool set
What’s SOA
           Service oriented architectures

• “monolithic” approach is not enough
• SOA is a way to design complex applications
  by splitting out major components into
  individual services and communicating via
• a service is a vertical slice of functionality:
  database, application code and caching layer
a monolithic web app example



a SOA example


     WebApp                  WebApps
for Administration           for User

       Services A    Services B

        Database     Database
Why SOA? Isolation
• Shared Resources
• Encapsulation
• Scalability
• Interoperability
• Reuse
• Testability
• Reduce Local Complexity
Shared Resources
• Different front-web website use the same
• SOA help you avoiding duplication databases
  and code.
• Why not only shared database?
 • code is not DRY                 WebApp
                              for Administration
                                                      for User

 • caching will be problematic

• you can change underly implementation in
  services without affect other parts of system
 • upgrade library
 • upgrade to Ruby 1.9
• you can provide API versioning
Scalability1: Partitioned
     Data Provides
•   Database is the first bottleneck, a single DB
    server can not scale. SOA help you reduce
    database load
•   Anti-pattern: only split the database              WebApps

    •   model relationship is broken
    •   referential integrity               Database

•   Myth: database replication can not help you
    speed and consistency
Scalability 2: Caching

• SOA help you design caching system easier
 • Cache data at the right times and expire
    at the right times
 • Cache logical model, not physical
 • You do not need cache view everywhere
Scalability 3: Efficient
• Different components have different task
  loading, SOA can scale by service.


             Balancer                                 Load

    Services A    Services A    Services B   Services B    Services B   Services B

• Different services can be inside different
  • You can only open public web and
    services, others are inside firewall.
• HTTP is the common interface, SOA help
  you integrate them:
 • Multiple languages
 • Internal system e.g. Full-text searching engine
 • Legacy database, system
 • External vendors

• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)

• Isolate problem
• Mocking API calls
 • Reduce the time to run test suite
Reduce Local
• Team modularity along the same module
  splits as your software
• Understandability: The amount of code is
  minimized to a quantity understandable by
  a small team
• Source code control

• Partition into Separate Services
• API Design
• Which Protocol
How to partition into
 Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Partitioning by Minimizing Joins
• Partitioning by Iteration Speed
API Design

• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
Physical Models &
     Logical Models
• Physical models are mapped to database
  tables through ORM. (It’s 3NF)
• Logical models are mapped to your
  business problem. (External API use it)
• Logical models are mapped to physical
  models by you.
Logical Models
• Not relational or normalized
• Maintainability
  • can change with no change to data store
  • can stay the same while the data store
• Better fit for REST interfaces
• Better caching
Which Protocol?

RESTful Web services

• Rails way
• REST is about resources
 • URL
The tool set

• Web framework
• XML Parser
• JSON Parser
• HTTP Client
Web framework

• We do not need controller, view too much
• Rails is a little more, how about Sinatra?
• Rails metal

• Mapping RESTful resources as models in a
  Rails application.
• But not useful in practice, why?
XML parser

• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
  Reader parser. Among Nokogiri’s many
  features is the ability to search documents
  via XPath or CSS3 selectors.
JSON Parser

• http://github.com/brianmario/yajl-ruby/
• An extremely efficient streaming JSON
  parsing and encoding library. Ruby C
  bindings to Yajl
HTTP Client

• http://github.com/pauldix/typhoeus/
• Typhoeus runs HTTP requests in parallel
  while cleanly encapsulating handling logic

• Define your logical model (i.e. your service
  request result) first.

• model.to_json and model.to_xml is easy to
  use, but not useful in practice.
6.Distributed File System
 •   NFS not scale
     •   we can use rsync to duplicate
 •   MogileFS
     •   http://www.danga.com/mogilefs/
     •   http://seattlerb.rubyforge.org/mogilefs-client/
 •   Amazon S3
 •   HDFS (Hadoop Distributed File System)
 •   GlusterFS
7.Distributed Database

• CAP theorem
 • Eventually consistent
• HBase/Cassandra/Voldemort
The End
•   Books&Articles:
    •    Distributed Programming with Ruby, Mark Bates (Addison Wesley)
    •    Enterprise Rails, Dan Chak (O’Reilly)
    •    Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)
    •    RESTful Web Services, Richardson&Ruby (O’Reilly)
    •    RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)
    •    Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)
    •    Ruby in Practice, McAnally&Arkin (Manning)

    •    Building Scalable Web Sites, Cal Henderson (O’Reilly)
    •    Background Processing in Rails, Erik Andrejko (Rails Magazine)
    •    Background Processing with Delayed_Job, James Harrison (Rails Magazine)
    •    Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)
    •                 Web   点          (                 )
•   Slides:
    •    Background Processing (Rob Mack) Austin on Rails - April 2009
    •    The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)
    •    Asynchronous Processing (Jonathan Dahl)
    •    Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008
    •    Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008
    •    Physical Models & Logical Models in Rails, dan chak
•   Links:
    •   http://segment7.net/projects/ruby/drb/
    •   http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces
    •   http://github.com/blog/542-introducing-resque
    •   http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/
    •   http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/
    •   http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html
    •   http://blog.gslin.org/archives/2009/07/25/2065/
    •   http://www.javaeye.com/topic/524977
    •   http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Todo (maybe next time)
•   AMQP/RabbitMQ example code
    •   How about Nanite?
•   XMPP
•   MagLev VM
•   More MapReduce example code
    •   How about Amazon Elastic MapReduce?
•   Resque example code
•   More SOA example and code
•   MogileFS example code

