Distributed Ruby and Rails
This document discusses distributed Ruby programming and using message queues with Ruby on Rails applications. It introduces several distributed Ruby technologies including DRb for remote method invocation, Rinda for distributed tuple spaces, Starfish for map-reduce programming, and the MagLev VM. It also covers various message queue systems like Starling, RabbitMQ, ActiveMQ, and beanstalkd that can be used to build scalable and reliable distributed Ruby applications.
2. About Me
• a.k.a. ihower
• http://ihower.tw
• http://twitter.com/ihower
• http://github.com/ihower
• Ruby on Rails Developer since 2006
• Ruby Taiwan Community
• http://ruby.tw
3. Agenda
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA for Rails
• Distributed Filesystem
• Distributed database
5. DRb
• Ruby's RMI system
(remote method invocation)
• an object in one Ruby process can invoke
methods on an object in another Ruby
process on the same or a different machine
6. DRb (cont.)
• no defined interface, faster development time
• tightly couple applications, because no
defined API, but rather method on objects
• unreliable under large-scale, heavy loads
production environments
7. server example 1
require 'drb'
class HelloWorldServer
def say_hello
'Hello, world!'
end
end
DRb.start_service("druby://127.0.0.1:61676",
HelloWorldServer.new)
DRb.thread.join
10. server example 2
require 'drb'
require 'user'
class UserServer
attr_accessor :users
def find(id)
self.users[id-1]
end
end
user_server = UserServer.new
user_server.users = []
5.times do |i|
user = User.new
user.username = i + 1
user_server.users << user
end
DRb.start_service("druby://127.0.0.1:61676", user_server)
DRb.thread.join
13. Why? DRbUndumped
• Default DRb operation
• Pass by value
• Must share code
• With DRbUndumped
• Pass by reference
• No need to share code
14. Example 2 Fixed
# user.rb
class User
include DRbUndumped
attr_accessor :username
end
# <DRb::DRbObject:0x1003b84f8 @ref=2149433940,
@uri="druby://127.0.0.1:61676">
# Username: 2
# Username: ihower
15. Why use DRbUndumped?
• Big objects
• Singleton objects
• Lightweight clients
• Rapidly changing software
16. ID conversion
• Converts reference into DRb object on server
• DRbIdConv (Default)
• TimerIdConv
• NamedIdConv
• GWIdConv
17. Beware of garbage
collection
• referenced objects may be collected on
server (usually doesn't matter)
• Building Your own ID Converter if you want
to control persistent state.
18. DRb security
require 'drb'
ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")
class << ro
undef :instance_eval
end
# !!!!!!!! WARNING !!!!!!!!! DO NOT RUN
ro.instance_eval("`rm -rf *`")
20. DRb security (cont.)
• Access Control Lists (ACLs)
• via IP address array
• still can run denial-of-service attack
• DRb over SSL
21. Rinda
• Rinda is a Ruby port of Linda distributed
computing paradigm.
• Linda is a model of coordination and communication among several parallel processes
operating upon objects stored in and retrieved from shared, virtual, associative memory. This
model is implemented as a "coordination language" in which several primitives operating on
ordered sequence of typed data objects, "tuples," are added to a sequential language, such
as C, and a logically global associative memory, called a tuplespace, in which processes
store and retrieve tuples. (WikiPedia)
22. Rinda (cont.)
• Rinda consists of:
• a TupleSpace implementation
• a RingServer that allows DRb services to
automatically discover each other.
23. RingServer
• We hardcoded IP addresses in DRb
program, it’s tight coupling of applications
and make fault tolerance difficult.
• RingServer can detect and interact with
other services on the network without
knowing IP addresses.
24. 1. Where Service X?
RingServer
via broadcast UDP
address
2. Service X: 192.168.1.12
Client
@192.1681.100
3. Hi, Service X @ 192.168.1.12
Service X
@ 192.168.1.12
4. Hi There 192.168.1.100
25. ring server example
require 'rinda/ring'
require 'rinda/tuplespace'
DRb.start_service
Rinda::RingServer.new(Rinda::TupleSpace.new)
DRb.thread.join
26. service example
require 'rinda/ring'
class HelloWorldServer
include DRbUndumped # Need for RingServer
def say_hello
'Hello, world!'
end
end
DRb.start_service
ring_server = Rinda::RingFinger.primary
ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new,
'I like to say hi!'], Rinda::SimpleRenewer.new)
DRb.thread.join
27. client example
require 'rinda/ring'
DRb.start_service
ring_server = Rinda::RingFinger.primary
service = ring_server.read([:hello_world_service, nil,nil,nil])
server = service[2]
puts server.say_hello
puts service.inspect
# Hello, world!
# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650
@uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like
to say hi!"]
28. TupleSpaces
• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
[:name, :Class, object, ‘description’ ]
29. 5 Basic Operations
• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)
30. Starfish
• Starfish is a utility to make distributed
programming ridiculously easy
• It runs both the server and the client in
infinite loops
• MapReduce with ActiveRecode or Files
31. starfish foo.rb
# foo.rb
class Foo
attr_reader :i
def initialize
@i = 0
end
def inc
logger.info "YAY it incremented by 1 up to #{@i}"
@i += 1
end
end
server :log => "foo.log" do |object|
object = Foo.new
end
client do |object|
object.inc
end
32. starfish server example
ARGV.unshift('server.rb')
require 'rubygems'
require 'starfish'
class HelloWorld
def say_hi
'Hi There'
end
end
Starfish.server = lambda do |object|
object = HelloWorld.new
end
Starfish.new('hello_world').server
33. starfish client example
ARGV.unshift('client.rb')
require 'rubygems'
require 'starfish'
Starfish.client = lambda do |object|
puts object.say_hi
exit(0) # exit program immediately
end
Starfish.new('hello_world').client
34. starfish client example (another way)
ARGV.unshift('server.rb')
require 'rubygems'
require 'starfish'
catch(:halt) do
Starfish.client = lambda do
|object|
puts object.say_hi
throw :halt
end
Starfish.new
('hello_world').client
end
puts "bye bye"
35. MapReduce
• introduced by Google to support
distributed computing on large data sets on
clusters of computers.
• inspired by map and reduce functions
commonly used in functional programming.
37. starfish client example
ARGV.unshift('client.rb')
require 'rubygems'
require 'starfish'
Starfish.client = lambda { |logs|
logs.each do |log|
puts "Processing #{log}"
sleep(1)
end
}
Starfish.new("log_server").client
38. Other implementations
• Skynet
• Use TupleSpace or MySQL as message queue
• Include an extension for ActiveRecord
• http://skynet.rubyforge.org/
• MRToolkit based on Hadoop
• http://code.google.com/p/mrtoolkit/
39. MagLev VM
• a fast, stable, Ruby implementation with
integrated object persistence and
distributed shared cache.
• http://maglev.gemstone.com/
• public Alpha currently
42. Why not DRb?
• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
distributed programming: reliable and scalable.
43. Starling
• a light-weight persistent queue server that
speaks the Memcache protocol (mimics its
API)
• Fast, effective, quick setup and ease of use
• Powered by EventMachine
http://eventmachine.rubyforge.org/EventMachine.html
• Twitter’s open source project, they use it
before 2009. (now switch to Kestrel, a port of Starling from Ruby
to Scala)
45. Starling set example
require 'rubygems'
require 'starling'
starling = Starling.new('192.168.1.4:22122')
100.times do |i|
starling.set('my_queue', i)
end
append to the queue, not
overwrite in Memcached
46. Starling get example
require 'rubygems'
require 'starling'
starling = Starling.new('192.168.2.4:22122')
loop do
puts starling.get("my_queue")
end
47. get method
• FIFO
• After get, the object is no longer in the
queue. You will lost message if processing
error happened.
• The get method blocks until something is
returned. It’s infinite loop.
48. Handle processing
error exception
require 'rubygems'
require 'starling'
starling = Starling.new('192.168.2.4:22122')
results = starling.get("my_queue")
begin
puts results.flatten
rescue NoMethodError => e
puts e.message
Starling.set("my_queue", [results])
rescue Exception => e
Starling.set("my_queue", results)
raise e
end
49. Starling cons
• Poll queue constantly
• RabbitMQ can subscribe to a queue that
notify you when a message is available for
processing.
50. AMQP/RabbitMQ
• a complete and highly reliable enterprise
messaging system based on the emerging
AMQP standard.
• Erlang
• http://github.com/tmm1/amqp
• Powered by EventMachine
51. Stomp/ActiveMQ
• Apache ActiveMQ is the most popular and
powerful open source messaging and
Integration Patterns provider.
• sudo gem install stomp
• ActiveMessaging plugin for Rails
52. beanstalkd
• Beanstalk is a simple, fast workqueue
service. Its interface is generic, but was
originally designed for reducing the latency
of page views in high-volume web
applications by running time-consuming tasks
asynchronously.
• http://kr.github.com/beanstalkd/
• http://beanstalk.rubyforge.org/
• Facebook’s open source project
53. Why we need asynchronous/
background-processing in Rails?
• cron-like processing
text search index update etc)
(compute daily statistics data, create reports, Full-
• long-running tasks (sending mail, resizing photo’s, encoding videos,
generate PDF, image upload to S3, posting something to twitter etc)
• Server traffic jam: expensive request will block
server resources(i.e. your Rails app)
• Bad user experience: they maybe try to reload
and reload again! (responsive matters)
57. cron
• Cron is a time-based job scheduler in Unix-
like computer operating systems.
• crontab -e
58. Whenever
http://github.com/javan/whenever
• A Ruby DSL for Defining Cron Jobs
• http://asciicasts.com/episodes/164-cron-in-ruby
• or http://cronedit.rubyforge.org/
every 3.hours do
runner "MyModel.some_process"
rake "my:rake:task"
command "/usr/bin/my_great_command"
end
60. rufus-scheduler
http://github.com/jmettraux/rufus-scheduler
• scheduling pieces of code (jobs)
• Not replacement for cron/at since it runs
inside of Ruby.
require 'rubygems'
require 'rufus/scheduler'
scheduler =
Rufus::Scheduler.start_new
scheduler.every '5s' do
puts 'check blood pressure'
end
scheduler.join
61. Daemon Kit
http://github.com/kennethkalmer/daemon-kit
• Creating Ruby daemons by providing a
sound application skeleton (through a
generator), task specific generators (jabber
bot, etc) and robust environment
management code.
62. Monitor your daemon
• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/
66. run_later plugin
http://github.com/mattmatt/run_later
• Borrowed from Merb
• Uses worker thread and a queue
• Simple solution for simple tasks
run_later do
AccountMailer.deliver_signup(@user)
end
68. spawn (cont.)
• By default, spawn will use the fork to spawn
child processes.You can configure it to do
threading.
• Works by creating new database
connections in ActiveRecord::Base for the
spawned block.
• Fock need copy Rails every time
69. threading vs. forking
• Forking advantages:
• more reliable? - the ActiveRecord code is not thread-safe.
• keep running - subprocess can live longer than its parent.
• easier - just works with Rails default settings. Threading
requires you set allow_concurrency=true and. Also,
beware of automatic reloading of classes in development
mode (config.cache_classes = false).
• Threading advantages:
• less filling - threads take less resources... how much less?
it depends.
• debugging - you can set breakpoints in your threads
70. Okay, we need
reliable messaging system:
• Persistent
• Scheduling: not necessarily all at the same time
• Scalability: just throw in more instances of your
program to speed up processing
• Loosely coupled components that merely ‘talk’
to each other
• Ability to easily replace Ruby with something
else for specific tasks
• Easy to debug and monitor
72. Rails only?
• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment
73. ar_mailer
http://seattlerb.rubyforge.org/ar_mailer/
• a two-phase delivery agent for ActionMailer.
• Store messages into the database
• Delivery by a separate process, ar_sendmail
later.
74. BackgroundDRb
http://backgroundrb.rubyforge.org/
• BackgrounDRb is a Ruby job server and
scheduler.
• Have scalability problem due to
Mark Bates)
(~20 servers for
• Hard to know if processing error
• Use database to persist tasks
• Use memcached to know processing result
75. workling
http://github.com/purzelrakete/workling
• Gives your Rails App a simple API that you
can use to make code run in the
background, outside of the your request.
• Supports Starling(default), BackgroundJob,
Spawn and AMQP/RabbitMQ Runners.
77. Workling example
class EmailWorker < Workling::Base
def deliver(options)
user = User.find(options[:id])
user.deliver_activation_email
end
end
# in your controller
def create
EmailWorker.asynch_deliver( :id => 1)
end
78. delayed_job
• Database backed asynchronous priority
queue
• Extracted from Shopify
• you can place any Ruby object on its queue
as arguments
• Only load the Rails environment only once
80. delayed_job example
send_later
def deliver
mailing = Mailing.find(params[:id])
mailing.send_later(:deliver)
flash[:notice] = "Mailing is being delivered."
redirect_to mailings_url
end
81. delayed_job example
custom workers
class MailingJob < Struct.new(:mailing_id)
def perform
mailing = Mailing.find(mailing_id)
mailing.deliver
end
end
# in your controller
def deliver
Delayed::Job.enqueue(MailingJob.new(params[:id]))
flash[:notice] = "Mailing is being delivered."
redirect_to mailings_url
end
82. delayed_job example
always asynchronously
class Device
def deliver
# long running method
end
handle_asynchronously :deliver
end
device = Device.new
device.deliver
83. Running jobs
• rake jobs:works
(Don’t use in production, it will exit if the database has any network connectivity
problems.)
• RAILS_ENV=production script/delayed_job start
• RAILS_ENV=production script/delayed_job stop
84. Priority
just Integer, default is 0
• you can run multipie workers to handle different
priority jobs
• RAILS_ENV=production script/delayed_job -min-
priority 3 start
Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)
Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
85. Scheduled
no guarantees at precise time, just run_after_at
Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)
Delayed::Job.enqueue(MailingJob.new(params[:id]),
3, 1.month.from_now.beginning_of_month)
86. Configuring Dealyed
Job
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 5 # sleep if empty queue
Delayed::Worker.max_attempts = 25
Delayed::Worker.max_run_time = 4.hours # set to the amount of time
of longest task will take
87. Automatic retry on failure
• If a method throws an exception it will be
caught and the method rerun later.
• The method will be retried up to 25
(default) times at increasingly longer
intervals until it passes.
• 108 hours at most
Job.db_time_now + (job.attempts ** 4) + 5
88. Capistrano Recipes
• Remember to restart delayed_job after
deployment
• Check out lib/delayed_job/recipes.rb
after "deploy:stop", "delayed_job:stop"
after "deploy:start", "delayed_job:start"
after "deploy:restart", "delayed_job:restart"
89. Resque
http://github.com/defunkt/resque
• a Redis-backed library for creating background jobs,
placing those jobs on multiple queues, and processing
them later.
• Github’s open source project
• you can only place JSONable Ruby objects
• includes a Sinatra app for monitoring what's going on
• support multiple queues
• you expect a lot of failure/chaos
90. My recommendations:
• General purpose: delayed_job
(Github highly recommend DelayedJob to anyone whose site is not 50% background work.)
• Time-scheduled: cron + rake
91. 5. SOA for Rails
• What’s SOA
• Why SOA
• Considerations
• The tool set
92. What’s SOA
Service oriented architectures
• “monolithic” approach is not enough
• SOA is a way to design complex applications
by splitting out major components into
individual services and communicating via
APIs.
• a service is a vertical slice of functionality:
database, application code and caching layer
93. a monolithic web app example
request
Load
Balancer
WebApps
Database
94. a SOA example
request
Load
request
Balancer
WebApp WebApps
for Administration for User
Services A Services B
Database Database
96. Shared Resources
• Different front-web website use the same
resource.
• SOA help you avoiding duplication databases
and code.
• Why not only shared database?
• code is not DRY WebApp
for Administration
WebApps
for User
• caching will be problematic
Database
97. Encapsulation
• you can change underly implementation in
services without affect other parts of system
• upgrade library
• upgrade to Ruby 1.9
• you can provide API versioning
98. Scalability1: Partitioned
Data Provides
• Database is the first bottleneck, a single DB
server can not scale. SOA help you reduce
database load
• Anti-pattern: only split the database WebApps
• model relationship is broken
• referential integrity Database
A
Database
B
• Myth: database replication can not help you
speed and consistency
99. Scalability 2: Caching
• SOA help you design caching system easier
• Cache data at the right times and expire
at the right times
• Cache logical model, not physical
• You do not need cache view everywhere
100. Scalability 3: Efficient
• Different components have different task
loading, SOA can scale by service.
WebApps
Load
Balancer Load
Balancer
Services A Services A Services B Services B Services B Services B
101. Security
• Different services can be inside different
firewall
• You can only open public web and
services, others are inside firewall.
102. Interoperability
• HTTP is the common interface, SOA help
you integrate them:
• Multiple languages
• Internal system e.g. Full-text searching engine
• Legacy database, system
• External vendors
103. Reuse
• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)
105. Reduce Local
Complexity
• Team modularity along the same module
splits as your software
• Understandability: The amount of code is
minimized to a quantity understandable by
a small team
• Source code control
107. How to partition into
Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Partitioning by Minimizing Joins
• Partitioning by Iteration Speed
108. API Design
• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
109. Physical Models &
Logical Models
• Physical models are mapped to database
tables through ORM. (It’s 3NF)
• Logical models are mapped to your
business problem. (External API use it)
• Logical models are mapped to physical
models by you.
110. Logical Models
• Not relational or normalized
• Maintainability
• can change with no change to data store
• can stay the same while the data store
changes
• Better fit for REST interfaces
• Better caching
116. XML parser
• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
Reader parser. Among Nokogiri’s many
features is the ability to search documents
via XPath or CSS3 selectors.
119. Tips
• Define your logical model (i.e. your service
request result) first.
• model.to_json and model.to_xml is easy to
use, but not useful in practice.
120. 6.Distributed File System
• NFS not scale
• we can use rsync to duplicate
• MogileFS
• http://www.danga.com/mogilefs/
• http://seattlerb.rubyforge.org/mogilefs-client/
• Amazon S3
• HDFS (Hadoop Distributed File System)
• GlusterFS
123. References
• Books&Articles:
• Distributed Programming with Ruby, Mark Bates (Addison Wesley)
• Enterprise Rails, Dan Chak (O’Reilly)
• Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)
• RESTful Web Services, Richardson&Ruby (O’Reilly)
• RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)
• Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)
• Ruby in Practice, McAnally&Arkin (Manning)
• Building Scalable Web Sites, Cal Henderson (O’Reilly)
• Background Processing in Rails, Erik Andrejko (Rails Magazine)
• Background Processing with Delayed_Job, James Harrison (Rails Magazine)
• Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)
• Web 点 ( )
• Slides:
• Background Processing (Rob Mack) Austin on Rails - April 2009
• The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)
• Asynchronous Processing (Jonathan Dahl)
• Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008
• Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008
• Physical Models & Logical Models in Rails, dan chak
125. Todo (maybe next time)
• AMQP/RabbitMQ example code
• How about Nanite?
• XMPP
• MagLev VM
• More MapReduce example code
• How about Amazon Elastic MapReduce?
• Resque example code
• More SOA example and code
• MogileFS example code