Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Distributed
Ruby and Rails
      @ihower
   http://ihower.tw
        2010/1
About Me
•           a.k.a. ihower
    • http://ihower.tw
    • http://twitter.com/ihower
    • http://github.com/ihower
• Ruby on Rails Developer since 2006
• Ruby Taiwan Community
 • http://ruby.tw
Agenda
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA for Rails
• Distributed Filesystem
• Distributed database
1.Distributed Ruby

• DRb
• Rinda
• Starfish
• MapReduce
• MagLev VM
DRb

• Ruby's RMI                 system
             (remote method invocation)


• an object in one Ruby process can invoke
  methods on an object in another Ruby
  process on the same or a different machine
DRb (cont.)

• no defined interface, faster development time
• tightly couple applications, because no
  defined API, but rather method on objects
• unreliable under large-scale, heavy loads
  production environments
server example 1
require 'drb'

class HelloWorldServer

      def say_hello
          'Hello, world!'
      end

end

DRb.start_service("druby://127.0.0.1:61676",
HelloWorldServer.new)
DRb.thread.join
client example 1
require 'drb'

server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

puts server.say_hello
puts server.inspect

# Hello, world!
# <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby://
127.0.0.1:61676">
example 2
# user.rb
class User

  attr_accessor :username

end
server example 2
require 'drb'
require 'user'

class UserServer

  attr_accessor :users

  def find(id)
    self.users[id-1]
  end

end

user_server = UserServer.new
user_server.users = []
5.times do |i|
  user = User.new
  user.username = i + 1
  user_server.users << user
end

DRb.start_service("druby://127.0.0.1:61676", user_server)
DRb.thread.join
client example 2
require 'drb'

user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

user = user_server.find(2)

puts user.inspect
puts "Username: #{user.username}"
user.name = "ihower"
puts "Username: #{user.username}"
Err...

# <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo:
tUser006:016@usernameia">
# client2.rb:8: undefined method `username' for
#<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
Why? DRbUndumped
•   Default DRb operation

    • Pass by value
    • Must share code
• With DRbUndumped
 • Pass by reference
 • No need to share code
Example 2 Fixed
# user.rb
class User

  include DRbUndumped

  attr_accessor :username

end

# <DRb::DRbObject:0x1003b84f8 @ref=2149433940,
@uri="druby://127.0.0.1:61676">
# Username: 2
# Username: ihower
Why use DRbUndumped?

 • Big objects
 • Singleton objects
 • Lightweight clients
 • Rapidly changing software
ID conversion
• Converts reference into DRb object on server
 • DRbIdConv (Default)
 • TimerIdConv
 • NamedIdConv
 • GWIdConv
Beware of garbage
         collection
•   referenced objects may be collected on
    server (usually doesn't matter)
•   Building Your own ID Converter if you want
    to control persistent state.
DRb security
require 'drb'

ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")
class << ro
    undef :instance_eval
end

# !!!!!!!! WARNING !!!!!!!!! DO NOT RUN
ro.instance_eval("`rm -rf *`")
$SAFE=1

instance_eval':
Insecure operation - instance_eval (SecurityError)
DRb security (cont.)

• Access Control Lists (ACLs)
 • via IP address array
 • still can run denial-of-service attack
• DRb over SSL
Rinda

• Rinda is a Ruby port of Linda distributed
    computing paradigm.
•   Linda is a model of coordination and communication among several parallel processes
    operating upon objects stored in and retrieved from shared, virtual, associative memory. This
    model is implemented as a "coordination language" in which several primitives operating on
    ordered sequence of typed data objects, "tuples," are added to a sequential language, such
    as C, and a logically global associative memory, called a tuplespace, in which processes
    store and retrieve tuples. (WikiPedia)
Rinda (cont.)

• Rinda consists of:
 • a TupleSpace implementation
 • a RingServer that allows DRb services to
    automatically discover each other.
RingServer

• We hardcoded IP addresses in DRb
  program, it’s tight coupling of applications
  and make fault tolerance difficult.
• RingServer can detect and interact with
  other services on the network without
  knowing IP addresses.
1. Where Service X?

                                                          RingServer
                                                          via broadcast UDP
                                                                address
                2. Service X: 192.168.1.12




  Client
@192.1681.100

                        3. Hi, Service X @ 192.168.1.12



                                                           Service X
                                                           @ 192.168.1.12
                  4. Hi There 192.168.1.100
ring server example
require 'rinda/ring'
require 'rinda/tuplespace'

DRb.start_service
Rinda::RingServer.new(Rinda::TupleSpace.new)
DRb.thread.join
service example
require 'rinda/ring'

class HelloWorldServer
    include DRbUndumped # Need for RingServer

      def say_hello
          'Hello, world!'
      end

end

DRb.start_service
ring_server = Rinda::RingFinger.primary
ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new,
                   'I like to say hi!'], Rinda::SimpleRenewer.new)

DRb.thread.join
client example
require 'rinda/ring'

DRb.start_service
ring_server = Rinda::RingFinger.primary

service = ring_server.read([:hello_world_service, nil,nil,nil])
server = service[2]

puts server.say_hello
puts service.inspect

# Hello, world!
# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650
@uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like
to say hi!"]
TupleSpaces

• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
  [:name, :Class, object, ‘description’ ]
5 Basic Operations

• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)
Starfish

• Starfish is a utility to make distributed
  programming ridiculously easy
• It runs both the server and the client in
  infinite loops
• MapReduce with ActiveRecode or Files
starfish foo.rb
# foo.rb

class Foo
  attr_reader :i

  def initialize
    @i = 0
  end

  def inc
    logger.info "YAY it incremented by 1 up to #{@i}"
    @i += 1
  end
end

server :log => "foo.log" do |object|
  object = Foo.new
end

client do |object|
  object.inc
end
starfish server example
   ARGV.unshift('server.rb')

   require 'rubygems'
   require 'starfish'

   class HelloWorld
     def say_hi
       'Hi There'
     end
   end

   Starfish.server = lambda do |object|
       object = HelloWorld.new
   end

   Starfish.new('hello_world').server
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda do |object|
     puts object.say_hi
     exit(0) # exit program immediately
   end

   Starfish.new('hello_world').client
starfish client example                 (another way)


       ARGV.unshift('server.rb')

       require 'rubygems'
       require 'starfish'

       catch(:halt) do
         Starfish.client = lambda do
       |object|
           puts object.say_hi
           throw :halt
         end

         Starfish.new
       ('hello_world').client

       end

       puts "bye bye"
MapReduce

• introduced by Google to support
  distributed computing on large data sets on
  clusters of computers.
• inspired by map and reduce functions
  commonly used in functional programming.
starfish server example
ARGV.unshift('server.rb')

require 'rubygems'
require 'starfish'

Starfish.server = lambda{ |map_reduce|
  map_reduce.type = File
  map_reduce.input = "/var/log/apache2/access.log"
  map_reduce.queue_size = 10
  map_reduce.lines_per_client = 5
  map_reduce.rescan_when_complete = false
}

Starfish.new('log_server').server
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda { |logs|
     logs.each do |log|
       puts "Processing #{log}"
       sleep(1)
     end
   }

   Starfish.new("log_server").client
Other implementations
• Skynet
 • Use TupleSpace or MySQL as message queue
 • Include an extension for ActiveRecord
 • http://skynet.rubyforge.org/
• MRToolkit based on Hadoop
 • http://code.google.com/p/mrtoolkit/
MagLev VM

• a fast, stable, Ruby implementation with
  integrated object persistence and
  distributed shared cache.
• http://maglev.gemstone.com/
• public Alpha currently
2.Distributed Message
       Queues

• Starling
• AMQP/RabbitMQ
• Stomp/ActiveMQ
• beanstalkd
what’s message queue?
          Message X
 Client                Queue



                      Check and processing




          Processor
Why not DRb?

• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
  distributed programming: reliable and scalable.
Starling
• a light-weight persistent queue server that
  speaks the Memcache protocol (mimics its
  API)
• Fast, effective, quick setup and ease of use
• Powered by EventMachine
  http://eventmachine.rubyforge.org/EventMachine.html



• Twitter’s open source project, they use it
  before 2009. (now switch to Kestrel, a port of Starling from Ruby
  to Scala)
Starling command

• sudo gem install starling-starling
 • http://github.com/starling/starling
• sudo starling -h 192.168.1.100
• sudo starling_top -h 192.168.1.100
Starling set example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.1.4:22122')

100.times do |i|
  starling.set('my_queue', i)
end

                     append to the queue, not
                     overwrite in Memcached
Starling get example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.2.4:22122')

loop do
  puts starling.get("my_queue")
end
get method
• FIFO
• After get, the object is no longer in the
  queue. You will lost message if processing
  error happened.
• The get method blocks until something is
  returned. It’s infinite loop.
Handle processing
 error exception
 require 'rubygems'
 require 'starling'

 starling = Starling.new('192.168.2.4:22122')
 results = starling.get("my_queue")

 begin
     puts results.flatten
 rescue NoMethodError => e
     puts e.message
     Starling.set("my_queue", [results])
 rescue Exception => e
     Starling.set("my_queue", results)
     raise e
 end
Starling cons

• Poll queue constantly
• RabbitMQ can subscribe to a queue that
  notify you when a message is available for
  processing.
AMQP/RabbitMQ
• a complete and highly reliable enterprise
  messaging system based on the emerging
  AMQP standard.
  • Erlang
• http://github.com/tmm1/amqp
 • Powered by EventMachine
Stomp/ActiveMQ

• Apache ActiveMQ is the most popular and
  powerful open source messaging and
  Integration Patterns provider.
• sudo gem install stomp
• ActiveMessaging plugin for Rails
beanstalkd
• Beanstalk is a simple, fast workqueue
  service. Its interface is generic, but was
  originally designed for reducing the latency
  of page views in high-volume web
  applications by running time-consuming tasks
  asynchronously.
• http://kr.github.com/beanstalkd/
• http://beanstalk.rubyforge.org/
• Facebook’s open source project
Why we need asynchronous/
 background-processing in Rails?

• cron-like processing
  text search index update etc)
                                      (compute daily statistics data, create reports, Full-



• long-running tasks             (sending mail, resizing photo’s, encoding videos,
  generate PDF, image upload to S3, posting something to twitter etc)


 • Server traffic jam: expensive request will block
     server resources(i.e. your Rails app)
  • Bad user experience: they maybe try to reload
     and reload again! (responsive matters)
3.Background-
   processing for Rails
• script/runner
• rake
• cron
• daemon
• run_later plugin
• spawn plugin
script/runner


• In Your Rails App root:
• script/runner “Worker.process”
rake

• In RAILS_ROOT/lib/tasks/dev.rake
• rake dev:process
  namespace :dev do
    task :process do
          #...
    end
  end
cron

• Cron is a time-based job scheduler in Unix-
  like computer operating systems.
• crontab -e
Whenever
          http://github.com/javan/whenever

•   A Ruby DSL for Defining Cron Jobs

• http://asciicasts.com/episodes/164-cron-in-ruby
• or http://cronedit.rubyforge.org/
          every 3.hours do
            runner "MyModel.some_process"
            rake "my:rake:task"
            command "/usr/bin/my_great_command"
          end
Daemon

• http://daemons.rubyforge.org/
• http://github.com/dougal/daemon_generator/
rufus-scheduler
   http://github.com/jmettraux/rufus-scheduler


• scheduling pieces of code (jobs)
• Not replacement for cron/at since it runs
  inside of Ruby.
           require 'rubygems'
           require 'rufus/scheduler'

           scheduler =
           Rufus::Scheduler.start_new

           scheduler.every '5s' do
               puts 'check blood pressure'
           end

           scheduler.join
Daemon Kit
   http://github.com/kennethkalmer/daemon-kit



• Creating Ruby daemons by providing a
  sound application skeleton (through a
  generator), task specific generators (jabber
  bot, etc) and robust environment
  management code.
Monitor your daemon

• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/
daemon_controller
http://github.com/FooBarWidget/daemon_controller




• A library for robust daemon management
• Make daemon-dependent applications Just
  Work without having to start the daemons
  manually.
off-load task via system
       command
# mailings_controller.rb
def deliver
  call_rake :send_mailing, :mailing_id => params[:id].to_i
  flash[:notice] = "Delivering mailing"
  redirect_to mailings_url
end

# controllers/application.rb
def call_rake(task, options = {})
  options[:rails_env] ||= Rails.env
  args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" }
  system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &"
end

# lib/tasks/mailer.rake
desc "Send mailing"
task :send_mailing => :environment do
  mailing = Mailing.find(ENV["MAILING_ID"])
  mailing.deliver
end

# models/mailing.rb
def deliver
  sleep 10 # placeholder for sending email
  update_attribute(:delivered_at, Time.now)
end
Simple Thread

after_filter do
    Thread.new do
        AccountMailer.deliver_signup(@user)
    end
end
run_later plugin
      http://github.com/mattmatt/run_later


• Borrowed from Merb
• Uses worker thread and a queue
• Simple solution for simple tasks
  run_later do
      AccountMailer.deliver_signup(@user)
  end
spawn plugin
http://github.com/tra/spawn


  spawn do
    logger.info("I feel sleepy...")
    sleep 11
    logger.info("Time to wake up!")
  end
spawn (cont.)
• By default, spawn will use the fork to spawn
  child processes.You can configure it to do
  threading.
• Works by creating new database
  connections in ActiveRecord::Base for the
  spawned block.
• Fock need copy Rails every time
threading vs. forking
•   Forking advantages:
    •   more reliable? - the ActiveRecord code is not thread-safe.
    •   keep running - subprocess can live longer than its parent.
    •   easier - just works with Rails default settings. Threading
        requires you set allow_concurrency=true and. Also,
        beware of automatic reloading of classes in development
        mode (config.cache_classes = false).
•   Threading advantages:
    •   less filling - threads take less resources... how much less?
        it depends.
    •   debugging - you can set breakpoints in your threads
Okay, we need
    reliable messaging system:
•   Persistent
•   Scheduling: not necessarily all at the same time
•   Scalability: just throw in more instances of your
    program to speed up processing
•   Loosely coupled components that merely ‘talk’
    to each other
•   Ability to easily replace Ruby with something
    else for specific tasks
•   Easy to debug and monitor
4.Message Queues
     (for Rails only)
• ar_mailer
• BackgroundDRb
• workling
• delayed_job
• resque
Rails only?

• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment
ar_mailer
       http://seattlerb.rubyforge.org/ar_mailer/



• a two-phase delivery agent for ActionMailer.
 • Store messages into the database
 • Delivery by a separate process, ar_sendmail
    later.
BackgroundDRb
            http://backgroundrb.rubyforge.org/

• BackgrounDRb is a Ruby job server and
  scheduler.
• Have scalability problem due to
  Mark Bates)
                                         (~20 servers for



• Hard to know if processing error
• Use database to persist tasks
• Use memcached to know processing result
workling
     http://github.com/purzelrakete/workling




• Gives your Rails App a simple API that you
  can use to make code run in the
  background, outside of the your request.
• Supports Starling(default), BackgroundJob,
  Spawn and AMQP/RabbitMQ Runners.
Workling/Starling
         setup
• script/plugin install git://github.com/purzelrae/
  workling.git
• sudo starling -p 15151
• RAILS_ENV=production script/
  workling_client start
Workling example
 class EmailWorker < Workling::Base
   def deliver(options)
     user = User.find(options[:id])
     user.deliver_activation_email
   end
 end


 # in your controller
 def create
     EmailWorker.asynch_deliver( :id => 1)
 end
delayed_job
• Database backed asynchronous priority
  queue
• Extracted from Shopify
• you can place any Ruby object on its queue
  as arguments
• Only load the Rails environment only once
delayed_job setup
                (use fork version)




• script/plugin install git://github.com/
  collectiveidea/delayed_job.git
• script/generate delayed_job
• rake db:migrate
delayed_job example
     send_later
def deliver
  mailing = Mailing.find(params[:id])
  mailing.send_later(:deliver)
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
end
delayed_job example
  custom workers
class MailingJob < Struct.new(:mailing_id)

  def perform
    mailing = Mailing.find(mailing_id)
    mailing.deliver
  end

end

# in your controller
def deliver
  Delayed::Job.enqueue(MailingJob.new(params[:id]))
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
end
delayed_job example
       always asynchronously


   class Device
     def deliver
       # long running method
     end
     handle_asynchronously :deliver
   end

   device = Device.new
   device.deliver
Running jobs

• rake jobs:works
  (Don’t use in production, it will exit if the database has any network connectivity
  problems.)


• RAILS_ENV=production script/delayed_job start
• RAILS_ENV=production script/delayed_job stop
Priority
                  just Integer, default is 0

• you can run multipie workers to handle different
  priority jobs
• RAILS_ENV=production script/delayed_job -min-
  priority 3 start

  Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)

  Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
Scheduled
        no guarantees at precise time, just run_after_at



Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)

Delayed::Job.enqueue(MailingJob.new(params[:id]),
                                    3, 1.month.from_now.beginning_of_month)
Configuring Dealyed
        Job
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 5 # sleep if empty queue
Delayed::Worker.max_attempts = 25
Delayed::Worker.max_run_time = 4.hours # set to the amount of time
of longest task will take
Automatic retry on failure
 • If a method throws an exception it will be
   caught and the method rerun later.
 • The method will be retried up to 25
   (default) times at increasingly longer
   intervals until it passes.
   • 108 hours at most
     Job.db_time_now + (job.attempts ** 4) + 5
Capistrano Recipes
• Remember to restart delayed_job after
  deployment
• Check out lib/delayed_job/recipes.rb
   after "deploy:stop",    "delayed_job:stop"
   after "deploy:start",   "delayed_job:start"
   after "deploy:restart", "delayed_job:restart"
Resque
             http://github.com/defunkt/resque

•   a Redis-backed library for creating background jobs,
    placing those jobs on multiple queues, and processing
    them later.
•   Github’s open source project
•   you can only place JSONable Ruby objects
•   includes a Sinatra app for monitoring what's going on
•   support multiple queues
•   you expect a lot of failure/chaos
My recommendations:

• General purpose: delayed_job
  (Github highly recommend DelayedJob to anyone whose site is not 50% background work.)



• Time-scheduled: cron + rake
5. SOA for Rails

• What’s SOA
• Why SOA
• Considerations
• The tool set
What’s SOA
           Service oriented architectures



• “monolithic” approach is not enough
• SOA is a way to design complex applications
  by splitting out major components into
  individual services and communicating via
  APIs.
• a service is a vertical slice of functionality:
  database, application code and caching layer
a monolithic web app example
                 request




             Load
            Balancer




            WebApps




            Database
a SOA example
                                     request




                                 Load
       request
                                Balancer



     WebApp                  WebApps
for Administration           for User




       Services A    Services B




        Database     Database
Why SOA? Isolation
• Shared Resources
• Encapsulation
• Scalability
• Interoperability
• Reuse
• Testability
• Reduce Local Complexity
Shared Resources
• Different front-web website use the same
  resource.
• SOA help you avoiding duplication databases
  and code.
• Why not only shared database?
 • code is not DRY                 WebApp
                              for Administration
                                                      WebApps
                                                      for User


 • caching will be problematic
                                               Database
Encapsulation

• you can change underly implementation in
  services without affect other parts of system
 • upgrade library
 • upgrade to Ruby 1.9
• you can provide API versioning
Scalability1: Partitioned
     Data Provides
•   Database is the first bottleneck, a single DB
    server can not scale. SOA help you reduce
    database load
•   Anti-pattern: only split the database              WebApps


    •   model relationship is broken
    •   referential integrity               Database
                                               A
                                                                 Database
                                                                    B


•   Myth: database replication can not help you
    speed and consistency
Scalability 2: Caching

• SOA help you design caching system easier
 • Cache data at the right times and expire
    at the right times
 • Cache logical model, not physical
 • You do not need cache view everywhere
Scalability 3: Efficient
• Different components have different task
  loading, SOA can scale by service.

                               WebApps



              Load
             Balancer                                 Load
                                                     Balancer




    Services A    Services A    Services B   Services B    Services B   Services B
Security

• Different services can be inside different
  firewall
  • You can only open public web and
    services, others are inside firewall.
Interoperability
• HTTP is the common interface, SOA help
  you integrate them:
 • Multiple languages
 • Internal system e.g. Full-text searching engine
 • Legacy database, system
 • External vendors
Reuse

• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)
Testability

• Isolate problem
• Mocking API calls
 • Reduce the time to run test suite
Reduce Local
         Complexity
• Team modularity along the same module
  splits as your software
• Understandability: The amount of code is
  minimized to a quantity understandable by
  a small team
• Source code control
Considerations

• Partition into Separate Services
• API Design
• Which Protocol
How to partition into
 Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Partitioning by Minimizing Joins
• Partitioning by Iteration Speed
API Design

• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
Physical Models &
     Logical Models
• Physical models are mapped to database
  tables through ORM. (It’s 3NF)
• Logical models are mapped to your
  business problem. (External API use it)
• Logical models are mapped to physical
  models by you.
Logical Models
• Not relational or normalized
• Maintainability
  • can change with no change to data store
  • can stay the same while the data store
    changes
• Better fit for REST interfaces
• Better caching
Which Protocol?

• SOAP
• XML-RPC
• REST
RESTful Web services

• Rails way
• REST is about resources
 • URL
 • Verbs: GET/PUT/POST/DELETE
The tool set

• Web framework
• XML Parser
• JSON Parser
• HTTP Client
Web framework

• We do not need controller, view too much
• Rails is a little more, how about Sinatra?
• Rails metal
ActiveResource

• Mapping RESTful resources as models in a
  Rails application.
• But not useful in practice, why?
XML parser

• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
  Reader parser. Among Nokogiri’s many
  features is the ability to search documents
  via XPath or CSS3 selectors.
JSON Parser

• http://github.com/brianmario/yajl-ruby/
• An extremely efficient streaming JSON
  parsing and encoding library. Ruby C
  bindings to Yajl
HTTP Client


• http://github.com/pauldix/typhoeus/
• Typhoeus runs HTTP requests in parallel
  while cleanly encapsulating handling logic
Tips

• Define your logical model (i.e. your service
  request result) first.

• model.to_json and model.to_xml is easy to
  use, but not useful in practice.
6.Distributed File System
 •   NFS not scale
     •   we can use rsync to duplicate
 •   MogileFS
     •   http://www.danga.com/mogilefs/
     •   http://seattlerb.rubyforge.org/mogilefs-client/
 •   Amazon S3
 •   HDFS (Hadoop Distributed File System)
 •   GlusterFS
7.Distributed Database

• NoSQL
• CAP theorem
 • Eventually consistent
• HBase/Cassandra/Voldemort
The End
References
•   Books&Articles:
    •    Distributed Programming with Ruby, Mark Bates (Addison Wesley)
    •    Enterprise Rails, Dan Chak (O’Reilly)
    •    Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)
    •    RESTful Web Services, Richardson&Ruby (O’Reilly)
    •    RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)
    •    Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)
    •    Ruby in Practice, McAnally&Arkin (Manning)

    •    Building Scalable Web Sites, Cal Henderson (O’Reilly)
    •    Background Processing in Rails, Erik Andrejko (Rails Magazine)
    •    Background Processing with Delayed_Job, James Harrison (Rails Magazine)
    •    Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)
    •                 Web   点          (                 )
•   Slides:
    •    Background Processing (Rob Mack) Austin on Rails - April 2009
    •    The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)
    •    Asynchronous Processing (Jonathan Dahl)
    •    Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008
    •    Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008
    •    Physical Models & Logical Models in Rails, dan chak
References
•   Links:
    •   http://segment7.net/projects/ruby/drb/
    •   http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces
    •   http://github.com/blog/542-introducing-resque
    •   http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/
    •   http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/
    •   http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html
    •   http://blog.gslin.org/archives/2009/07/25/2065/
    •   http://www.javaeye.com/topic/524977
    •   http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Todo (maybe next time)
•   AMQP/RabbitMQ example code
    •   How about Nanite?
•   XMPP
•   MagLev VM
•   More MapReduce example code
    •   How about Amazon Elastic MapReduce?
•   Resque example code
•   More SOA example and code
•   MogileFS example code

More Related Content

Distributed Ruby and Rails

  • 1. Distributed Ruby and Rails @ihower http://ihower.tw 2010/1
  • 2. About Me • a.k.a. ihower • http://ihower.tw • http://twitter.com/ihower • http://github.com/ihower • Ruby on Rails Developer since 2006 • Ruby Taiwan Community • http://ruby.tw
  • 3. Agenda • Distributed Ruby • Distributed Message Queues • Background-processing in Rails • Message Queues for Rails • SOA for Rails • Distributed Filesystem • Distributed database
  • 4. 1.Distributed Ruby • DRb • Rinda • Starfish • MapReduce • MagLev VM
  • 5. DRb • Ruby's RMI system (remote method invocation) • an object in one Ruby process can invoke methods on an object in another Ruby process on the same or a different machine
  • 6. DRb (cont.) • no defined interface, faster development time • tightly couple applications, because no defined API, but rather method on objects • unreliable under large-scale, heavy loads production environments
  • 7. server example 1 require 'drb' class HelloWorldServer def say_hello 'Hello, world!' end end DRb.start_service("druby://127.0.0.1:61676", HelloWorldServer.new) DRb.thread.join
  • 8. client example 1 require 'drb' server = DRbObject.new_with_uri("druby://127.0.0.1:61676") puts server.say_hello puts server.inspect # Hello, world! # <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby:// 127.0.0.1:61676">
  • 9. example 2 # user.rb class User attr_accessor :username end
  • 10. server example 2 require 'drb' require 'user' class UserServer attr_accessor :users def find(id) self.users[id-1] end end user_server = UserServer.new user_server.users = [] 5.times do |i| user = User.new user.username = i + 1 user_server.users << user end DRb.start_service("druby://127.0.0.1:61676", user_server) DRb.thread.join
  • 11. client example 2 require 'drb' user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676") user = user_server.find(2) puts user.inspect puts "Username: #{user.username}" user.name = "ihower" puts "Username: #{user.username}"
  • 12. Err... # <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo: tUser006:016@usernameia"> # client2.rb:8: undefined method `username' for #<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
  • 13. Why? DRbUndumped • Default DRb operation • Pass by value • Must share code • With DRbUndumped • Pass by reference • No need to share code
  • 14. Example 2 Fixed # user.rb class User include DRbUndumped attr_accessor :username end # <DRb::DRbObject:0x1003b84f8 @ref=2149433940, @uri="druby://127.0.0.1:61676"> # Username: 2 # Username: ihower
  • 15. Why use DRbUndumped? • Big objects • Singleton objects • Lightweight clients • Rapidly changing software
  • 16. ID conversion • Converts reference into DRb object on server • DRbIdConv (Default) • TimerIdConv • NamedIdConv • GWIdConv
  • 17. Beware of garbage collection • referenced objects may be collected on server (usually doesn't matter) • Building Your own ID Converter if you want to control persistent state.
  • 18. DRb security require 'drb' ro = DRbObject.new_with_uri("druby://127.0.0.1:61676") class << ro undef :instance_eval end # !!!!!!!! WARNING !!!!!!!!! DO NOT RUN ro.instance_eval("`rm -rf *`")
  • 19. $SAFE=1 instance_eval': Insecure operation - instance_eval (SecurityError)
  • 20. DRb security (cont.) • Access Control Lists (ACLs) • via IP address array • still can run denial-of-service attack • DRb over SSL
  • 21. Rinda • Rinda is a Ruby port of Linda distributed computing paradigm. • Linda is a model of coordination and communication among several parallel processes operating upon objects stored in and retrieved from shared, virtual, associative memory. This model is implemented as a "coordination language" in which several primitives operating on ordered sequence of typed data objects, "tuples," are added to a sequential language, such as C, and a logically global associative memory, called a tuplespace, in which processes store and retrieve tuples. (WikiPedia)
  • 22. Rinda (cont.) • Rinda consists of: • a TupleSpace implementation • a RingServer that allows DRb services to automatically discover each other.
  • 23. RingServer • We hardcoded IP addresses in DRb program, it’s tight coupling of applications and make fault tolerance difficult. • RingServer can detect and interact with other services on the network without knowing IP addresses.
  • 24. 1. Where Service X? RingServer via broadcast UDP address 2. Service X: 192.168.1.12 Client @192.1681.100 3. Hi, Service X @ 192.168.1.12 Service X @ 192.168.1.12 4. Hi There 192.168.1.100
  • 25. ring server example require 'rinda/ring' require 'rinda/tuplespace' DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) DRb.thread.join
  • 26. service example require 'rinda/ring' class HelloWorldServer include DRbUndumped # Need for RingServer def say_hello 'Hello, world!' end end DRb.start_service ring_server = Rinda::RingFinger.primary ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new, 'I like to say hi!'], Rinda::SimpleRenewer.new) DRb.thread.join
  • 27. client example require 'rinda/ring' DRb.start_service ring_server = Rinda::RingFinger.primary service = ring_server.read([:hello_world_service, nil,nil,nil]) server = service[2] puts server.say_hello puts service.inspect # Hello, world! # [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650 @uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like to say hi!"]
  • 28. TupleSpaces • Shared object space • Atomic access • Just like bulletin board • Tuple template is [:name, :Class, object, ‘description’ ]
  • 29. 5 Basic Operations • write • read • take (Atomic Read+Delete) • read_all • notify (Callback for write/take/delete)
  • 30. Starfish • Starfish is a utility to make distributed programming ridiculously easy • It runs both the server and the client in infinite loops • MapReduce with ActiveRecode or Files
  • 31. starfish foo.rb # foo.rb class Foo attr_reader :i def initialize @i = 0 end def inc logger.info "YAY it incremented by 1 up to #{@i}" @i += 1 end end server :log => "foo.log" do |object| object = Foo.new end client do |object| object.inc end
  • 32. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' class HelloWorld def say_hi 'Hi There' end end Starfish.server = lambda do |object| object = HelloWorld.new end Starfish.new('hello_world').server
  • 33. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda do |object| puts object.say_hi exit(0) # exit program immediately end Starfish.new('hello_world').client
  • 34. starfish client example (another way) ARGV.unshift('server.rb') require 'rubygems' require 'starfish' catch(:halt) do Starfish.client = lambda do |object| puts object.say_hi throw :halt end Starfish.new ('hello_world').client end puts "bye bye"
  • 35. MapReduce • introduced by Google to support distributed computing on large data sets on clusters of computers. • inspired by map and reduce functions commonly used in functional programming.
  • 36. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' Starfish.server = lambda{ |map_reduce| map_reduce.type = File map_reduce.input = "/var/log/apache2/access.log" map_reduce.queue_size = 10 map_reduce.lines_per_client = 5 map_reduce.rescan_when_complete = false } Starfish.new('log_server').server
  • 37. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda { |logs| logs.each do |log| puts "Processing #{log}" sleep(1) end } Starfish.new("log_server").client
  • 38. Other implementations • Skynet • Use TupleSpace or MySQL as message queue • Include an extension for ActiveRecord • http://skynet.rubyforge.org/ • MRToolkit based on Hadoop • http://code.google.com/p/mrtoolkit/
  • 39. MagLev VM • a fast, stable, Ruby implementation with integrated object persistence and distributed shared cache. • http://maglev.gemstone.com/ • public Alpha currently
  • 40. 2.Distributed Message Queues • Starling • AMQP/RabbitMQ • Stomp/ActiveMQ • beanstalkd
  • 41. what’s message queue? Message X Client Queue Check and processing Processor
  • 42. Why not DRb? • DRb has security risk and poorly designed APIs • distributed message queue is a great way to do distributed programming: reliable and scalable.
  • 43. Starling • a light-weight persistent queue server that speaks the Memcache protocol (mimics its API) • Fast, effective, quick setup and ease of use • Powered by EventMachine http://eventmachine.rubyforge.org/EventMachine.html • Twitter’s open source project, they use it before 2009. (now switch to Kestrel, a port of Starling from Ruby to Scala)
  • 44. Starling command • sudo gem install starling-starling • http://github.com/starling/starling • sudo starling -h 192.168.1.100 • sudo starling_top -h 192.168.1.100
  • 45. Starling set example require 'rubygems' require 'starling' starling = Starling.new('192.168.1.4:22122') 100.times do |i| starling.set('my_queue', i) end append to the queue, not overwrite in Memcached
  • 46. Starling get example require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') loop do puts starling.get("my_queue") end
  • 47. get method • FIFO • After get, the object is no longer in the queue. You will lost message if processing error happened. • The get method blocks until something is returned. It’s infinite loop.
  • 48. Handle processing error exception require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') results = starling.get("my_queue") begin puts results.flatten rescue NoMethodError => e puts e.message Starling.set("my_queue", [results]) rescue Exception => e Starling.set("my_queue", results) raise e end
  • 49. Starling cons • Poll queue constantly • RabbitMQ can subscribe to a queue that notify you when a message is available for processing.
  • 50. AMQP/RabbitMQ • a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. • Erlang • http://github.com/tmm1/amqp • Powered by EventMachine
  • 51. Stomp/ActiveMQ • Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns provider. • sudo gem install stomp • ActiveMessaging plugin for Rails
  • 52. beanstalkd • Beanstalk is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously. • http://kr.github.com/beanstalkd/ • http://beanstalk.rubyforge.org/ • Facebook’s open source project
  • 53. Why we need asynchronous/ background-processing in Rails? • cron-like processing text search index update etc) (compute daily statistics data, create reports, Full- • long-running tasks (sending mail, resizing photo’s, encoding videos, generate PDF, image upload to S3, posting something to twitter etc) • Server traffic jam: expensive request will block server resources(i.e. your Rails app) • Bad user experience: they maybe try to reload and reload again! (responsive matters)
  • 54. 3.Background- processing for Rails • script/runner • rake • cron • daemon • run_later plugin • spawn plugin
  • 55. script/runner • In Your Rails App root: • script/runner “Worker.process”
  • 56. rake • In RAILS_ROOT/lib/tasks/dev.rake • rake dev:process namespace :dev do task :process do #... end end
  • 57. cron • Cron is a time-based job scheduler in Unix- like computer operating systems. • crontab -e
  • 58. Whenever http://github.com/javan/whenever • A Ruby DSL for Defining Cron Jobs • http://asciicasts.com/episodes/164-cron-in-ruby • or http://cronedit.rubyforge.org/ every 3.hours do runner "MyModel.some_process" rake "my:rake:task" command "/usr/bin/my_great_command" end
  • 60. rufus-scheduler http://github.com/jmettraux/rufus-scheduler • scheduling pieces of code (jobs) • Not replacement for cron/at since it runs inside of Ruby. require 'rubygems' require 'rufus/scheduler' scheduler = Rufus::Scheduler.start_new scheduler.every '5s' do puts 'check blood pressure' end scheduler.join
  • 61. Daemon Kit http://github.com/kennethkalmer/daemon-kit • Creating Ruby daemons by providing a sound application skeleton (through a generator), task specific generators (jabber bot, etc) and robust environment management code.
  • 62. Monitor your daemon • http://mmonit.com/monit/ • http://github.com/arya/bluepill • http://god.rubyforge.org/
  • 63. daemon_controller http://github.com/FooBarWidget/daemon_controller • A library for robust daemon management • Make daemon-dependent applications Just Work without having to start the daemons manually.
  • 64. off-load task via system command # mailings_controller.rb def deliver call_rake :send_mailing, :mailing_id => params[:id].to_i flash[:notice] = "Delivering mailing" redirect_to mailings_url end # controllers/application.rb def call_rake(task, options = {}) options[:rails_env] ||= Rails.env args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" } system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &" end # lib/tasks/mailer.rake desc "Send mailing" task :send_mailing => :environment do mailing = Mailing.find(ENV["MAILING_ID"]) mailing.deliver end # models/mailing.rb def deliver sleep 10 # placeholder for sending email update_attribute(:delivered_at, Time.now) end
  • 65. Simple Thread after_filter do Thread.new do AccountMailer.deliver_signup(@user) end end
  • 66. run_later plugin http://github.com/mattmatt/run_later • Borrowed from Merb • Uses worker thread and a queue • Simple solution for simple tasks run_later do AccountMailer.deliver_signup(@user) end
  • 67. spawn plugin http://github.com/tra/spawn spawn do logger.info("I feel sleepy...") sleep 11 logger.info("Time to wake up!") end
  • 68. spawn (cont.) • By default, spawn will use the fork to spawn child processes.You can configure it to do threading. • Works by creating new database connections in ActiveRecord::Base for the spawned block. • Fock need copy Rails every time
  • 69. threading vs. forking • Forking advantages: • more reliable? - the ActiveRecord code is not thread-safe. • keep running - subprocess can live longer than its parent. • easier - just works with Rails default settings. Threading requires you set allow_concurrency=true and. Also, beware of automatic reloading of classes in development mode (config.cache_classes = false). • Threading advantages: • less filling - threads take less resources... how much less? it depends. • debugging - you can set breakpoints in your threads
  • 70. Okay, we need reliable messaging system: • Persistent • Scheduling: not necessarily all at the same time • Scalability: just throw in more instances of your program to speed up processing • Loosely coupled components that merely ‘talk’ to each other • Ability to easily replace Ruby with something else for specific tasks • Easy to debug and monitor
  • 71. 4.Message Queues (for Rails only) • ar_mailer • BackgroundDRb • workling • delayed_job • resque
  • 72. Rails only? • Easy to use/write code • Jobs are Ruby classes or objects • But need to load Rails environment
  • 73. ar_mailer http://seattlerb.rubyforge.org/ar_mailer/ • a two-phase delivery agent for ActionMailer. • Store messages into the database • Delivery by a separate process, ar_sendmail later.
  • 74. BackgroundDRb http://backgroundrb.rubyforge.org/ • BackgrounDRb is a Ruby job server and scheduler. • Have scalability problem due to Mark Bates) (~20 servers for • Hard to know if processing error • Use database to persist tasks • Use memcached to know processing result
  • 75. workling http://github.com/purzelrakete/workling • Gives your Rails App a simple API that you can use to make code run in the background, outside of the your request. • Supports Starling(default), BackgroundJob, Spawn and AMQP/RabbitMQ Runners.
  • 76. Workling/Starling setup • script/plugin install git://github.com/purzelrae/ workling.git • sudo starling -p 15151 • RAILS_ENV=production script/ workling_client start
  • 77. Workling example class EmailWorker < Workling::Base def deliver(options) user = User.find(options[:id]) user.deliver_activation_email end end # in your controller def create EmailWorker.asynch_deliver( :id => 1) end
  • 78. delayed_job • Database backed asynchronous priority queue • Extracted from Shopify • you can place any Ruby object on its queue as arguments • Only load the Rails environment only once
  • 79. delayed_job setup (use fork version) • script/plugin install git://github.com/ collectiveidea/delayed_job.git • script/generate delayed_job • rake db:migrate
  • 80. delayed_job example send_later def deliver mailing = Mailing.find(params[:id]) mailing.send_later(:deliver) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  • 81. delayed_job example custom workers class MailingJob < Struct.new(:mailing_id) def perform mailing = Mailing.find(mailing_id) mailing.deliver end end # in your controller def deliver Delayed::Job.enqueue(MailingJob.new(params[:id])) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  • 82. delayed_job example always asynchronously class Device def deliver # long running method end handle_asynchronously :deliver end device = Device.new device.deliver
  • 83. Running jobs • rake jobs:works (Don’t use in production, it will exit if the database has any network connectivity problems.) • RAILS_ENV=production script/delayed_job start • RAILS_ENV=production script/delayed_job stop
  • 84. Priority just Integer, default is 0 • you can run multipie workers to handle different priority jobs • RAILS_ENV=production script/delayed_job -min- priority 3 start Delayed::Job.enqueue(MailingJob.new(params[:id]), 3) Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
  • 85. Scheduled no guarantees at precise time, just run_after_at Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now) Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 1.month.from_now.beginning_of_month)
  • 86. Configuring Dealyed Job # config/initializers/delayed_job_config.rb Delayed::Worker.destroy_failed_jobs = false Delayed::Worker.sleep_delay = 5 # sleep if empty queue Delayed::Worker.max_attempts = 25 Delayed::Worker.max_run_time = 4.hours # set to the amount of time of longest task will take
  • 87. Automatic retry on failure • If a method throws an exception it will be caught and the method rerun later. • The method will be retried up to 25 (default) times at increasingly longer intervals until it passes. • 108 hours at most Job.db_time_now + (job.attempts ** 4) + 5
  • 88. Capistrano Recipes • Remember to restart delayed_job after deployment • Check out lib/delayed_job/recipes.rb after "deploy:stop", "delayed_job:stop" after "deploy:start", "delayed_job:start" after "deploy:restart", "delayed_job:restart"
  • 89. Resque http://github.com/defunkt/resque • a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. • Github’s open source project • you can only place JSONable Ruby objects • includes a Sinatra app for monitoring what's going on • support multiple queues • you expect a lot of failure/chaos
  • 90. My recommendations: • General purpose: delayed_job (Github highly recommend DelayedJob to anyone whose site is not 50% background work.) • Time-scheduled: cron + rake
  • 91. 5. SOA for Rails • What’s SOA • Why SOA • Considerations • The tool set
  • 92. What’s SOA Service oriented architectures • “monolithic” approach is not enough • SOA is a way to design complex applications by splitting out major components into individual services and communicating via APIs. • a service is a vertical slice of functionality: database, application code and caching layer
  • 93. a monolithic web app example request Load Balancer WebApps Database
  • 94. a SOA example request Load request Balancer WebApp WebApps for Administration for User Services A Services B Database Database
  • 95. Why SOA? Isolation • Shared Resources • Encapsulation • Scalability • Interoperability • Reuse • Testability • Reduce Local Complexity
  • 96. Shared Resources • Different front-web website use the same resource. • SOA help you avoiding duplication databases and code. • Why not only shared database? • code is not DRY WebApp for Administration WebApps for User • caching will be problematic Database
  • 97. Encapsulation • you can change underly implementation in services without affect other parts of system • upgrade library • upgrade to Ruby 1.9 • you can provide API versioning
  • 98. Scalability1: Partitioned Data Provides • Database is the first bottleneck, a single DB server can not scale. SOA help you reduce database load • Anti-pattern: only split the database WebApps • model relationship is broken • referential integrity Database A Database B • Myth: database replication can not help you speed and consistency
  • 99. Scalability 2: Caching • SOA help you design caching system easier • Cache data at the right times and expire at the right times • Cache logical model, not physical • You do not need cache view everywhere
  • 100. Scalability 3: Efficient • Different components have different task loading, SOA can scale by service. WebApps Load Balancer Load Balancer Services A Services A Services B Services B Services B Services B
  • 101. Security • Different services can be inside different firewall • You can only open public web and services, others are inside firewall.
  • 102. Interoperability • HTTP is the common interface, SOA help you integrate them: • Multiple languages • Internal system e.g. Full-text searching engine • Legacy database, system • External vendors
  • 103. Reuse • Reuse across multiple applications • Reuse for public APIs • Example: Amazon Web Services (AWS)
  • 104. Testability • Isolate problem • Mocking API calls • Reduce the time to run test suite
  • 105. Reduce Local Complexity • Team modularity along the same module splits as your software • Understandability: The amount of code is minimized to a quantity understandable by a small team • Source code control
  • 106. Considerations • Partition into Separate Services • API Design • Which Protocol
  • 107. How to partition into Separate Services • Partitioning on Logical Function • Partitioning on Read/Write Frequencies • Partitioning by Minimizing Joins • Partitioning by Iteration Speed
  • 108. API Design • Send Everything you need • Parallel HTTP requests • Send as Little as Possible • Use Logical Models
  • 109. Physical Models & Logical Models • Physical models are mapped to database tables through ORM. (It’s 3NF) • Logical models are mapped to your business problem. (External API use it) • Logical models are mapped to physical models by you.
  • 110. Logical Models • Not relational or normalized • Maintainability • can change with no change to data store • can stay the same while the data store changes • Better fit for REST interfaces • Better caching
  • 111. Which Protocol? • SOAP • XML-RPC • REST
  • 112. RESTful Web services • Rails way • REST is about resources • URL • Verbs: GET/PUT/POST/DELETE
  • 113. The tool set • Web framework • XML Parser • JSON Parser • HTTP Client
  • 114. Web framework • We do not need controller, view too much • Rails is a little more, how about Sinatra? • Rails metal
  • 115. ActiveResource • Mapping RESTful resources as models in a Rails application. • But not useful in practice, why?
  • 116. XML parser • http://nokogiri.org/ • Nokogiri ( ) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.
  • 117. JSON Parser • http://github.com/brianmario/yajl-ruby/ • An extremely efficient streaming JSON parsing and encoding library. Ruby C bindings to Yajl
  • 118. HTTP Client • http://github.com/pauldix/typhoeus/ • Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
  • 119. Tips • Define your logical model (i.e. your service request result) first. • model.to_json and model.to_xml is easy to use, but not useful in practice.
  • 120. 6.Distributed File System • NFS not scale • we can use rsync to duplicate • MogileFS • http://www.danga.com/mogilefs/ • http://seattlerb.rubyforge.org/mogilefs-client/ • Amazon S3 • HDFS (Hadoop Distributed File System) • GlusterFS
  • 121. 7.Distributed Database • NoSQL • CAP theorem • Eventually consistent • HBase/Cassandra/Voldemort
  • 123. References • Books&Articles: • Distributed Programming with Ruby, Mark Bates (Addison Wesley) • Enterprise Rails, Dan Chak (O’Reilly) • Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley) • RESTful Web Services, Richardson&Ruby (O’Reilly) • RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly) • Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers) • Ruby in Practice, McAnally&Arkin (Manning) • Building Scalable Web Sites, Cal Henderson (O’Reilly) • Background Processing in Rails, Erik Andrejko (Rails Magazine) • Background Processing with Delayed_Job, James Harrison (Rails Magazine) • Bulinging Scalable Web Sites, Cal Henderson (O’Reilly) • Web 点 ( ) • Slides: • Background Processing (Rob Mack) Austin on Rails - April 2009 • The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH) • Asynchronous Processing (Jonathan Dahl) • Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008 • Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008 • Physical Models & Logical Models in Rails, dan chak
  • 124. References • Links: • http://segment7.net/projects/ruby/drb/ • http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces • http://github.com/blog/542-introducing-resque • http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/ • http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/ • http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html • http://blog.gslin.org/archives/2009/07/25/2065/ • http://www.javaeye.com/topic/524977 • http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
  • 125. Todo (maybe next time) • AMQP/RabbitMQ example code • How about Nanite? • XMPP • MagLev VM • More MapReduce example code • How about Amazon Elastic MapReduce? • Resque example code • More SOA example and code • MogileFS example code