Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Intro to memcached

Uploaded by

Mark Antony
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Intro to memcached

Uploaded by

Mark Antony
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

INTRODUCTION

TO
MEMCACHED
Tags
memcached,
performance,
scalability, php,
mySQL, caching
techniques, #ikdoeict
jurriaanpersyn.com
lead web dev at Netlog
since 4 years
php + mysql +
frontend
working on Gatcha
For who?
talk for students
professional bachelor
ICT www.ikdoeict.be
Why this talk?
One of the first things
I’ve learnt at Netlog.
Using it every single
day.
Program
- About caching
- About memcached
- Examples
- Tips & tricks
- Toolsets and other
solutions
What is caching?
A copy of real data
with faster (and/or
cheaper) access
What is caching?

• From Wikipedia: "A cache is a collection of


data duplicating original values stored
elsewhere or computed earlier, where the
original data is expensive to fetch (owing
to longer access time) or to compute,
compared to the cost of reading the
cache."

• Term introducted by IBM in the 60’s


The anatomy

• simple key/value storage


• simple operations
• save
• get
• delete
Terminology

• storage cost

• retrieval cost (network load / algorithm load)

• invalidation (keeping data up to date / removing


irrelevant data)

• replacement policy (FIFO/LFU/LRU/MRU/RANDOM


vs. Belady’s algorithm)

• cold cache / warm cache


Terminology

• cache hit and cache miss

• typical stats:

• hit ratio (hits / hits + misses)

• miss ratio (1 - hit ratio)

• 45 cache hits and 10 cache misses

• 45/(45+10) = 82% hit ratio

• 18% miss ratio


When to cache?

• caches are only efficient when the benefits


of faster access outweigh the overhead of
checking and keeping your cache up to
date

• more cache hits then cache misses


Where are caches used?

• at hardware level (cpu, hdd)


• operating systems (ram)
• web stack
• applications
• your own short term vs long term memory
Caches in the web stack

• Browser cache
• DNS cache
• Content Delivery Networks (CDN)
• Proxy servers
• Application level
• full output caching
plugin)
(eg. Wordpress WP-Cache

• ...
Caches in the web stack (cont’d)

• Application level
• opcode cache (APC)
• query cache (MySQL)
• storing denormalized results in the
database

• object cache
• storing values in php objects/classes
Efficiency of caching?

• the earlier in the process, the closer to the


original request(er), the faster
• browser cache will be faster then cache on a proxy

• but probably also the harder to get it


right
• the closer to the requester the more parameters the cache
depends on
What to cache on the server-side?

• As PHP backend developer, what to cache?


• expensive operations: operations that
work with slower resources

• database access
• reading files
(in fact, any filesystem access)

• API calls
• Heavy computations
• XML
Where to cache on the server-side?

• As PHP backend developer, where to store


cache results?

• in database (computed values,


generated html)
• you’ll still need to access your database

• in static files (generated html or


serialized php values)
• you’ll still need to access your file system
in memory!
memcached
About memcached

• Free & open source, high-performance,


distributed memory object caching system

• Generic in nature, intended for use in


speeding up dynamic web applications by
alleviating database load.

• key/value dictionary
About memcached (cont’d)

• Developed by Brad Fitzpatrick for


LiveJournal in 2003

• Now used by Netlog, Facebook, Flickr,


Wikipedia, Twitter, YouTube ...
Technically

• It’s a server
• Client access over TCP or UDP
• Servers can run in pools
• eg. 3 servers with 64GB mem each give
you a single pool of 192GB storage for
caching

• Servers are independent, clients manage


the pool
What to store in memcache?

• high demand (used often)


• expensive (hard to compute)
• common (shared accross users)
• Best? All three
What to store in memcache? (cont’d)

• Typical:
• user sessions (often)
• user data (often, shared)
• homepage data (eg. often, shared,
expensive)
What to store in memcache? (cont’d)

• Workflow:
• monitor application (query logs /
profiling)

• add a caching level


• compare speed gain
Memcached principles

• Fast network access (memcached servers close


to other application servers)

• No persistency (if your server goes down, data


in memcached is gone)

• No redundancy / fail-over
• No replication (single item in cache lives on one
server only)

• No authentication (not in shared environments)


Memcached principles (cont’d)

• 1 key is maximum 1MB


• keys are strings of 250 characters (in
application typically MD5 of user readable
string)

• No enumeration of keys (thus no list of


valid keys in cache at certain moment, list
of keys beginnen with “user_”, ...)

• No active clean-up (only clean up when


more space needed, LRU)
$ telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get foo
VALUE foo 0 2
hi
END
stats
STAT pid 8861
(etc)
Client Access

• both ASCII as Binary protocol


• in real life:
• clients available for all major languages
• C, C++, PHP, Python, Ruby, Java, Perl,
Windows, ...
PHP Clients

• Support the basics such as multiple


servers, setting values, getting values,
incrementing, decrementing and getting
stats.

• pecl/memcache
• pecl/memcached
• newer, in beta, a couple more features
PHP Client Comparison

pecl/memcache pecl/memcached
First Release Date 2004-06-08 2009-01-29 (beta)
Actively Developed? Yes Yes
External Dependency None libmemcached
Features
Automatic Key Fixup Yes No
Append/Prepend No Yes
Automatic Serialzation2 Yes Yes
Binary Protocol No Optional
CAS No Yes
Compression Yes Yes
Communication Timeout Connect Only Various Options
Consistent Hashing Yes Yes
Delayed Get No Yes
Multi-Get Yes Yes
Session Support Yes Yes
Set/Get to a specific server No Yes
Stores Numerics Converted to Strings Yes
PHP Client functions

• Memcached::add — Add an item under a new key


• Memcached::addServer — Add a server to the server
pool
• Memcached::decrement — Decrement numeric item's
value
• Memcached::delete — Delete an item
• Memcached::flush — Invalidate all items in the cache
• Memcached::get — Retrieve an item
• Memcached::getMulti — Retrieve multiple items
• Memcached::getStats — Get server pool statistics
• Memcached::increment — Increment numeric item's
value
• Memcached::set — Store an item
• ...
Output caching

• Pages with high load / expensive to


generate

• Very easy
• Very fast
• But: all the dependencies ...
• language, css, template, logged in
user’s details, ...
<?php

$html = $cache->get('mypage');
if (!$html)
{
ob_start();
echo "<html>";
// all the fancy stuff goes here
echo "</html>";
$html = ob_get_contents();
ob_end_clean();
$cache->set('mypage', $html);
}
echo $html;

?>
Data caching

• on a lower level
• easier to find all dependencies
• ideal solution for offloading database
queries

• the database is almost always the


biggest bottleneck in backend
performance problems
<?php

function getUserData($UID)
{
$key = 'user_' . $UID;
$userData = $cache->get($key);
if (!$userData)
{
$queryResult = Database::query("SELECT * FROM USERS
WHERE uid = " . (int) $UID);
$userData = $queryResult->getRow();
$cache->set($userData);
}
return $userData;
}

?>
“There are only two
hard things in
Computer Science:
cache invalidation and
naming things.”
Phil Karlton
Invalidation

• Caching for a certain amount of time


• eg. 10 minutes
• don’t delete caches
• thus: You can’t trust that data coming
from cache is correct
Invalidation (cont’d)

• Use: Great for summaries


• Overview
• Pages where it’s not that big a
problem if data is a little bit out of dat
(eg. search results)

• Good for quick and dirty optimizations


Invalidation (cont’d)

• Store forever, and expire on certain events


• the userdata example
• store userdata for ever
• when user changes any of his
preferences, throw cache away
Invalidation

• Use:
• data that is fetched more then it’s
updated

• where it’s critical the data is correct


• Improvement: instead of delete on
event, update cache on event. (Mind:
race conditions. Cache invalidation
always as close to original change as
possible!)
Uses at Netlog

• sessions (cross server)


• database results (via database class, or
object caching)

• flooding checks
• output caching (eg. for RSS feeds)
• locks
<?php
function getUserData($UID)
{
$db = DB::getInstance();
$db->prepare("SELECT *
FROM USERS
WHERE uid = {UID}");
$db->assignInt('UID', $UID);
$db->execute();
return $db->getRow();
}
?>
<?php
function getUserData($UID)
{
$db = DB::getInstance();
$db->prepare("SELECT *
FROM USERS
WHERE uid = {UID}");
$db->assignInt('UID', $UID);
$db->setCacheTTL(0); // cache forever
$db->execute();
return $db->getRow();
}
?>
<?php
function getUserData($UID, $invalidateCache = false)
{
$db = DB::getInstance();
$db->prepare("SELECT *
FROM USERS
WHERE uid = {UID}");
$db->assignInt('UID', $UID);
$db->setCacheTTL(0); // cache forever
if ($invalidateCache)
{
return $db->invalidateCache();
}
$db->execute();
return $db->getRow();
}
?>
<?php
function updateUserData($UID, $data)
{
$db = DB::getInstance();
$db->prepare("UPDATE USERS
SET ...
WHERE uid = {UID}");

...

getUserData($UID, true); // invalidate cache

return $result;
}
?>
<?php
function getLastBlogPosts($UID, $start = 0,
$limit = 10, $invalidateCache = false)
{
$db = DB::getInstance();
$db->prepare("SELECT blogid
FROM BLOGS WHERE uid = {UID}
ORDER BY dateadd DESC LIMIT {start}, {limit}");

$start; $limit; $UID;


$db->setCacheTTL(0); // cache forever
if ($invalidateCache)
{
return $db->invalidateCache();
}
$db->execute();
return $db->getResults();
}
?>
<?php
function addNewBlogPost($UID, $data)
{
$db = DB::getInstance();
$db->prepare("INSERT INTO BLOGS
...");
...

// invalidate caches
getLastBlogPosts($UID, 0, 10);
getLastBlogPosts($UID, 11, 20);
... // ???

return $result;
}
?>
<?php
function getLastBlogPosts($UID, $start = 0,
$limit = 10)
{
$cacheVersionNumber = CacheVersionNumbers::
get('lastblogsposts_' . $UID);
$db = DB::getInstance();
$db->prepare("SELECT blogid FROM ...");
...
$db->setCacheVersionNumber($cacheVersionNumber);
$db->setCacheTTL(0); // cache forever
$db->execute();
return $db->getResults();
}
?>
<?php
class CacheVersionNumbers
{
public static function get($name)
{
$result = $cache->get('cvn_' . $name);
if (!$result)
{
$result = microtime() . rand(0, 1000);
$cache->set('cvn_' . $name, $result);
}
return $result;
}

public static function bump($name)


{
return $cache->delete('cvn_' . $name);
}
}
?>
<?php
function addNewBlogPost($UID, $data)
{
$db = DB::getInstance();
$db->prepare("INSERT INTO BLOGS
...");

...

CacheVersionNumbers::bump('lastblogsposts_' . $UID);

return $result;
}
?>
Query Caching (cont’d)

• queries with JOIN and WHERE statements


are harder to cache

• often not easy to find the cache key on


update/change events

• solution: JOIN in PHP


Query Caching (cont’d)

• queries with JOIN and WHERE statements


are harder to cache

• often not easy to find the cache key on


update/change events

• solution: JOIN in PHP

• In following example: what if nickname


of user changes?
<?php
$db = DB::getInstance();
$db->prepare("SELECT c.comment_message,
c.comment_date, u.nickname
FROM COMMENTS c
JOIN USERS u
ON u.uid = c.commenter_uid
WHERE c.postid = {postID}");
...
?>
<?php
$db = DB::getInstance();
$db->prepare("SELECT c.comment_message,
c.comment_date
,

c.commenter_uid AS uid
FROM COMMENTS c
WHERE c.postid = {postID}");
...
$comments = Users::addUserDetails($comments);
...
?>
<?php
...
public static function addUserDetails($array)
{
foreach($array as &$item)
{
$item = array_merge($item,
self::getUserData($item['uid']));
// assume high hit ratio
}
return $item;
}
...
?>
So?

• Pro’s:
• speed, duh.
• queries get simpler (better for your db)
• easier porting to key/value storage
solutions

• Cons:
• You’re relying on memcached to be up
and have good hit ratios
Multi-Get Optimisations

• We reduced database access


• Memcached is faster, but access to
memcache still has it’s price

• Solution: multiget
• fetch multiple keys from memcached in
one single call

• result is array of items


Multi-Get Optimisations (cont’d)

• back to addUserDetails example


• find UID’s from array
• multiget to memcached for details of
UID’s

• for UID’s without result, do a query


• SELECT ... FROM USERS WHERE uid IN (...)

• for each fetched user, store in cache

• worst case (no hits): 1 query

• return merged cache/db results


Consistent Hashing

• client is responsible for managing pool


• hashes a certain key to a certain server
• clients can be naïve: distribute keys on size
of pool

• if one server goes down, all keys will now be


queried on other servers > cold cache

• use a client with consistent hashing


algorithms, so if server goes down, only data
on that server gets lost
Memcached Statistics

• available stats from servers include:


• uptime, #calls (get/set/...), #hits (since
uptime), #misses (since uptime)

• no enumeration, no distinguishing on types


of caches

• add own logging / statistics to monitor


effectiveness of your caching strategy
More tips ...

• Be carefull when security matters.


(Remember ‘no authentication’?)
• Working on authentication for memcached via SASL Auth
Protocol

• Caching is not an excuse not to do database


tuning. (Remember cold cache?)

• Make sure to write unit tests for your


caching classes and places where you use it.
(Debugging problems related to out-of-date
cache data is hard and boring. Very boring.)
Libraries for memcached

• Zend framework has Zend_Cache with


support for a memcached backend

• Wordpress has 3 plugins for working with


memcached

• all of the other major frameworks have some


sort of support (built in or via plugins):
Symfony, Django, CakePHP, Drupal, ...

• Gear6: memcached servers in the cloud


memcached isn’t the only caching solution

• memcachedb (persistent memcached)


• opcode caching
• APC (php compiled code cache, usable for
other purposes too)

• xCache
• eAccelerator
• Zend optimizer
Last thought

• main bottleneck in php backends is


database
• adding php servers is easier then scaling databases

• a complete caching layer before your


database layer solves a lot of performance
and scalability issues
• but being able to scale takes more then memcached

• performance tuning, beginning with identifying the slowest


and most used parts stays important, be it tuning of your
php code, memcached calls or database queries
P ER S
R D EVELO
FO
M E
G A High-score Handling
UR
YO Tournaments
M E Challenge builder
GA
IA L Achievements
S O C
O P
AT

Got an idea for a game? Great!


Gatcha For Game Developers

Game tracking
Start game and end game calls results in accurate gameplay
tracking and allows us to show who is playing the game at any
given moment, compute popularity, target games.

High-scores
You push your high-score to our API, we do the hard work of
creating different types of leader boards and rankings.

Achievements
Pushing achievements reached in your game, just takes one API
call, no configuration needed.
Gatcha For Game Developers

Multiplayer Games
We run SmartFox servers that enable you to build real-time
multiplayer games, with e.g.. in game chat

coming:

Challenges & Tournaments


Allow your game players to challenge each other, or build
challenges & contests yourself.
Gatcha For Game Developers

How to integrate?
Flash Games
We offer wrapper for AS3 and AS2 games with full
implementation of our API

Unity3D Games

OpenSocial Games
Talk to the supported containers via the Gatcha OpenSocial
Extension

Other Games
Simple iframe implementation. PHP Client API available for the
Gatcha API

Start developing in our sandbox.


Job openings

Weʼre searching for great developers!

PHP Talents
Working on integrations and the gaming platform

Flash Developers
Working on Flash Games and the gaming platform

Design Artists
Designing games and integrations
jurriaan@netlog.com
Resources, a.o.:
• memcached & apc: http://www.slideshare.net/benramsey/
caching-with-memcached-and-apc
• speed comparison: http://dealnews.com/developers/
memcachedv2.html
• php client comparison: http://code.google.com/p/memcached/
wiki/PHPClientComparison
• cakephp-memcached: http://teknoid.wordpress.com/
2009/06/17/send-your-database-on-vacation-by-using-
cakephp-memcached/
• caching basics: http://www.slideshare.net/soplakanets/caching-
basics
• caching w php: http://www.slideshare.net/JustinCarmony/
effectice-caching-w-php-caching

You might also like