Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
57 views

Web Caching Through Modified Cache Replacement Algorithm

Uploaded by

Tenma Inazuma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Web Caching Through Modified Cache Replacement Algorithm

Uploaded by

Tenma Inazuma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Web Caching Through Modified Cache Replacement

Algorithm

V. Sathiyamoorthi Dr. V. Murali Bhaskaran


Assistant Professor/CSE Principal
Sona College of Technology Paavai College of Engineering
Salemi-5, Tamil Nadu, India Paachal, 637018, Tamil Nadu, India.
sathyait2003@gmail.com.

Abstract— Web caching is a mechanism used to improve network be well exploited to analyze and discover useful information
performance by reducing network traffic, load on the Web Server about users’ interests.
and delay in accessing the Web page. This is achieved by storing
frequently accessed Web pages on proxy cache placed within the Web Mining [1] is an application of Data Mining
network. Caching can be taken place either at client side or in the techniques which deals with the retrieving knowledge from the
proxy Server. Web proxy cache can potentially improve network Web data. Web Content Mining [2] focusing on the
performance by reducing the number of requests that reaches the information available in Web pages in the form of text, images
server, the volume of data transferred through the network and and multimedia contents. Web Structure Mining [3] focusing
the delay in accessing Web page. When a requested page is not on the structure of Web sites that is inter and intra hyperlinks
present in the cache and cache is full then removal of one or more present in the Web pages. Web Usage Mining or Web Log
cached documents take pace. The performance of proxy depends
on page replacement algorithm. The decision by which document Mining deals with the extracting knowledge from server log
is evicted from the cache is depends on different kinds of files; Data source for Web usage mining mainly consist of logs
replacement policies used. A number of exiting cache replacement that are generated when users access Web servers, which is
algorithms, which attempt to reduce various cost metrics such as represented in the form of standard formats([4] and [5]).
hit rate, byte hit rate, average latency and total cost. In this paper
presents one such technique that improves the Web cache Caching Web objects at locations close to the user has been
performance by modifying Least Frequently Used (LFU) accepted as one of the solutions to Web server bottlenecks
algorithm. Proposed algorithm is based on recency, frequency and which reduce traffic over the internet and improve the
popularity of web page with the help of Web usage mining. scalability of the system. Caches act as intermediate systems
that interrupt the user requests before they arrive at the remote
Keywords- Web Caching, Page Replacement, Proxy, latency server. A Web cache checks if the requested object is available
I. INTRODUCTION in its local cache, if it’s available then cached page is sent back
to the user; otherwise the cache redirect the request to the
World Wide Web (WWW) is an evolving system of origin server. When the cache receives requested object it store
interlinked files like containing audio, images, videos, and it in local cache and forwards back the results to the user. The
other multimedia. Here Web caching is found as an important copies kept in the cache are used for subsequent users’
technology. Now a day’s World Wide Web is widely used and requests.
this has led to a substantial increase in the amount of traffic
over the internet. As a result, the Web has now become one of Web proxy cache can potentially improve network
the primary hold up to network performance. Transferring of performance by reducing
objects over the networks is leading to increase in the level of
• The number of requests that reaches the server,
traffic in the network. This increase in network traffic will
reduce bandwidth for competing requests and increase latencies • The volume of data transferred through the
for Web users. In order to reduce such access latencies, it is network and
desirable to replicate copies of frequently requested Web
objects closer to the user. Consequently, Web caching and Web • The delay that an end-user incurred while
pre -fetching has become an increasingly important topic across accessing Web page.
the network to reduce latency perceived by users. In Web
The cache replacement policy for a cache determines which
caching technology if a client is requesting a page from proxy-
documents to evict to make room for new data to be brought
server it will fetch from the server and will give response to the
into the cache. The paper is organized as follows. Section 2
client. In Web pre-fetching scheme the proxy server itself will
summarizes the related research in this field and section 3
give the response to the clients if the Web page requested is
discusses architecture of proposed work and finally, Section 4
present in the proxy server itself.
presents the comparative study of proposed work with the
The Web access logs serve as a substantial source of existing work.
information about users’ Web access patterns. So such logs can

ISBN: 978-1-4673-1601-9/12/$31.00 ©2012 IEEE 483 ICRTIT-2012


II. RELATED WORK a one day trace from University of Saskatchewan proxy. In
Balamash et al and Poplipniget al [17, 16] gives an terms of cache hit ratio, GD-Size-Frequency (1) performs the
overview of various replacement algorithms. They conclude best and GD-Size (1) ranks the second.
that GDSF outperform when cache size is small. Williams et al In Jyoti et al [18] presented an approach that will
[7] discussed that SIZE outperforms than LFU, LRU and predict the usae page access before user accessing them. They
several LRU variations in terms of different performance have used higher order markov model for predicting user next
measures; cache hit ratio and byte hit ratio. In their request. The main drawback is that, it will predict only one
experiments, they fail to consider object frequency in decision object at a time. González et al [14] evaluated the performance
making process. Rachid et al [12] proposed a strategy called of various cost based algorithm by using different content types
class-based LRU. C-LRU works as recency-based as well as like audio, video and image and so on.
size-based, aiming to obtain a well-balanced mixture between III. ARCHITECTURE
large and small documents in the cache, and hence, good
performance for both small and large objects requests. The The complete block diagram for the proposed scheme is as
caching strategy class-based LRU is a modification of standard shown in the diagram below. The key components in the
scheme are the proxy server log file preprocessor, customized
LRU.
log file processor, and access pattern generation.
Cao Pei et al [8] proposed that caching algorithms should
address the network cost. They proposed a new algorithm
called Greedy Dual-Size that combines locality, size and cost Log File Customizer
together. The cost can be defined as the downloading latency,
the network cost or other variables depending on the goal of the Web Data User and
algorithm. Authors introduced two versions of the Greedy Log Cleaning Session, Access
Dual-Size algorithm, GD-Size (1) and GD-Size (packets). GD Identificati Pattern
Size (1) sets the cost for each document to 1, and GD-Size on
(packets) sets the cost for each document to 2+(size/536).GD-
Size(1) tries to minimize miss ratio, and GD-Size(packets) tries
to minimize the network traffic resulting from the misses. The
results show that clearly, GD-Size (1) achieves the best hit ratio Figure1.Proposed System Architecture
among all algorithms across traces and cache sizes.
Martin et al [10] used trace-driven simulations to assess The processes involved in generating Web access
the performance of different cache replacement policies for pattern from Web log are:
proxy caches and utilized a trace of client requests to Web • Data Cleaning
proxy in an ISP environment to assess the performance of • Identifying frequent user and pages
several existing replacement policies. The results in this paper • Access pattern generation
are based on the most extensive Web proxy workload A. Data Cleaning
characterization yet reported in the literature.
The proxy server log file processor processes the raw proxy
The two new policies introduced in [10] are GDFS Hits
server log file to obtain a processed log file. All the proxy
and LFU-DA. The GDSF-Hits policy achieves higher hit rates server logs requests that refer to the image, css, and java script
than the GD-Size (1) policy which had the highest hit rate of all files are removed from the Web log file. Those entries like IP
the earlier policies. For small cache sizes GDSF-Hits requires address, Web pages and its size are requested by the end users
only half of the cache space to achieve the same hit rates as the are retained in the processed log file. The Log file processor
GD-Size (1) policy, and almost 16 times less space than the outputs a processed proxy log file. 70% of the requests for the
LRU policy. GDSF-Hits policy also achieves a higher byte hit cleaned log file are used for sampling and 30% of the requests
rate than does GDSize (1). are used for testing
Yan Zhao [11] compared some of the existing caching
algorithms in terms of cache hit ratio, byte hit ratio and reduced B. Log File Customizer
latency especially focusing on the reduction of the average It includes the following sub modules:
downloading latency. In his paper, LRU, four frequency-based • Frequent user identification
algorithms - LFU, LFU*, LFU-Aging, LFU*-Aging, and four • Frequent page identification
variations of GD-Size-based algorithms were compared. He • Access Pattern generation
used a very simple proxy cache model. Before deciding the
levels of different cache sizes, he did an experiment to //Frequent user identification
demonstrate performance that a proxy cache could ever achieve //Input: cleaned Web log containing<IP, Page>
assuming an infinite size of cache, i.e. no file is ever removed //Output: Frequent user FU
from the cache. Because limited cache storage is assumed in all For each record present in the cleaned log
the simulations of the paper, the performance that any Retrieve the field containing the IP
replacement algorithm can achieve are less than the results If (count (IP)>2) then
assuming infinite cache size. The workload used in the paper is Add IP to FU

484 ICRTIT-2012
Else 09/. These represent the traces for proxy server installation at
Ignore the record Research Triangle Park, North Carolina for the dates
End 01/09/2007. For evaluating the proposed work we have used
Increment the pointer to point the next Record 85% of data for Access pattern generation and remaining 15%
End data is used for testing our scheme.Table1 shows details of
datasets used for testing the proposed method and its
//Frequent page identification preprocessing details. We plotted a graph for no. of misses
//Input: cleaned Web log containing<IP, Page> against fixed number of cache size. The performance
//Output: Frequent page FP comparison of LFU and GDS algorithm is shown in figure 2.It
For each record present in the cleaned log shows that GDS outperform only if cache size is larger. In
Retrieve the field containing the URL figure 3 its show that MLFU perform much better than LFU if
If(count(URL)>2) then gradually increasing the cache size. In figure 4 MGDSF
Add URL to FP outperform than GDSF if cache size is increased. Figure 5 and
Else figure 6 shows performance of MLFU over different datasets.
Ignore the record Graph plotted here shows the performance in terms of hit ratio
End with different cache size
Increment the pointer to point the next record Algorithm 1: LFU Cache system
End {
Initialize
//Access Pattern Generation fi=0;
//Input:Frequent user FU For each request
//Output:Frequent page FP Let current requested document to be i
For each useri present in the FU if i is already in cache then
For each pagej present in the FP fi++
Count the number of times pagej is accessed by useri Update its priority by Pri=fi;
Store it in access [useri][pagej]=count Else
End If cache is full then
End Evict i such that Pri= min (Pri)
End
//Algorithm for counting total page count Load i into the cache
//input: Access pattern fi=1;
//output: pagefreq[] Pri=fi;
For i=0 to FP.length End
For j=0 to FU.length End
Count the no.of time each userj accessing pagei }
store it in pagefreq[i] Algorithm 2: MLFU Cache system
End { //input: fpu – past page access frequency
End Initialize
Log File Customizer uses the data obtained from the L = 0;
cleaned proxy log file for sampling. A sub module of the log fpi=0;
file customizer includes user identification and session For each request
identification. In user identification unique users are identified let current requested document to be i
based on IP address and stored in one dimensional array FU. In Fetch future access of document i into fpi if its found
session identification for each user it identifies the set of URL in access pattern
requested by particular user during the predefined time period. If i is already in cache then
Once users and pages is identified then frequent user and fi++
frequent page is identified by counting total number of visits. Update Pri=L+ (fi+fpi);
Here threshold value of 2 is used. Else
While there is not enough room in cache for i then
IV. RESULTS AND DISCUSSIONS Let L = min (Pri), for all i in cache
Dataset for testing have been obtained from IRCACHE. Evict i such that Pri= L
IRCache is a NLANR (National Laboratory of Applied End
Network Research) project that encourages Web caching and Load i into cache
provides data for researchers. The datasets for testing proposed fi=1;
work have been obtained from Pri=L+ (fi+fpi);
under:ftp.ircache.net/Traces/bo2[1].sanitized-access.2007-01- End

485 ICRTIT-2012
TABLE1: DATASETS AND ITS PREPROCESSING DETAILS
Data source Duration Size Of The Size Of No. Of No. of No. of No. of
name Data Source Data Source Unique Unique Frequent Frequent
Before After User Pages User Page
Preprocessing Preprocessing
bo2[1].sanitized- 24 hours 30.2MB 1.41MB 63 2545 48 122
access.20070110 1/9/2007
bo2[1].sanitized- 24 Hours 35.4MB 1.43MB 63 2545 54 280
access.20070109 1/10/2007

In GDS the priority of a page in cache is determined by 700


Pri=L+ Costi / Sizei;
Where cost is set to 1and size is size of a pagei in bytes 690
No. of 680
GDS Vs LFU Hits gdsf
700
670
GDS
660 GDSF
650 LFU

650 MGDSF
600
640
550
No. of Page 10 20 30 40 50 60
Fault 500

450 Cache Size

400
Figure 4: GDS Vs GDSF Vs MGDSF

350
720
300
715
710
250
705
No. of
700
200 Hits
695 lfu
20 30 40 690
Cache Size in KB 685 mlfu
Figure 2: LFU Vs GDS
680

LFU Vs MLFU
300

MLFU
LFU
295
Cache Size in Bytes

290 Figure 5: LFU Vs MLFU on Dataset1


No. Of Page
Fault 1100
285
1098
1096
280 1094
No. of 1092
Hits lfu
275
1090
1088
mlfu
1086
270

265
70 80 90 100 110 120
Cache Size in KB
Figure 3: LFU Vs MLFU Cache Size in Bytes
Figure 6: LFU Vs MLFU on Dataset2

486 ICRTIT-2012
V. CONCLUSION [6] S.Williams, M.Abrams, C.R. Standbridge, G.Abdulla and E.A.Fox.
Removal Policies in Network Caches for World-Wide Web Documents.
Web caching is used to reduce server workload by storing Proceedings of the ACM Sigcomm96, August, 1996, Stanford
data in proxy caches that is placed over the network. More University.
number of researches has been done in order to evaluate the [7] Martin F. Arlitt and Carey L. Williamson Trace-Driven Simulation of
Document Caching Strategies for Internet Web Servers Simulation
performance of different caching replacement algorithms, in vol.68, Jan. 1997, pp.23-33
terms of the number of requests that reaches the server, the [8] Pei Cao, Sandy Irani. Cost-Aware WWW Proxy Caching Algorithms.
volume of data and latency an user experiences in retrieving a Research Report, Department of Computer Science, University of
document. The proposed work provides the survey of different Wisconsin-Madison.
cache replacement policies and comparison based on their [9] Martin F. Arlitt, Ludmila Cherkasova, John Dilley, Rich Friedrich, Tai
performance. Its purpose is to but to identify common designs Jin: Evaluating content management techniques for Web proxy caches.
SIGMETRICS Performance Evaluation Review 27(4): 3-11 (2000)
and compare one solution with another. Based on the survey,
[10] Zhao, Yanping. Trace-Driven Simulation of Caching Strategies for
we have come to the conclusion that proposed work provides Internet Web Proxy, Dec 1998.
better results when compare to GDS and LFU. This indicates [11] Boudewijn R. Haverkort, Rachid El Abdouni Khayari, Ramin Sadre: A
its suitability to save the data volume transferred in the Class-Based Least Recently Used Caching Algorithm for World-Wide
network. Web Proxies. Computer Performance Evaluation / TOOLS 2003: 273-
This system can also be used with small modifications for 290
the following applications: [12] J. B. Patil and B. V. Pawar, “GDSF#, A Better Algorithm that Optimizes
Both Hit Rate and Byte Hit Rate in Internet Web Servers”, International
• Predicting user purchase pattern of commodities Journal of Computer Science and Applications, ISSN: 0972-9038,
• Predicting sales pattern of commodities Volume 5, Number 4, Pages 1-10, 2008.
• Predicting the various other links that a user might be [13] J. B. Patil and B. V. Pawar, “Trace Driven Simulation of GDSF# and
Existing Caching Algorithms for Internet Web Servers”, Journal of
interested Computer Science, Volume 2, Issue 3, Page 573, 2008.
• Modifying the design of a site according to user taste [14] F.J. González-Cañete, E. Casilari, A. Triviño-Cabrera “a content-type
based evaluation of web Cache replacement policies” IADIS
REFERENCES International Conference Applied Computing 2007.
[1] O. Etzioni, The world-wide Web: quagmire or gold mine? [15] Khayari, R.A.” Impact of Document Types on the Performance of
Communications of the ACM 39(11) (1996) 65–68. Caching Algorithms in WWW Proxies”: ATrace Driven Simulation
[2] R. Kosala, H. Blockeel, Web mining research: a survey, SIGKDD: Study, 19th IEEE International Conference on Advanced Information
SIGKDD explorations: newsletter of the special interest group (SIG) Networking and Applications,2005.
on knowledge discovery & data mining, ACM 2 (1) (2000) 1–15. [16] Poplipnig, S. and Böszörmenyi, L.,”A Survey of Web Cache
Replacement Strategies”, ACM Computing Surveys,Vol. 35, Ner 4, pp.
[3] S. Brin, L. Page, The anatomy of a large-scale hypertextual Web search 374-398,2003.
engine, Computer Networks and ISDN Systems 30 (1–7) (1998) [17] Balamash, A. and Krunz, M., 2004, An Overview of Web Caching
107–117. Replacement Algorithms, IEEE Communications Surveys and Tutorials,
[4] Configuration file of W3C httpd, Vol. 6, Ner 2, pp. 44-56
http://www.w3.org/Daemon/User/Config/ (1995). [18] Jyoti Pandey, Amit Goel, Dr. A K Sharma “A Framework for Predictive
[5] W3C Extended Log File Format, http://www.w3.org/TR/WD- Web Prefetching at the Proxy Level using Data Mining” IJCSNS,VOL.8
logfile.html (1996). No.6, June 2008,303-308.

487 ICRTIT-2012

You might also like