Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A short walk in the Blogistan

Published: 06 April 2006 Publication History

Abstract

The increasingly prominent new subset of Web pages, called 'blogs' differs from traditional Web pages both in characteristics and potential to applications. We explore three aspects of the blogistan: its overall scope and size, identification of emerging hot topics of discussion and link patterns, and implications both to blogs and applications such as search. Beyond blogs, we develop a general methodology of mining evolving networks and connections. The first part of our study is longitudinal-based on a five-week continuous fetch of a seed collection of nearly 10,000 blog URLs. The second part is based on a successive crawl of pages suspected to be blogs leading to a larger collection of several million URLs. The collection is examined for a variety of properties. We characterize blogs and study different facets of the link structure in blogs and its evolution over time, attributes of servers and domains that host many of the blogs including their IP addresses, and how blogs behave with respect to various HTTP/1.1 protocol issues. Inferences from our in-depth exploration are relevant to applications ranging from mining to hosting of blogs and other issues of relevance to the measurement community.

References

[1]
R. Blood, weblogs: a history and perspective. Available from: <http://www.rebeccablood.net/essays/weblog_history.html>.
[2]
D. Barry, The (Un)official Dave Barry Blog. Available from: <http://davebarry.blogspot.com>.
[3]
J.C. Mogul, F. Douglis, A. Feldmann, B. Krishnamurthy, Potential benefits of delta encoding and data compression for HTTP, in Proc. ACM SIGCOMM, pp. 181-194, Aug. 1997. Available from: <http://www.acm.org/sigcomm/sigcomm97/papers/p156.html>.
[4]
J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van Hoff, D. Hellerstein, Delta encoding in HTTP, RFC 3229, IETF, January 2002. Proposed Standard http://www.ietf.org/rfc/rfc3229.txt.
[5]
Blogads for opinion makers. Available from: <http://www.blogads.com/order_html>.
[6]
Salon radio community server. Available from: <http://blogs.salon.com/rankings.html>.
[7]
Top 100 Technorati. Available from: <http://www.technorati.com/cosmos/top100.html>.
[8]
Most watched blogs. Available from: <http://blo.gs/most-watched.php>.
[9]
The blogosphere power rankings-the most popular political blogs on the net. Available from: <http://www.rightwingnews.com/special/topblogs.php>.
[10]
Userland site report. Available from: <http://stats.userland.com/groups/radio1/report.html>.
[11]
Dennis Fetterly, M.N. and Mark, J., Manasse Wiener, a large-scale study of the evolution of web pages. Software Practice and Experience. iMay.
[12]
S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, in: Proceedings of the 7th World Wide Web Conference, 1998.
[13]
Kleinberg, J.M., Authoritative sources in a hyperlinked environment. Journal of the ACM. v46 i5. 604-632.
[14]
F. Douglis, A. Feldmann, B. Krishnamurthy, J. Mogul, Rate of Change and other Metrics: A Live Study of the World Wide Web, in: Proc. USENIX Symposium on Internet Technologies and Systems, December 1997, pp. 147-158. Available from: <http://www.research.att.com/bala/papers/roc-usits97.ps.gz>.
[15]
B. Brewington, G. Cybenko, How dynamic is the web?, in: Proceedings of the 9th World Wide Web Conference, 2000.
[16]
J. Cho, H. Garcia-Molina, The evolution of the web and implications for an incremental crawler, in: VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, Morgan Kaufmann, 2000, pp. 200-209.
[17]
J. Edwards, K. McCurley, J. Tomlin, An adaptive model for optimizing performance of an incremental web crawler, in: Proceedings of the 10th World Wide Web Conference, 2001.
[18]
R. Kumar, J. Novak, P. Raghavan, On the bursty evolution of blogspace, in: Proc. WWW, 2003.
[19]
J. Henning, The blogging iceberg, October 2003. Available from: <http://www.perseus.com/blogsurvey>.
[20]
Cohen, E., Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences. v55. 441-453.
[21]
A. Broder, S. Glassman, M. Manasse, G. Zweig, Syntactic clustering of the web, in: Proceedings of the 6th World Wide Web Conference, 1997.
[22]
B. Krishnamurthy, J. Wang, On network-aware clustering of web clients, in: Proceedings of ACM Sigcomm, August 2000. Available from: <http://www.research.att.com>/<http://bala/papers/sigcomm2k.ps>.
[23]
S. Hodgson, The unpersons group blog, October 2003. Available from:<http://unpersons.net/archives/000055.html>.
[24]
B. Krishnamurthy, M. Arlitt, PRO-COW: Protocol Compliance on the Web-A Longitudinal Study, in: Proceedings of USENIX Symposium on Internet Technologies and Systems, March 2001. Available from:<http://www.research.att.com/bala/papers/usits01.ps.gz>.
[25]
D. Mosberger, T. Jin, httperf-A Tool for Measuring Web Server Performance, in: Proc. Workshop on Internet Server Performance, June 1998, pp. 59-67. Available from:<http://www.hpl.hp.com/personal/David_Mosberger/httperf>.
[26]
Web 100. Available from: <http://www.web100.com>.
[27]
Blogger help. Available from: <http://help.blogger.com/bin/answer.py?answer=808&topic=12>.
[28]
Bloglines. Available from: <http://www.bloglines.com/topblogs>.
[29]
Userland news aggregator. Available from: <http://radio.userland.com/newsAggregator>.
[30]
Blogdex. Available from: <http://blogdex.net/about.asp>.
[31]
Daypop. Available from: <http://daypop.com>.
[32]
Mom finds out about blog. Available from: <http://www.theonion.com/3944/news3.html>.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computer Networks: The International Journal of Computer and Telecommunications Networking
Computer Networks: The International Journal of Computer and Telecommunications Networking  Volume 50, Issue 5
6 April 2006
143 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 06 April 2006

Author Tags

  1. Blog
  2. Evolving networks
  3. Hyperlinks
  4. Measurement
  5. Weblog

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)FIRJournal of Medical Systems10.1007/s10916-015-0333-039:11(1-14)Online publication date: 1-Nov-2015
  • (2010)Analysis of MySpace user profilesInformation Systems Frontiers10.1007/s10796-009-9206-812:4(361-367)Online publication date: 1-Sep-2010
  • (2009)Top-level decisions through public deliberation on the internetProceedings of the 10th Annual International Conference on Digital Government Research: Social Networks: Making Connections between Citizens, Data and Government10.5555/1556176.1556189(42-55)Online publication date: 17-May-2009
  • (2006)BlogRankProceedings of the 2nd international workshop on Advanced architectures and algorithms for internet delivery and applications10.1145/1190183.1190193(8-es)Online publication date: 10-Oct-2006
  • (2006)Cat and mouseProceedings of the 15th international conference on World Wide Web10.1145/1135777.1135829(337-346)Online publication date: 23-May-2006

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media