Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
Zookeeper is a coordination tool to let people build distributed systems easier. In this slides, the author summarizes the usage of zookeeper and provides Kazoo Python library as example.
This is Apache ZooKeeper session.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
By the end of this presentation you should be fairly clear about Apache ZooKeeper.
To watch the video or know more about the course, please visit
http://www.knowbigdata.com/page/big-data-and-hadoop-online-instructor-led-training
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
ZooKeeper is a highly available, scalable, distributed configuration, consensus, group membership, leader election, naming and coordination service. It provides a hierarchical namespace and basic operations like create, delete, and read data. It is useful for building distributed applications and services like queues. Future releases will focus on monitoring improvements, read-only mode, and failure detection models. The community is working on features like children for ephemeral nodes and viewing session information.
ZooKeeper is a coordination service for distributed systems that allows nodes to perform tasks correctly. It provides features like atomic operations, order guarantees, and high availability. ZooKeeper uses the Zab protocol to elect a leader and achieve consensus. It stores data in a hierarchical structure of znodes that can be persistent or ephemeral. Software load balancers can use ZooKeeper to store cluster configuration and direct traffic. The Apache Curator library provides a higher-level API for common ZooKeeper patterns like leader election and locks.
This document discusses ZooKeeper, an open-source server that enables distributed coordination. It provides instructions for installing ZooKeeper, describes ZooKeeper's data tree and API, and exercises for interacting with ZooKeeper including creating znodes, using watches, and setting up an ensemble across multiple servers.
The document provides an overview and summary of Curator, a client library for Apache ZooKeeper. Curator aims to simplify ZooKeeper development by providing a friendlier API, handling retries, and implementing common patterns ("recipes") like leader election and locks. It consists of a client, framework, and recipes components. The framework handles connection management and retries, while recipes implement distributed primitives. Details about common recipes like locks and leader election are provided.
Centralized Application Configuration with Spring and Apache ZookeeperRyan Gardner
From talk given at Spring One 2gx Dallas, 2014
Application configuration is an evolution. It starts as a hard-coded strings in your application and hopefully progresses to something external, such as a file or system property that can be changed without deployment. But what happens when other enterprise concerns enter the mix, such as audit requirements or access control around who can make changes? How do you maintain the consistency of values across too many application servers to manage at one time from a terminal window? The next step in the application configuration evolution is centralized configuration that can be accessed by your applications as they move through your various environments on their way to production. Such a service transfers the ownership of configuration from the last developer who touched the code to a well-versed application owner who is responsible for the configuration of the application across all environments. At Dealer.com, we have created one such solution that relies on Apache ZooKeeper to handle the storage and coordination of the configuration data and Spring to handle to the retrieval, creation and registration of configured objects in each application. The end result is a transparent framework that provides the same configured objects that could have been created using a Spring configuration, configuration file and property value wiring. This talk will cover both the why and how of our solution, with a focus on how we leveraged the powerful attributes of both Apache ZooKeeper and Spring to rid our application of local configuration files and provide a consistent mechanism for application configuration in our enterprise.
This document provides a guide for developing distributed applications that use ZooKeeper. It discusses ZooKeeper's data model including znodes, ephemeral nodes, and sequence nodes. It describes ZooKeeper sessions, watches, consistency guarantees, and available bindings. It provides an overview of common ZooKeeper operations like connecting, reads, writes, and handling watches. It also discusses program structure, common problems, and troubleshooting. The guide is intended to help developers understand key ZooKeeper concepts and how to integrate ZooKeeper coordination services into their distributed applications.
ZooKeeper is a distributed coordination service that allows distributed applications to synchronize data and configuration information. It uses a data model of directories and files, called znodes, that can contain small amounts of structured data. ZooKeeper maintains data consistency through a leader election process and quorum-based consensus algorithm called Paxos. It provides applications with synchronization primitives and configuration maintenance in a highly-available and reliable way.
ZooKeeper is an open-source coordination service for distributed applications that provides common services like leader election, configuration management, and locks in a simple interface to help distributed processes coordinate actions and share information. It provides guarantees around consistency, reliability, and timeliness to applications using its hierarchical data model and APIs. Popular distributed systems like Hadoop and Kafka use ZooKeeper for tasks such as cluster management, metadata storage, and detecting node failures.
This document outlines a presentation on developing distributed applications with Akka and Akka Cluster. It introduces Akka as a toolkit for building highly concurrent, distributed, and fault tolerant applications. It discusses concurrency paradigms like actors, dataflow, and software transactional memory. Live demos are presented showing actors, Akka remoting and clustering, and consistent replicated data types. The presentation emphasizes building distributed systems with Akka's actor model and using features like routers, deployment, and CRDTs to manage distributed state.
The document discusses the .NET driver for Cassandra. It provides an overview of the driver and how to connect to Cassandra and execute queries from .NET applications. Key points covered include how to connect to a cluster, execute queries using simple and prepared statements, handle paging of large result sets, and map query results to .NET objects. Examples are provided showing common operations like creating a session, executing queries, and updating data using batches in a .NET application connecting to Cassandra.
This document discusses Knewton's use of ZooKeeper and PettingZoo to implement distributed machine learning on a Python cluster. It begins by explaining what ZooKeeper is and how it provides services for distributed synchronization. It then discusses the state of ZooKeeper libraries for Python, including incomplete bindings and lack of high-level recipes. PettingZoo is introduced as Knewton's library that implements common ZooKeeper recipes for Python, allowing their machine learning models to be sharded and scaled across multiple machines. Distributed discovery, distributed bags, leader queues, and role matching are highlighted as key recipes that enable dynamic reconfiguration and load balancing of their distributed system.
This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.
This document discusses integrating the Python driver for Cassandra into Python applications. It covers connecting to Cassandra, executing queries, prepared statements, asynchronous queries, object mapping with cqlengine, and best practices for application development including using virtual environments. The presentation aims to make working with Cassandra from Python straightforward and high performing.
The document discusses continuous deployment and practices at Disqus for releasing code frequently. It emphasizes shipping code as soon as it is ready after it has been reviewed, passes automated tests, and some level of QA. It also discusses keeping development simple, integrating code changes through automated testing, using metrics for reporting, and doing progressive rollouts of new features to subsets of users.
Container monitoring for resource and application metrics with cAdvisor. Shipping monitoring information with the container so it is monitored irrespective of the host it runs on.
Intro to monitoring in distributed systems, cAdvisor, heapster, kubedash, kubernetes
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
This document discusses running Elasticsearch clusters on Docker containers. It describes how Docker containers are more lightweight than virtual machines and have less overhead. It provides examples of running official Elasticsearch Docker images and customizing configurations. It also covers best practices for networking, storage, constraints, and high availability when running Elasticsearch on Docker.
Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and leader election. It allows for distributed applications to synchronize transactions and configuration updates. Zookeeper uses a data model of znodes that can be persistent, ephemeral, or sequential. Clients can set watches on znodes to receive notifications of changes. Zookeeper provides consistency guarantees including sequential consistency, atomicity, and a single system image. Many large companies and open source projects use Zookeeper for coordination across distributed systems.
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
Presentation given at TriHUG (Triangle Hadoop User Group) on May 22, 2012. Gives a basic overview of Apache ZooKeeper as well as some common use cases, 3rd party libraries, and "gotchas"
Demo code available at https://github.com/mumrah/trihug-zookeeper-demo
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Curators are responsible for managing museum collections. They select items for display, direct acquisitions and loans, conduct research, and oversee conservation efforts. Effective curation requires documenting each item, providing access for researchers while protecting collections, implementing preventative conservation measures, controlling pests, planning for emergencies, and establishing policies for deaccessioning items. Curators ensure collections are well-managed, preserved, and made available to educate the public.
This document summarizes a presentation about SolrCloud shard splitting. It introduces the presenter and his background with Apache Lucene and Solr. The presentation covers an overview of SolrCloud, how documents are routed to shards in SolrCloud, the SolrCloud collections API, and the new functionality for splitting shards in Solr 4.3 to allow dynamic resharding of collections without downtime. It provides details on the shard splitting mechanism and tips for using the new functionality.
This document discusses scaling Solr using SolrCloud. It provides an overview of Solr history and architectures. It then describes how SolrCloud addresses limitations of earlier architectures by utilizing Apache ZooKeeper for coordination across Solr nodes and shards. Key concepts discussed include collections, shards, replicas, and routing queries across shards. The document also covers configuration topics like caches, indexing tuning, and monitoring.
This document discusses scaling search with Apache SolrCloud. It provides an introduction to Solr and how scaling search was difficult in previous versions due to manually managing shards and replicas. SolrCloud makes scaling easier by utilizing ZooKeeper for centralized configuration and management across a cluster. Nodes can be added to a SolrCloud cluster and will automatically be configured and assigned as shards or replicas. This allows for effortless scaling, fault tolerance, and load balancing. The document promotes upcoming features in Solr 4 and demonstrates indexing and querying in a SolrCloud cluster.
This document discusses SolrCloud failover and testing. It provides an overview of how SolrCloud uses ZooKeeper to elect an overseer node to monitor cluster state and automatically create a new replica on an available node when one goes down, allowing failover capability. It also discusses challenges with distributed testing and recommends focusing more on backfilling tests when changing code, fixing frequently failing tests, and adding more unit tests to improve Solr's testing culture.
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
This document discusses scaling SolrCloud to support large numbers of document collections. It begins by introducing SolrCloud and some of its key capabilities and terminology. It then describes four problems that can arise at large scale: high cluster state load, overseer performance issues, inflexible data management, and limitations with data export. For each problem, solutions are proposed that were implemented in Apache Solr to improve scalability, such as splitting the cluster state, optimizing the overseer, enabling more flexible data splitting and migration, and allowing distributed deep paging exports. The document concludes by describing efforts to test SolrCloud at massive scale through automated tools and cloud infrastructure.
This document discusses using Apache Geode and ActiveMQ Artemis to build a scalable IoT platform. It introduces IoT and the MQTT protocol. ActiveMQ Artemis is described as a high performance message broker that is embeddable and supports clustering. Geode is presented as a distributed in-memory data platform for building data-intensive applications that require high performance, scalability, and availability. Example users of Geode include large companies handling billions of records and thousands of transactions per second. Key capabilities of Geode like regions, functions, querying, and continuous queries are summarized.
Windows 8 apps can access data from services in several ways:
- They can call ASMX, WCF, and REST services asynchronously using HttpClient and retrieve responses.
- They can access oData services using the oData client library.
- They can retrieve RSS feeds using SyndicationClient and parse the responses.
- They can perform background transfers using BackgroundDownloader.
- They can update tiles periodically by polling a service and setting updates.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
Apache ZooKeeper is an open-source distributed coordination service that helps manage large sets of hosts. It implements coordination protocols to provide a consistent view of shared state across distributed applications or servers. ZooKeeper uses a hierarchical namespacing system called znodes to store configuration data and other information. It ensures highly reliable distributed coordination through features like leader election, group membership, and notifications.
Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, and fault-tolerant database. It originated at Facebook in 2007 to solve their inbox search problem. Some key companies using Cassandra include Twitter, Facebook, Digg, and Rackspace. Cassandra's data model is based on Google's Bigtable and its distribution design is based on Amazon's Dynamo.
Paul Dix, CTO and co-founder of InfluxData, discussed the future of InfluxDB and the release of InfluxDB 2.0 Open Source. He explained that InfluxDB 2.0 has been rebuilt from the ground up to address limitations of the original InfluxDB like lack of distributed features and poor performance for high cardinality analytics data. The new database, called InfluxDB IOx, uses a columnar data store with parquet files and is designed to be distributed, federated, and able to run analytics at scale on high cardinality data.
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
Node.js is an event-driven, asynchronous JavaScript runtime that allows JavaScript to be used for server-side scripting. It uses an event loop model that maps events to callbacks to handle concurrent connections without blocking. This allows Node.js applications to scale to many users. Modules in Node.js follow the CommonJS standard and can export functions and objects to be used by other modules. The event emitter pattern is commonly used to handle asynchronous events. Node.js is well-suited for real-time applications with intensive I/O operations but may not be the best choice for CPU-intensive or enterprise applications.
As the popularity of PostgreSQL continues to soar, many companies are exploring ways of migrating their application database over. At Redgate Software, we recently added PostgreSQL as an optional data store for SQL Monitor, our flagship monitoring application, after nearly 18 years of being backed exclusively by SQL Server. Knowing that others will be taking this journey in the near future, we'd like to discuss what we learned. In this training, we'll discuss the planning that needs to take place before a migration begins, including datatype changes, PostgreSQL configuration modifications, and query differences. This will be a mix of slides and demo from our own learnings, as well as those of some clients we've helped along the way.
DISQUS is a comment system that handles high volumes of traffic, with up to 17,000 requests per second and 250 million monthly visitors. They face challenges in unpredictable spikes in traffic and ensuring high availability. Their architecture includes over 100 servers split between web servers, databases, caching, and load balancing. They employ techniques like vertical and horizontal data partitioning, atomic updates, delayed signals, consistent caching, and feature flags to scale their large Django application.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
OrigoDB is an in-memory database toolkit that allows writing and data to exist in the same process. It uses write-ahead command logging and snapshots for persistence. The document discusses OrigoDB's architecture, data modeling approaches, testing strategies, hosting options, and configuration capabilities like different persistence modes and kernels. It provides examples of using OrigoDB for various applications and demonstrates its immutability and server capabilities.
This document provides an overview of SQL Server internals including:
- The query processing pipeline including parsing, optimizing, and executing queries.
- How the optimizer evaluates and chooses the most efficient query execution plan.
- The role of indexes, statistics, and parallelism in query optimization.
- Transaction logging and the different SQL Server recovery models.
- Key SQL Server memory structures like the buffer pool and query plan cache.
- Threading and scheduling within the SQL Server query processor.
Learning Objectives - This module will cover Advance HBase concepts. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper and how to Build Applications with Zookeeper.
This document discusses troubleshooting Oracle WebLogic performance issues. It outlines various tools that can be used for troubleshooting including operating system tools like sar and vmstat, Java tools like jps and jstat, and WebLogic-specific tools like the WebLogic Diagnostics Framework. It also covers taking thread dumps, configuring WebLogic logging and debugging options, and using the Oracle Diagnostic Logging framework.
This document discusses deploying and managing Apache Solr at scale. It introduces the Solr Scale Toolkit, an open source tool for deploying and managing SolrCloud clusters in cloud environments like AWS. The toolkit uses Python tools like Fabric to provision machines, deploy ZooKeeper ensembles, configure and start SolrCloud clusters. It also supports benchmark testing and system monitoring. The document demonstrates using the toolkit and discusses lessons learned around indexing and query performance at scale.
You can find the first part of this presentation here: https://www.slideshare.net/secret/pAvK8Qd9f07oa
This presentation takes a deep dive into how the Million Song Library, a microservices-based application, was built using the Netflix Stack, Cassandra and Datastax.
To learn more about Million Song Library and its components visit the project on GitHub: https://github.com/kenzanlabs/million-song-library
Lea
Kubernetes is an open-source system for managing containerized applications across multiple hosts. It includes key components like Pods, Services, ReplicationControllers, and a master node for managing the cluster. The master maintains state using etcd and schedules containers on worker nodes, while nodes run the kubelet daemon to manage Pods and their containers. Kubernetes handles tasks like replication, rollouts, and health checking through its API objects.
Similar to Apache zookeeper seminar_trinh_viet_dung_03_2016 (20)
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/07/intels-approach-to-operationalizing-ai-in-the-manufacturing-sector-a-presentation-from-intel/
Tara Thimmanaik, AI Systems and Solutions Architect at Intel, presents the “Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” tutorial at the May 2024 Embedded Vision Summit.
AI at the edge is powering a revolution in industrial IoT, from real-time processing and analytics that drive greater efficiency and learning to predictive maintenance. Intel is focused on developing tools and assets to help domain experts operationalize AI-based solutions in their fields of expertise.
In this talk, Thimmanaik explains how Intel’s software platforms simplify labor-intensive data upload, labeling, training, model optimization and retraining tasks. She shows how domain experts can quickly build vision models for a wide range of processes—detecting defective parts on a production line, reducing downtime on the factory floor, automating inventory management and other digitization and automation projects. And she introduces Intel-provided edge computing assets that empower faster localized insights and decisions, improving labor productivity through easy-to-use AI tools that democratize AI.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
What's Next Web Development Trends to Watch.pdfSeasiaInfotech2
Explore the latest advancements and upcoming innovations in web development with our guide to the trends shaping the future of digital experiences. Read our article today for more information.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
3. Overview – What is ZooKeeper?
• An open source, high-performance
coordination service for distributed
application.
• Exposes common services in simple
interface:
• Naming
• Configuration management
• Locks & synchronization
• Groups services
• Build your own on it for specific needs
4. Overview – Who uses ZooKeeper?
• Companies:
• Yahoo!
• Zynga
• Rackspace
• Linkedlin
• Netflix, and many more…
• Projects:
• Apache Map/Reduce (Yarn)
• Apache HBase
• Apache Kafka
• Apache Storm
• Neo4j, and many more…
5. Overview – ZooKeeper Use Cases
• Configuration Management
• Cluster member nodes bootstrapping configuration from a
centralized source in unattended way
• Distributed Cluster Management
• Node join / leave
• Node statuses in real time
• Naming service – e.g. DNS
• Distributed synchronization – locks, barriers, queues
• Leader election in a distributed system
6. The ZooKeeper Service (ZKS)
• ZooKeeper Service is replicated over a set of machines
• All machines store a copy of the data (in-memory)
• A leader is elected on service startup
• Clients only connect to a single ZooKeeper server and maintain a
TCP connection
7. The ZKS - Sessions
• Before executing any request, client must establish a
session with service
• All operations client summits to service are associated to
a session
• Client initially connects to any server in ensemble, and
only to single server.
• Session offer order guarantees – requests in session are
executed in FIFO order
8. The ZKS – Session States and Lifetime
• Main possible states: CONNECTING, CONNECTED,
CLOSED, NOT_CONNECTED
9. The ZooKeeper Data Model (ZDM)
• Hierarchal name space
• Each node is called as a ZNode
• Every ZNode has data (given as byte[])
and can optionally have children
• ZNode paths:
• Canonical, absolute, slash-separated
• No relative references
• Names can have Unicode characters
• ZNode maintain stat structure
10. ZDM - Versions
• Eash Znode has version number, is incremented every
time its data changes
• setData and delete take version as input, operation
succeeds only if client’s version is equal to server’s one
11. ZDM – ZNodes – Stat Structure
• The Stat structure for each znode in ZooKeeper is made
up of the following fields:
• czxid
• mzxid
• pzxid
• ctime
• mtime
• dataVersion
• cversion
• aclVersion
• ephemeralOwner
• dataLength
• numChildren
12. ZDM – Types of ZNode
• Persistent ZNode
• Have lifetime in ZooKeeper’s namespace until they’re explicitly
deleted (can be deleted by delete API call)
• Ephemeral ZNode
• Is deleted by ZooKeeper service when the creating client’s session
ends
• Can also be explicitly deleted
• Are not allowed to have children
• Sequential Znode
• Is assigned a sequence number by ZooKeeper as a part of name
during creation
• Sequence number is integer (4bytes) with format of 10 digits with 0
padding. E.g. /path/to/znode-0000000001
14. ZDM – Znode – Reads & Writes
• Read requests are processed locally at the ZooKeeper
server to which client is currently connected
• Write requests are forwarded to leader and go through
majority consensus before a response is generated
15. ZDM – Consistency Guarantees
• Sequential Consistency
• Atomicity
• Single System Image
• Reliability
• Timeliness (Eventual Consistency)
16. ZDM - Watches
• A watch event is one-time trigger, sent to client that set
watch, which occurs when data for which watch was set
changes.
• Watches allow clients to get notifications when a znode
changes in any way (NodeChildrenChanged,
NodeCreated, NodeDataChanged,NodeDeleted)
• All of read operations – getData(), getChildren(), exists()
– have option of setting watch
• ZooKeeper Guarantees about Watches:
• Watches are ordered, order of watch events corresponds to the
order of the updates
• A client will see a watch event for znode it is watching before
seeing the new data that corresponds to that znode
18. ZDM – Access Control List
• ZooKeeper uses ACLs to control access to its znodes
• ACLs are made up of pairs of (scheme:id, permission)
• Build-in ACL schemes
• world: has single id, anyone
• auth: doesn’t use any id, represents any authenticated user
• digest: use a username:password
• host: use the client host name as ACL id identity
• ip: use the client host IP as ACL id identity
• ACL Permissions:
• CREATE
• READ
• WRITE
• DELETE
• ADMIN
• E.g. (ip:192.168.0.0/16, READ)
19. Recipe #1: Queue
• A distributed queue is very common data structure used in
distributed systems.
• Producer: generate / create new items and put them into
queue
• Consumer: remove items from queue and process them
• Addition and removal of items follow ordering of FIFO
20. Recipe #1: Queue (cont)
• A ZNode will be designated to hold a queue instance,
queue-znode
• All queue items are stored as znodes under queue-znode
• Producers add an item to queue by creating znode under
queue-znode
• Consumers retrieve items by getting and then deleting a
child from queue-znode
QUEUE-ZNODE : “queue instance”
|-- QUEUE-0000000001 : “item1”
|-- QUEUE-0000000002 : “item2”
|-- QUEUE-0000000003 : “item3”
21. Recipe #1: Queue (cont)
• Let /_QUEUE_ represent top-level znode, is called queue-
znode
• Producer put something into queue by creating a
SEQUENCE_EPHEMERAL znode with name “queue-N”,
N is monotonically increasing number
create (“queue-”, SEQUENCE_EPHEMARAL)
• Consumer process getChildren() call on queue-znode
with watch event set to true
M = getChildren(/_QUEUE_, true)
• Client picks up items from list and continues processing
until reaching the end of the list, and then check again
• The algorithm continues until get_children() returns
empty list
22. Recipe #2: Group Membership
• A persistent Znode /membership represent the root of the
group in ZooKeeper tree
• Any client that joins the cluster creates ephemeral znode
under /membership to locate memberships in tree and set
a watch on /membership
• When another node joins or leaves the cluster, this node
gets a notification and becomes aware of the change in
group membership
23. Recipe #2: Group Membership (cont)
• Let /_MEMBERSHIP_ represent root of group membership
• Client joining the group create ephemeral nodes under root
• All members of group will register for watch events on
/_MEMBERSHIP, thereby being aware of other members in
group
L = getChildren(“/_MEMBERSHIP”, true)
• When new client joins group, all other members are notified
• Similarly, a client leaves due to failure or otherwise,
ZooKeeper automatically delete node, trigger event
• Live members know which node joined or left by looking at
the list of children L
Centralized and highly reliable (simple) data registry
Unattended = without the owner present
- Each server maintains an in-core database, which represents the entire state of the ZooKeeper namespace. To ensure that updates are durable, and thus recoverable in the event of a server crash, updates are logged to a local disk. Also, the writes are serialized to the disk before they are applied to the in-memory database
- (3) The client initially connects to any server in the ensemble, and only to a single server. It uses a TCP connection to communicate with the server, but the session may be moved to a different server if the client has not heard from its current server for some time. Moving a session to a different server is handled transparently by the ZooKeeper client library
- (4)
- A session starts at the NOT_CONNECTED state and transitions to CONNECTING (arrow 1) with the initialization of the ZooKeeper client.
- Normally, the connection to a ZooKeeper server succeeds and the session transitions to CONNECTED (arrow 2).
- When the client loses its connection to the ZooKeeper server or doesn’t hear from the server, it transitions back to CONNECTING (arrow 3) and tries to find another ZooKeeper server. If it is able to find another server or to reconnect to the original server, it transitions back to CONNECTED once the server confirms that the session is still valid.
- Otherwise, it declares the session expired and transitions to CLOSED (arrow 4).
- The application can also explicitly close the session (arrows 4 and 5)
- Each znode has a version number associated with it that is incremented every time its data changes
Zxid: Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2. zxid is 64-bit integer = 32bits EPOCH and 32bits COUNTER
Czxid: The zxid of the change that caused this znode to be created
Mzxid: The zxid of the change that last modified this znode
Pzxid: This is the transaction ID for a znode change that pertains to adding or removing children
Ctime: The time in milliseconds from epoch when this znode was created
Mtime: The time in milliseconds from epoch when this znode was last modified
dataVersion: The number of changes to the data of this znode
cVersion: The number of changes to the children of this znode
aclversion: The number of changes to the ACL of this znode
ephemeralOwner: The session id of the owner of this znode if the znode is an ephemeral node. If it is not an ephemeral node, it will be zero
dataLength: The length of the data field of this znode
numChildren: The number of children of this znode
ZNode's type is set at its creation time
(1) Persistent znodes are useful for storing data that needs to be highly available and accessible by all the components of a distributed application. For example, an application can store the configuration data in a persistent znode. The data as well as the znode will exist even if the creator client dies
(2) An end to a client's session can happen because of disconnection due to a client crash or explicit termination of the connection
The concept of ephemeral znodes can be used to build distributed applicationswhere the components need to know the state of the other constituent components or resources. For example, a distributed group membership service can be implemented by using ephemeral znodes. The property of ephemeral nodes getting deleted when the creator client's session ends can be used as an analogue of a node that is joining or leaving a distributed cluster. Using the membership service, any node is able discover the members of the group at any particular time.
READ requests such as exists(), getData(), and getChildren() are processed locally by the ZooKeeper server where the client is connected. This makes the read operations very fast in ZooKeeper
WRITE or update requests such as create(), delete(), and setData() are forwarded to the leader in the ensemble. The leader carries out the client request as a transaction. This transaction is similar to the concept of a transaction in a database management system
A ZooKeeper transaction also comprises all the steps required to successfully execute the request as a single work unit, and the updates are applied atomically
- Sequential Consistency: Updates from a client will be applied in the order that they were sent
- Atomicity: Updates either succeed or fail -- there are no partial results
- Single System Image: A client sees the same view of the service regardless of the ZK server it connects to.
- Reliability: Updates persists once applied, till overwritten by some clients. If a client gets a successful return code, the update will have been applied
- Timeliness: The clients’ view of the system is guaranteed to be up-to-date within a certain time bound. (Eventual Consistency)
CREATE: you can create a child node
READ: you can get data from a node and list its children.
WRITE: you can set data for a node
DELETE: you can delete a child node
ADMIN: you can set permissions
world has a single id, anyone, that represents anyone.
auth doesn't use any id, represents any authenticated user.
digest uses a username:password string to generate MD5 hash which is then used as an ACL ID identity. Authentication is done by sending the username:password in clear text. When used in the ACL the expression will be the username:base64 encoded SHA1 password digest.
host uses the client host name as an ACL ID identity. The ACL expression is a hostname suffix. For example, the ACL expression host:corp.com matches the ids host:host1.corp.com and host:host2.corp.com, but not host:host1.store.com.
ip uses the client host IP as an ACL ID identity. The ACL expression is of the form addr/bits where the most significant bits of addr are matched against the most significant bits of the client host IP
The FIFO order of the items is maintained using sequential property of znode provided by ZooKeeper. When a producer process creates a znode for a queue item, it sets the sequential flag. This lets ZooKeeper append the znode name with a monotonically increasing sequence number as the suffix. ZooKeeper guarantees that the sequence numbers are applied in order and are not reused. The consumer process processes the items in the correct order by looking at the sequence number of the znode.