Virrrr
Virrrr
Virrrr
Storage virtualization is a major component for storage servers, in the form of functional
RAID levels and controllers. Applications and operating systems on the device can directly
access the discs for writing. Local storage is configured by the controllers in RAID groups,
and the operating system sees the storage based on the configuration. However, the storage is
abstracted and the controller is determining how to write the data or retrieve the requested
data for the operating system. Storage virtualization is important in various other forms:
File servers: The operating system doesn't need to know how to write to physical
media; it can write data to a remote location.
WAN Accelerators: WAN accelerators allow you to provide re-requested blocks at
LAN speed without affecting WAN performance. This eliminates the need to transfer
duplicate copies of the same material over WAN environments.
SAN and NAS: Storage is presented over the Ethernet network of the operating
system. NAS (Network Attached Storage) presents the storage as file operations (like
NFS). SAN technologies present the storage as block level storage (like Fibre
Channel). SAN (Storage Area Network) technologies receive the operating
instructions only when if the storage was a locally attached device.
Storage Tiering: Analysing the most frequently used data and allocating it to the
best-performing storage pool, storage tiering uses the storage pool concept as an entry
point. The least used data is stored in the storage pool with the lowest performance.
Memory Virtualization:
Memory virtualization gathers volatile random access memory (RAM) resources from
many data centre systems, making them accessible to any cluster member machine.
Software performance issues commonly occur from physical memory limits. Memory
virtualization solves this issue by enabling networked, and hence distributed, servers
to share a pool of memory. Applications can utilise a vast quantity of memory to
boost system utilisation, enhance memory usage efficiency, and open up new use
cases when this feature is integrated into the network.
Shared memory systems and memory virtualization solutions are different. Because
shared memory systems do not allow memory resources to be abstracted, they can
only be implemented with a single instance of an operating system (that is, not in a
clustered application environment).
Memory virtualization differs from flash memory-based storage, like solid-state
drives (SSDs), in that the former replaces or enhances regular RAM, while the latter
replaces hard drives (networked or not).
Products based on Memory Virtualization are: ScaleMP, RNA Networks Memory
Virtualization Platform, Oracle Coherence and GigaSpaces.
Implementaions
Application level integration
In this case, applications running on connected computers connect to the memory pool
directly through an API or the file system.
Features
1. Virtual Address Space: Creating a virtual address space for each programme that
corresponds to a physical memory address is the first stage in memory virtualization.
While physical memory addresses are often bigger than virtual address spaces, numerous
applications can run simultaneously.
2. Page Tables: The operating system keeps track of the memory pages used by each app
and their matching physical memory addresses in order to manage the mapping between
virtual and physical memory addresses. This data structure is known as a page table.
3. Memory Paging: A page fault occurs when an application tries to access a memory page
that is not already in physical memory. The OS reacts to this by loading the requested
page from disc into physical memory and swapping out a page of memory from physical
memory to disc.
5. Memory Over commitment: Memory over commitment, in which applications are given
access to more virtual memory than is physically accessible, is made possible by
virtualization. because not all memory pages are actively being used at once, the System
can employ memory paging and compression to release physical memory as needed.
2. Memory Isolation: Each process has its own virtual memory space, which provides
memory isolation and protects processes from interfering with each other’s memory.
4. Efficient Memory Utilization: By using techniques like demand paging and page
replacement, memory virtualization optimizes the usage of physical memory by keeping
frequently accessed pages in memory and swapping out less used pages to disk.
2. Object-Level: Data is not immediately stored on a disc when using object storage.
Data buckets are used to abstract it instead. You can retrieve this data from your
programme using API (programme Programming Interface) calls. This may be a more
scalable option than block storage when dealing with big data volumes. Hence, after
arranging your buckets, you won't need to be concerned about running out of room.
3. File-Level: When someone wants another server to host their data, they use file
server software such as Samba and NFS. The files are kept in directories known as
shares. As a result, this eliminates the requirement for disc space management and
permits numerous users to share a storage device. File servers are useful for desktop
PCs, virtual servers, and servers.
4. Host-based: Access to the host or any connected devices is made possible via host-
based storage virtualization. The server's installed driver intercepts and reroutes the
input and output (IO) requests. These input/output (IO) requests are typically sent
towards a hard disc, but they can also be directed towards other devices, including a
USB flash drive. This kind of storage is mostly used for accessing actual installation
CDs or DVDs, which make it simple to install an operating system on the virtual
computer.
5. Network-based: The host and the storage are separated by a fibre channel switch.
The virtualization takes place and the IO requests are redirected at the switch. No
specific drivers are needed for this approach to function on any operating system.
6. Array-based: All of the arrays' IO requests are handled by a master array. This
makes data migrations easier and permits management from a single location.
2. Storage Virtualization Layer: A storage virtualization layer sits between the applications
and the physical storage devices. It manages the allocation and retrieval of data blocks,
providing a transparent interface to the applications.
3. Uniform Addressing: Each block of data is assigned a unique address within the
virtualized storage space. This allows for consistent addressing regardless of the physical
location of the data.
5. Data Migration and Load Balancing: The virtualization layer can facilitate data migration
across different storage devices without affecting the applications using the data. This helps in
load balancing and optimizing storage performance.
7. Vendor Independence: Users can often mix and match storage devices from different
vendors within the virtualized storage pool. This promotes vendor independence and
flexibility in choosing hardware components.
8. Snapshot and Backup: Many block-level storage virtualization solutions offer features like
snapshots and backups. Snapshots allow for point-in-time copies of data, and backup
processes can be streamlined through centralized management.
3. Vendor Independence: Users can integrate storage devices from different vendors into a
unified storage pool, promoting flexibility and preventing vendor lock-in.
5. Data Migration and Load Balancing: The virtualization layer facilitates seamless data
migration across storage devices, aiding in load balancing and optimizing storage
performance.
4. Compatibility Issues: Integrating storage devices from different vendors may lead to
compatibility issues or require additional effort to ensure seamless operation.
2. Unified Namespace: It provides a unified namespace for files and directories, allowing
users and applications to interact with a centralized and standardized file system.
4. Access Control and Security: Administrators can implement access control and
security policies at the file level, managing permissions for individual files or directories.
5. Dynamic Expansion and Contraction: The virtualization layer allows for dynamic
expansion or contraction of storage resources, making it easier to manage changing storage
requirements.
5. Enhanced Access Control: Access control can be applied at the file level, allowing for
fine-grained permissions management.
3. Compatibility Challenges: Some legacy applications or systems may not fully support
file-level virtualization, leading to compatibility challenges.
4. Initial Setup Costs: There can be significant initial setup costs associated with
implementing file-level virtualization, including hardware and software investments.
5. Data Integrity: Ensuring data integrity during address space remapping is crucial.
The virtualization layer must guarantee that data is correctly mapped to the intended
physical locations.
Managing the different software and hardware can get difficult when there are several
hardware and software elements.
Storage systems need frequent upgradation to meet the challenging nature of
applications and huge data.
Despite the ease of accessing data with storage virtualisation, there is always a risk of
cyber-attacks and various cyber threats in virtual environments. That is, for the data
stored in virtual machines, data security and its governance are the major challenges.
Amongst the various vendors delivering storage virtualisation solutions, it’s important
to find a reliable one. As many a time, it happens when vendors provide storage
solutions but ignore the complexities of backing up virtual storage pools.
Similarly, they fall in situations when there is a need for immediate recovery of data
in case of hardware failure or any other issue.
Storage virtualisation, at times, can lead to access issues. This can be if the LAN
connection is disrupted, or internet access is lost due to some reason.
There comes a time when there is a need to switch from a smaller network to a larger
one, as the capacity of the current one is insufficient. The migration process is time-
consuming and can even result in downtime.
Additionally, problems like more significant data analysis, lack of agility, scalability,
and more rapid access to data are the common challenges companies face while
selecting storage solutions.
A SAN presents storage devices to a host such that the storage appears to be locally
attached. This simplified presentation of storage to a host is accomplished through the
use of different types of virtualization.
SANs perform an important role in an organization's Business Continuity Management
(BCM) activities (e.g., by spanning multiple sites).
SANs are commonly based on a switched fabric technology. Examples include Fibre
Channel (FC), Ethernet, and InfiniBand. Gateways may be used to move data between
different SAN technologies.
Fibre Channel is commonly used in enterprise environments. Fibre Channel may be used
to transport SCSI, NVMe, FICON, and other protocols.
Ethernet is commonly used in small and medium sized organizations. Ethernet
infrastructure can be used for SANs to converge storage and IP protocols onto the same
network. Ethernet may be used to transport SCSI, FCoE, NVMe, RDMA, and other
protocols.
InfiniBand is commonly used in high performance computing environments. InfiniBand
may be used to transport SRP, NVMe, RDMA, and other protocols.
The core of a SAN is its fabric: the scalable, high-performance network that interconnects
hosts -- servers -- and storage devices or subsystems. The design of the fabric is directly
responsible for the SAN's reliability and complexity. At its simplest, an FC SAN can simply
attach HBA ports on servers directly to corresponding ports on SAN storage arrays, often
using optical cables for top speed and support for networking over greater physical
distances.
But such simple connectivity schemes belay the true power of a SAN. In actual practice, the
SAN fabric is designed to enhance storage reliability and availability by eliminating single
points of failure. A central strategy in creating a SAN is to employ a minimum of two
connections between any SAN elements. The goal is to ensure that at least one working
network path is always available between SAN hosts and SAN storage.
SAN architecture includes host components, fabric components and storage components.
Consider a simple example in the image above where two SAN hosts must communicate with
two SAN storage subsystems. Each host employs a separate HBA -- not a multiport HBA
because the HBA device itself is a single point of failure. The port from each HBA is
connected to a port on a different SAN switch, such as Fibre Channel switch. Similarly,
multiple ports on the SAN switch connect to different storage target devices or systems.
This is a simple redundant fabric; remove any one connection in the diagram,
and both servers can still communicate with both storage systems to preserve storage access
for the workloads on both servers.
Consider the basic behaviour of a SAN and its fabric. A host server requires access to SAN
storage; the host will internally create a request to access the storage device. The traditional
SCSI commands used for storage access are encapsulated into packets for the network -- in
this case FC packets -- and the packets are structured according to the rules of the FC
protocol. The packets are delivered to the host's HBA where the packets are placed onto the
network's optical or copper cables. The HBA transmits the request packet(s) to the SAN
where the request will arrive at the SAN switch(s). One of the switches will receive the
request and send it along to the corresponding storage device. In a storage array, the storage
processor will receive the request and interact with storage devices within the array to
accommodate the host's request.
SAN Switches:
The SAN switch is the focal point of any SAN. As with most network switches, the SAN
switch receives a data packet, determines the source and destination of the packet and
then forwards that packet to the intended destination device. Ultimately, the SAN fabric
topology is defined by number of switches, the type of switches -- such as backbone
switches, or modular or edge switches -- and the way in which the switches are
interconnected. Smaller SANs might use modular switches with 16, 24 or even 32 ports,
while larger SANs might use backbone switches with 64 or 128 ports. SAN switches can
be combined to create large and complex SAN fabrics that connect thousands of servers
and storage devices.
Virtual SAN. Virtualization technology was a natural fit for the SAN, encompassing
both storage and storage network resources to add flexibility and scalability to the
underlying physical SAN. A virtual SAN -- denoted with a capital V in VSAN -- is a
form of isolation, reminiscent of traditional SAN zoning, which essentially uses
virtualization to create one or more logical partitions or segments within the physical
SAN. Traditional VSANs can employ such isolation to manage SAN network traffic,
enhance performance and improve security. Thus, VSAN isolation can prevent potential
problems on one segment of the SAN from affecting other SAN segments, and the
segments can be changed logically as needed without the need to touch any physical
SAN components. VMware offers Virtual SAN Technology.
Unified SAN. A SAN is noted for its support of block storage, which is typical for
enterprise applications. But file, object and other types of storage would traditionally
demand a separate storage system, such as network-attached storage (NAS). A SAN that
supports unified storage is capable of supporting multiple approaches -- such as file,
block and object-based storage -- within the same storage subsystem. Unified storage
provides such capabilities by handling multiple protocols, including file-based SMB and
NFS, as well as block-based, such as FC and iSCSI. By using a single storage platform
for block and file storage, users can take advantage of powerful features that are usually
reserved for traditional block-based SANs, such as storage snapshots, data replication,
storage tiering, data encryption, data compression and data deduplication.
Converged SAN. One common disadvantage to a traditional FC SAN is the cost and
complexity of a separate network dedicated to storage. ISCSI is one means of
overcoming the cost of a SAN by using common Ethernet networking components rather
than FC components. FCoE supports a converged SAN that can run FC communication
directly over Ethernet network components -- converging both common IP and FC
storage protocols onto a single low-cost network. FCoE works by encapsulating FC
frames within Ethernet frames to route and transport FC data across an Ethernet network.
However, FCoE relies on end-to-end support in network devices, which has been
difficult to achieve on a broad basis, making the choice of vendor limited.
Hyper-converged infrastructure. The data center use of HCI has grown dramatically in
recent years. HCI combines compute and storage resources into pre-packaged modules,
allowing modules -- also called nodes -- to be added as needed and managed through a
single common utility. HCI employs virtualization, which abstracts and pools all the
compute and storage resources. IT administrators then provision virtual machines and
storage from the available resource pools. The fundamental goal of HCI is to simplify
hardware deployment and management while allowing fast scalability.
SAN Benefits:
High performance. The typical SAN uses a separate network fabric that is dedicated to
storage tasks. The fabric is traditionally FC for top performance, though iSCSI and
converged networks are also available.
High scalability. The SAN can support extremely large deployments encompassing
thousands of SAN host servers and storage devices or even storage systems. New hosts
and storage can be added as required to build out the SAN to meet the organization's
specific requirements.
High availability. A traditional SAN is based on the idea of a network fabric, which --
ideally -- interconnects everything to everything else. This means a full-featured SAN
deployment has no single point of failure between a host and a storage device, and
communication across the fabric can always find an alternative path to maintain storage
availability to the workload.
Advanced management features. A SAN will support an array of useful enterprise-class
storage features, including data encryption, data deduplication, storage replication and
self-healing technologies intended to maximize storage capacity, security and data
resilience. Features are almost universally centralized and can easily be applied to all the
storage resources on the SAN.
SAN Disadvantages:
Complexity. Although more convergence options, such as FCoE and unified options,
exist for SANs today, traditional SANs present the added complexity of a second
network -- complete with costly, dedicated HBAs on the host servers, switches and
cabling within a complex and redundant fabric and storage processor ports at the
storage arrays. Such networks must be designed and monitored with care, but the
complexity is increasingly troublesome for IT organizations with fewer staff and
smaller budgets.
Scale. Considering the cost, a SAN is generally effective only in larger and more
complex environments where there are many servers and significant storage. It's
certainly possible to implement a SAN on a small scale, but the cost and complexity
are difficult to justify. Smaller deployments can often achieve satisfactory results
using an iSCSI SAN, a converged SAN over a single common network -- such as
FCoE -- or an HCI deployment, which is adept at pooling and provisioning resources.
Management. With the idea of complexity focused on hardware, there is also
significant challenge in SAN management. Configuring features, such as LUN
mapping or zoning, can be problematic for busy organizations. Setting up RAID and
other self-healing technologies as well as corresponding logging and reporting -- not
to mention security -- can be time-consuming but unavoidable to maintain the
organization's compliance, DR and BC postures.
NAS Components:
CPU. The heart of every NAS is a computer that includes the central processing
unit (CPU) and memory. The CPU is responsible for running the NAS OS,
reading and writing data against storage, handling user access and even
integrating with cloud storage if so designed. Where typical computers or servers
use a general-purpose CPU, a dedicated device such as NAS might use a
specialized CPU designed for high performance and low power consumption in
NAS use cases.
Network interface. Small NAS devices designed for desktop or single-user use
might allow for direct computer connections, such as USB or limited wireless
(Wi-Fi) connectivity. But any business NAS intended for data sharing and file
serving will demand a physical network connection, such as a cabled Ethernet
interface, giving the NAS a unique IP address. This is often considered part of the
NAS hardware suite, along with the CPU.
Storage. Every NAS must provide physical storage, which is typically in the
form of disk drives. The drives might include traditional magnetic HDDs, SSDs
or other non-volatile memory devices, often supporting a mix of different storage
devices. The NAS might support logical storage organization for redundancy and
performance, such as mirroring and other RAID implementations -- but it's the
CPU, not the disks, that handle such logical organization.
OS. Just as with a conventional computer, the OS organizes and manages the
NAS hardware and makes storage available to clients, including users and other
applications. Simple NAS devices might not highlight a specific OS, but more
sophisticated NAS systems might employ a discrete OS such as Netgear
ReadyNAS, QNAP QTS, Zyxel FW, among others.
Object storage
Some industry experts speculate that object storage will overtake scale-out NAS.
However, it's possible the two technologies will continue to function side by side.
Both scale-out and object storage methodologies deal with scale, but in different
ways.
NAS files are centrally managed via the Portable Operating System Interface
(POSIX). It provides data security and ensures multiple applications can share a
scale-out device without fear that one application will overwrite a file being
accessed by other users.
Object storage does not use POSIX or any file system. Instead, all the objects are
presented in a flat address space. Bits of metadata are added to describe each object,
enabling quick identification within a flat address namespace.
Advantages of NAS:
Disadvantages of NAS:
• Technology used in computer systems to organize and manage multiple physical hard
drives as a single logical unit.
• RAID is designed to improve the reliability, performance, and/or capacity of data
storage systems.
• It achieves this by storing data across multiple disks in a way that provides
redundancy and/or data striping.
• There are different levels of RAID, each with its own set of characteristics and
advantages.
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
RAID 0:
• RAID 0 implements data striping.
• The data blocks are placed in multiple disks without redundancy. None of the disks
are used for data redundancy so if one disk fails
then all the data in the array is lost. No data block is being repeated
in any disk
Block '10,11,12,13' form a stripe
Instead of placing one block of data in a disk, we can place more than one block of data in a
disk and then move to another disk.
Pros of RAID 0:
• All the disk space is utilized and hence performance is increased.
• Data requests can be on multiple disks and not on a single disk hence improving the
throughput.
Cons of RAID 0:
• Failure of one disk can lead to complete data loss in the respective array.
• No data Redundancy is implemented so one disk failure can lead to system failure.
RAID 1:
• RAID 1 implements mirroring which means the data of one disk is replicated in
another disk.
• This helps in preventing system failure as if one disk fails then the redundant disk
takes over.
Here Disk 0 and Disk 1 have the same data as disk 0 is copied to disk 1. Same is the case
with Disk 2 and Disk 3.
Pros of RAID 1:
• Failure of one Disk does not lead to system failure as there is redundant data in other
disk.
Cons of RAID 1:
• Extra space is required for each disk as each disk data is copied to some other disk
also
RAID 2:
• RAID 2 is used when error in data has to be checked at bit level, which uses a
Hamming code detection method.
• One is used to store bit of each word in the disk and another is used to store error code
correction (Parity bits) of data words.
Pros of RAID 2:
• One full disk is used to store parity bits which helps in detecting error.
Cons of RAID 2:
RAID 3:
• Data is stored across disks with their parity bits in a separate disk. The parity bits
helps to reconstruct the data when there is a data loss.
Here Disk 3 contains the Parity bits for Disk 0 Disk 1 and Disk 2. If any one of the Disk's
data is lost the data can be reconstructed using parity bits in Disk 3.
Pros of RAID 3:
Cons of RAID 3:
RAID 4:
• If only one of the data is lost in any disk, then it can be reconstructed with the help of
parity drive.
• Parity is calculated with the help of XOR operation over each data disk block.
Here P0 is calculated using XOR (0,1,0) = 1 and P1 is calculated using XOR (1,1,0) = 0. If
there is even number of 1 then XOR is 0 and for odd number of 1 XOR is 1. If suppose Disk
0 data is lost, by checking parity P0=1 we will know that Disk 0 should have 0 to make the
Parity P0 as 1 whereas if there was 1 in Disk 0 it would have made the parity P0=0 which
contradicts with the current parity value.
Pros of RAID 4:
• Parity bits helps to reconstruct the data if at most one data is lost from the disks.
Cons of RAID 4:
• If there is more than one data loss from multiple disks then Parity cannot help us
reconstruct the data.
RAID 5:
• Parity is distributed over the disk and makes the performance better.
Cons of RAID 5:
• Parity bits are useful only when there is data loss in at most one Disk.
• If there is loss in more than one Disk block then parity is of no use.
RAID 6:
• If there is more than one Disk failure, then RAID 6 implementation helps in that case.
• In RAID 6 there are two parity in each array/row. It is similar to RAID 5 with extra
parity.
Here P0,P1,P2,P3 and Q0,Q1,Q2,Q3 are two parity to reconstruct the data if almost two disks
fail.
Pros of RAID 6:
Cons of RAID 6:
In Summary, RAID is used to backup the data when a disk fails for some reason and there are
some levels of RAID.
• RAID 2 uses Hamming code Error Detection method to correct error in data.
• RAID 3 does byte-level data striping and has parity bits for each data word.
• RAID 6 has two parity which can handle at most two disk failures.