<< Click here to display Table of Contents >> Navigation: Install, Configure and Administer GN4 > Designing and Installing GN4 > System Requirements:

About SAN storage

Build 1501 on 14/Nov/2017 This topic last edited on: 6/Aug/2014, at 16:09

Contents

(source: http://www.serverwatch.com/trends/article.php/3925351/10-Advantages-of-SAN-vs-DAS.htm)

1. Scalability

SAN has many advantages over DAS in your data center. Here are 10 reasons to consider making the leap from local storage to a SAN.

If you know, or have heard, one thing about a SAN, it's scalable. What does scalable mean? SAN scalability means that you don't have the limit of a handful of disks that you can attach to a system. SANs can grow to hundreds of disks in size, whereas your server has a physical limit of about a dozen.

2. Performance

SAN performance isn't affected by Ethernet traffic or local disk throughput bottlenecks. Data transmitted to and from a SAN is on its own private network partitioned off from user traffic, backup traffic and other SAN traffic.

3. Data Isolation

There's no chance of your data being copied or stolen by anyone sharing the same SAN with you. Not even the SAN admins can see your data. When correctly configured, SAN data is zoned. These zones protect your data from everyone else's on the same SAN. An example of SAN zone separation is how UNIX servers can connect to a SAN and Windows servers connect to the same SAN, but the data that each group of servers accesses is different. In effect, Windows systems can't "see" UNIX data and vice versa.

4. Uptime

There's nothing quite like a SAN to assure 100-percent storage availability. SAN systems require no reboots to add new disks, to replace disks or to configure RAID groups. The ability to stream data between SANs for data backup and recovery also increases performance by bypassing server systems completely.

5. Workload Isolation

Zoning also separates your workloads from one another on a SAN. Not only is your data protected by zoning, but it also provides a barrier against other non-related workloads from affecting your application's performance. Sharing a SAN isn't a performance problem for applications when zones are in place.

6. Long Distance Connectivity

SANs have the advantage over all other storage connectivity for distance at 10km (about 6 miles). Not that you'll necessarily use that distance capability, but it's there if you need it. Having the advantage of distance allows you to consolidate your storage into an isolated location dedicated to storage and separate from the systems it serves.

7. Increased Utilization

Rather than hundreds or thousands of partially utilized local disks wasting power and generating heat in your data center, you could have dozens of SAN disks have no wasted space on them. How so? Thin provisioning on the storage side (i.e., on the SAN) uses space more effectively than local storage does. As a system requires more storage, the SAN allocates it dynamically. Yes, this means that physical systems can enjoy thin provisioning just like your virtual ones do.

8. Bootable

Despite the benefit of more fully utilized disks, as highlighted in advantage No. 7, you do not need to use local disks for the server operating system. It's possible to run diskless physical servers and boot directly to the SAN for your operating system, swap space (pagefile), and all applications. That's right, just like virtual machines.

9. Centralized Management

If you have SAN arrays from several different vendors because your data center has grown over the years, stress not, SAN vendors have created software management tools to manage your heterogeneous environment with ease. But, better than multiple vendor management capability, all of your SAN environments can be centrally managed from this single interface. This capability provides efficient and centralized storage management.

10. Disaster Recovery

The cost of a SAN is high. As you can see, there's no entry for SAN being a particularly frugal technology in this list. However, in the case of disaster recovery, a SAN can and does earn back its high price by providing a speedy recovery when the clock is ticking. A SAN is a reliable and fast data recovery solution. Server systems might go offline, but the SAN remains available.

(source: http://www.sqlteam.com/article/which-is-faster-san-or-directly-attached-storage)

In general, you can use three key metrics to quantify the performance of a disk I/O path:

•I/Os per second (IOps)

•Megabytes per second (MBps)

•Latency

IOps measures how many I/O requests can be satisfied by the disk I/O path in a second. For a given disk I/O path, this metric is generally in reverse proportion to the size of the I/O requests. That is, the larger the I/O requests, the lower the IOps. This is intuitive. After all, it takes more time to process a 256KB I/O request than it does an 8KB request.

MBps measures how much data can be pumped through the disk I/O path. If you view the I/O path as a pipeline, MBps measures how big the pipeline is and how much pressure it can sustain. So, the bigger the pipeline is and the more pressure it can handle, the more megabytes of data it can push through. For a given I/O path, MBps is in direct proportion to the size of the I/O requests. That is, the larger the I/O requests, the higher the MBps. Larger I/Os give you better throughput because they incur less disk seek time penalty than smaller I/Os.

Note that for I/O requests of a fixed block size, MBps is simply IOps times the block size in megabytes.

I/O latency--also known as I/O response time--measures how fast an I/O request can be processed by the disk I/O subsystem. For a given I/O path, it is in reserve proportion to the size of the I/O request. As mentioned previously, a larger I/O request takes longer to complete.

Generally speaking, you should pay more attention to IOps and I/O latency when it comes to small-sized I/O requests, and be more concerned with MBps when dealing with large-sized I/O requests. But whether you report IOps or MBps, you should always keep I/O latency in the picture. Note that as you push a disk I/O subsystem to sustain higher IOps or MBps, the average I/O latency will continue to go up. After a certain point, the latency will increase exponentially as you put even more I/O requests on the system. If you obtain a good IOps or MBps number at the expense of a very high I/O latency, that IOps or MBps number may not be very useful in practice.

What does all this have to do with the question of which is faster, SAN or directly-attached storage? Well, these three metrics define precisely what is meant by "faster". So when you say one is faster than the other, you have to be explicit about what measure you are referring to.

Let us examine the question along these three performance measures, and let's start with the I/O latency measure first.

Disk I/O latency

SANs almost always come out a loser when it competes with DAS on I/O latency under light load. If you don't believe me, you can measure it yourself, and you don't need any fancy tool to obtain convincing data points.

The latency of an I/O request depends on the current load condition of the I/O path. If the I/O path is heavily loaded, there will be contention among the I/O requests for use of certain components on the I/O path, and consequently it'll take longer to process the I/O request.

To measure the latency of an I/O path under the best possible condition, you should issue single-threaded I/O requests, i.e. synchronous I/Os. If you are using an I/O benchmark tool such as IOMeter or sqlio.exe, for instance, you can configure a single worker thread to issue small sequential writes with a queue depth of one. This is meaningful since it simulates the SQL Server database transaction logging I/O activities.

Alternatively, you can use a simple SQL Server script to measure the latency of an I/O path. The following T-SQL script is an example that can help you ascertain the best possible disk I/O latency (or response time) when committing a SQL Server database transaction. The script is designed to generate I/Os only on the database transaction log file; there is negligible I/O on the data file because the script repeatedly writes to the same single data page and that page is not flushed to disk until a checkpoint is issued.

The script assumes that D is a directly-attached internal drive and E is a drive presented to the same server from a SAN.

-- create a small database on the local internal drive D

CREATE DATABASE io_test_D ON PRIMARY

(NAME=N'io_test_d_data', FILENAME=N'd:\io_test_d.mdf', SIZE=255)

LOG ON

(NAME=N'io_test_d_log', FILENAME=N'd:\io_test_d.ldf', SIZE=2000);

ALTER DATABASE io_test_D SET RECOVERY SIMPLE;

-- create a small database on the SAN drive E

CREATE DATABASE io_test_E ON PRIMARY

(NAME=N'io_test_e_data', FILENAME=N'e:\io_test_e.mdf', SIZE=255)

LOG ON

(NAME=N'io_test_e_log', FILENAME=N'e:\io_test_e.ldf', SIZE=2000);

ALTER DATABASE io_test_E SET RECOVERY SIMPLE;

-- Create two identical tables, one in each database

use io_test_D

CREATE TABLE test (i CHAR(200) NOT NULL);

INSERT test VALUES('ABC');

use io_test_E

CREATE TABLE test (i CHAR(200) NOT NULL);

INSERT test VALUES('ABC');

-- Run the following script on database io_test_D and io_test_E.

-- Run the script multiple times in each database, and alternate

-- the test runs to get consistent measures

SET NOCOUNT ON;

DECLARE @i int,

@start datetime;

SET @i = 1;

SET @start = getdate();

WHILE @i < 10

BEGIN

UPDATE test

SET i = cast(@i as CHAR(200));

SET @i = @i + 1;

END

SELECT 'Duration' = datediff(ms, @start, getdate()), 'Latency' = datediff(ms, @start, getdate())/10000.0;

What you will find is that the best I/O response time on your D drive is in the 100-microsecond range, whereas the best I/O response time on the E drive may be in the 300~500-microsecond range and could be higher.

This is intuitive because a typical I/O path between the host and the SAN media (or the SAN cache) is significantly longer than a typical I/O path between the host and a directly attached storage media (or its cache). In other words, an I/O request would have to travel through more components before it can be considered hardened on a SAN drive than it would on a directly attached storage. No matter how small a latency each of these components may incur, they add up.

IOps and MBps

IOps is really a measure of the multi-tasking capability of the disk subsystem, whereas MBps is a measure of how big the disk I/O pipeline is. SANs -- especially higher end SANs -- shine on these measures, although from a specific host server you may only have access to a fraction of what the SAN has to offer.

Higher end SANs are often deployed through a Fibre-Channel switch based fabric. It offers point-to-point links through the fabric, and a host is often configured to have multiple links through the fabric. Other components on the I/O path can be similarly configured to offer I/O path parallelism. This enables higher end SANs to scale their ability to process I/O requests and to scale the size of the I/O pipeline.

Note that the higher end disk arrays are in fact powerful multi-processor computers with a large amount of memory for cache. They are ideal for handling multiple concurrent small-sized I/O requests. The faster and the more the processors are in these disk arrays, the higher IOps we can expect. Likewise, the larger the cache in the disk arrays, the higher IOps we may expect.

MBps, on the other hand, may leave much to be desired on a drive presented to a specific host, especially in a large shared environment where one has to distribute the I/O resources among as many hosts as possible while still deliver adequate performance for each host. If you view a SAN drive as consisting of one or more paths of many I/O components, it is then a question of whether enough I/O path parallelism is configured for the drive.

For instance, if a server is equipped with two older 1-Gbps host bus adapters (HBAs), its MBps throughput would be capped at about 200MB per second no matter how powerful the rest of the SAN is. Replacing the 1-Gbps HBAs with two newer 4-Gbps HBAs or adding more HBAs may improve the throughput, if the HBAs are indeed the throughput bottleneck. But the SAN drive throughput could also be limited by the maximum throughput of the inter-switch links in the SAN switched fabric. Further down the I/O paths, the front-side adapter ports on the disk array, the cache in the disk array, the disk controllers, and the disk spindles can all become the bottleneck limiting the MBps throughput of the SAN drive.

In short, there can be a host of reasons why the MBps throughput of a particular SAN drive is limited.

However, this by no means suggests that you can't configure a SAN drive to deliver the MBps throughput you want (up to a reasonable limit). For instance, to get 1000MBps on a SAN drive, you may need to plug in three 4-Gbps HBAs, configure enough inter-switch links (depending on the throughput of each link), configure enough front-side adapter port (again depending on the port throughput), configure enough number of disk controllers, and sufficient number of spindles. The challenge is that most of these SAN components are not under your control, and in some SAN environments you may have to jump through the hoops to get all the desired configurations done.

Configuring DAS, on the other hand, is a lot simpler as you most likely have full control of all the key decision factors: the choice of disk controllers (i.e. the maximum throughput of each controller and the number of controllers), the choice of drives (i.e. the maximum throughput of each drive and its size), and how many drives to hang off each controller.

For instance, if you need 1000MBps throughput, you can plug in four 300MBps SATA RAID controllers to your host server, and hook up enough SATA drives to each controller.

Two simple examples

So how does a SAN drive stack up against a DAS drive? Well, I hope the discussions so far--though scratching only the surface of the topic--have helped to convince you that there is no absolute winner, and it all depends on how the drives are configured. In general and given sufficient resources, you can configure a SAN drive to outperform a DAS drive in terms of throughput. Similarly, with sufficient resources, you can configure a DAS drive to outperform a SAN drive.

If that's not convincing enough, let me give you two simple examples.

•Example 1 - SAN beating DAS

I had a server with a SAN drive and an internal local drive which was a mirrored set of two drives. To assess their performance, I ran an identical series of disk I/O benchmark tests on both the SAN drive and the DAS drive, respectively. The following table summarizes some of the results:

	SAN Drive		DAS Drive
	IOps	MBps	IOps	MBps
8K sequential writes	3951	31	313	2
8K sequential reads	9542	75	552	4
128K sequential writes	1240	155	149	19
128K sequential reads	1656	207	488	61

- See more at: http://www.sqlteam.com/article/which-is-faster-san-or-directly-attached-storage#sthash.1OQLqsSs.dpuf

The SAN drive far outperformed the DAS drive on all four I/O workloads. In fact, on the I/O throughput, I'd expect any decently configured SAN drive to beat an internal mirrored set--that you typically find on a server--by a large margin.

•Example 2 - DAS beating SAN

It is an easy job to show that you can configure a DAS drive to beat a SAN drive. Take the SAN drive in the previous example. The best throughput it could achieve was around 207MB per second. Now, take a look at the DAS drives that came with the Sun Fire X4500 server as described in one of Jim Gray's articles. With six SATA controllers and 48 7K-rpm drives, you could get about 2.6GB per second, more than 10 times better in throughput than the SAN drive in the previous example could offer in throughput.

Remarks

I hope we have established that it does not help or make sense to make unqualified blanket statements such as that SAN performs better than DAS or that DAS performs better than SAN.

Keep in mind that performance is only one of the many factors in deciding whether to place the database files on SAN or DAS. Since it's likely that you can configure either to meet your performance requirements (subject to the constraints of your particular environment), these other factors -- such as cost, back/restore requirements, data replication requirements, company storage policies and directions, and data center infrastructure constraints -- are often more important than the performance factors.

Having made the case that you can configure either SAN to outperform DAS or DAS to outperform SAN, I should point out that, for very high I/O performance requirements, in most enterprise environments you'll probably find that it's easier to configure a DAS drive to outperform a SAN drive because of the highly shared and strictly managed nature of a large-scale SAN.

That may sound like a bad deal with SAN. However, it is not at all an indictment of SAN as an enterprise storage solution, but a necessary consequence of its main purposes. Large-scale enterprise SANs are not deployed to meet the extreme I/O requirements of a very few applications. While they should and do meet the performance requirements of most applications, their main benefits lie in improved sharing of the storage resources, scalability, manageability, and availability. For instance, your very large databases can benefit tremendously from the SAN-based backup solutions, and the servers in a large data center can take advantage of the SAN-level replication technologies to effectively meet the application disaster recovery requirements en mass. DAS can't provide this kind of scalability.

- See more at: http://www.sqlteam.com/article/which-is-faster-san-or-directly-attached-storage#sthash.1OQLqsSs.dpuf