<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data deduplication</title>
	<atom:link href="http://blogs.rupturedmonkey.com/?feed=rss2&#038;p=50" rel="self" type="application/rss+xml" />
	<link>http://blogs.rupturedmonkey.com/?p=50</link>
	<description>The greatest storage blog in the world....</description>
	<lastBuildDate>Mon, 30 Aug 2010 22:25:10 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: c2olen</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-136</link>
		<dc:creator>c2olen</dc:creator>
		<pubDate>Wed, 20 Dec 2006 19:43:14 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-136</guid>
		<description>Chris, you are right, 4:1 isn&#039;t near what was promised.

If you have read all the comments carefully, you should also have noticed that we are not running backups and tape storage the according to the &quot;best practices&quot;.
We still use client compression on the majority of our systems.
This is due to chargebacks, based on the network traffic to our TSM servers. When we disable client compression, the amount of network traffic suddenly increased bigtime, and the chargeback also.
We are doing the changes gradually now, and the factoring rate is increasing.

Based on our policies and retentions, we were promissed 10:1 factoring. I believe we will exceed this a bit, in about a couple of months.</description>
		<content:encoded><![CDATA[<p>Chris, you are right, 4:1 isn&#8217;t near what was promised.</p>
<p>If you have read all the comments carefully, you should also have noticed that we are not running backups and tape storage the according to the &#8220;best practices&#8221;.<br />
We still use client compression on the majority of our systems.<br />
This is due to chargebacks, based on the network traffic to our TSM servers. When we disable client compression, the amount of network traffic suddenly increased bigtime, and the chargeback also.<br />
We are doing the changes gradually now, and the factoring rate is increasing.</p>
<p>Based on our policies and retentions, we were promissed 10:1 factoring. I believe we will exceed this a bit, in about a couple of months.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris M Evans</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-135</link>
		<dc:creator>Chris M Evans</dc:creator>
		<pubDate>Tue, 19 Dec 2006 21:05:51 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-135</guid>
		<description>I did some of the first customer releases of RVA (or Iceberg) back in 1996 when StorageTek first released it.  It was a great product although the downfall was undoubtedly performance.  When IBM OEM&#039;d it I had a fantastic meeting with some of the people who then went on to write the redbook on RVA.  I remember having to explain to one guy 4 times what the capacity of the array &quot;could&quot; be....

So, 4:1 - not as good as the promised 25:1 that would sell this to management....</description>
		<content:encoded><![CDATA[<p>I did some of the first customer releases of RVA (or Iceberg) back in 1996 when StorageTek first released it.  It was a great product although the downfall was undoubtedly performance.  When IBM OEM&#8217;d it I had a fantastic meeting with some of the people who then went on to write the redbook on RVA.  I remember having to explain to one guy 4 times what the capacity of the array &#8220;could&#8221; be&#8230;.</p>
<p>So, 4:1 &#8211; not as good as the promised 25:1 that would sell this to management&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: c2olen</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-134</link>
		<dc:creator>c2olen</dc:creator>
		<pubDate>Sat, 25 Nov 2006 20:04:14 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-134</guid>
		<description>&lt;p&gt;The 25:1 factoring ratio, as Diligent calls it, was one of the reasons we took a peek at the product.&lt;br /&gt; But the factoring greatly depends on a variety of variables.&lt;/p&gt; &lt;ul&gt; 	&lt;li&gt;What is the retention period?&lt;br /&gt; &lt;/li&gt; 	&lt;li&gt;How many versions of backup data do you keep?&lt;br /&gt; &lt;/li&gt; 	&lt;li&gt;How is your backup cycle configured. Daily full, incrementals, diff and so on.&lt;br /&gt; &lt;/li&gt; 	&lt;li&gt;Is the data already compressed by the client?&lt;br /&gt; &lt;/li&gt; 	&lt;li&gt;And some more.....&#160;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;We have it connected to our TSM servers. On the clients, we are not in the position to do uncompressed backups, because the network interfaces are not capable of doing high bandwith traffic. Please don&#039;t start on this, because i&#039;ve been trying to get those server admins to upgrade or reconfigure to at least 1 gigE.&lt;br /&gt; Go lanfree you think? Yeah sure, we would, if those damn servers would be running anything else than AIX4.3. Don&#039;t start on this either. I&#039;ll give you the legacy software excuse for this ;-)&lt;br /&gt; &lt;br /&gt; When doing prelimenary tests, we would be backing up some Oracle instances multiple times, and the factoring did kick in and went up to 20:1.&lt;br /&gt; But this was not a real-life situation at all. On a variety of files, filetypes, filesizes, factoring drops until it goes up again after several regular backup cycles.&lt;br /&gt; We&#039;ve started with the assumption (based on Diligent homework) with a factoring of 10:1. With the current data being client compressed already, we manage to get a factoring of 4:1, which is still good from my point of view.&lt;/p&gt; &lt;p&gt;The image below shows the current ratio. Only a bit data is stored, in client compressed form, so this isn&#039;t really a good reference. But thought you&#039;d like to see some information.&lt;br /&gt; The curves in the graph indicates the fluctuation when brand new data is stored, and later multiple versions are stored.&lt;/p&gt; &lt;p&gt;&lt;img src=&quot;http://www.websteam.nl/Stuff/external/img1.jpg&quot; alt=&quot;Diligent ProtecTier Gui (repository)&quot; title=&quot;Diligent ProtecTier Gui (repository)&quot; width=&quot;600&quot; height=&quot;459&quot; /&gt;&lt;/p&gt; &lt;p&gt;If the image resizing made it too blurry for your eyes, check this&lt;a href=&quot;http://www.websteam.nl/Stuff/external/img1_large.jpg&quot; title=&quot;Life Size ProtecTier GUI&quot; rel=&quot;nofollow&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt; life-size&lt;/a&gt; version right here.&#160;&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>The 25:1 factoring ratio, as Diligent calls it, was one of the reasons we took a peek at the product.<br /> But the factoring greatly depends on a variety of variables.</p>
<ul>
<li>What is the retention period? </li>
<li>How many versions of backup data do you keep? </li>
<li>How is your backup cycle configured. Daily full, incrementals, diff and so on. </li>
<li>Is the data already compressed by the client? </li>
<li>And some more&#8230;..&nbsp;</li>
</ul>
<p>We have it connected to our TSM servers. On the clients, we are not in the position to do uncompressed backups, because the network interfaces are not capable of doing high bandwith traffic. Please don&#39;t start on this, because i&#39;ve been trying to get those server admins to upgrade or reconfigure to at least 1 gigE.<br /> Go lanfree you think? Yeah sure, we would, if those damn servers would be running anything else than AIX4.3. Don&#39;t start on this either. I&#39;ll give you the legacy software excuse for this <img src='http://blogs.rupturedmonkey.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p> When doing prelimenary tests, we would be backing up some Oracle instances multiple times, and the factoring did kick in and went up to 20:1.<br /> But this was not a real-life situation at all. On a variety of files, filetypes, filesizes, factoring drops until it goes up again after several regular backup cycles.<br /> We&#39;ve started with the assumption (based on Diligent homework) with a factoring of 10:1. With the current data being client compressed already, we manage to get a factoring of 4:1, which is still good from my point of view.</p>
<p>The image below shows the current ratio. Only a bit data is stored, in client compressed form, so this isn&#39;t really a good reference. But thought you&#39;d like to see some information.<br /> The curves in the graph indicates the fluctuation when brand new data is stored, and later multiple versions are stored.</p>
<p><img src="http://www.websteam.nl/Stuff/external/img1.jpg" alt="Diligent ProtecTier Gui (repository)" title="Diligent ProtecTier Gui (repository)" width="600" height="459" /></p>
<p>If the image resizing made it too blurry for your eyes, check this<a href="http://www.websteam.nl/Stuff/external/img1_large.jpg" title="Life Size ProtecTier GUI" rel="nofollow" target="_blank" rel="nofollow"> life-size</a> version right here.&nbsp;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nigel (mackem)</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-133</link>
		<dc:creator>Nigel (mackem)</dc:creator>
		<pubDate>Sat, 25 Nov 2006 16:13:52 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-133</guid>
		<description>Data de-dupiing in a storage box - worth some thought!  Especially since the guys at Diligent claim that they can map 1PB fo storage using just 4GB RAM.

Id really be keen to know if you get anywhere near the 25:1 claims being made about this solution.  Obviously it will be a while befire you will know but please keep us posted.

Im also really interested to know how they guarantee 100% data integrity?  A couple of years ago a company I worked for had a little think about using a HP product called RISS which did a form of single instance storage that applied hashes against data being saved.  I remember at the time being worried about the slight chances of two different sets of data generating the same hash and being mistaken for the same data.  Do diligent provide other safety features?

Also when you start getting close to the 25:1 ratio Id be interested to know how much of an impact this has on restore time?  Are you backing off to tape at all?

There seem to be lots of claims out there about de-duping prodcuts but I really dont know what to believe - &quot;there are lies, damned lies and marketing materials&quot;</description>
		<content:encoded><![CDATA[<p>Data de-dupiing in a storage box &#8211; worth some thought!  Especially since the guys at Diligent claim that they can map 1PB fo storage using just 4GB RAM.</p>
<p>Id really be keen to know if you get anywhere near the 25:1 claims being made about this solution.  Obviously it will be a while befire you will know but please keep us posted.</p>
<p>Im also really interested to know how they guarantee 100% data integrity?  A couple of years ago a company I worked for had a little think about using a HP product called RISS which did a form of single instance storage that applied hashes against data being saved.  I remember at the time being worried about the slight chances of two different sets of data generating the same hash and being mistaken for the same data.  Do diligent provide other safety features?</p>
<p>Also when you start getting close to the 25:1 ratio Id be interested to know how much of an impact this has on restore time?  Are you backing off to tape at all?</p>
<p>There seem to be lots of claims out there about de-duping prodcuts but I really dont know what to believe &#8211; &#8220;there are lies, damned lies and marketing materials&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: c2olen</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-132</link>
		<dc:creator>c2olen</dc:creator>
		<pubDate>Sat, 25 Nov 2006 11:27:54 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-132</guid>
		<description>Storagezilla, I wasn&#039;t actually refering to you, when i mentioned HDS people, but I know that a couple of HDS&#039;s crew follow this weblog.

I am glad your interested. We&#039;re taking the Diligent machine into production right now. I&#039;ll post some stats on it after a couple of weeks.</description>
		<content:encoded><![CDATA[<p>Storagezilla, I wasn&#8217;t actually refering to you, when i mentioned HDS people, but I know that a couple of HDS&#8217;s crew follow this weblog.</p>
<p>I am glad your interested. We&#8217;re taking the Diligent machine into production right now. I&#8217;ll post some stats on it after a couple of weeks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Storagezilla</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-131</link>
		<dc:creator>Storagezilla</dc:creator>
		<pubDate>Fri, 24 Nov 2006 23:09:15 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-131</guid>
		<description>I&#039;m not a HDS person, but it&#039;s interesting to read your impressions. ;)</description>
		<content:encoded><![CDATA[<p>I&#8217;m not a HDS person, but it&#8217;s interesting to read your impressions. <img src='http://blogs.rupturedmonkey.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: c2olen</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-130</link>
		<dc:creator>c2olen</dc:creator>
		<pubDate>Thu, 23 Nov 2006 15:37:01 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-130</guid>
		<description>&lt;p&gt;We&#039;re running on a HP DL585G1, 8GB Memory, 4 CPU&#039;s, 4 Front end Emulex ports, and 4 backend Qlogic ports. The Emulex ports get reconfigured to be used as a target device for tape emulation.  The back-end storage is currently running on a DS4300 Turbo SATA box, fully equiped.  Running smoothly (now).&lt;/p&gt; &lt;p&gt;It wasn&#039;t a smooth path from the beginning though. In the beginning there were several issues with failover on backend disks, due to the fact that the multipath software wasn&#039;t really incorperated into the &quot;blackbox&quot; concept we though it would be. We experimented with device-mapper-multipath packages, RDAC en default kernel failover. The ProtecTier runs Redhat EL 4 update 2 under the hood. New configuration concepts in the DS4300 in combination with RDAC eventually did the trick.&#160;&#160;&lt;/p&gt; &lt;p&gt;Another issue was raised when we did a path-fail test without multipath failover enabled. The ext3 filesystems were corrupted. The journalling in ext3 and fsck commands fixed the filesystems, but within the ProtecTier software, the metadata corruption couldn&#039;t be fixed. A patch to the software fixed this incorrect reporting of corruption.&lt;br /&gt; &lt;/p&gt; &lt;p&gt;In the meanwhile all problems seem fixed, as all tests went fine after applying patch 1.2.1.9.&lt;/p&gt; &lt;p&gt;As a side note i have to address is the fact that the Diligent support was fenominal and open. I recieved notification of all ticket updates made. Including comments and remarks from the development folks. This isn&#039;t common to my knowledge.&lt;/p&gt; &lt;p&gt;I think you HDS people are interested in all information about the Diligent VTF stuff, since HDS has a partnership with Diligent, right?&lt;br /&gt; &lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>We&#39;re running on a HP DL585G1, 8GB Memory, 4 CPU&#39;s, 4 Front end Emulex ports, and 4 backend Qlogic ports. The Emulex ports get reconfigured to be used as a target device for tape emulation.  The back-end storage is currently running on a DS4300 Turbo SATA box, fully equiped.  Running smoothly (now).</p>
<p>It wasn&#39;t a smooth path from the beginning though. In the beginning there were several issues with failover on backend disks, due to the fact that the multipath software wasn&#39;t really incorperated into the &quot;blackbox&quot; concept we though it would be. We experimented with device-mapper-multipath packages, RDAC en default kernel failover. The ProtecTier runs Redhat EL 4 update 2 under the hood. New configuration concepts in the DS4300 in combination with RDAC eventually did the trick.&nbsp;&nbsp;</p>
<p>Another issue was raised when we did a path-fail test without multipath failover enabled. The ext3 filesystems were corrupted. The journalling in ext3 and fsck commands fixed the filesystems, but within the ProtecTier software, the metadata corruption couldn&#39;t be fixed. A patch to the software fixed this incorrect reporting of corruption. </p>
<p>In the meanwhile all problems seem fixed, as all tests went fine after applying patch 1.2.1.9.</p>
<p>As a side note i have to address is the fact that the Diligent support was fenominal and open. I recieved notification of all ticket updates made. Including comments and remarks from the development folks. This isn&#39;t common to my knowledge.</p>
<p>I think you HDS people are interested in all information about the Diligent VTF stuff, since HDS has a partnership with Diligent, right? </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Storagezilla</title>
		<link>http://blogs.rupturedmonkey.com/?p=50&#038;cpage=1#comment-129</link>
		<dc:creator>Storagezilla</dc:creator>
		<pubDate>Thu, 23 Nov 2006 10:56:38 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.rupturedmonkey.com/?p=51#comment-129</guid>
		<description>What is the spec of the ProtecTier server(s) you went for?</description>
		<content:encoded><![CDATA[<p>What is the spec of the ProtecTier server(s) you went for?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.707 seconds -->
