Showing posts with label Storage. Show all posts
Showing posts with label Storage. Show all posts

Tuesday, 7 January 2014

Target Based Deduplication


Target based deduplication was one of the initial ways of performing deduplication. This was more on the VTLs when VTLs were launched as a technology, and over time, it seems that this is not much in use now.
All it did was to carry the data to the target backup device – mainly VTL – and store it there. It would then run a deduplication process to match blocks and delete the duplicate data. At a later schedule, system will run a clear garbage type process to finally remove the deleted content and free up the space on the VTL.
When it was launched, it was the only process and was a great process. With newer and better technologies available, target based deduplication has lost its charm since most applications now prefer source based deduplication and reduce data before it travels on the network.
So target based deduplication has its advantages of using minimum source processing and memory cycles in performing deduplication at the target, the hind side being slightly oversizing the target to ensure enough space to accommodate full data before starting the deletion process.

Monday, 30 December 2013

Source Based Deduplication


Choose the unique content from source itself when you start a backup. It does utilize some processing and memory from the source system so size it well.
Source based deduplication is also very powerful in ensuring that you utilize minimum network bandwidth during the backups. The backup application will create blocks of data on source and then store their hashes there at source and send unique data on the network. This is good for backups only if it is sized well. Catalog created by some applications is large enough to cause trouble for the performance of the source system which could be a production system.
Source based deduplication also gives good results for file system backups. A traditional approach takes long for file system backup that has millions of small files taking days for getting written especially during a full backup cycle. Source based deduplication in this case picks up only the changed content of the changed files reducing the amount of data travelling on the network irrespective of the backup level set.
Global deduplication on the target further reduces the amount of data stored.

Saturday, 21 December 2013

Store less, backup lesser – The Art of Deduplication


Deduplication is not a new concept though the term is relatively new. It originated more from the backup prospective though it has now got good footholds in many IT components.
I know and have been using for over a decade backup technologies that would backup only unique emails and files, and subsequently backup only delta changes. Being the only one doing that, they used their own language. They did not coin the word deduplication. Now everyone does that and calls it File level deduplication. This approach served good for desktop and laptop backups. Then this technology progressed, the current being block level deduplication. This recognises unique blocks of data and ensures that one block gets backed up only once, thereby reducing the amount of data travelling and getting backed up.
Like any other process, deduplication also needs resources to do what it is meant for. Various applications offer various forms like source based, target based and inline deduplication. Each one of them has their own working mechanisms and pros and cons. In the following series of blogs we will discuss the various methods of deduplication along with their pros and cons.

Monday, 2 December 2013

How to backup Big Data?


The industry has been struggling a lot with the backups of the data they have been having for long now. Traditional tape based backup solutions seem good only for small size environments now. Though they have been growing in individual capacities and number of slots, better disk based options are pushing them more towards being secondary rather than the primary backup mediums.
Big data needs better care anyway for being big, and perhaps a bit more meaningful than the databases with invoice records or product records. The new technologies like deduplication and better compression algorithms like LZOB and ZLIB are making it more cost effective to back them up by bringing down their size.
What is also important is the cost of retaining this large volume of data and the varied sources of this unstructured data.
Ace Data’s Abhraya Cloud based backup offering resolves this challenge for its customers. Its flexible backup policies allow organizations to keep latest data close to them locally, and send the remaining to a cloud based offering. Being cloud based, they pay for what they backup and not invest on large growth assumptions. Furthermore as the backup grows old, it can be automatically archived to low cost disks reducing the cost of long term retention while ensuring data availability for long time.
The solution is capable of backing up smartphones, mobile laptops, large volumes of file servers apart from backing up the large servers and databases thereby ensuring that all sources of data can be backed up through a single solution.

Wednesday, 13 November 2013

How to Store and Manage BIG Data?


While I mentioned in my previous blog that any size of data is no problem, I often get questioned upon how to store and manage the huge volumes. This is a typical concern of an enterprise faced with increasing data size.
Storage vendors have seen and known this problem as it grew, and have scaled-up or rather scaled-out to help handle this massive growth. Both NAS and SAN vendors have gone beyond the traditional methods of upgrading the storage infrastructure by adding additional shelves and disks. The challenge that the traditional method has is that you end up upgrading capacity with shelves and disks with limited enhancements in processing power. This ends up in performance reduction.
The Scale-out method helps upgrade the storage by adding new nodes which include processing power, memory and capacity, thereby keeping the overall performance consistent with practically no dip in user experience. This is true for both SAN and NAS based storages. These storages can be expanded to PBs on a single storage, or even a single file system, by simply plugging in a new node. It is viable commercially also, as the cost per GB goes down as you keep adding more nodes.
So don’t worry about handling your Big Data as the storage devices are now available to store them more efficiently.

Friday, 18 October 2013

Developing an ILM strategy


Information Lifecycle Management or ILM as it is known popularly is perhaps the most important aspect of any organization today. This means that every organization needs to think over how it wants to handle its data right from the time it is created to the time it looses its value.
With the kind of data growth we are witnessing, it is becoming even more important for the organizations to understand how frequently they need to access the data, and how long do they need to retain that data.
Compliance regulations are one of the driving factors to define the overall retention period and business practices help define the criticality of the data.
It is for this reason that I believe that ILM is more of a business function than a pure IT function. In Indian context, you can co-relate this with the VAT authorities. If they have a query of current year data, they call you same day or next day. If it is a couple of years old case, you get 15-20 days to respond to every query and if it goes to 5-6 years old data, sometimes the case goes on for another year. Even they don’t ask for more than 10 years old data.
The only difference is that the business owner stored his sales files earlier at different locations based on their age and now on different disks and storages based on the same factor. The driving factor has always been the criticality and compliance for that data.
By categorizing your data into active and non-active data, and based upon the urgency of its availability, you can store this on tiered storage. This enables you to store the most recent and critical data on your fastest and most accessible devices (or cloud), and retire the rest to archivals, thereby saving both cost and resources.