Monday 30 December 2013

Source Based Deduplication


Choose the unique content from source itself when you start a backup. It does utilize some processing and memory from the source system so size it well.
Source based deduplication is also very powerful in ensuring that you utilize minimum network bandwidth during the backups. The backup application will create blocks of data on source and then store their hashes there at source and send unique data on the network. This is good for backups only if it is sized well. Catalog created by some applications is large enough to cause trouble for the performance of the source system which could be a production system.
Source based deduplication also gives good results for file system backups. A traditional approach takes long for file system backup that has millions of small files taking days for getting written especially during a full backup cycle. Source based deduplication in this case picks up only the changed content of the changed files reducing the amount of data travelling on the network irrespective of the backup level set.
Global deduplication on the target further reduces the amount of data stored.

Tuesday 24 December 2013

Deduplication on Storage


When deduplication was launched for storage, it seemed a difficult technology to handle. Like any other technological aspects, deduplication also needs processing power and memory. So  deduplicating everything while storing on the primary storage would not be very effective. The base premise it started off with was reducing the disk storage investment by reducing the content to be stored. In real sense, it did not help much. While it would reduce the number of disk spindles required to store data, lesser spindles mean lesser IOPS so a compromise on performance.
SSD based storages require huge investment. Deduplication there can help reduce the disk requirement. Being SSD and capable of large no. of IOPS per disk, there is no compromise on IOPS while deduplicating. While working on one such customer requirement recently, I realized that this does not end here. Scale out storage further provides more processing power
& memory every time you upgrade and help you with consistent performance. Deduplication also happens inline i.e. you write only what is unique unlike other technologies that do deduplication at rest i.e. you write everything and then run a deduplication process to mark the duplicate content followed by a cleanup process to remove the duplicate content.
Choose deduplication on Storage with a caution, it may not be as fascinating as it looks like.

Saturday 21 December 2013

Store less, backup lesser – The Art of Deduplication


Deduplication is not a new concept though the term is relatively new. It originated more from the backup prospective though it has now got good footholds in many IT components.
I know and have been using for over a decade backup technologies that would backup only unique emails and files, and subsequently backup only delta changes. Being the only one doing that, they used their own language. They did not coin the word deduplication. Now everyone does that and calls it File level deduplication. This approach served good for desktop and laptop backups. Then this technology progressed, the current being block level deduplication. This recognises unique blocks of data and ensures that one block gets backed up only once, thereby reducing the amount of data travelling and getting backed up.
Like any other process, deduplication also needs resources to do what it is meant for. Various applications offer various forms like source based, target based and inline deduplication. Each one of them has their own working mechanisms and pros and cons. In the following series of blogs we will discuss the various methods of deduplication along with their pros and cons.

Saturday 14 December 2013

Protecting Small Databases


A lot of SME’s get concerned about protecting their databases – typically SQL database. The interesting challenge is the fact that they are really small databases having extremely critical data.

While traditionally a standalone tape based backup solution would be considered ideal, it is not so simple. Since a 50-60 GB database normally compresses down to 10-15 GB, it does not deserve that kind of investment into tapes, each of which is capable of holding terabytes. It ends up holding too small a data and therefore costs more per GB.

With the changing times, better options are available depending on what you want to achieve:

A simple backup could help maintain multiple versions and copies and give you old and new recoveries when required. While this gives flexibility of versioning, the recovery process would take some time based on the kind of resources available.

Alternatively, if you are looking for a quick access to your data even after a disaster, replicating –especially mirroring – is the best option. To keep it in economical range, you can use native mirroring capabilities rather than investing in third party tools.

Ace Data Abhraya offers both: Cloud based backup for option 1 and cloud based infrastructure for option 2 with committed recovery SLA. Infact for SQL databases, you can opt for recovery on cloud infrastructure with option of recovering only database or complete server on cloud while enjoying the flexibility of versioning and compliance, and investing very less based on the backup size usage only.

Monday 9 December 2013

Processing Big Data


While many think it is difficult to manage data beyond a volume, technologists don’t agree with this. Newer technologies keep coming in to handle and process large volume of data. Consider Google search. While you are typing what you want to search, it starts autocompleting for you and starts showing you results as well.
All this is done by using clusters of servers at the backend. Data that goes in is processed by these servers so you get a good pool of processors and memory to take it in. Further to this, the storage network used behind to read/write this data offers a huge choice.
If it is a file based data, you can go for a sale-out NAS. If you have to handle block level data, scale-out SAN options are available. To help the really heavy databases, pure Flash based storage is now available.
Flash based storages help achieve upto a few million IOPS especially when they perform on inline deduplication done in the memory. Scale out storage there ensures that adding more capacity automatically gives you more memory to handle the new IOs and deduplication help control the requirement of storage since it could otherwise go for a big on budget.

Monday 2 December 2013

How to backup Big Data?


The industry has been struggling a lot with the backups of the data they have been having for long now. Traditional tape based backup solutions seem good only for small size environments now. Though they have been growing in individual capacities and number of slots, better disk based options are pushing them more towards being secondary rather than the primary backup mediums.
Big data needs better care anyway for being big, and perhaps a bit more meaningful than the databases with invoice records or product records. The new technologies like deduplication and better compression algorithms like LZOB and ZLIB are making it more cost effective to back them up by bringing down their size.
What is also important is the cost of retaining this large volume of data and the varied sources of this unstructured data.
Ace Data’s Abhraya Cloud based backup offering resolves this challenge for its customers. Its flexible backup policies allow organizations to keep latest data close to them locally, and send the remaining to a cloud based offering. Being cloud based, they pay for what they backup and not invest on large growth assumptions. Furthermore as the backup grows old, it can be automatically archived to low cost disks reducing the cost of long term retention while ensuring data availability for long time.
The solution is capable of backing up smartphones, mobile laptops, large volumes of file servers apart from backing up the large servers and databases thereby ensuring that all sources of data can be backed up through a single solution.