Nick Mehta, CEO, LiveOffice LLCNick Mehta, CEO
LiveOffice LLC

Add to Google Reader or Homepage
Add to My AOL

Subscribe by Email

Your email:

Browse By tag

Email Archiving, Email Hosting - SaaS

Current Articles | RSS Feed RSS Feed

On-premise email archive indexing nightmares

Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon | Share on Twitter Twitter 

One of the top reasons for which organizations deploy email archiving solutions is to allow them to find important email when they need it - whether for email discovery, email compliance or simply knowledge management.

Unfortunately, as the billions of dollars in CapEx that Google spends on its search infrastructure proves, searching (and indexing - the process to make searching possible) is easier said than done.

Many customers that try to deploy their own archives often find that the indexes become corrupt (unusable), slow or worse, inconsistent.

For example, witness this thread on a Google Groups forum about Autonomy's Zantaz EAS product:

We have a similar problem. We have been struggling for 8 months to build an idol index. We start building the index from scratch and everything runs at a reasonable pace initially and the idx files are processed. As the index grows the speed at which it processes the idx files slows down considerably, eventually it almost grinds to a halt.

Our vendor has tried various configurations for us over the last 9 months and we have still not succeeded in building a complete index. We have about 21 million docs to index and the best we get too is about 5 million docs indexed.

Quite honestly this product is not doing more for us other than reduce the size of our mailfiles. Even on the archiving side we continually experience cases where users are unable to retrieve archived mails. I could spend time on webbex's with our vendor trying to sort each of these issues out, but there are so many and my perception is that the support from autonomy is
so poor that I do not waste my time anymore, I just restore from tape.

This isn't an issue with Autonomy per-se.  You'll find similar issues for nearly all on-premise products.  The fact is that indexing technology is notoriously-complex:

  • You need to make sure that indices have consistent access to high-speed storage.
  • You need to make sure that index servers have appropriate RAM and RAM configuration.
  • You need to continually scale and add indexing nodes to scale with unpredictable search volume.  Troubleshooting performance is really challenging.
  • Even if you have it down to science, you need to figure out how to handle the once-a-year HUGE search without always over-provisioning the system and wasting capacity the rest of the year.
  • You need to diagnose missing or inconsistent results if you find them (and you will).
  • You need to make sure you have full-time staff who can handle all of the issues above.

In the end, many customers are left like the one above - using the on-premise email archive for mailbox management but not getting the E-Discovery benefits that they originally bought the product for.

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.

Current Articles | RSS Feed RSS Feed


No Blogs have been posted yet.