Wednesday, 4 August 2010

NotApp - Random thoughts

So as usual @chrismevans latest blog post here raises some good points that need discussing. In a previous post a while ago I highlighted some of these but I think Chris brings a number clearly to the fore.

FlexVols are inherently a good thing (although some of the sales use cases are rather flawed - eg copy prod to dev/test, which of course ignores all the information security issues), similarly so is the concept of Aggregates - but the issue for me is that these features haven't moved with the times. Some points below...

Aggregates limited to 16TB - this is an increasing issue for several reasons. Firstly the simple point that as disks increase in capacity we have fewer drives that can participate within an aggregate, impacting performance and risk. Secondly the first impact also impacts costs - as NetApp themselves are having to recommend smaller disk sizes with aggregates, which of course prevents any cost/gb benefits of larger disk sizes. Thirdly of course there are times when the actual capacity of an aggregate is just needed to be bigger than this limit.

Full support - so as Dimitris points out, in 8.0 a FlexVol can now be the same size as an aggregate (a good thing) but there is a new restriction, this cannot be a dedupe/ASIS volume (they are still limited in size). So a case of 'feature marketing' - in that most customers will now be using dedupe on their storage, meaning this new limit increase is pure theoretical at this point.

Mixed disks - The fact that today aggregates can only include a single disk type (ignoring PAM) is frankly painful, heck even EMC have understood this message and delivered upon it (still more work need Boston based dues!). But not to see this in OnTap after all this time is depressing to say the least.

Non-disruptive - for me this comes in two main areas that still appear to be missing :-

  • Non-disruptive move of volumes between disk types and/or aggregates within the same array
  • Non-disruptive in-place upgrade from 32bit to 64bit aggregates

NameSpace abstraction in the NAS area is still a major issue - primary for technology refresh & migrations, yes vFilers help for some of it but really just snowplough the problem around rather than actually remove the problem. I think NetApp certainly do need to spend more time looking at the migration & tech refresh areas, and spend time looking at environments where customers run a variety of ages of platform - to see what can be done to improve these parts. Otherwise they'll find that customers invest in the heterogeneous namespace virtualisation areas (eg F5 ARX, EMC RainFininty etc) rather than the persistence layer.

Reducing the size of an aggregate - hasn't been an issue in the past but will most certainly be an issue with larger aggregate sizes for companies with decent sized estates that need to move their storage hardware and/or capacities around within their estate.

Legacy Management - in talking of versions of OnTap we naturally get into the topic of estate management - by this I mean two key areas. Firstly the interop with & between various versions of OnTap - for many customers this will be a key factor, how to move over time to newer releases and how functionality works/is constrained by backwards & forwards compatibility. Secondly but related, is the equipment that the latest versions of OnTap are actually supported on, as much as vendors would like a new OS release to drive hardware refresh this just isn't possible or realistic in any well managed storage estate. So a key factor is that a current release is provided and supported on previous generations of hardware and that it is clear & flexible how the deployments of software interact with each other from a functionality perspective.

Multi-Protocol - yes this is interesting and can be of use, but reality is that is reasonable sized data-centres instance deployments of arrays still then to be protocol aligned (eg a 3160 for NAS, a different 3160 for FC etc). For me the bigger benefit is that the same interface and firmware/software is used over the different platforms as opposed to the myriad of platform software from other vendors.

But for me the greatest issue is that we're still discussing the same points re OnTap that we were 4 years ago, and little real progress has been made. I'm certainly not going to hold my breath for the v8.1 release that people allude to including lots of stuff that was needed 3 years ago. The talk of future releases is always bothersome, and is a standard sales tactic 'ignore the construction debris, look at possible the shiny future' - look at the disclaimers on any presentation on futures and then ask "why should I believe any of this is actually going to happen when & how stated?". I'm increasingly of the view that the OnTap code-chest is getting too old, too complex and too over-stuffed from an architectural clarity perspective for NetApp to be able to make sensible progression, and I'm wondering a) when it needs a tear down & rebuild with full architectural clarity? (as I don't believe 8 release is that) and b) what the next set of innovations are coming from?

Are they still better than the competitors? In a number of areas clearly & most definitely yes (eg MetroCluster, Failover between arrays, MultiStore etc), have they innovated ahead of others? yes (eg ASIS), but increasingly far too rarely & slowly and the market is rapidly catching up and will pass them very soon.

It's positive to see NetApp people commenting positively on Chris's blog, but NetApp if you want to improve things for the future? Run some a customer council sessions, listen to your customers discussing between them in a forum, take the information and act rapidly upon it - heck EMC did and they made big strides forward...

Overall - my view is that NetApp product management, particularly the WAFL & OnTap strategy teams have been asleep at the wheel for a few years and they, but more importantly their customers, are now paying a heavy price for this failure.

10 comments:

  1. Ian

    Thanks for the review.

    As with Chris' blog, there are a couple of points I'd like to clarify.

    Flexvols can be as big as the aggregate in OnTap8; that's 100TB. Flexvols that are to be deduplicated, and only those, are limited -- currently -- to 16TB. That's deliberate, a decision we've made so that we can deliver what we know works. When we introduced dedupe a few years ago, the limits were much lower; they were raised as we and our customers gained experience of using it. The same will happen here.

    By mixed disks do you mean mixed types, such as FC, SAS and SATA, or mixed sizes such as 500GB, 1TB etc?

    I've not seen a convincing use case for aggregate size reduction, nor do I expect to see many -- if any -- moving forward. (As an aside, Chris's point to moving out a RAID set from an aggregate isn't the way the NetApp technology works; a RAID set is a unit of protection, not a unit of space management nor a volume container.)

    Legacy management will be addressed, and in a way that requires little to no disruption. It's an area of intense focus for us, as we want our customers to move to 8; not having a pretty seamless transition would hurt them, and ultimately us.

    IMHO, multi-protocol is of more use to end users than reducing aggregate sizes! That's the one area where flexibility -- SAN or NAS tomorrow when customers are NAS or SAN today -- is vital if acquisitions are to have any meaningful life beyond immediate requirements customers may have.

    Lastly, I appreciate that this is of concern for a lot of our customers; change is always difficult, regardless of how well prepared or informed. But I would like to reassure you that this new code base is the result of many years of work, and that we're committed to making it work, and work well. This is a big step change, and no-one's been asleep at the wheel; we're ensuring that the product works, works well, and transitions smoothly.

    There's one favour I'd like to ask. I wonder what your reaction will be to EMC and HP announcements when they replace the CX and EVA. Are your expectations of a seamless move as high for them as for us? I'd like to see Chris and you apply the same analytic skills as (and when) they cast out the old and bring in the new. Now, that would be interesting!

    ReplyDelete
  2. Seamless technology refresh is one of the most under-performing areas across the pretty much all of the storage vendors. Technology refresh is generally pretty painful and often for little gain.

    Refresh disruption is actually one of the biggest blockers to investment. If the cost and disruption of upgrade is significant, then it impacts the TCO and at that point it leaves the customers thinking

    1) Why bother?
    2) Why not evaluate other vendors and change; the disruption is often similar?
    3) And back to why bother again.

    As estates grow ever larger, the time to simply move the data even at full-wire speed is considerable; what is needed a seamless, non-disruptive and incremental method to moving data.

    Hey, I'd like to see a seamless way to moving data from SAN to NAS (vice versa would be nice...but not so important). Probably not easy but not I suspect impossible either.

    ReplyDelete
  3. Hi Ian,

    Dimitris from NetApp here.

    FYI, we do have customer councils and take all suggestions extremely seriously.

    Here's the issue with making changes more rapidly:

    We have code in testing that will boggle most people's minds. In my opinion, a lot of it is more than ready, and would create quite a stir if we released it. However, putting it in production means it has to pass through all kinds of paranoid testing, since we want to retain our 5-6 nines availability rating. Such is the price of success.

    This testing takes time.

    EMC is similar, things you see in the CX or symm code today weren't developed and put in production that quickly, I know of many features that have been in development for 4-5 years before they made it out.

    Also, 8.0 is internally a LOT more different vs 7.x than you would think.

    MY personal pet peeve is the current lack of transparent migration for objects. Some are easier than others (CIFS for instance as a protocol is problematic to move live, NFS is easier, FC is arguably the easiest).

    Is it a feature frequently needed? Probably not.

    But is it useful when needed? Oh yes.

    Out of curiosity, is there another well-established vendor that can seamlessly move around all protocols in their box and between boxes? (we tackle some of the "between boxes" problem with Data Motion).

    Thx

    D

    ReplyDelete
  4. Alex,

    Many thanks for your comments & time to respond – much appreciated, my thoughts :-

    1) FlexVol size – so just to be clear, can I have a 100TB aggregate that contains a mix of dedupe (<=16TB) & non-dedupe flexvols (<=100TB)?

    2) Mixed disks – by this I do mean a mix of FC, SATA, SAS within the same aggregate, but yes also making effective use of different disk sizes within an aggregate as well (eg mix of 1TB & 2TB SATA etc). The key bit is that a FlexVol is allowed to span the different disk types so that cost per GB IOP can be managed differently for different flexvols without having to impact the namespace.

    3) Aggregate reductions – we have plenty of circumstances where our requirement for storage within our overall estate changes, and the ability to reduce the size of an aggregate so that we can relocate shelves would be of benefit.

    4) Legacy mngt – good to hear, but we need more details of this and quickly! The move from 7 onwards for more ‘mature hardware’ is certainly problematic, and of course these assets interact with newer assets, which causes ‘interesting’ situations… Equally the in-place upgrade situation needs to be handled rapidly – the days of buying another wave of disk capacity based on a firmware upgrade are long gone…

    5) Multi-protocol – is of use but not a key driver. Most large estates will have clear preferences for block products, and clear preference for file products, similarly it’s likely there will be different areas of support teams involved. Whilst I agree from a vision perspective re converged protocols, the hear & now benefit of this is less compelling and there are lots of people/process/org hurdles to solve first. For smaller estates I can see the value though.

    6) Thanks for the reassuring comments – something we need to continue over a beer I think (heck I might even buy the beer!). But afraid I disagree over ‘asleep at the wheel’ – I see what could/should have been possible over the last 5 years and the missed opportunity that NetApp have squandered and this pains & frustrates me with a passion

    7) Oh I can assure you I look at all strategic partners with the same mindset and detail – and will publish my views in the same way. I may be accused of being grumpy but at least I’m consistently grumpy to all! :)

    Cheers

    Ian

    ReplyDelete
  5. Dimitris,

    Thanks for the comments – appreciated in content & style! I’ll try an address your responses :-

    1) Customer councils – Can I ask when you last had a customer council session (with customers not resellers) outside of the USA? Can I ask what the invite criteria is for this? (as in the circles I deal with are all PB+ NetApp customers and nobody seems aware of anything)

    2) Suggestions – It would be good to see evidence of this, a disclosed RFE list from customers perhaps? (under NDA if you must) It would also be good to understand & have clarity on your RFE process and tracking etc? (as like most storage vendors this area is ‘in need of improvement’ – being polite here, but have a look at http://www.grumpystorage.com/2009/10/rfe-for-rfes.html for more thoughts)

    3) Fully agree with testing, and also understand the need for uptime (I have legal obligations to regulators about uptime – so trust me I know this well) – however my point being that you’re judged by the end result not the internal malarkey to get there. In this area the rate of functionality & scale improvements from NetApp has dropped over the last few years, with most customer exposed changes being through hardware updates. During this same time period other vendors have gone through similar major architecture & code-base changes and have matched their scale restrictions etc.

    4) OnTap 8.0 – agree it needs to be pretty different, but then I’m expecting to see a rapid timeline of features/function innovation and scale limits removed in minor release versions over the coming months – is this the case?

    5) File migration – I believe transparent namespace migration, both within & between heterogeneous boxes to be a major pain area today. For people of material estates this is a constant issue (I have the issue today on over 15 NAS arrays current being migrated) – something NetApp doesn’t appear to have any offerings in.

    6) Protocol migration – your last question is an interesting one, frankly I think the key use case is to be able to move any protocols seamlessly (along with the relevant namespace mngt etc). The area of particular pain here is NFS & CIFS.

    Don’t get me wrong, NetApp is still the NAS leader (and in many key ways) – but that lead reduces every day (and is certainly not as big as it might be perceived in California), and there appears to be little sense of urgency (a ‘common burning platform’ feeling if you like) about moving some of the basic stuff forward…

    Cheers

    Ian

    ReplyDelete
  6. Martin,

    As usual totally agree! It staggers me that more time isn’t spent by vendors making refresh & migration easier – that is the best way to reduce and transition pain (emotional, physical, financial or otherwise). Odd…

    Not so sure about the need for SAN<->NAS migration – but sure you’re several steps ahead of me on that requirement…

    Cheers

    Ian

    ReplyDelete
  7. Ian

    Good questions, and (some) answers.

    1) Yes, you can have a 100TB aggregate that
    contains a mix of dedupe (restricted to <=16TB) & non-dedupe flexvols (<=100TB). Functionality like clones, snaps, thin provisioning apply to all.

    2) Not yet. Currently you can mix disk types as long as they're of the same spin speed and size (FC 15K with SAS 15K for example).

    3) Evacuating a shelf is problematic without evacuating the aggregate. That (afaik) would be the only solution.

    4) Details are coming; they're under wraps at the moment, but I'll speak to Uday (the product mgr for WAFL) to see what we can say publicly.

    6) Beer is good. The missed opportunities I too can see from an outsider's perspective. But development at this scale, while maintaining backward compatibility, innovating and developing new features that customers want to buy, is not easy.

    7) Excellent, I can't wait :-)

    ReplyDelete
  8. Actually looking at where I am storing most of my files i.e on GPFS filesystems, my migration of SAN to NAS may not be especially hard. In fact, as long as we stick with GPFS, the whole tech-refresh is in many ways a lot more simple and non-disruptive.

    ReplyDelete
  9. Hi Ian,

    Before I answer some of your questions - why "NotApp"? Also, are you already a customer?

    1. VARs and NetApp engineers don't go to the council, customers get invited. Not sure about outside the US, ask your local NetApp rep (but you need to be a customer and have been using the gear, at least that's the rule of thumb I've seen here).

    2. RFE contents and procedures - again contact your local NetApp team, I'm sure they'll be happy to oblige.

    3. A lot of the innovations will come not as a trickle but as an explosion, refer to #1 and 2.

    4. Aye (I earned the right having lived in Aberdeen for many years :) ) Again #1,2

    5. Bycast could be a fit here though that was not the intended purpose, again no vendor has this built-in AFAIK, everyone needs some external box.

    6. Live, clean NFS migration you'll see, CIFS as a protocol is almost impossible to move live.

    Oh there's a sense of urgency for many things, but we also want to cater to the vast majority of our customers, who seem very happy with the stuff on offer.

    I'll repeat my question though:

    What established vendor provides the stuff you're asking for (or is even close)? I'm merely asking, since I'm not aware of any.

    Thx

    D

    ReplyDelete
  10. Also posted a response on the aggregate requirement here: http://bit.ly/duBafj

    ReplyDelete