Script:
Owner:
Subdir:
Blog ID: 75809171
Group ID: User ID:

Top Point Earners Today

    ColdFusion/WebSphere Admininstrator
    AES/PHEAA
    Points: 195
    Middleware Administrator
    Scope International SDN BHD - Malaysia
    Points: 100

    Reflections on IMPACT 2012 and beyond


    For me, IMPACT 2012 was definitely the best IMPACT ever!  We had more than 8,500 people in attendance, Pure Systems made a dramatic debut on stage and we added more WMQ content and more advanced technical content in general.  This included the return of my all-time favorite session "WebSphere MQ Internals."  If you liked the changes we made regarding more MQ content and more intermediate or advanced content, please be sure to state that when you fill out your surveys to make sure we continue that trend.

    The other cool thing about this year's conference was that I had a chance once again to co-present with my pal AJ from Prolifics.  (Who was named an IBM Champion at IMPACT, by the way!)  Some years ago AJ and I worked on an engagement together and this led to our co-presenting a session and collaborating on several other projects over the years.  This year we tag-teamed on the "Five Security Use Cases" session, the highlights of which were AJ's horror stories from the field.  The simultaneously best and worst thing about consulting is that you get called in when things are in dire straits.  While this can be a bit stressful, you do come away with some blood-curdling tales of horror that can be told over a campfire or in front of a podium.  AJ doesn't disappoint.

    What's more, the fun this year doesn't end at IMPACT!  AJ and I will be speaking together again at five GWC events this year, along with new friend and co-speaker Rick Christian of IBM.  The tour will take us to New York in June, then Chicago, London, Orlando and finishing up in Amsterdam in December.

    Although I enjoy writing and presenting my own sessions, collaborating and co-presenting with good people is always better because we learn so much from each other.  Rick, AJ and I bring different perspectives to the mix as well as broad experience from working with many different clients and in particular from resolving Production-down events and the subsequent root-cause analyses.  I can't wait to spend some time with Rick and AJ swapping war stories with the attendees at the seminars.  What was the outage you love to hate?  We want to know!

    The seminars will focus on MQ administration and cover basic platform skills, root cause analysis, performance tuning, security best practices, High Availability (HA) clustering and more.  Since WMQ v7.5 was just announced and v6.0 will be end-of-service as of September there will be a section on migration considerations.  I will also present an updated version of the Trends and Directions presentation from IMPACT but with a bit of a "T.Rob take" on what all that means to you.

    Whether you missed IMPACT or just want to get more in-depth WMQ training, sign up!  Rick, AJ and I are looking forward to meeting you at one of the WebSphere MQ Administration seminars this summer and fall.
    0 (0 Ratings)
    [ 98 views ] Leave a Comment

    WMQ and load balancing

    Wednesday, February 8, 2012, 5:04 PM

    I recently received the following inquiry concerning WebSphere Application Server load balancing across a cell on message input:

    Problem:
    Three AppServers in the same cluster that have instances of the Activation Specification that they use (created at the cluster scope for the same queue) each with a max connection of 10 for a total of 30 potential connections.  We thought that this would have created three pools of 10 but instead this is acting as if it is a shared pool of 30 connections.  Our scale has been less than 10 so that only one AppServer has its Activation Specification ever being used.  The processing capability of the other two Activation Specifications are never used until the scale was increased during our testing.  The second one is activated after 10 connections are exceeded and the third is activated when 20 connections are exceeded.  But if we stay under 10 connections, then the processing power of the other 2 are never used.  However the preferred behavior would be to have the messages processed equally (emulating a round-robin) by each Activation Spec. to make better use of our resources through workload balancing and also to get the best processing and throughput times.   

    One solution so far:
    Set the max connection to the the level of connections where the server is most efficient.  Then if this is hit the next server will take over the work that is higher than the max connections.  For instance if 7 connections is the optimum number of connections, set this to 7 for the Activation Spec at the cluster level.  However this solution still does not allow us to make use of the processing power of all of the three AppServers.  One server has to reach 100% of the 7 connections before the other server starts doing any processing.     

    Proposed solution:  Create separate Activation Specifications with a maximum of ten connections for each AppServer declared at the AppServer scope.  We will try this out tomorrow.  In theory this should cause each AppServer to have independent pool of ten connections.  Each of the three AppServers should independently  be pulling messages off of the queue and all three of the AppServers should be kept in use.   Given that this scope setting is allowed on the Activation Specifications on the Admin Console, I see no reason why this wouldn't be possible or why it wouldn't work as I am hoping.  We will find out tomorrow morning.

    The next morning a follow-up email arrived stating that the test had not produced the desired results.  Surprisingly though, this behavior is the expected result.  Here's why.

    This is based on the WMQ Internals session that Mark Taylor used to give at IMPACT.  (And which looks like it will make the cut to be resurrected for 2012!!! Yet another reason to go to IMPACT this year.)

    So here's the scoop.  When an application performs a GET on a queue, the connection handle is pushed onto a stack.  This is MUCH faster than maintaining a FIFO queue of connection handles and is one of the things that gives WMQ it's performance under heavy load.  But there is a side effect.  Consider the following...

    • 10 outstanding GET on a queue represented by 10 connection handles on a stack.
    • Message arrives and is passed to handle for APP01.
    • APP01 processes said message and performs a new GET
    • The GET handle for APP01 is pushed down on the stack
    • New message arrives and is passed to APP01 because its handle is once again top of the stack.

    Because of this it is common to not see load balancing under light loads.  Typically the app servers GET handles are all bunched together on the stack so the last one to start gets the top 10 (or however many) handles.  As long as messages arrive slow enough so that those top 10 handles can process the load, no other instance sees messages.  If this is the issue, then no amount of WAS tuning will make it go away.  

    Although the use case in the example was WAS, this behavior can be seen in any case where there are many instances of something all with outstanding GET calls.

    The way to test is to drop a large load of messages on the queue at once.  Preferably 1000 or more small messages in a single syncpoint so that all handles will be served multiple times.  SupportPac MA01 will do that for you.  If you try to put too many messages you get transaction rollbacks and need to tune log file extents and MAXUMSGS parameters.  If you keep them pretty small 1,000 should be easy to do without errors, even on a default QMgr configuration.

    The follow-up email stated that this did indeed behave as expected when message rate was varied.  The slower the rate, the fewer instances were activated.  Above a threshold, all instances were busy.

    The "best" answer here is not to expect load balancing with slow message rates. Changing WMQ to manage GET handles with a FIFO queue would impact performance under heavy load.  Given a choice between tuning for performance under load versus tuning to distribute load during non-peak periods, I think favoring the peak loads is the right choice.  The next question is that once this is understood, how big a problem is it that GET calls do not round-robin among instances?  It appears to be a superficial problem but if this causes you busines simpacts, please tell me what those are in the comments below or via email so we can consider how they might be addressed in the future.




    3.7 (2 Ratings)
    [ 435 views ] Leave a Comment

    Big day for IBM security announcements

    Tuesday, October 4, 2011, 11:57 AM

    One great thing about being in WebSphere Product Management is seeing all the exciting enhancements in the pipeline.  The worst thing is not being able to talk about them before the announcement.  That's why I'm so excited about the announcements today - now I can finally talk about some of the things to be delivered in WMQ v7.1 next month! 

    Obviously as a security guy the thing I'm most excited about are the new features in that department.  From the announcement:

    Reduced complexity for enabling and checking system security

    The importance of the effectiveness of the security applied to the connections between applications is matched by the need to ensure this security can be efficiently applied and maintained. Previous versions of WebSphere MQ enabled customers to write their own security exits to manage the authentication of channels. This allowed for effective, if complex security control of channel access, but would obviously require customers to write and maintain these exits.

    In this release customers have the option of continuing to use these security exits or instead they have the choice to use IBM provided code, enabling wide-ranging control of multiple security aspects of access to channels.

    These security controls are available across all platforms of WebSphere MQ V7.1 and are enabled for use with WebSphere MQ Explorer Security Wizard, a new security wizard, which runs on Linux and Windows.

    Other security enhancements include:

    • Fine grained control for cluster transmit queue authorization
    • Support for SSL provider use of Suite B (FIPS 186-3)
    • Enhanced algorithm support for SSL encryption (Elliptic Curve algorithms) and hashing (SHA-2 algorithms)

    The first part talks about adding the kinds of controls to channels that previously required channel exits.  This includes such things as filtering connections by IP address and mapping certificate DNs to MCAUSER.  Dating back to my days as a WebSphere MQ customer, I've always said that channels should natively support this functionality and I know that a great many of you have asked for this as well, so I'm ecstatic that channels now have this capability. 

    Hidden in the bullet about "fine grained control of transmit queue authorization" is something else that has been in great demand - authorization on non-local queues!  The implication is that in v7.1 you will be able to authorize users to put to non-local queues without granting them rights to the cluster transmit queue or creating a local alias.

    There is a lot more to come and I'm waiting on the final WSTC presentations before talking about some of the other features in great detail.  But I will say that there are quite a number of items which have been in great demand and which you are going to be very happy about.  Keep an eye on the announcements here at GWC, The Deep Queue blog, MQ on TV, the WebSphere MQ home page or just follow me on Twitter for lots of information and links to come!

     

    3.7 (1 Ratings)
    [ 250 views ] Leave a Comment

    High Availability Messaging

    Thursday, September 1, 2011, 12:54 AM

    As a consultant one of the most common requirements I saw was to implement high availability for a messaging network.  WebSphere MQ has quite a few options for H/A including WMQ clusters, hardware clusters, Multi-Instance Queue Managers (MIQM) and Queue Sharing Groups (QSG) so one might think this would be an easy assignment.  Of course the devil, as always, is in the details.

    There are two aspects of high availability when it comes to messaging:

    1. Keeping the service available for new messages
    2. Making the messages themselves highly available

    The first of these should be very familiar because making an asynchronous messaging service highly available uses techniques very similar to the ones employed to make any other service highly available - dynamic run-time name resolution and redundant services.  If a node fails, the requesting client retries, resolves a new node, connects and resumes communication.

    WebSphere MQ clients achieve this by distributing connections across multiple queue managers configured to be functionally equivalent.  This can be accomplished through Client Channel Definition Table files, multi-instance connection names, shared channels in a QSG, or even IP load balancers (although they really balance connection requests in this case).  Service availability from queue manager to another queue manager is achieved with WMQ clusters or shared channels in a QSG.

    In all of these cases availability is achieved through the use of redundancy.  Need more capacity?  Add another node or add a new queue instance to the cluster.

    So far, so good.  Keeping the service available is pretty easy.  What about making the messages themselves highly available?

    The interesting thing about asynchronous messaging and the aspect that sets it apart from synchronous is that the state of the transaction is handed off to the transport.  With a synchronous call, transaction state is either at the source or the destination.  If the transaction interrupted before completion the sender can retry with confidence, even to a different receiver node, because the original receiver will eventually roll back.

    Asynchronous messaging offers the same transaction semantics but now there are two different units of work.  The first is when the sender hands the message to the transport and the second is when the transport hands the message to the receiver.  In between these transactions, the sender has the state associated with the posted message and the receiver has the state prior to the posted message.  This introduces the possibility that if a node fails, the applications could end up out of sync due to stranded in-flight messages on the downed node.

    Thus, the second high availabilty bullet - keeping the messages themselves highly available.  WebSphere MQ has several very effective solutions for this: Multi-instance Queue Managers, traditional hardware clusters such as Power HA (formerly HACMP) and Queue Sharing Groups.  Each of these works by making the data available to multiple MQ instances but they do so in different ways.  The MIQM and hardware cluster allow the same queue manager to run on any one of multiple possible servers.  Only one queue manager instance can run at a time and all possible instances "see" the exact same data residing on a shared drive.  When the queue manager instance is down (for example to apply maintenance) another takes over using the same data store as the first, including any in-flight messages.

    The queue sharing group takes this a step further and allows multiple different queue managers to access the same data store.  In this model, several queue  managers compete for the same messages in a queue so if one queue manager is unavailable, the messages can be accessed by a different queue manager.

    So when we talk about High Availability Messaging, we need to account both for availability of the system for new messages and for availability of existing in-flight messages.  Perhaps only one of these is a requirement, perhaps both.  MQ Clustering provides service availability for new messages.  Hardware clustering and Multi-Instance Queue Managers provide availability for existing in-flight messages.  Combine these two methods or use Queue Sharing Groups if you need both types of availability. Be sure to use the right combination of options to meet your specific requirements and you too can enjoy highly available messaging!

     

    3.7 (1 Ratings)

    The invisible threat

    Thursday, August 11, 2011, 11:21 AM

    I have recently changed roles in IBM, but for the past five years I have been working as a consultant in Software Services for WebSphere and specializing in messaging security.  Typically discussions around security focus on some of the more dramatic threats: from the disgruntled employee with an axe to grind, to profit-driven organized crime, to state-sponsored cyber-attackers with political agendas.  Although these incidents are scary, many of my clients did not find them compelling reasons to invest in security.  If the enterprise had no high-value financial transactions, transmitted minimal personally identifiable information and there were no compliance drivers, the dramatic security breaches reported in the media did not tend to resonate deeply.  Many of my clients simply did not feel the threats applied to them.  But that does not mean that there is no risk.

    I thought it would be useful to describe some of the more routine security threats that are not motivated by malice or profit.  These will apply to everyone and hopefully inspire some readers to reconsider their security – or lack thereof.

    One of my customers with a large messaging network has a central team who manage all of the queue managers.  They have a change control process that requires developers to submit requests for new queues or channels and the administrators make sure that these adhere to standards and then build the requested objects.  Turnaround time is one business day.  Security on the network is of a variety I like to call "security for honest people."  Specifically, the developers and users were assigned to specific client channels which were restricted from administrative actions.

    But the change control process incurs a delay in handling requests and this creates an incentive to bypass it, especially for projects under chronic time pressure.  Incentives will always affect behavior so naturally it was just a matter of time before developers found they could simply point their WMQ Explorer at the administrator's client channel and have free run of the queue manager.  As long as they stuck to the published standards the administrators turned a blind eye and everyone was happy.  Or they were, right up to the moment that someone advertised a new queue instance in Production.

    The intent was that a new program would make requests to an existing service.  However the developer used the name of the service queue as his reply-to queue.  Because of the lack of communication, the administrators thought this new application was a new instance of the service provider rather than a client.  Only after the new queue was defined in production and started receiving one third of all service requests was the mistake found.  Because only about a third of requests failed, it took a while to track this down.

    If you have gone to the trouble of creating standards and change management processes, proper security can enforce that these are followed.

    When I mention security, most people immediately think of things related to intrusion prevention.  Although that is a very important component of security, my working definition includes intrusion detection and recovery.  For example, I like to monitor against a configuration baseline and then report any exceptions.  This is especially important when the messaging network is managed as shared infrastructure where many applications depend on common components.  The next example resulted in a major outage that lasted most of a week.

    In this particular case my client implemented a new service which required persistent messages.  Although testing went well, once promoted to production every once in a while a transaction would disappear.  Eventually it was discovered that some of the requesting programs had failed to specify persistence on the request messages and were inheriting the value from the queue.  Now and then one of these messages was lost on a channel with NPMSPEED(FAST).  Rather than fixing the programs, it was decided to change the queue definition to specify default persistence.  In addition to changing the intended service queue, the administrator also changed DEFPSIST in the system default model queue, reasoning that if the requests were persistent, the replies must be as well.

    Of course, SYSTEM.DEFAULT.MODEL.QUEUE is a shared resource and the change affected many programs using that queue.  Some programs explicitly specified non-persistence and were not impacted.  Others took the option persistence as queue default and inherited the change.  The result was some pretty strange behavior.  Since a temporary dynamic queue cannot accept persistent messages, some programs replying to local queues failed.  In other cases replies were sent to remote dynamic queues.  In these cases the responding program succeeded but the channel had no place to put the message so it ended up in the Dead Queue.  Once the application timed out waiting for the response the dynamic queue disappeared so it was not immediately obvious what the problem was.  After nearly a week the root cause was identified and the changes, including the new service, were backed out.

    The same security which monitors for malicious intrusion detection will detect and report accidental changes to the configuration baseline.

    My last example is of a client who had been experiencing problems in their production network.  In an effort to diagnose the issues, they took the extreme measure of stopping the production queue managers, backing them up and restoring the backups in the QA environment.  They then proceeded to run a day's worth of production traffic through the system in an attempt to recreate and diagnose the problem.

    Unfortunately, the channel definitions in the QA environment still pointed to production nodes upstream and downstream and the SSL certificates which were copied were accepted by the production nodes.  In the course of recreating a day's traffic, all the transactions were routed to the actual production system and all the orders propagated to external business partners in duplicate.  Some of the business partners executed the duplicate orders, causing a cascade that extended far outside the enterprise.  The monetary and reputational cost was significant.  The cost to prevent a recurrence was to reconfigure the existing security exits to look at IP address ranges as well as certificate names.

    The security which protects you from malicious intruders also protects you from accidental breaches.

    After reviewing the impact of these breaches and cost to prevent them, not one of my customers has ever decided that the loss was acceptable.  In every case the financial and reputational impact was at least an order of magnitude greater than the remediation implemented to prevent future occurrences.

    It seems that every day there's a new breach reported in the news involving millions of lost passwords, credit card numbers, government ID numbers or other sensitive information.  But routine and accidental breaches are almost never reported and in my experience they represent the vast majority of incidents.  In all of the examples I've cited my client assumed that their security was "good enough" or that they were simply not an attractive target so "it won't happen here."  For those who believe their security is good enough, I say trust is good but verify and enforce is better.  For those who believe it won't happen here I would point out that you do not have to be an attractive target to be the victim of a well-intentioned mistake.  The most feared word in security is "oops!"

    Don't wait for a media reported breach in your industry sector.  Or an audit finding.  Or an accidental breach.  Secure your middleware network today.

    3.7 (2 Ratings)

    Have laptop, will travel

    Thursday, July 15, 2010, 11:02 PM

    Well, if you found this page, chances are you are interested in WebSphere MQ security.  Perfect!  I can help with WebSphere MQ, MQ File Transfer Edition or pretty much anything that talks to MQ one way or another.  I am a consultant in IBM's Software Services for WebSphere division and ready to travel to your location to help you secure your messaging network and to teach your staff all sorts of WMQ security ninja skills.  I am also occasionally available to speak at your conference or event.  For more on WebSphere MQ security, please have a look at t-rob.net.  The WMQ page contains pointers to lots of WMQ security resources you might need and the Links tab has pointers to all the articles and presentations I've written that are publicly available.

    0 (0 Ratings)
    [ 193 views ] Leave a Comment

Connect w/ Others

Leader Spotlight

    Loading...
    Loading...
    Loading...