2014 – DW-Lab GmbH

WebSphere Monitoring

Over the last weeks I see an increasing request for WebSphere Application Server (WAS) monitoring. This article summarizes solutions available, created over the last years on top of IBM’s monitoring solution, SmartCloud Application Performance Management (SCAPM).

The SCAPM portfolio comprises almost everything of IBM’s monitoring capabilities under the umbrella of ITM. The ITCAM for Application contains the WAS monitoring agents.

The documentation of the WAS monitoring solution may be found on the IBM Knowledge Center.

Additionally, I’ve created two add-ons for the WebSphere Monitoring. The situation package gives a set of sample monitoring rules, covering the most often seen requirements in the field.

To get a comprehensive overview of all WebSphere Application Server instances monitored in your environment, this navigator view might help.

For deep dive analysis the data collector might be connected directly with the ITCAM Managing Server to enable transaction debugging and detailed WebSphere environment analysis.

The WebSphere monitoring is only one discipline within the SCAPM portfolio. Other areas of the application performance management are covered, including transaction tracking, HTTP response time measurement and robotic monitoring.

SCAPI: Preparing your system — Software Packages

Before you can install SmartCloud Analytics Insight (SCAPI) you have to meet the software prerequisites on the RedHat Enterprise Linux Server system you are using to host the SCAPI Data Server Components. Currently only RHEL 6 64-bit is supported.

The documentation names the requirements in several locations of the installation brochure.

I’m using the following command stack to make sure that all software packages are installed:

yum -y install libstdc++.i686
yum -y install *libstdc++-33*.i*86
yum -y install openmotif22*.*86
yum -y install pam.i686
yum -y install libXpm.i686
yum -y install libXtst.i686
yum -y install freetype.i686
yum -y install libmcpp.i686
yum -y install libXdmcp.i686
yum -y install libxkbfile.i686
yum -y install libpciaccess.i686
yum -y install libXxf86misc
yum -y install libXm.so.4
yum -y install ksh*
yum -y install libstdc++.*
yum -y install *libstdc++-33*
yum -y install openmotif22*
yum -y install compat-glibc
yum -y install pam
yum -y install libXpm
yum -y install libXtst
yum -y install freetype
yum -y install xorg-x11-xinit
yum -y install Xorg
yum -y install firefox
yum -y install openmotif
yum -y install atlas
yum -y install compat-libgfortran-41
yum -y install blas
yum -y install lapack
yum -y install dapl
yum -y install sg3_utils
yum -y install libstdc++.so.6
yum -y install libstdc++.so.5
yum -y install java-1.7*-openjdk.x86_64
Java is required to get the prerequisite checker executed delivered with the IBM InfoSphere Streams software.
The packages below are installed because the InfoSphere Streams checker.
yum -y install libcurl-devel.i686
yum -y install libcurl-devel.x86_64
yum -y install fuse-curlftpfs.x86_64
yum -y install libcurl.i686
yum -y install libcurl.x86_64
yum -y install perl-Time-HiRes
yum -y install perl-XML-Simple*
yum -y install gcc-c++

This command stack includes only those packages, which are provided by RedHat Satellite server.

After you installed all the packages above you have to add the provided RPM package as documented in the installation manual.

I’ve used the following command:

#
# Install provided InfoSphere RPM Prerequisite
rpm -Uvh <streams_unpack_folder>/rpm/*.rpm

Having all this packages installed, allows you to install all SCAPI software components.

Implementing the OMNIbus WebGUI in silent mode

After installing the OMNIbus WebGUI you have to implement the properties to direct the WebGUI server to the correct OMNIbus server and the correct authorization source.

To run the configuration wizard in silent mode, which is essential for repeated installations, use the ws_ant.sh shell script as documented in the manual.

The documentation suggests that you can locate the OMNIbusWebGUI.properties file in any location. This is not really true.

Inside this file other files are referenced which are provided in the same directory as the properties file.

The following approach worked for me:

I’ve installed the OMNIbus WebGUI server with default values, so it ended up in /opt/IBM/netcool/omnibus_webgui/ directory. In the subdirectory bin I’ve found the OMNIbusWebGUI.properties file.

Change the file accordingly to your installation and leave it in this directory. As documented in the manual execute the ws_ant.sh script from this directory, adding the required activity.

The ws_ant.sh requires the build.xml in the working directory. build.xml file describes the different callable activities. These are

configureOS
resetVMM
configureVMM
restartServer

Through this build.xml file the WebSphere script execution is controlled.

In this xml file, the OMNIbusWebGUI.properties file is referenced with hard coded path names. Additionally, other XML files are referenced which are expected in the same directory.

So, edit the properties file in the location where you find it. And then execute the ws_ant shell script from this directory…

IBM Monitoring goes SaaS

Big changes in the IT market are taking place. We see cloud services all around changing the delivery model of software from product sale to a software as a service model.

IBM also delivers more and more parts of its portfolio in a software as a service model. One of the very first offerings is IBM Monitoring. Based on the IBM Service Engage platform the monitoring infrastructure is delivered to the customers.

But how does it work?

IBM delivers the server components in a Softlayer® data center. The infrastructure is hidden behind a firewall in combination with a reverse proxy. All customer agents and client devices are connected by using the HTTPS port (Port 443) on the announced service address.

How are the different customers separated from each other?

The user clients are connected to the correct customer specific monitoring environment based on the user credentials given on the login page.

The agents have customer specific credentials in their setup and are generated for each customer exclusively. These agents are provided upon registration for the service and can be downloaded on customer request.

How many agents should a customer have?

Well, there is no minimum number of agents a customer has to request to become eligible for IBM’s monitoring offering. However, there is a maximum number of agents a single instance of this monitoring infrastructure can serve. Depending on the complexity of the monitoring rules you apply we expect a maximum of about 1000 agents per infrastructure instance.

What kind of agents are available?

The following agents are currently available for the SaaS offering:

Operating Systems
- Windows OS
- Linux OS (RHEL, SLES)
- AIX
Databases
- DB2 UDB
- Oracle DB
- Microsoft SQL Server
- Mongo DB
- MySQL
- PostGreSQL
Response Time Monitoring
Microsoft Active Directory

Virtualization Engines
- KVM
- System P AIX
JEE Container
- WebSphere Application Server
- WebSphere Liberty
- Apache Tomcat
Languages and Frameworks
- Ruby on Rails
- Node.js
- PHP
- Python

There are several other agents planned to be released within the next few weeks or months, but I’m not authorized to write about in detail in this blog. If you want to more details, or if you have specific requirements, drop me a message, and I’ll come back to you with more specific information.

How are these agents installed?

The installation procedure is now pretty simple. The following videos show the installation on Linux and Windows. After downloading the appropriate packages for the target OS platform, the installation process can be initiated. The redesigned installation process on Linux follows now the standard installation rules for the OS platform (here now RPM).

The new IBM Monitoring is different from the previous one. The new lightweight infrastructure is available within a few minutes. The agents are easy to install and are simple to configure. The monitoring solution comes with a new user interface based on HTML without the need of any Java Runtime Environment. Because of that, the user interface is now also available on touch pads and smart phones.

Follow me on Twitter @DetlefWolf, or drop me a discussion point below if you have further question regarding the new IBM Monitoring.

Raising IT Monitoring Acceptance

After publishing my blog “IT Monitoring is out of style?” a discussion was initiated by several followers, how IT Monitoring acceptance could be achieved within the system administration groups.

To make that clear, system admins are not preventing monitoring in general, they complain about too often, toounspecific alerts which stops them from doing their daily business.

This leads to the refusal of such monitoring services. So what to do to get a commitment from the system admin team.

What system admins really hate?

Alerts, which indicate minor issues that could be also fixed later on within normal business hours, deranging them within their leisure time.
Alerts, which flip on and off within intervals (bouncing alerts)
Alerts, which are out of their responsibility

Well, I can imagine another bunch of bullet points, what system admins do not like, but remembering my own time as a system programmer, I believe these are the real eye-catchers in this area.

But there are also reasons, why they support a monitoring solution. They want to avoid the following situations:

Being hit by an outage of a service without an early warning
Upset users are floating the support team with calls, due to poor response times

You can fill this list with tons of other statements, so feel free to drop me your top reasons in the comment section.

What really changed over the last years in the IT department is the service orientation. Formerly, we watched the system health, rather than the service health. Today we focus on the service health. And this offers a new approach to increase the acceptance of IT Monitoring solutions.

End-To-End-Measurement

A business partner, currently implementing a monitoring as a service model for small businesses, stated the requirement to get alerted only, if key business IT functions of its customer are on risk or are already out of service. We used the Internet Service Monitor to check the named services (like email, internet accessibility, phone server, and so on). By using the approach of the End-To-End-Measurement the detection of critical service status is assured. For more sophisticated services like Web Applications or SAP Transactions the Web Response Time Monitor delivers deep insight into transactions. To track down the availability and performance of transactions in business off hours, the Robotic Response Time agent delivers valuable insight and informs about unexpected outages.

All events coming from this discipline are good candidates to be escalated also in business off hours.

Resource Monitoring

Resources, like CPU usage, memory or disk consumption, database buffer pools, JEE heap size or whatever are very important metrics to analyze the health of the operating or application system. A single metric is only an indicator but too often not a good signal to throw a high critical alert. This is exactly the question discussed in “Still configuring thresholds to detect IT problems? Don’t just detect, predict!” But yes, there might be single metrics indicating a hard stop of a system or application, which requires immediate intervention. And this knowledge comes often from resource monitors. Additionally, the resource monitors gather important data for historical projections and capacity planning. Based on this data, predictive insight becomes actionable, and gives us another source of meaningful events. Events detected by Predictive Insights are also good candidates to be escalated even in business off hours, if you are interested in avoiding interruptions in IT services.

Suppressing Events

When I was a system programmer, my team’s main goal was to have as little as possible calls in business off hours. We tried to catch up with the events – also with the less important ones – within our standard office hours. To achieve this goal, we created rules, what kind of events – or what combination of events – are critical enough, to initiate a call in business off hours. In normal business hours we monitored the system with an extended set of rules to get early indications of unhealthy system conditions. This helped us to maintain a pretty tidy IT environment, causing relatively seldom unexpected system behavior. All these extended events were suppressed by the event engine (here OMNIBUS) in business off hours. When we came on-site again, we reviewed the list of open and already closed events, recapped the number of occurrence in the monitoring system to understand the situation we’ve missed while being off-site.

In summary, there are ways to get the commitment from the system administrator team for a monitoring solution. The system administrator’s goal is to have a high available, high performance system environment with fully functioning service running on it. IBM Monitoring tools help them to achieve this goal and offer them the flexibility to get filtered information about the system status as they need it.

For those customers, trying to avoid maintaining a monitoring infrastructure by themselves, the new Monitoring as a Service offering fits perfectly.

So what is your impression? Are you also discussing with system administrators about a powerful monitoring?

Follow me on Twitter @DetlefWolf, or drop me a discussion point below to continue the conversation.

IT monitoring is out of style?

This blog has been also published on Service Management 360 on 09-Jul-2014.

A few weeks ago I read a blog entry written by Vinay Rajagopal on Service Management 360 with the headline “Still configuring thresholds to detect IT problems? Don’t just detect, predict!” I was wondering what that new big data approach will imply and what it means to my profession focusing on IT monitoring. Is IT monitoring old style now?

The IT service management discipline today is really a big data business. We have to take a lot of data under consideration if we want to understand the health of IT services. In today’s modern application architectures, with their multitier processing layers and the requirement that everything be available all the time and that performance remains at an acceptable level, IT management becomes a threat that often ends in critical situations.

The “old” approach, of monitoring a single resource or a dedicated response time of a single transaction doesn’t seem to be the way to succeed anymore. However, it is still essential to perform IT monitoring for multiple reasons:

IT monitoring helps to gather performance and availability data as well as log data from all involved systems.

This data may be used to understand and learn the “normal” behavior. Understanding this “normal behavior” is essential to predict upcoming situations and to send out alerts earlier.

The more data we gather from different source, the better our prediction accuracy gets.

With this early detection mechanism in place from so many different data sources, injected by the IT monitoring, operations teams can earn enough time before the real outage takes place, so that they can avoid this outage.
IT monitoring can help to identify very slow-growing misbehavior.

Gathering large amounts of data does not guarantee that all misbehavior can be identified. If the response time of a transaction server system increases over a long period of time and all other monitored metrics evolve accordingly, an anomaly detection system will fail. There are no anomalies. Because growing workload is nothing unexpected and the growth takes place over a long period of time, only distinct thresholds will help. This is classical IT monitoring.
IT monitoring helps subject matter experts to understand their silos.

Yes, we should no longer think in silos, but for good system performance it is essential to have a good understanding of key performance metrics in the different disciplines, like operating systems, databases and middleware layers. IT monitoring gives the experts the required detailed insight and enables the teams to adjust performance tasks as required.

So the conclusion is simple: monitoring is a kind of prerequisite for doing successful predictive analysis. Without monitoring you won’t have the required data to make the required decisions, whether manually or automatically, as described with IBM SmartCloud Analytics – Predictive Insights.

Prediction based on big data approaches is a great enhancement for IT monitoring and enables IT operation teams to identify system anomalies much earlier and thus to start reactive responses in time.

IBM SmartCloud Application Performance Management offers a suite of products to cover most monitoring requirements and gather the required data for predictive analysis.

So what is your impression? Is monitoring yesterday’s discipline?

Follow me on Twitter @DetlefWolf, or drop me a discussion point below to continue the conversation.