My previous post titled “Why is MOS so easily dismissed?” summarized anecdotal findings from real-world experiences of organizations. This week, I will continue on the same theme of real-world experiences gathered from operational groups throughout the world.
Most monitoring and management tools focus on SNMP and Syslog as the primary means to manage Unified Communications (UC) environments, however UC platforms include a degree of complexity where visibility must go beyond SNMP:
- UC is generally comprised of many multi-vendor solutions that encompass IP PBXs, gateways, SBCs, routers and switches…hence, KPIs may not all be available via SNMP
- Usually the introduction of UC adds complexity and experience gaps to our Operational staff…hence, the actions to maintain systems requires administrative command knowledge
The combination of more complex solutions and experience gaps requires visibility to the communications environment that automates the collection of more advanced metrics and can reduce operational staff resources in managing the client environment.
When monitoring routers or switches, SNMP usually covers most of the key performance metrics required to measure performance of the network such as availability, uptime, interface performance, processors, etc… However, visibility to communications platforms such as the UC platforms, IP PBXs and gateways require the collection of vital metrics that are only available through some command line interface and not exposed by the manufacturer via SNMP. A classic example is visibility to trunk channels which is generally only available through some console command interface. SNMP will provide an accurate indicator of Administration and Operational states, however if trunk channels are unavailable little visibility exists…other than the end-user negative experience of call failures.
Let’s face it, by the time the end-user complains we’re now working in a reactive mode.
So how does Automation help to become proactive?
Automation can systematically simulate the console command line structures frequently performed by engineers to identify potential faulty conditions that are unavailable through SNMP. In the channel failure example, automation can either periodically poll the console sessions of network devices or can even be triggered when a link changes status to scan the channel states.
Automation can also assist your organization beyond simple fault management and can directly reduce operational overhead by scripting routine tasks. In some organizations, the amount of manual labor invested simply to perform administrative tasks on the monitored environments can well exceed a dedicated man year. As an example, the simple task of changing passwords on voicemails and messaging platforms that are not integrated can be completely automated utilizing software to reduce the labor overhead.
Automation as a key factor in not just creating a proactive Operations group but an efficient one as well. The commonality between the successful operational groups include automation as a key factor in total operational processes.