Typically, creating a new software package is not a seamless process. It involves debugging and fixing the problems discovered when the package is deployed in test environments, and the dev-test community (the development and testing phases working together) starts testing. When many different software developers are involved in writing code for their part of the project, it requires tweaking for optimal efficiency. If the task requires expedited development times to speed a product to market, it also necessitates a faster response when there are problems. The resulting chaos can impact and delay time to market unless there is a defined process in place. The solution is to avoid environment (not necessarily infrastructure) instability and chaos by developing an incident management team to tackle software bugs.
What is IT Civility?
There are several players involved in the IT world, from code developers (aimed at solving specific issues or making the end user’s life easier) to the in-house testing team, to those that manage the servers; together they all are responsible for sending out the finished software program to the world. When continuous integration/continuous delivery or deployment (CI/CD) methodology is utilized to make faster code changes, numerous developers may be revising their particular program niche and logging on to load updates to the pipeline after retesting, though not necessarily all at the same time. That’s where IT chaos can occur. Think of it as a busy city intersection with no traffic signals. That chaos cannot be removed without instituting a regimen or discipline regarding software development, testing, and debugging methodologies.
Who will play traffic cop without imposing restrictions that can impede the timely development and release of a software product or solution for an existing issue? Not clearly defining which team(s) will attack a particular coding issue, which team is responsible for testing, etc., is a recipe for trouble and delay. The advent of cloud data storage has not solved the IT chaos issue either. The in-house server farm may not be bogged down with multiple logins as developers access the cloud to make changes. However, that seemingly “limitless” cloud capacity may invite more chaos and lack of coordination from application development teams. Not every group will be ready with their changes at the same time, and the pipeline then becomes muddled.
Incident Management to the Rescue
When there is a problem, perhaps from an end user with a login issue, it is imperative to identify which development team is responsible. This is the incident management process (IMP), typically found in a production environment. IMP involves logging, recording, and resolving incidents to restore business processes or services quickly. Instituting IMP earlier in a non-production environment can help transform IT chaos into IT civility. Code changes frequently occur in the latter environment, asynchronously. It is therefore imperative to have a plan and chain of command in place. An incident management team (IMT) will return bugs discovered by the testing group during regular “health checks” back to the appropriate development team(s) for modification.
An IMT also works with the testing group to see which codes should be checked daily, starting with basic functions like logging on. When configuration defects arise, an automatic ticket is generated and sent back to the development teams without waiting for testing groups to verify it, shortening the lead time for the revisions needed. With humans doing the work, coding errors will happen, but an IMT can bring calmness, or IT civility, to the process. Tools like Splunk and AppDynamics can provide real-time data about the health and performance of each layer in a tech stack, making it easier to detect errors and then fix them faster. To help expedite the correction process, establish an incident repository database like ServiceNow to house all relevant information about issues that occurred in the system.
Want More Tech News? Subscribe to ComputingEdge Newsletter Today!
Slow the Roll When Bringing Products to Market
Software developers that do not adopt an IMP may want to lengthen the lead times for releasing a product to the end user, since it can take longer to uncover and correct issues without optimal coordination between the development and testing teams. Software or a logon that doesn’t work all the time can prove disastrous for a website or program if the competition gets it right. Incident management also works with multi-stack architecture (where the technology stack may be split between legacy systems and the cloud, for example) to again pinpoint where problems may originate when it comes to logons or other issues.
Breaking It Down
One approach to software development that can reduce chaos and make it easier to pinpoint problems with coding is the use of microservices architecture, in which a single application is composed of smaller components or services that are then coupled together. Development teams can focus on different technology stacks and code that can be updated (with new features or improved functionality) without impacting the entire program. IBM points out that microservices architecture also allows components to be scaled individually of each other, reducing cost and waste with having to scale entire applications. It’s important to note that microservices aren’t new but unless the incident management approach is employed when fixes are necessary, it could mean different teams focusing on their own application and not seeing how it might impact other functions with which it is coupled.
The CI/CD concept of continuous integration and deploying new or updated code into a pipeline with various developers doing so in an asynchronous environment works better when the incident management approach is used to establish a testing protocol before the new code is deployed. Establishing limited, defined time periods called green zones for new code to be integrated also institutes some rigor to the development process. Additionally, this process helps establish time frames when testing can be enacted without new coding being integrated concurrently, in “non-disturbing windows.” Several green zones can be set up, for example, at different times of the day, perhaps to accommodate U.S. development teams and others in various places around the world.
Incident Management Is a Bottom-Line Issue
Chances are high that a software product will need to be fixed before it is truly ready to be released to the marketplace for end-user consumption. How an organization responds to the coding problems that arise, differentiating between software defects and other incidents, and coordinating the necessary fixes (by practicing IT civility over chaos) will determine if, when, and how quickly a product is released to market. There’s a bonus, too: it may stand out from other software developers where chaos reigns and products are released before all the problems are solved.
A more formalized benchmark for IT best practices (tasks, procedures, checklists) is available for individuals in the field. The Information Technology Infrastructure Library (ITIL) was established several decades ago and has undergone several revisions. Employing the incident management concept in a non-production environment can turn chaos to calmness and IT civility. While it has yet to be adopted by some software development companies, they should consider the time and money benefits by using this process.
About the Author
Bhawini Navapura is a senior software professional specializing in DevOps implementations. Her niche lies in stabilizing nonproduction environments in last decade. She specializes in ITIL implementation in test environments and providing high availability to application development teams globally for a leading financial firm. Contact her at bhawini.navapura@gmail.com.
Disclaimer: The author is completely responsible for the content of this article. The opinions expressed are their own and do not represent IEEE’s position nor that of the Computer Society nor its Leadership.