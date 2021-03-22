Following the trading glitch in the National Stock Exchange (NSE), market regulator Securities and Exchange Board of India (Sebi), on Monday, issued revised guidelines to ensure faster restoration of operations in case of any disaster or technical glitch. Sebi has also advised governing board of NSE to determine why the NSE management failed to shift operations from primary site to disaster recovery site within the time frame specified by the regulator, fix individual responsibilities for the task and complete it within 21 days.

As per Sebi’s new mandate if there is any trading disruption in the critical systems, market infrastructure institutions (MIIs) should declare that incident as disaster within 30 minutes of incident and take measures to restore operations within 45 minutes. MIIs are stock exchanges and clearing corporations. “In the event of disruption in the ‘critical systems’ of the MII, the MII shall, declare that incident as ‘disaster’ within 30 minutes of incident (earlier two hours) and take measures to restore operations including from disaster recovery site (DRS) within 45 minutes (earlier two hours) from the declaration of ‘disaster’," Sebi said in a statement.

It also said that MIIs will need to study the feasibility of intra-day shifting from primary site to DR site with a notice of 45 minutes from Sebi. “MIIs shall prepare comprehensive testing plan and build sufficient redundancy in its systems in order to mitigate impact of any unforeseen technical glitch and to ensure failure of any sub system of MIIs would not impact other critical systems of MIIs and continuous functioning of securities market," Sebi said.

The Sebi statement said that staring April, unannounced live trading session will be conducted from DR site of the MIIs with a notice of 4 hours from Sebi before the start of the trading session.

Meanwhile, in a separate statement on Monday, NSE has said Storage Area Network (SAN) system failure led to the glitch due to which trading on the platform had to be halted for over three hours on 24 February. It said that the SAN system at the primary data centre stopped functioning, which was completely unexpected. It also said that the exchange is exploring alternate solutions to de-risk dependency of critical applications to a single storage device.

“On 24 February, post link failure, we saw unexpected behaviour of the SAN system, with the primary SAN becoming inaccessible to the host servers. This resulted in the risk management system of NSE Clearing and other systems such as clearing and settlement, index and surveillance systems becoming unavailable," NSE said in an official statement.

The SAN is a fault tolerant system that was designed to function seamlessly even in the event of telecom link failures between primary and Near Disaster Recovery (NDR) copies. One of the features of SAN that was deployed in October 2020 was designed to provide not just zero data loss but also zero down time. Before deployment, the system was tested against various scenarios including link failures and functioned properly, NSE said.

Subsequent incident analysis showed that the problem was caused by failover logic implemented by the vendor which did not conform to NSE’s stated design requirements, coupled with issues in the configuration done by the SAN vendor that triggered the failover logic. “We note that the specific failure logic used by the vendor is not documented, was not communicated to NSE, and was not appropriate for NSE’s setup. The resultant SAN failure led to the incident on 24 February.

It added that while there was no impact on the trading system, given that the risk management system was unavailable, allowing trading to continue on NSE posed an unacceptable risk, and hence trading had to be halted.

NSE’s primary data centre is in BKC, a Near Disaster Recovery (NDR) site is maintained in Kurla, and the disaster recovery (DR) site is in Chennai. The statement said that there is synchronous data replication between primary site in BKC and NDR site to ensure no data loss in case of primary site failure, and asynchronous replication to our DR site in Chennai which is designed to take over with zero data loss in case of disaster at the primary site.

“Between our primary and NDR sites, NSE has multiple telecom links with two service providers to ensure redundancy. On 24 February, we had instability in links from both service providers primarily due to digging and construction activity along the path between the two sites. The replication to NDR is designed such that in the event of the links between primary and NDR getting cut, the primary continues operations without any direct effect. Post earlier link failures in February 2021, operations continued without any interruption," NSE said.

NSE informed that there are various steps that have already been taken and others that are under implementation to address the SAN and telecom link issues. “We had already placed orders in January for two additional telecom provider links and have removed the SAN software that caused the incident. We are also exploring alternate solutions to de-risk dependency of critical applications to a single storage device," NSE said.

Vendors like Cisco, HP, Dell, Hitachi, Checkpoint, Palo Alto, Oracle etc support fault tolerant technology infrastructure of NSE, aided by technology service providers like TCS, Cognizant, Wipro etc.

