HUBBLE: an optical link management system for dense wavelength division multiplexing networks

Final Version: 28.03.2020 Abstract: Timely detection of Dense Wavelength Division Multiplexing (DWDM) link quality and service performance problems of fiber deployment are important and critical for telecommunication operators. In this paper, we propose a new methodology for network fault detection inside optical transmission systems deployed in a real-operator environment and present the working principles of the system. Our new calculation methodology is used for joint fiber and DWDM link quality evaluation inside the proposed High-level Unified BackBone Link Examiner (HUBBLE) platform. At the end of the paper, we also detail some of the benefits, challenges, and opportunities of automation in DWDM networks using the


Introduction
The target of optical transmission systems is to carry all the payload wrapped transparently in containers with minimum loss and latency while providing maximum capacity. In the optical transmission domain, Dense Wavelength Division Multiplexing (DWDM) technology constitutes a big portion of the transmission process.
It is used to carry huge amounts of data over long distances while guaranteeing the service level requirements of end-users. For reliable and seamless connection, measuring the quality level of fiber cables is quite important for the operation of DWDM systems. The reason is that attenuation in fiber cables might have detrimental effects on the service delivery process of network operators that have already deployed DWDM networks inside their infrastructures.
For reliable networks, it is mandatory not only to solve the occurring problem (fiber cuts, board problems) on fine grained timescales (e.g., millisecond levels), but also to create preventative operations before any trouble appears in the network. Some protection mechanisms exist in optical networks such as Automatically Switched Optical Network (ASON) [1] that come into play when fiber outage occurs. Although ASON can be updated to work in cases of attenuation with a very sensitive process, it can create a potential risk of continuous switching inside the network. Too much switching is not preferred and determining the root cause of the problem with Providing automation in the network is one of the important enablers for the management of next generation wireless network services. The mobile backhaul, which is owned by the mobile service providers themselves or used by leased lines from fixed service providers, consists of high capacity optical networks. The slightest problem in this optical network will have a significant impact on the quality of next generation wireless services. Therefore, it is important to improve the transmission quality of the optical signal and to reduce the error rate in the received signal. There are various types of developed optical receivers using different types of modulation to decode far-end transmitted optical signals. Choosing the best modulation method with the most available physical conditions for fiber links and repeater points for field conditions is quite significant. Error correction mechanisms are also important for DWDM technology. By developing technology on receiving sensitivity and signal processing techniques, software and hardware decision FEC can provide extra performance to the selection of the signal if it is "1" or "0". To ensure continuous improvement, the network should always check critical parameters such as fiber quality, the DWDM optical channel's flatness, and the updated value of the design tool's precalculated Beginning of Life (BoL) and End Of Life (EoL) attenuation values. BoL attenuation is measured in the beginning of network deployment and EoL attenuation is the maximum limit of span attenuation. In addition, network performance measurements such as BER and FEC measurements should also be checked properly. By automated and scheduled controls of these parameters, network service quality will always be under desired thresholds and the DWDM network can operate reliably.

Related work
In the literature, some studies that focus on automation of fault management in optical networks exist [2][3][4][5]. An Network Configuration Protocol (NETCONF)-or Representational State Transfer (REST)-based automation system is demonstrated in [2]. A Software-Defined Networking (SDN) controller-based network abstraction layer for the implementation of cognitive controls and autonomic operation policies is presented in [3]. In [4], the necessary information elements and processes are identified to be used by control or management systems of optical systems. The studies in [6] and [7] propose architectures for the management of SDN-based optical networks that also include fault analysis approaches. A white paper in [5] discusses a new control plane structure to support flexibility in control plane. Fiber problems can be detected with in-service Optical Time Domain Reflectometer (OTDR) [8,9]. In-service OTDR suggests a costly and complex structure compared to the system that is proposed in this paper. At the same time, currently existing and deployed equipment does not support in-service OTDR, and the inclusion of this feature adds a high cost when considering thousands of nodes. A decision system for the determination of the alarms correctly is described in [10]. The work in [11] proposed a machine learning solutions for quality analysis for the links. The article in [12] studied the performance of Raman amplifier in DWDM transmission systems that can be extended to be used for fault management purposes. Similar to our analysis in this paper, the authors in [13] propose a Geographical Information System (GIS)-based fiber optic monitoring system that can be used as a fault or degradation detection tool. However, the approach in [13] is based on positioning remote testing units, whereas in the present study we use our own proposed quality measurement system that calculates collected Key Parameter Indicator (KPI) values from deployed equipment and from NMS without using any additional remote probes.
There are also many studies that focus on the management of link quality in DWDM systems. In [14], the authors evaluated the performance of several protection and restoration algorithms in optical networks against potential losses of data in optical links. The importance of fault management and repair time for the optical links to provide enhancement for reliable transmission and increments in power savings of 5G transport networks that are based on DWDM rings is presented in [15]. Our previous work in [16] demonstrated the benefits of automation for fault management of the links in DWDM networks. The study in [17] presents a solution for detecting the link failures caused by fiber breaks or cable damage related to the external intervention to the links.
The article in [18] studied 100G optical links and proposed a monitoring system for link failures that is based on Field Programmable Gate Array (FPGA) insertion/desertion processing and the use of in-band tunneling. A method and system for monitoring/supervising optical fault management for fibers and their connection points in certain Network Equipment (NE) to resolve and detect some of the problems based on OTDR measurements is presented in [19]. The importance of fast recovery time in cases of optical network failures for better flexible optical networks planning is described in [20]. The study in [21] expresses the benefits of using automation for fault management for service continuity in optical networks. Signaling-free fault management with monitoring resource allocation based on near shortest m-trails that can be used to find the failure of neighboring nodes is studied in [22]. An approach for access links with OTDR testing is described in [23].

Main contributions
The existing infrastructure of telecommunication providers includes multiple vendor devices and managing all these heterogeneous devices using the above methods can be challenging for telecommunication operators. The biggest problem with fiber in the service provider domain is the issue of attenuation. Although distance information and attenuation can be measured by OTDR testing, the EoL and BoL values for fiber quality are not taken into account. Different from the above-presented related works, we propose a new architecture that ensures all NE belonging to different vendors can be managed inside the same environment of a telecommunication operator. The proposed High-level Unified BackBone Link Examiner (HUBBLE) platform is running in collaboration with the inventory server that includes inventory and related GIS information of the operator's network as well as fault management systems for opening up fault tickets when faults are detected in the system. Inside the HUBBLE platform, fiber attenuation difference, fiber km loss, and fiber deviation from expected attenuation values from the systems are used jointly to determine the quality of both fiber and DWDM links. Hence, the HUBBLE platform is able to detect problems caused by the fiber quality that cannot be detected by OTDR testing by taking into account the defined EoL and BoL values in this paper, and can also correlate the measured values with the distance information received from the inventory server. Moreover, for visualization purposes, the proposed system ensures that the fault and inventory information of the calculations using the proposed methodology can be visualized individually for different vendors.
Our main contributions in this paper can be summarized as follows: (i) We propose a new architecture specific to telecommunication operators that can collect fiber and DWDM related parameters from multiple vendors, perform analysis, and visualize the fiber and DWDM alarm severity levels. (ii) We develop new metrics and methodologies that score jointly both the fiber link quality and DWDM link using the measurements collected from various NE instead of simply adhering to the attenuation values given by the fiber optic cable standards of manufacturers. (iii) We provide benefits, challenges, and opportunities of automation of DWDM networks experienced using the HUBBLE platform.
The rest of the paper is organized as follows. Section 2 presents the fiber quality parameters and network design. Section 3 describes our proposal for fault management. Section 4 provides the experimental results and Section 5 discusses the benefits, limitations, and opportunities of the proposed architecture. Finally, Section 6 gives the conclusions and mentions future work. Figure 1 shows a high-level architecture of the proposed HUBBLE automation system for DWDM networks that is integrated with the Mobile Network Operator (MNO) infrastructure. It mainly constitutes three main components: first, the networking systems that belong to different vendors are at the bottom of the figure, second the HUBBLE platform is in the middle and is used for running the proposed analysis and methods in this paper, and last components are additional helper servers that are connected to HUBBLE either for ticketing to NMS (fault management server) or for user access (Lightweight Directory Access Protocol (LDAP) server) and inventory (via inventory server) purposes. Vendor specific elements in Figure 1 include DWDM NE and network management servers that are specific for each vendor as marked with different colors for vendors A, B , and C in Figure 1. These NE are heterogeneous components of a telecommunication operator's infrastructure that provides nationwide connectivity. Therefore, the data stored in NMS can be in different proprietary formats (e.g., in Extensible Markup Language (XML), Comma-separated Values (CSV)) specific to different vendors.

System architecture
In the middle, the HUBBLE analysis platform has application and data collection servers. The data collection server is running different scripts (e.g., PERL in our case) that are used to collect vendor specific network information in various forms (using Structured Query Language (SQL) query and telnet and parsing XML and CSV data formats for extraction purposes) and transform them into a simple and single data format. The HUBBLE analysis platform is also connected to various helper servers such as an inventory server for collecting up-to-date information about DWDM elements deployed in the server (related to coordinates, fiber distance, the location of the nodes on the map, etc.), fault management server (for opening a ticket to NMS in case failure occurs based on the HUBBLE's proposed analysis outcome), and LDAP server (for authorized user access to the HUBBLE platform). Finally, users of the HUBBLE platform are interacting with the platform to infer more up-to-date information about the underlying optical network infrastructure.

Fiber quality parameters and network design
At the beginning of each DWDM network deployment, there is a design process in which service providers mostly use network design tools specifically designed and developed for optical networks. Generally, all vendors have their own propriety designed tools that are specific to perform customized and advanced network features.
When vendors deploy their own services on top of these devices/products, these tools are used to obtain the best service performance and the most effective equipment usage statistics. Network design tools measure the value of the parameters that are mentioned above and also perform some measurements related to the lifecycle management of the fiber optic cables such as calculation of BoL and EoL [? ]. Design tools estimate the total amount of attenuation that may occur by taking into account the fiber cable lifespan and the corresponding network measurement parameter values. The DWDM network design process is finalized by selecting the available and the most suitable amplifiers on Wavelength Selective Switch (WSS) boards (if they exist), transponder/muxponder boards, multiplexers, and de-multiplexers. ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) Net e e wor o o k  Figure 1. High-level architecture of integration of the HUBBLE automation system for DWDM networks with MNO infrastructure. DWDM systems can work at very high attenuation. For terrestrial system, Raman amplifiers can be used for long distances and high attenuation cases, but the expected fiber attenuation is fixed. After several years of external and internal factors, fiber quality can be worse than the estimates of the design tools of vendors. If the difference between the estimates and the real values grows higher than the expected attenuation values, service interruption and quality problems can occur. For these reasons, online fiber attenuation control is required for premaintenance activities. Based on these observations, in the next section we detail our proposed calculation methodology to determine the severity level of the fiber quality and DWDM links in a joint framework.

A proposal for fault management of DWDM systems
In this section, we give some of the formulae for the calculation of values to determine the severity of the fiber cable inside the HUBBLE platform. Checking all DWDM networks' span fiber attenuation periodically will give up-to-date information about the service quality of the underlying transport network. Three kinds of criteria can be used to give priority to faults performed in DWDM links: (i) Kilometric span loss where the reference is taken as 0.25 dB/km. The standard kilometric span loss for the G.652 fiber optic cable, which is a more commonly used cable type in the network, is actually 0.20 dB/km as also mentioned above. Generally, the service providers keep this value a little higher based on their operational experience. We have used 0.25 dB/km to keep the margin a little higher. (ii) Deviation from expected attenuation where the expected attenuation is fiber length ×0.25 dB/km. (iii) Fiber span loss difference between the transit and receive side fiber optic cable.
Additionally, it will be effective to give priority to the fiber span that has higher attenuation than the others. Critical, Major, Minor, and No Alarm severity levels can be used to describe the span fiber quality. A pictorial explanation of fiber optic cable span loss is given in Figure 2. Fiber optic cable span loss is between transmitter and receiver NE. Optical Supervisory Channel (OSC) can be used for all transmit and receive side optical measurements where it has fixed wavelength and is not affected by how many channels are working on the measured span. By using graphical user interfaces, it will be easy to extract the location and severity levels of the fiber connection problem. This can ensure the telecommunication operators focus their efforts more on fiber span for detection of failures.  Fiber span loss and DWDM alarm severity metrics are summarized in Table 1 and Table 2, respectively.
Planned attenuation values (BoL and EoL) are used as criteria for comparison purposes in Table 2. The comparisons are used to calculate the difference between the actually measured and expected DWDM attenuation values. Some example values over DWDM and fiber links from node A to node B are given in Table 3, where the values are obtained from the nodes and inventory server by the HUBBLE platform. The collected parameters include center node names ( A or B ), transmit and receive power measurements, and transmit/receive physical attenuator values. If there are ES, SES, UAS errors at any of the DWDM services, checking the entire service route using the proposed criteria will give very important clues for finding the root cause of the problem.
As a matter of fact, both fiber quality and DWDM span loss comparisons between planned BoL / EoL values and real time attenuation measurements have importance in practical applications.
Let us denote transmit power as P t (A) , receive power as R t (A) , transmit physical attenuator as P tα(A) , and receiver physical attenuator as P rα(A) for node A. Additionally, denote fiber distance between nodes A and B as d AB . In the HUBBLE platform, DWDM attenuation from node A to node B (AB ) in dB is calculated Note that a similar definition can also be given for DWDM attenuation from node B to node A ( BA ).
Then DWDM attenuation difference from node A to node B ( AB ) and from node B to node A ( BA ) in dB can be calculated as DWDM deviation from planned attenuation ( γ DW DM AB−BA ) is calculated using DWDM Span Attenuation BoL and EoL values, which are taken from inventory servers (an example is given in Table 4). We define LNA in Table 4 as (3) In the HUBBLE platform, fiber attenuation from node A to node B ( AB ) in dB is calculated as Note that a similar definition can also be given for fiber attenuation from node B to node A ( BA ).
Then the fiber attenuation difference ( AB -BA ) from node A to node B ( AB ) and from node B to node A ( BA ) in dB is Fiber optic kilometric loss β f iber AB−BA in dB per km is and fiber expected attenuation in dB is Finally, fiber deviation from expected fiber attenuation in dB is calculated as In the next section, we demonstrate some of the example DWDM link analysis results that are generated using the above proposed methodology inside the HUBBLE platform.

Evaluation results
General nationwide statistics of the DWDM links for failure detection by the HUBBLE platform are given in Table 5. Figure 3 shows the optical fiber cable severity levels previously defined between nodes. In Figure 3, green lines represent the nonexistence of problems on the fiber optic cable, yellow lines represent that the severity of the link is a minor alarm, orange lines correspond to a major alarm, and red lines indicate a critical alarm. The alarm severity level of the optical fiber cable between A and B is shown in green in Figure 3.
Criteria for calculating the fiber optic alarm severity levels using fiber attenuation difference, fiber km loss, and fiber deviation from expected attenuation values from the system are listed in Table 1. Fiber optic alarm severity levels hold true if all of the subconditions in Table 1 hold true. For example, even if the fiber attenuation difference and fiber km loss are in the green region (no alarm case), if the fiber deviation from expected attenuation value is in the red region, we mark this fiber as critical. This is again true for the DWDM  alarm severity levels calculations defined in HUBBLE as given in Table 2. By joint utilization of Table 1 and Table 2, the quality of a fiber optical link can fall into the major (orange) category, whereas DWDM link quality can be in the minor (yellow) category. Therefore, it is up to telecommunication operator to decide if appropriate action will be taken. Note that based on the proposed methodology in some cases, even if the fiber quality is poor, DWDM link quality can in turn be in decent condition, which may require no specific action by the operator, hence saving both human and equipment resources.
A dashboard view of the HUBBLE platform demonstrating a real-time DWDM link status with measured and expected attenuation values is given in Figure 4. These examples demonstrate DWDM NE's attenuation values over the time range from November 5 2018 to February 5 in 2019. When the attenuation from A to B, as marked by the green line, becomes larger than the expected attenuation (marked by the blue line) as calculated with Table 1 and Table 2 on December 21 2018, a problem with the DWDM link is detected; hence a trouble ticket is requested from the fault management server to fix the issue in the link (the problem in this case is related to the fiber connection). Note that in cases when a problem with DWDM link failure is detected the fault can be solved either remotely (by adjusting power levels etc.) or on site when the fault cannot be solved remotely (e.g., problems related to board, port, attenuator, etc.). Values after the resolution of the problem are also demonstrated where the green line is below the expected attenuation value after the problem is fixed in Figure 4. Another view for the list of links with measured alarm levels by the HUBBLE platform is shown in Figure 5.

Challenges, benefits, and opportunities of HUBBLE building process for DWDM networks
Huge networks cause huge problems for network operators. Some of these include difficulties with the management of a huge network, difficulties in viewing the whole topology on a single plane, performance depredations due to fiber attenuation, difficulties in checking problems at all NE, standardization difficulties, and different know-hows with different vendors. Compared to existing works in the literature, the information contained in GIS inventory systems such as coordinates, the fiber distance, and the location of the nodes on the map were not integrated with the breakdown tool of the DWDM systems. This capability has been added to our proposed HUBBLE system. Consolidation of the data with different vendors under a single system provides a great benefit for service providers. This is true especially in border points, where systems belonging to different vendors are working mutually. In this case, fault management takes time because analysis must be performed by checking all NMS belonging to different vendors. Together with the HUBBLE system, this problem is eliminated and a single fault management screen is displayed under the same data format independent of the device model. One of the benefits with HUBBLE is that automation can be integrated into DWDM systems without the need for additional Capital Expenditure (CAPEX) required to update older equipment. Some of the challenges encountered and the potential solutions and the benefits obtained during the building of the HUBBLE platform are described in the rest of this section.

Data collection process:
The biggest challenge encountered when automating the existing DWDM systems was deciding how to collect the data from all heterogeneous NE. While newly produced equipment can support NETCONF, Simple Network Management Protocol (SNMP), and even OpenFlow protocols, these protocols are not supported by the currently utilized infrastructure and the devices are 4 to 5 years old. Collecting data directly from the devices and performing this process can frequently create a high load for the processor of the device. For this reason, we decided to obtain data via NMS to which the devices are connected for efficiency purposes. The NMS periodically collects data from the network infrastructure.

Different vendor characteristics:
Another challenge encountered when building the HUBBLE system was the different manufacturers' characteristics to store data in different data formats in the systems. For this reason, while the data from a manufacturer of the NMS can be in XML format, another vendor's system can be in CSV format, which needs to be transformed into the same format as used in NMS systems. For this reason, the NMS had to be developed further to be able to export data from different manufacturers into similar data format. However, this was costly and required time-consuming effort for telecommunication operators. Instead of relying on NMS, we collected data directly using SNMP and Transaction Language 1 (TL1) protocols from the devices. To make the collected data in different formats more meaningful, a data preprocessing stage is applied requiring further effort to submit the data to the HUBBLE database system in a single format.

Nonmatching inventory information:
Another problem that was encountered during the HUBBLE system building process was that the circuit/service numbers held on the NMS and GIS were not consistent.
Because all systems had their own coding methods, the problem of inventory information could not be matched with the correct circuit numbers appropriately. For this reason, manual and inventory information options that cannot be matched manually have been updated. With the integration of the HUBBLE system into the fault management system, an important opportunity for telecommunication operators has been created. Thanks to this integration, fault management and failure tickets are opened jointly for the devices of different manufacturers. In a normal operational process, a notification is sent to the fault management system via the NMS of each manufacturer. In this case, two separate tickets are needed. However, together with HUBBLE, these two tickets are merged in a single ticket carrying the malfunctioning system information. At the same time, it has also been possible to update Internet Protocol (IP) and Synchronous Digital Hierarchy (SDH) service information carried over DWDM systems while processing the circuit information on NMS using the inventory system.
In summary, after building the HUBBLE platform, we can collect data from the equipment/NMS of different vendors, view the status of all DWDM networks on a single web page, calculate the fiber optic cable's kilometers attenuation to obtain priority values, check NE's problem automatically via customized scripts, and integrate ticket creation and fiber map systems. Moreover, the HUBBLE platform provides a common user interface for equipment from different vendors that collects data from the DWDM networks. An observed operational benefit of HUBBLE for the service provider is that the telecommunication operators' fiber and transmission operation teams can now manage the optical network with high accuracy, whenever the fiber attenuation increases in some links of the network. Another observed benefit with the use of HUBBLE is preventive fault management in DWDM networks, which provides an easy way to share know-how about DWDM problems with the existing on-site technical teams. Our final observation was the indirect improvement in speech quality and data throughput, which have had great benefits for the service experience of telecommunication operators' end-users.

Conclusions and future work
In this paper, we introduced a new calculation methodology for joint fiber and DWDM link quality evaluation using the proposed HUBBLE platform and presented the working principles of the system. Our analysis results rely on the fact that both fiber quality and DWDM alarm severity levels need to be utilized to measure the severity of DWDM deployments and trigger the fault management server to open trouble tickets when undesired conditions occur. Through the dashboard of the HUBBLE platform, the telecommunication operator's network optimization experts can easily detect the problems related to either fiber or DWDM links. Later, appropriate actions can be taken into account in cooperation with NMS units thanks to integration with the network fault management systems of the operator. At the end of the paper, we describe the benefits, challenges, and opportunities of automation in DWDM networks where an implementation of automation for the management of optical networks that will carry high capacity data for the new generation wireless networks is given with more low-level details. Our future work involves extending the implementation of an automation tool for the current deployed DWDM networks into a root cause analysis platform that can relate to fiber cables problems. Additionally, we are planning to perform link quality analysis by integrating machine learning techniques into our proposed system.