A RULE-BASED APPROACH FOR GENERATING URBAN FOOTPRINT MAPS: FROM ROAD NETWORK TO URBAN FOOTPRINT

: Decision and policymakers need urban footprint data for monitoring human impact on the urban ecosystem for politics and services. Deriving urban footprint is a challenging work since it has rapidly changing borders. The existing methods for deriving urban footprint map based on raster images have several steps such as determination of indicators and parameters of image classification. These steps limit the process by an operator since they require human decisions. This paper proposes a new rule-based approach for obtaining urban footprint based on Delaunay triangulation among selected centroids of roads and dead-end streets. The selection criterion is determined as maximum road length by using standard deviation operator. To produce urban footprints, this method needs no other data or information apart from road network geometry. This means that the proposed method uses only intrinsic indicators and measures. The experimental study was conducted with OpenStreetMap road data of Washington DC, Madrid, Stockholm, and Wellington. The comparisons with authority data prove that the proposed method is sufficient in many parts of urban and suburban lands.


INTRODUCTION
Natural and artificial entities on Earth surface differ from each other with regards to living area of human. While natural entities are composed of any habitat such as forest, mountain, and sea, an artificial construction maybe a tribal temple in a village or a skyscraper, highway, train station in a megacity. However, the differences between natural and artificial entities are not as distinct since there are several semi-natural and semiartificial area in urbanised/sub-urbanised lands such as a park, open space, zoo, dam or farming land. For these purposes,land use/cover processes classify the Earth surface into its urban, suburban, green open place, forest, dam and/or sea patterns. A land-use map, thematically proposed to represent the urbanised area, is called "urban footprint map". Urban footprint represents the border of human impact on the land surface. Determination of urban footprints in a specified land surface is a challenging issue since human activity varies from villages to metropole cities. Some measurements used for producing spatially significant urban footprint map are common with urban sprawl measurements. Many methods in the literature use remote sensing imagery as the major data source for analysing and predicting urban growth, with several classifications and indicators (Musa et al. 2017;Karakuş et al. 2017;Canaz Sevgen, 2019). Tsai (2005) developed a set of quantitative variables (i.e. metropolitan size, density, degree of equal distribution and clustering) to characterise urban forms at the municipal level. Angel et al. (2007) Bhatta et al. (2010) described spatial metrics used to quantify the urban sprawl with some examples such as area-densityedge metrics, shape metrics, isolation-proximity metrics, contrast metrics, contagion-interspersion metrics, connectivity metrics, and diversity metrics. They claimed that the measurement of sprawl from remote sensing data is still in its research domain. Jiang et al. (2007) measured urban sprawl from the spatial configuration, urban growth efficiency and external impacts. They developed a geo-spatial indices system for measuring sprawl, a total of 13 indicators. For this study, they used any different data sources including land use maps, former land use planning, land price and floor-area-ratio samples, digitised map of the highways and city centres, population and statistical data. Triantakonstantis and Stathakis (2015) used spatial indicators to calculate urban morphological properties such as shape, aggregation, compactness and dispersion. They applied the indicators to the urban areas in order to measure urban sprawl in 24 European countries. Esch et al. (2013) presented a fully automated processing system for the delineation of human settlements worldwide based on the synthetic aperture radar (TanDEM-X). They assessed the high potential of the TanDEM-X data and the proposed urban footprint processor to provide highly accurate geo-data for improved global mapping of human settlements.
Entropy method is one of the widely used techniques to measure the extent of urban sprawl with the integration of remote sensing and geographic information system (GIS) (Kumar et al. 2007;Bhatta et al. 2010). Kumar et al. (2007) used buffer zones with Shannon entropy to determine the spatial concentration or dispersion of builtup land. They integrated the observations with the road network to check the influence of infrastructure on haphazard urban growth.
Road networks were also used to generate urban maps. Owen and Wong (2013) used road networks, shape, terrain geomorphology, texture and dominant settlement materials (vegetation, soil, asphalt) to distinguish informal neighbourhoods from formal ones in developing countries. They used dangling nodes to determine the ratio of connected nodes. A high value in the ratio implies that roads are better connected, and multiple routes between endpoints exist. Liu and Jiang (2011) stated that the dangling lines inside a block define the field as low residential density. They claim that the longer the dangling roads, the more sprawling the block.
Triangulated Irregular Network (TIN) was used in many scientific works as part of cartography; contour generation, generalisation, surface models and spatial analysis (Gökgöz, 2005;Yang et al. 2005;Kang et al. 2015). Semboloni (2000) proposed a growth model based on Delaunay triangles and road network. The model based on cellular automata operated within a lattice by using the essential elements of cells and roads which differ in size and form and the dynamic system functions by changing the state of the cells and generating new cells and roads. This paper proposes a method for obtaining urban footprint based on the characteristics of road networks. The following section, firstly, presents study area and data. Then, the rule-based method is explained step by step. In section 3, the proposed method is implemented in four capital cities around the world (Washington, Stockholm, Madrid and Wellington), and the results are given. Section 4 evaluates the results of the study by comparing with authority data. Finally, the last section discusses the advantages and disadvantages of the method.

Study Area and Data
This study was conducted using OpenStreetMap (OSM) road data of Washington DC, Madrid, Stockholm and Wellington (Fig. 1). The study area was determined as capital cities in different places of the world to reveal the efficiency of the proposed method in different metropole cities. They have the urban and suburban areas, rural lands, coasts, open spaces and forests.
Volunteers from all around the world contribute OSM road data. Some studies were conducted to determine the behaviour of the contributors and the quality of OSM data (Neis and Zipf, 2012;Koukoletsos et al. 2012;Corcoran et al. 2013;Zhao et al. 2015;Hacar et al. 2018). They show that the accuracy of the road data is sufficient for most of the GIS applications.
The authority of land use data is also used to evaluate the accuracy of the study. OSM road data and authority data were projected into the specific coordinate systems for each study area to apply the same measurement units (Table 1).
Some statistics in each study area requires to be compared and explained before the experiment. While Fig. 1 can give the observable information that infers the most urbanised patterns per city area in both Washington DC and Stockholm, Table 2 gives the same information quantitatively. In Table 2, the road density (length per area) is the highest in Washington DC and Stockholm since length per area is maximum, so it is expected that the cities have less open places. The density is relatively middle or high in Madrid, so it may have more open places or forests. However, Wellington is expected to have many open place or forests since its density is very low. Besides, the measure of length per road may give some information about the road characteristics. For instance, if a city has a lower value, this means the city may have more residential area since the urban lands have shorted roads than rural lands. In this assumption, none of the cities has a major difference from others. Moreover, Table 2

The Proposed Method: from Road Network to Urban Footprint
Transportation is essential in any dynamic human land (Polat et al. 2017). Urban and suburban areas tend to have relatively short road lines (streets) since residential roads relatively have shorter length. The proposed method consists of several geometric indicators of road lines such as lengths of road and triangle edge and area of polygons. It starts with selecting road lines shorter than a threshold. The threshold aims to select the residential roads from all roads to be used in the rest of the process. Standard deviation (σ) of road lengths was pragmatically used as the threshold since it is represented by the same unit with lengths and emphasize the root square of the variations of lengths. The method uses the TIN model generated from centroids and dead-ends of selected roads to depict the urban land. The fusion of the adjacent triangles generates the urban polygons (Fig. 2).  The proposed method is referred hereafter as "RNUF" (initial letters of Road Network to Urban Footprint). A series of geometric rules with threshold values makes RNUF an easy-to-use method. Accordingly, the steps 1 -8 were explained with the following definitions below: Step 1: RNUF starts with eliminating the topological inconsistencies of the input road network (preprocessing). After σ threshold calculation with the length of raw OSM roads, RNUF conducts a pre-process that eliminate the duplication of roads and generate topologically structured road networks by merging continuous roads and splitting them at intersections.

<
(1) represents the lengths of the road lines. In Fig. 3(a), red road lines represent the lines that are longer than σ.
Step 3: Dangling points are used as parameters for determining urban or suburban areas (Owen and Wong, 2013;Liu and Jiang, 2011). In this study, dangling points are generated on the road lines that have dead-ends. Also, centroids are generated from the rest of the road lines using the formula below.
where and are the centroid coordinates of a line, is the length of segment i of the line, and are the midpoint coordinates of the segment i, w is the total number of segments, and S is the total length of the line.
Step 4: Delaunay triangulation is generated using dangling points and centroids. In this stage, optionally, a city boundary file can be used to get a better result by constraining the TIN.
Step 5: TIN model is updated by deleting the triangles that have at least one edge longer than σ (Fig. 3(b)). This criterion helps to eliminate big triangles represent nonurban areas among far neighbourhoods and the triangles connecting them to the far surrounding districts. Below Eq. (3), represents the lengths of the triangle edges.

Figure 3. Dangling points (red) and centroids (blue), (a) valid road lines (black) and not valid road lines (red), (b) valid triangles (yellow) and not valid triangles (light red).
Step 6: All adjacent triangles are merged, and their common edges are fused. Unique triangles and merged triangles represent the premature footprints ( Fig. 4(a)). Many small polygons and holes are generated during this step.
Step 7: The polygons and holes have a smaller area than the threshold σ 2 are deleted.
where is the polygon area size and ℎ is the hole size inside any polygon. In Fig. 4, the south-east polygon is eliminated due to the area size.
Step 8: With a variety of observations on road networks and suburban lands, it was seen that even small neighbourhoods have at least three streets that have deadends. This criterion is the last rule of RNUF: where is the number of dangling points in a polygon. The polygons have at least three dangling points are retained to generate urban footprint (Fig. 4(b)), others are eliminated. The rule-based approach retains the polygons representing the urban and suburban footprint.

EXPERIMENT AND RESULTS
The proposed method has been tested with OSM data from the capital cities; Washington DC, Madrid, Stockholm, Wellington. Since the road networks in the cities have their characteristics in accordance with both geometric and topological properties, σ had different values (Table 3). Besides, road data had topological inconsistencies such as continuous lines at junctions since its raw structure consists of node and way geometries with their relations. RNUF automatically dealt with such situations by splitting lines at connections. As mentioned in section 2.2, OSM road data are topologically structured by RNUF. Therefore, the statistics for the study areas were computed by using topologically structured roads. The results for each study area are shown in Table 4. RNUF selected the roads more than 95% considering the σ values. These values were also used for the selection of significant triangles. If this criterion was not applied, the small settlements (Fig. 5) would have many triangles connecting them to the far surrounding settlements. As shown in Fig. 5, red zones represent the significant triangles that have all edges shorter than related σ. After all the steps were conducted, RNUF generated urban footprints of study areas, as shown in Fig. 6.    Vol;5, Issue;2, pp. 100-108, Month, 2020,

EVALUATION OF THE RESULTS
The urban footprint map (UF) generated by RNUF were evaluated by using the authority data (AD) as the reference to determine the accuracy in each study area (Fig. 7). Determination of the accuracy by using the overlapped areas between AD and UF ( ) is essential (Eq. (6)), but not sufficient if UF covers a very large area containing and exceeding whole reference data. For instance, the accuracy is the highest (94%) in Wellington, however, UF covered an area, approximately two times larger than the area of AD (Table 5 and 6). This means that half of the UF did not overlap with AD. This case shows that the completeness of UF is also an essential measure as its accuracy. To determine the completeness, Eq. (7) might be used. F-measure can be used to quantify the balance between accuracy and completeness (Samal et al. 2004;Song et al. 2011;Akbulut et al. 2018;Hacar and Gökgöz, 2019).  (2019)).
International Journal of Engineering and Geosciences (IJEG), Vol;5, Issue;2, pp. 100-108, Month, 2020,   FUF values are close to FAD in study areas except for Wellington (Table 5) since Wellington had the highest tolerances for both linear and polygonal thresholds, σ and σ 2 values, respectively. Besides, as seen in Fig. 8, in Madrid and Wellington, the FAD and FUF is very smaller than the areas of their city boundaries (FCA). As a result, they have more rural land proportions than in other cities. In contrast, it can also be inferred that Washington and Stockholm have more urbanised land proportions. These two cities also had similar accuracy, completeness, and Fmeasure in Washington and Stockholm (Table 6) (Fig. 8). Furthermore, the accuracy and completeness had close values in Madrid, but in Wellington, they were very different. The completeness (69%) in Wellington is very low since half of the UF did not overlap with AD (Table  5). Dangling points are generally located in the cities of developing countries, sub-urban lands, and where height differences are high in metropole cities of developed countries. Since the study data is from the metropole cities, the density of the dangling points might show how the urban settlements are formed. Wellington had the most density of the dangling points (dangling points per roads) (Table 4). In other words, it has the most number of dead-ends in its urban lands. Therefore, it has more sub-urban lands and/or the height differences are very high. The dead-end statistics and the accuracy measures show that there is a significant relationship between the landforms/proportions and confidence of RNUF.

CONCLUSION
Producing urban footprint maps by RNUF is an easy and efficient way for small and middle scale maps represent such an area of a district, city or state. It is not recommended for large scale maps since its accuracy is dependent on accuracy, date and scale of input network data. If input data is not up-to-date, the footprint may not represent new small settlements. However, this situation may only affect the width of the related local neighbourhood in large-scaled studies, not the global result in middle and small-scaled studies. In this study, the experiment was depended on the completeness of OSM road data. One can conduct a completeness study for the source data to obtain current urban footprints since the higher completeness the data has, more up-to-date footprints can be generated. However, this paper does not have such aims that choose the more consistent data sources. This study remarks the possible usage of the road networks for extracting urban lands.
Conventional ways of generating urban footprint maps require many data (especially raster data) and indicators; instead, RNUF works independently of the aerial views of landscapes and surface models. It works with only the road network as the input data, and the used parameters are calculated from the network by easy-touse GIS methods. RNUF has defined parameters and data-dependent thresholds like the minimum number of dangling nodes and standard deviation value, respectively. Users take advantages of these intrinsic values without trying to search complex indicators of urban sprawl like population, density, vegetation ratio or geomorphology. They may also integrate these indicators as additional rules with their thresholds into RNUF.
RNUF conducted the four instances satisfactorily with almost over 80% F-measure in each case. It can be said that RNUF can be used as an alternative for the methods that use remote sensing data. The experiments also show that landforms/proportions may decrease the confidence of RNUF.
Some future works are required to determine the thresholds in accordance with the exact scale of the resulting map and also the confidence of RNUF. An additional study may also be conducted for the last step of RNUF to measure how many dead-end streets a neighbourhood covers with regards to the urban patterns.