โ˜ฐ โš blog ยป โš‘ this post

๐Ÿข Building taxonomy

โ€ข version: public โ€ข created: Nov 2024


Fig 1. Simplified building taxonomy of selected cities.
โ‡ฑ Click preview image to start animation in better quality


Introduction

Plot a simplified taxonomies of building across selected cities using PyData, QuackOSM and Metaflow.

#30DayMapChallenge | Day 3: Polygons | 2024 (see: "30DayMapChallenge" official Github repo โŽ‹)

Recently I saw a post by Daniel Gorokhov who illustrated building taxonomy of "interesting buildings" in Milano, Italy, Stockholm, Sweden and Amsterdam, Nethedlands. Here is my extension for that post, with some more cities, mostly in USA, but also in Europe.

Chart type is "small multiple" (a.k.a. "grid chart"). How was it made? I run a script that loaded all mapped buildings within city area and counted exterior vertices (nodes) for all of them. Still, I did not explicitly count inner polygons, like courtyards or patios. Then, I took one hundred of buildings with the most nodes to plot them.

  1. Yet, it happened that under current methodology, quite often, "interesting" buildings are repetitive because of construction nature (for example - neighborhoods of apartment buildings or high-rise housing projects; industrial structures, like water treatment ponds, fuel storages, dockside cranes)
  2. Besides, as was expected, some of "the actual most complex" building structures aren't mapped well, they will require more detailed and case-by-case processing to handle relationships, connections, and "multi-buildings" instead of such gallop-ish approach (for example, see Kaseya Center arena โŽ‹ and its' eastern covered walkaways and sub-buildings โŽ‹ in Miami, FL)

  • --
  • Made with #Python
  • Data: #OSM, #OpenStreetMap retrieved with QuackOSM โŽ‹.
  • #dataviz #datavisualization
  • Inspiration: "building taxonomy of Milano, Stockholm and Amsterdam" โŽ‹ by Daniel Gorokhov โŽ‹

    Futher works: "Visualizing a building shapes taxonomy - The City Summit ๐Ÿ™๏ธ๐Ÿ—ป project" โŽ‹ by Kamil Raczycki โŽ‹


    ๐Ÿ—๏ธ How-to: Make this product

    โ˜… Click here to see code snippets.

    Fig. 1 Processing graph (DAG).

    ๐Ÿ› ๏ธ Processing graph (DAG).
                <br>Made with Metaflow by Netflix.

    imports (a.k.a. dependencies)

    python
    																								
                              import metaflow
                              from metaflow import FlowSpec, Parameter, JSONType, step, NBRunner, pypi
    
                              import quackosm as qosm
    
                              import time
                              import datetime
    
                              import numpy as np
                              import geopandas as gpd
                              import matplotlib
                              import matplotlib.pyplot as plt
                            
    																							

    define auxilary functions (a.k.a. utils)

    python
    																									
                              def _func_count_vertices(row_index, row_values_series):
                                  """
                                      * https://gis.stackexchange.com/questions/328884/counting-number-of-vertices-in-geopandas
                                      * https://gis.stackexchange.com/questions/388606/counting-vertices-and-adding-it-as-number-to-column-using-shapely
                                  """
                                  geometry = row_values_series.geometry
                                  geom_type = geometry.geom_type
    
                                  n_vertices = 0
                                  #try:
                                  if geom_type == "Polygon":
                                      n_vertices = len(geometry.exterior.coords)
                                  elif geom_type == "MultiPolygon":
                                      for inner_geometry in list(geometry.geoms):
                                          n_vertices += len(inner_geometry.exterior.coords)
                                  else:
                                      None
                                      #print("- Other type of geometry: ", geom_type, row_index)
                                  #except:
                                      #print("- There was unexpected error with row: ", row_index)
    
                                  return n_vertices
                            
    																								

    define pipeline (a.k.a flow)

    python
    																										
                              class QuackOSM_Flow(FlowSpec):
                                  """
                                  A flow where Metaflow parses OSM, process raw data, and plot a chart.
                                  """
                                  osm_place_name = Parameter(name='osm_place_name', help='Name of place in OSM databse', default="Detroit, Michigan, USA")
                                  osm_tags_filter = Parameter(name='osm_tags_filter', help='OSMnx tags to extract geometries', default={'building': True}, type=JSONType)
    
                                  @step
                                  def start(self):
                                      """
                                      This is the 'start' step. All flows must have a step named 'start' that
                                      is the first step in the flow.
    
                                      """
                                      self.next(self.get_place_polygon)
    
                                  @step
                                  def get_place_polygon(self):
                                      """
                                      Get and load
                                      """
    
                                      print(f" - EXTRACT - polygon - {self.osm_place_name}")
                                      self.area_geometry = qosm.geocode_to_geometry(self.osm_place_name)
    
                                      self.next(self.get_features_from_place)
    
                                  @step
                                  def get_features_from_place(self):
                                      """
                                      Get and load
                                      """
    
                                      print(f" - EXTRACT - features - {self.osm_tags_filter}")
                                      self.features_df = qosm.convert_geometry_to_geodataframe(self.area_geometry,
                                                                                               tags_filter=self.osm_tags_filter,
                                                                                               verbosity_mode="verbose")
                                      print(f"* num of features: {len(self.features_df)}")
    
                                      self.next(self.preprocess_features)
    
                                  @step
                                  def preprocess_features(self):
                                      """
                                      Preprocessing
                                      """
    
                                      print(f" - PROCESSING - computation - `func_count_vertices`")
                                      vertices_counter_aux = []
                                      for row_index, row_values_series in self.features_df.iterrows():
                                          vertices_counter_aux.append(_func_count_vertices(row_index, row_values_series))
                                          #break
                                      filter_indexes = np.array(vertices_counter_aux).argsort()[-100:]
                                      self.top_100_interesting_buildings = self.features_df.iloc[filter_indexes]
    
                                      self.next(self.plot_chart)
    
    
                                  @step
                                  def plot_chart(self):
                                      """
                                      Plot artifact
                                      """
                                      import highlight_text
    
                                      print(f" - PLOT - plot chart")
                                      FONTSIZE = 16;
                                      N_OBJECTS = len(self.top_100_interesting_buildings)
                                      NCOLS = 10;
                                      NROWS = N_OBJECTS // NCOLS + 1 if N_OBJECTS % NCOLS else N_OBJECTS // NCOLS ;
                                      fig, axes = plt.subplots(nrows=NROWS, ncols=NCOLS, figsize=(22, NROWS * 2), dpi=150, facecolor='#FFFFF0')
                                      axes = axes.flatten()
    
                                      for indx, (row_index, row_values_series) in enumerate(self.top_100_interesting_buildings.iterrows()):
                                          geometry = row_values_series.geometry
                                          geom_type = geometry.geom_type
                                          if geom_type == "Polygon":
                                              axes[indx].plot(*geometry.exterior.xy, alpha=0.98, color="black")
                                          elif geom_type == "MultiPolygon":
                                              # https://gis.stackexchange.com/questions/354184/showing-only-the-external-boundary-of-a-geopandas-dataframe
                                              gds_to_plot = gpd.GeoSeries(geometry, crs=self.top_100_interesting_buildings.crs)
                                              gds_to_plot.boundary.plot(alpha=0.98, color="black", ax=axes[indx])
                                          else:
                                              print(row[0], geom_type)
    
                                          axes[indx].margins(x=0.00, y=0.00)
    
                                      for axis in axes:
                                          axis.axis("off")
    
                                      highlight_text.fig_text(fig=fig, s="
    																											",
                                                          x=0.5, y=1.00, fontsize=round(FONTSIZE * 5),
                                                          highlight_textprops=[{"fontweight": "bold"},],
                                                          ha='center', va="top")
                                      highlight_text.fig_text(fig=fig, s=f"<{', '.join(self.osm_place_name.split(', ')[0:2])}>",
                                                              x=0.5, y=0.935, fontsize=round(FONTSIZE * 3),
                                                              highlight_textprops=[{"fontweight": "bold", "color": "black", "alpha": 0.95,}],
                                                              ha='center', va="top")
                                      highlight_text.fig_text(fig=fig, s="Made with pure python by Witold1.github.io โ€ข Sources: OSM โ€ข Retrieved: Nov 2024 โ€ข #30DayMapChallenge, Day 3: Polygons",
                                                              x=0.5, y=0.07, fontsize=round(FONTSIZE),
                                                              color="black",
                                                              ha='center', va="top")
    
                                      fig.savefig(f"img/Building taxonomy of {self.osm_place_name}.svg", format='svg', bbox_inches='tight', dpi=150, pad_inches=1.5)
                                      fig.savefig(f"img/Building taxonomy of {self.osm_place_name}.png", format='png', bbox_inches='tight', dpi=150, pad_inches=1.5)
    
                                      self.next(self.end)
    
                                  @step
                                  def end(self):
                                      """
                                      This is the 'end' step. All flows must have an 'end' step, which is the
                                      last step in the flow.
    
                                      """
                                      pass
    
    																											
    																										

    run pipeline (in notebook cell โŽ‹)

    python
    																												
                              cities_list_test = ['Detroit, Michigan, USA', 'Miami, Florida', 'Seattle, Washington',
                                                 'Cleveland, Ohio', 'Pittsburgh, Pennsylvania', 'Indianapolis, Indiana, USA',
                                                 'Philadelphia, Pennsylvania, United States', 'Baltimore, Maryland',
                                                 'Paris, France', 'Moscow, Russia'
                                                ]
                              for place in cities_list_test:
                                 run = NBRunner(QuackOSM_Flow, pylint=False).nbrun(osm_place_name=place,
                                                                               osm_tags_filter={'building': True})
                            
    																											


    Media presence:
    LinkedIn post โŽ‹ โ€ข Kaggle icon kaggle notebook โŽ‹ โ€ข imgur storage โŽ‹ โ€ข "Data Visualization Picks / Nov 17, 2024" by BI Bites ๐Ÿช โŽ‹ weekly news of dataviz and BI

    Table of Content:
    Introduction โ€ข


    โ˜… How to cite this work

    Visualization "Building taxonomy" (2024), Vitaliy Y from Witold's Data Consulting
        https://witold1.github.io/blog/posts/small-project-building-taxonomy/post

    Back to top โ‡ช