Naming, Not Shaming

Before this year’s 30-Day Map Challenge kicked off, I saw a post from someone cautioning creators to be a bit more sensitive to others when it comes to portraying “bad” maps. Folks just starting out their cartographic journeys are going to make some truly awful maps. I know I did (and, still do!).

While I do think one absolutely needs to develop a thick skin to survive/thrive in this fairly horrible timeine, I also think there’s nothing wrong with caring enough about your fellow humans to think twice before tossing something out into the world.

Some super funny and nowhere near accidentally shaming “bad” maps have been created, such as displaying a perfectly fine map titled “All Disneyland Locations In 1230 B.C.” and then having a just a world map with no points on it. I can totally get behind those cartographic dad jokes.

Plus, how you interpret “bad” for these challenges is totally up to you.

This year, I went with a tried and true, well-worned “places in the U.S. that have names starting with ‘bad’”.

Tech Used

  • DuckDB
  • ObservableJS in Quarto
    • Observable Plot
    • Observable Inputs
    • Arquero

What’s In a [Geo]Name?

That which we call a map by any other name would look as sweet.

With the theme picked, I needed names. So off to Geonames we go to retrieve the US.zip (direct d/l) file.

These main geo-files have a basic format:

  • geonameid: integer id of record in geonames database
  • name: name of geographical point (utf8) varchar(200)
  • asciiname: name of geographical point in plain ascii characters, varchar(200)
  • alternatenames: alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
  • latitude: latitude in decimal degrees (wgs84)
  • longitude: longitude in decimal degrees (wgs84)
  • feature class: see http://www.geonames.org/export/codes.html, char(1)
  • feature code: see http://www.geonames.org/export/codes.html, varchar(10)
  • country code: ISO-3166 2-letter country code, 2 characters
  • cc2: alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
  • admin1 code: fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
  • admin2 code: code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)
  • admin3 code: code for third level administrative division, varchar(20)
  • admin4 code: code for fourth level administrative division, varchar(20)
  • population: bigint (8 byte int)
  • elevation: in meters, integer
  • dem: digital elevation model, srtm3 or gtopo30, average elevation of 3’‘x3’’ (ca 90mx90m) or 30’‘x30’’ (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
  • timezone: the iana timezone id (see file timeZone.txt) varchar(40)
  • modification date: date of last modification in yyyy-MM-dd format

There are lots of ways to turn that into some form of shapefile we can directly use vs. rely on the janky Plot.dot hack I did on Day 1. I decided to play with DuckDB’s spatial extension since I don’t get to do that too often.

I first used a helper script to load TSV data into DuckDB. I’m used to working with giant files and this script (below) pre-builds a schema with a first thousand row sample of a TSV file, then loads the whole thing into DuckDB.

#!/bin/bash
# TSV to DuckDB
# TODO make `sep` an input parameter
# usage: ./tsv2duckdb.sh bigdata.tsv duckdbfilename tablename

set -e

TSV_FILE="${1}"
DUCKDB_FILENAME="${2}"
TABLE_NAME="${3}"

TMPDIR=$(mktemp -d)
TMPTSV="${TMPDIR}/TSV.tsv"

head -n 1001 "${TSV_FILE}" > "${TMPTSV}"

duckdb "${DUCKDB_FILENAME}" <<EOF
.mode csv
.sep '\t'
CREATE TABLE ${TABLE_NAME} AS 
  SELECT * FROM read_csv_auto('${TMPTSV}', HEADER=TRUE);
EOF
rm "${TMPTSV}"

duckdb "${DUCKDB_FILENAME}" <<EOF
.mode csv
.sep '\t'
DELETE FROM ${TABLE_NAME}; 
INSERT INTO ${TABLE_NAME} 
  SELECT * FROM read_csv_auto('${TSV_FILE}', HEADER=TRUE);
EOF

duckdb "${DUCKDB_FILENAME}" <<EOF
SELECT 
  COUNT(*) aS row_count
FROM ${TABLE_NAME};
.quit
EOF

The Geonames files are small enough that I could have just queried the TSV directly, but now I have U.S. geonames in a super handy format.

We can export a GeoJSON from DuckDB with all the “‘bad’” locations with a tiny bit of SQL:

INSTALL spatial;
LOAD spatial;

COPY (
  SELECT
    ST_Point(
      longitude,
      latitude
    ) AS geometry,
    asciiname AS place,
    admin1_code AS state,
    geonameid
  FROM
    geonames.names
  WHERE
    country_code = 'US' AND
    starts_with(lower(asciiname), 'bad')
)
TO 'bad-us-places.json'
WITH (FORMAT GDAL, driver 'GeoJSON', layer_creation_options 'WRITE_BBOX=YES');

The rest of the code is in the hidden OJS block.

Code
Plot = require('@observablehq/plot')
Inputs = require('@observablehq/inputs')

Plot.plot({
  className: "bad",
  projection: "albers-usa",
  title: "'Bad'dest Places In The 🇺🇲",
  subtitle: "Use the search bar/table to find and select 'Bad' places to highlight them.",
  caption: md`Data source: [Geonames](https://www.geonames.org/advanced-search.html?q=bad&country=US&continentCode=NA&startRow=150)`,
  width: width,
  style: {
        backgroundColor: "transparent",
    },
  color: {
    scheme: "reds"
  },
  marks: [
    Plot.geo(states, {
      fill: (d) => badCounts[d.id],
      stroke: "#142933",
      strokeWidth: 0.33
    }),
    Plot.geo(selectedPlaces, {
      fill: "yellow",
      strokeWidth: 1,
      stroke: "black"
    }),
    Plot.geo(nation, {
      stroke: "#999",
      strokeWidth: 0.1
    })
  ]
})
Code
viewof badsearch = Inputs.search(badTblData, { label: "Place lookup:" })

viewof badSel = Inputs.table(badsearch, {
  columns: ["place", "state", "geonameid"],
  header: {
    place: "Place",
    state: "State",
    geonameid: "Geoname"
  },
  width: {
    place: 300,
    state: 40,
    geonameid: 60
  },
  maxWidth: 600,
  format: {
    geonameid: (d) => html`<a href="https://www.geonames.org/${d}">${d}</a>`
  }
})

stateTrans = JSON.parse(
  '{"AL":"01","AK":"02","AZ":"04","AR":"05","CA":"06","CO":"08","CT":"09","DE":"10","DC":"11","FL":"12","GA":"13","HI":"15","ID":"16","IL":"17","IN":"18","IA":"19","KS":"20","KY":"21","LA":"22","ME":"23","MD":"24","MA":"25","MI":"26","MN":"27","MS":"28","MO":"29","MT":"30","NE":"31","NV":"32","NH":"33","NJ":"34","NM":"35","NY":"36","NC":"37","ND":"38","OH":"39","OK":"40","OR":"41","PA":"42","RI":"44","SC":"45","SD":"46","TN":"47","TX":"48","UT":"49","VT":"50","VA":"51","WA":"53","WV":"54","WI":"55","WY":"56"}'
)

fipsTrans = JSON.parse(
  '{"01":"Alabama","02":"Alaska","04":"Arizona","05":"Arkansas","06":"California","08":"Colorado","09":"Connecticut","10":"Delaware","11":"District of Columbia","12":"Florida","13":"Georgia","15":"Hawaii","16":"Idaho","17":"Illinois","18":"Indiana","19":"Iowa","20":"Kansas","21":"Kentucky","22":"Louisiana","23":"Maine","24":"Maryland","25":"Massachusetts","26":"Michigan","27":"Minnesota","28":"Mississippi","29":"Missouri","30":"Montana","31":"Nebraska","32":"Nevada","33":"New Hampshire","34":"New Jersey","35":"New Mexico","36":"New York","37":"North Carolina","38":"North Dakota","39":"Ohio","40":"Oklahoma","41":"Oregon","42":"Pennsylvania","44":"Rhode Island","45":"South Carolina","46":"South Dakota","47":"Tennessee","48":"Texas","49":"Utah","50":"Vermont","51":"Virginia","53":"Washington","54":"West Virginia","55":"Wisconsin","56":"Wyoming"}'
)

badCounts = aq
  .from(
    badUSPlaces.features.map((d) => ({
      id: stateTrans[d.properties.state]
    }))
  )
  .groupby("id")
  .rollup({
    ct: (d) => aq.op.count()
  })
  .ungroup()
  .objects()
  .reduce((bc, d) => {
    bc[d.id] = d.ct;
    return bc;
  }, {})

selGeo = badSel.map((d) => d.geonameid)

badTblData = badUSPlaces.features.map((d) => ({
  place: d.properties.place,
  state: d.properties.state,
  geonameid: d.properties.geonameid
}))

selectedPlaces = ({
  type: "FeatureCollection",
  name: badUSPlaces.name,
  bbox: badUSPlaces.bbox,
  features: badUSPlaces.features.filter((d) =>
    selGeo.includes(d.properties.geonameid)
  )
})

badUSPlaces = FileAttachment("static/data/bad-us-places.json").json()
nation = topojson.feature(us, us.objects.nation)
states = topojson.feature(us, us.objects.states).features
statemesh = topojson.mesh(us, us.objects.states)
us = FileAttachment("static/data/us-counties-10m.json").json()

FIN

As an aside, I learned that I totally can import the latest Plot/Inputs/etc. from CDN in-Qmd. That makes it possible to do everything in this Qmd vs embed an Observable notebook like I did for Day 3.