Day 4: A Bad Map

Naming, Not Shaming

Before this year’s 30-Day Map Challenge kicked off, I saw a post from someone cautioning creators to be a bit more sensitive to others when it comes to portraying “bad” maps. Folks just starting out their cartographic journeys are going to make some truly awful maps. I know I did (and, still do!).

While I do think one absolutely needs to develop a thick skin to survive/thrive in this fairly horrible timeine, I also think there’s nothing wrong with caring enough about your fellow humans to think twice before tossing something out into the world.

Some super funny and nowhere near accidentally shaming “bad” maps have been created, such as displaying a perfectly fine map titled “All Disneyland Locations In 1230 B.C.” and then having a just a world map with no points on it. I can totally get behind those cartographic dad jokes.

Plus, how you interpret “bad” for these challenges is totally up to you.

This year, I went with a tried and true, well-worned “places in the U.S. that have names starting with ‘bad’”.

Tech Used

DuckDB
ObservableJS in Quarto
- Observable Plot
- Observable Inputs
- Arquero

What’s In a [Geo]Name?

That which we call a map by any other name would look as sweet.

With the theme picked, I needed names. So off to Geonames we go to retrieve the US.zip (direct d/l) file.

These main geo-files have a basic format:

geonameid: integer id of record in geonames database
name: name of geographical point (utf8) varchar(200)
asciiname: name of geographical point in plain ascii characters, varchar(200)
alternatenames: alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
latitude: latitude in decimal degrees (wgs84)
longitude: longitude in decimal degrees (wgs84)
feature class: see http://www.geonames.org/export/codes.html, char(1)
feature code: see http://www.geonames.org/export/codes.html, varchar(10)
country code: ISO-3166 2-letter country code, 2 characters
cc2: alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
admin1 code: fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code: code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)
admin3 code: code for third level administrative division, varchar(20)
admin4 code: code for fourth level administrative division, varchar(20)
population: bigint (8 byte int)
elevation: in meters, integer
dem: digital elevation model, srtm3 or gtopo30, average elevation of 3’‘x3’’ (ca 90mx90m) or 30’‘x30’’ (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone: the iana timezone id (see file timeZone.txt) varchar(40)
modification date: date of last modification in yyyy-MM-dd format

There are lots of ways to turn that into some form of shapefile we can directly use vs. rely on the janky Plot.dot hack I did on Day 1. I decided to play with DuckDB’s spatial extension since I don’t get to do that too often.

I first used a helper script to load TSV data into DuckDB. I’m used to working with giant files and this script (below) pre-builds a schema with a first thousand row sample of a TSV file, then loads the whole thing into DuckDB.

#!/bin/bash
# TSV to DuckDB
# TODO make `sep` an input parameter
# usage: ./tsv2duckdb.sh bigdata.tsv duckdbfilename tablename

set -e

TSV_FILE="${1}"
DUCKDB_FILENAME="${2}"
TABLE_NAME="${3}"

TMPDIR=$(mktemp -d)
TMPTSV="${TMPDIR}/TSV.tsv"

head -n 1001 "${TSV_FILE}" > "${TMPTSV}"

duckdb "${DUCKDB_FILENAME}" <<EOF
.mode csv
.sep '\t'
CREATE TABLE ${TABLE_NAME} AS 
  SELECT * FROM read_csv_auto('${TMPTSV}', HEADER=TRUE);
EOF
rm "${TMPTSV}"

duckdb "${DUCKDB_FILENAME}" <<EOF
.mode csv
.sep '\t'
DELETE FROM ${TABLE_NAME}; 
INSERT INTO ${TABLE_NAME} 
  SELECT * FROM read_csv_auto('${TSV_FILE}', HEADER=TRUE);
EOF

duckdb "${DUCKDB_FILENAME}" <<EOF
SELECT 
  COUNT(*) aS row_count
FROM ${TABLE_NAME};
.quit
EOF

The Geonames files are small enough that I could have just queried the TSV directly, but now I have U.S. geonames in a super handy format.

We can export a GeoJSON from DuckDB with all the “‘bad’” locations with a tiny bit of SQL:

INSTALL spatial;
LOAD spatial;

COPY (
  SELECT
    ST_Point(
      longitude,
      latitude
    ) AS geometry,
    asciiname AS place,
    admin1_code AS state,
    geonameid
  FROM
    geonames.names
  WHERE
    country_code = 'US' AND
    starts_with(lower(asciiname), 'bad')
)
TO 'bad-us-places.json'
WITH (FORMAT GDAL, driver 'GeoJSON', layer_creation_options 'WRITE_BBOX=YES');

The rest of the code is in the hidden OJS block.

Code

Plot = require('@observablehq/plot')
Inputs = require('@observablehq/inputs')

Plot.plot({
  className: "bad",
  projection: "albers-usa",
  title: "'Bad'dest Places In The 🇺🇲",
  subtitle: "Use the search bar/table to find and select 'Bad' places to highlight them.",
  caption: md`Data source: [Geonames](https://www.geonames.org/advanced-search.html?q=bad&country=US&continentCode=NA&startRow=150)`,
  width: width,
  style: {
        backgroundColor: "transparent",
    },
  color: {
    scheme: "reds"
  },
  marks: [
    Plot.geo(states, {
      fill: (d) => badCounts[d.id],
      stroke: "#142933",
      strokeWidth: 0.33
    }),
    Plot.geo(selectedPlaces, {
      fill: "yellow",
      strokeWidth: 1,
      stroke: "black"
    }),
    Plot.geo(nation, {
      stroke: "#999",
      strokeWidth: 0.1
    })
  ]
})

Code

viewof badsearch = Inputs.search(badTblData, { label: "Place lookup:" })

viewof badSel = Inputs.table(badsearch, {
  columns: ["place", "state", "geonameid"],
  header: {
    place: "Place",
    state: "State",
    geonameid: "Geoname"
  },
  width: {
    place: 300,
    state: 40,
    geonameid: 60
  },
  maxWidth: 600,
  format: {
    geonameid: (d) => html`<a href="https://www.geonames.org/${d}">${d}</a>`
  }
})

stateTrans = JSON.parse(
  '{"AL":"01","AK":"02","AZ":"04","AR":"05","CA":"06","CO":"08","CT":"09","DE":"10","DC":"11","FL":"12","GA":"13","HI":"15","ID":"16","IL":"17","IN":"18","IA":"19","KS":"20","KY":"21","LA":"22","ME":"23","MD":"24","MA":"25","MI":"26","MN":"27","MS":"28","MO":"29","MT":"30","NE":"31","NV":"32","NH":"33","NJ":"34","NM":"35","NY":"36","NC":"37","ND":"38","OH":"39","OK":"40","OR":"41","PA":"42","RI":"44","SC":"45","SD":"46","TN":"47","TX":"48","UT":"49","VT":"50","VA":"51","WA":"53","WV":"54","WI":"55","WY":"56"}'
)

fipsTrans = JSON.parse(
  '{"01":"Alabama","02":"Alaska","04":"Arizona","05":"Arkansas","06":"California","08":"Colorado","09":"Connecticut","10":"Delaware","11":"District of Columbia","12":"Florida","13":"Georgia","15":"Hawaii","16":"Idaho","17":"Illinois","18":"Indiana","19":"Iowa","20":"Kansas","21":"Kentucky","22":"Louisiana","23":"Maine","24":"Maryland","25":"Massachusetts","26":"Michigan","27":"Minnesota","28":"Mississippi","29":"Missouri","30":"Montana","31":"Nebraska","32":"Nevada","33":"New Hampshire","34":"New Jersey","35":"New Mexico","36":"New York","37":"North Carolina","38":"North Dakota","39":"Ohio","40":"Oklahoma","41":"Oregon","42":"Pennsylvania","44":"Rhode Island","45":"South Carolina","46":"South Dakota","47":"Tennessee","48":"Texas","49":"Utah","50":"Vermont","51":"Virginia","53":"Washington","54":"West Virginia","55":"Wisconsin","56":"Wyoming"}'
)

badCounts = aq
  .from(
    badUSPlaces.features.map((d) => ({
      id: stateTrans[d.properties.state]
    }))
  )
  .groupby("id")
  .rollup({
    ct: (d) => aq.op.count()
  })
  .ungroup()
  .objects()
  .reduce((bc, d) => {
    bc[d.id] = d.ct;
    return bc;
  }, {})

selGeo = badSel.map((d) => d.geonameid)

badTblData = badUSPlaces.features.map((d) => ({
  place: d.properties.place,
  state: d.properties.state,
  geonameid: d.properties.geonameid
}))

selectedPlaces = ({
  type: "FeatureCollection",
  name: badUSPlaces.name,
  bbox: badUSPlaces.bbox,
  features: badUSPlaces.features.filter((d) =>
    selGeo.includes(d.properties.geonameid)
  )
})

badUSPlaces = FileAttachment("static/data/bad-us-places.json").json()
nation = topojson.feature(us, us.objects.nation)
states = topojson.feature(us, us.objects.states).features
statemesh = topojson.mesh(us, us.objects.states)
us = FileAttachment("static/data/us-counties-10m.json").json()

FIN

As an aside, I learned that I totally can import the latest Plot/Inputs/etc. from CDN in-Qmd. That makes it possible to do everything in this Qmd vs embed an Observable notebook like I did for Day 3.

# Day 4: A Bad Map {.unnumbered} <style> svg.bad g:nth-child(2) { filter: drop-shadow(2px 3px 2px #ffffff33); } td, h2 { border: none; border-bottom: none !important; } figure h2 { padding-bottom: 0px; line-height: 1em; } figure h3 { margin-top: 0px; } </style> ## Naming, Not Shaming Before this year's 30-Day Map Challenge kicked off, I saw a post from someone cautioning creators to be a bit more sensitive to others when it comes to portraying "bad" maps. Folks just starting out their cartographic journeys are going to make some truly awful maps. I know I did (and, still do!). While I do think one absolutely needs to develop a thick skin to survive/thrive in this fairly horrible timeine, I also think there's nothing wrong with caring enough about your fellow humans to think twice before tossing something out into the world. Some super funny and nowhere near accidentally shaming "bad" maps have been created, such as displaying a perfectly fine map titled "All Disneyland Locations In 1230 B.C." and then having a just a world map with no points on it. I can totally get behind those cartographic dad jokes. Plus, how you interpret "bad" for these challenges is totally up to you. This year, I went with a tried and true, well-worned "places in the U.S. that have names starting with 'bad'". ## Tech Used - DuckDB - ObservableJS in Quarto - Observable Plot - Observable Inputs - Arquero ## What's In a [Geo]Name? _That which we call a map by any other name would look as sweet._ With the theme picked, I needed names. So [off to Geonames](https://download.geonames.org/export/dump/) we go to retrieve the [US.zip](https://download.geonames.org/export/dump/US.zip) (direct d/l) file. These main geo-files have a basic format: - `geonameid`: integer id of record in geonames database - `name`: name of geographical point (utf8) varchar(200) - `asciiname`: name of geographical point in plain ascii characters, varchar(200) - `alternatenames`: alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000) - `latitude`: latitude in decimal degrees (wgs84) - `longitude`: longitude in decimal degrees (wgs84) - `feature class`: see http://www.geonames.org/export/codes.html, char(1) - `feature code`: see http://www.geonames.org/export/codes.html, varchar(10) - `country code`: ISO-3166 2-letter country code, 2 characters - `cc2`: alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters - `admin1 code`: fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20) - `admin2 code`: code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) - `admin3 code`: code for third level administrative division, varchar(20) - `admin4 code`: code for fourth level administrative division, varchar(20) - `population`: bigint (8 byte int) - `elevation`: in meters, integer - `dem`: digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat. - `timezone`: the iana timezone id (see file timeZone.txt) varchar(40) - `modification date`: date of last modification in yyyy-MM-dd format There are _lots_ of ways to turn that into some form of shapefile we can directly use vs. rely on the janky `Plot.dot` hack I did on Day 1. I decided to play with DuckDB's [spatial extension](https://duckdb.org/docs/extensions/spatial) since I don't get to do that too often. I first used a helper script to load TSV data into DuckDB. I'm used to working with giant files and this script (below) pre-builds a schema with a first thousand row sample of a TSV file, then loads the whole thing into DuckDB. ```bash #!/bin/bash # TSV to DuckDB # TODO make `sep` an input parameter # usage: ./tsv2duckdb.sh bigdata.tsv duckdbfilename tablename set -e TSV_FILE="${1}" DUCKDB_FILENAME="${2}" TABLE_NAME="${3}" TMPDIR=$(mktemp -d) TMPTSV="${TMPDIR}/TSV.tsv" head -n 1001 "${TSV_FILE}" > "${TMPTSV}" duckdb "${DUCKDB_FILENAME}" <<EOF .mode csv .sep '\t' CREATE TABLE ${TABLE_NAME} AS SELECT * FROM read_csv_auto('${TMPTSV}', HEADER=TRUE); EOF rm "${TMPTSV}" duckdb "${DUCKDB_FILENAME}" <<EOF .mode csv .sep '\t' DELETE FROM ${TABLE_NAME}; INSERT INTO ${TABLE_NAME} SELECT * FROM read_csv_auto('${TSV_FILE}', HEADER=TRUE); EOF duckdb "${DUCKDB_FILENAME}" <<EOF SELECT COUNT(*) aS row_count FROM ${TABLE_NAME}; .quit EOF ``` The Geonames files are small enough that I could have just queried the TSV directly, but now I have U.S. geonames in a super handy format. We can export a GeoJSON from DuckDB with all the "'bad '" locations with a tiny bit of SQL: ```sql INSTALL spatial; LOAD spatial; COPY ( SELECT ST_Point( longitude, latitude ) AS geometry, asciiname AS place, admin1_code AS state, geonameid FROM geonames.names WHERE country_code = 'US' AND starts_with(lower(asciiname), 'bad') ) TO 'bad-us-places.json' WITH (FORMAT GDAL, driver 'GeoJSON', layer_creation_options 'WRITE_BBOX=YES'); ``` The rest of the code is in the hidden OJS block. ```{ojs} Plot = require('@observablehq/plot') Inputs = require('@observablehq/inputs') Plot.plot({ className: "bad", projection: "albers-usa", title: "'Bad'dest Places In The 🇺🇲", subtitle: "Use the search bar/table to find and select 'Bad' places to highlight them.", caption: md`Data source: [Geonames](https://www.geonames.org/advanced-search.html?q=bad&country=US&continentCode=NA&startRow=150)`, width: width, style: { backgroundColor: "transparent", }, color: { scheme: "reds" }, marks: [ Plot.geo(states, { fill: (d) => badCounts[d.id], stroke: "#142933", strokeWidth: 0.33 }), Plot.geo(selectedPlaces, { fill: "yellow", strokeWidth: 1, stroke: "black" }), Plot.geo(nation, { stroke: "#999", strokeWidth: 0.1 }) ] }) viewof badsearch = Inputs.search(badTblData, { label: "Place lookup:" }) viewof badSel = Inputs.table(badsearch, { columns: ["place", "state", "geonameid"], header: { place: "Place", state: "State", geonameid: "Geoname" }, width: { place: 300, state: 40, geonameid: 60 }, maxWidth: 600, format: { geonameid: (d) => html`<a href="https://www.geonames.org/${d}">${d}</a>` } }) stateTrans = JSON.parse( '{"AL":"01","AK":"02","AZ":"04","AR":"05","CA":"06","CO":"08","CT":"09","DE":"10","DC":"11","FL":"12","GA":"13","HI":"15","ID":"16","IL":"17","IN":"18","IA":"19","KS":"20","KY":"21","LA":"22","ME":"23","MD":"24","MA":"25","MI":"26","MN":"27","MS":"28","MO":"29","MT":"30","NE":"31","NV":"32","NH":"33","NJ":"34","NM":"35","NY":"36","NC":"37","ND":"38","OH":"39","OK":"40","OR":"41","PA":"42","RI":"44","SC":"45","SD":"46","TN":"47","TX":"48","UT":"49","VT":"50","VA":"51","WA":"53","WV":"54","WI":"55","WY":"56"}' ) fipsTrans = JSON.parse( '{"01":"Alabama","02":"Alaska","04":"Arizona","05":"Arkansas","06":"California","08":"Colorado","09":"Connecticut","10":"Delaware","11":"District of Columbia","12":"Florida","13":"Georgia","15":"Hawaii","16":"Idaho","17":"Illinois","18":"Indiana","19":"Iowa","20":"Kansas","21":"Kentucky","22":"Louisiana","23":"Maine","24":"Maryland","25":"Massachusetts","26":"Michigan","27":"Minnesota","28":"Mississippi","29":"Missouri","30":"Montana","31":"Nebraska","32":"Nevada","33":"New Hampshire","34":"New Jersey","35":"New Mexico","36":"New York","37":"North Carolina","38":"North Dakota","39":"Ohio","40":"Oklahoma","41":"Oregon","42":"Pennsylvania","44":"Rhode Island","45":"South Carolina","46":"South Dakota","47":"Tennessee","48":"Texas","49":"Utah","50":"Vermont","51":"Virginia","53":"Washington","54":"West Virginia","55":"Wisconsin","56":"Wyoming"}' ) badCounts = aq .from( badUSPlaces.features.map((d) => ({ id: stateTrans[d.properties.state] })) ) .groupby("id") .rollup({ ct: (d) => aq.op.count() }) .ungroup() .objects() .reduce((bc, d) => { bc[d.id] = d.ct; return bc; }, {}) selGeo = badSel.map((d) => d.geonameid) badTblData = badUSPlaces.features.map((d) => ({ place: d.properties.place, state: d.properties.state, geonameid: d.properties.geonameid })) selectedPlaces = ({ type: "FeatureCollection", name: badUSPlaces.name, bbox: badUSPlaces.bbox, features: badUSPlaces.features.filter((d) => selGeo.includes(d.properties.geonameid) ) }) badUSPlaces = FileAttachment("static/data/bad-us-places.json").json() nation = topojson.feature(us, us.objects.nation) states = topojson.feature(us, us.objects.states).features statemesh = topojson.mesh(us, us.objects.states) us = FileAttachment("static/data/us-counties-10m.json").json() ``` ## FIN As an aside, I learned that I totally can import the latest Plot/Inputs/etc. from CDN in-Qmd. That makes it possible to do everything in this Qmd vs embed an Observable notebook like I did for Day 3.