Visualizing the Census

Cathy Moran Hajo, Ramapo College

This assignment was designed as an in-class lab where students worked with Tableau, Word Clouds, and Canva Infographics to understand the process of data visualization. We used a lab installed with Tableau Public.

Goal

Use data from the Mahwah Census from 1860-1940 to practice using data and other visualization tools to make sense of data trends.

Tools

Data

We discuss how you obtain and manipulate data. I chose to use the census because the fields are pretty simple to understand. But there were data cleaning issues to resolve.

The Census fields we have to work with are as follows. Please note that fields with an asterisk (*) do not appear in all the census years. In general there is more information in the years after 1900:

  • Year  — We have 1860, 1870, 1880, 1900, 1910, 1920, 1930. The 1890 Census was lost in a fire.
  • Surname  — Surnames were recorded by the census taker and then transcribed by volunteers. There may be errors.
  • Given name — Given names were recorded by the census taker and then transcribed by volunteers. There may be errors.
  • Gender — Most of the entries have either Male or Female.
  • Race —  Note that these were recorded by the census-taker, not necessarily self-reported. Most are: White, Black, Mulatto, and Indian.
  • Age — Census takers asked how old the person was on the date the census was taken. It is known that people often round to 0’s or 5’s and you may see the same person 10 years apart whose age doesn’t match.
  • Year of Birth — This is a calculated field (Year of Census – Age)
  • Birthplace — For American-born individuals, the state is listed; for foreign-born individuals the country is listed.  Note that in order to map these places we need to convert them to simplified country and state codes.
  • Relationship — The census records households. They will start with a “Head” and then all the people following are described in relation to that head (Wife, Daughter, Son-in-law, boarder). When the next “Head” shows up on the list, you are in the next household.
  • Marital Status — Here you will find Married, Single, and Widowed.
  • Father’s Birthplace* — For American-born individuals, the state is listed; for foreign-born individuals the country is listed. Note that in order to map these places we need to convert them to simplified country and state codes.
  • Mother’s Birthplace* — For American-born individuals, the state is listed; for foreign-born individuals the country is listed. Note that in order to map these places we need to convert them to simplified country and state codes.
  • Occupation* — Occupations were not written down uniformly, so if you want to use this field some editing may be needed.
  • Industry* — The industry that the occupation is in was not always recorded or recorded consistently. Some editing of this field may be needed.
  • Street Name* — Street names are sporadically entered; in many cases the roads were unnamed.
  • Street Number* — Street numbers were not entered consistently and may not have been assigned.

Data sets

HoHoKusCensus-1860-1940   This contains more fields, but country and state names have not been regularized. You can use these fields, but cannot map them.

1860_1940simplifiedbirthplace This contains fewer fields, but the birthplace has been converted to codes that Tableau can read. In some cases, places that no longer exist had to be folded into larger places (ex. Austria-Poland and Russia-Poland)

Tableau Visualizations

Mahwah Population by Race, 1860-1940. For live chart, click here.

Start by dragging the Year field to the Columns placeholder, or to the top of the workspace. You should see the census years appear on your worksheet.

Then drag the Race field to the Rows placeholder. You will see a breakdown of the races in the date. To change it so that we get a count of the fields, click on Race and edit the Measure so that it is Count.

Also drag Race to Detail. This tells the visualization that we want to see each one. You can drag Race to Color to have each entry come in its own color.

You can Filter the Race field to remove bad data. There is a field with just an “I” and a Null field. Unclick those to remove them from the chart.

Discussion here focuses on two things:

  • Look at the population figures generally. Can you explain the decline and rise based on what we know about the history of Mahwah? What happened between 1880 and 1910?
  • Look at the different terminology used to describe people? They stopped using “Mulatto” in 1910 and started using “Indian” in 1920. Discuss how census taker, rather than subject, defines race.
Mahwah Population by State of Birth, 1860-1940. See live visualization.
Mahwah Immigrant Population Origin, 1860-1940. Click here for live visualization.

The vast majority of Mahwah residents were born in New Jersey and New York. To trace immigration, we exclude those born in the United States to see and look at the change in national origin.

Clicking on the measure Number of Records and dragging it to the Colors mark uses color to show which countries has the most people. It automatically changes the map display to color the countries.

Create new worksheet and use the field Birth State. Drag that to the center to create a map and drag the measure Number of Records to the Color mark. Again, our map is a bit skewed because New Jersey has the vast majority. So let’s take it out with a filter. Rename this sheet Birth States US.

Word Cloud Visualizations

We also look at some other ways to visualize data. Word Clouds are pretty basic, but by analyzing the same data to create a word cloud of the place of origin, you have a good impression over the changes in the town. Word clouds use the size of the word to show how often it appears, an easy visualization technique that most can understand.

A word cloud of the place of birth for Mahwah residents in 1910.
A word cloud of place of birth for Mahwah residents in 1940 shows far more variety.

Canva Infographics

Canva is a graphic design site that makes it easy to design appealing infographics. Students create a simplified version of what they learned by playing with the census data to convey the information quickly and with style.