Installing the U.S. Census API in R

The U.S. Census Bureau American Community Survey application program interface (API) is a valuable programming interface for quickly and efficiently adding census data to your R code for analysis. It’s an extremely efficient way of doing data analysis in R without having to pull, format and organize census tables into a CSV or spreadsheet.

In order to use the census API, you will need to apply for a census API key, which takes only a few minutes. You then will need to install the key using the function census_api_key.

census_api_key('<key>', install=TRUE)

Once the key is install, it doesn’t require to be installed again, unless it expires or you apply for new key. Going to the U.S. Census website and searching for the API is how to apply.

The get_acs R function gives you the ability to pull in multiple years and categories of survey information for demographic information including race, household income, marital status, employment, etc. It’s a power feature for analyzing survey and census data in tidy format.

ACS_2010 <- get_acs("state",  year=2010, variables="S1702_C02_001", output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_2011 <- get_acs("state", variables="S1702_C02_001", year=2011, output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_2012 <- get_acs("state", variables="S1702_C02_001", year=2012, output="tidy", geometry=TRUE) %>%
  select(-moe)
  
ACS_2013 <- get_acs("state", variables="S1702_C02_001", year=2013, output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_2014 <- get_acs("state", variables="S1702_C02_001", year=2014, output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_2015 <- get_acs("state", variables="S1702_C02_001", year=2015, output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_2016 <- get_acs("state", variables="S1702_C02_001", year=2016, output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_2017 <- get_acs("state", variables="S1702_C02_001", year=2017, output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_S1702_C01_001_Family <- get_acs("county", variables="S1702_C01_001", output="tidy", geometry=TRUE) %>%
  select(-moe)
  
ACS_S1702_C01_013_Household_Work <-  get_acs("county", variables="S1702_C01_013", output="tidy", geometry=TRUE) %>%
  select(-moe)

ACS_household_work <-  get_acs("county", variables="S1702_C01_013", output="tidy", geometry=TRUE) %>%
  select(-moe)

ACE variable data can be enumerated in a list with with descriptive names for easier classification. All variables are indicated by a document id (for example S1702_c01_018, S1702_C01_019, etc.).


 ACS_Education <- get_acs("county", variables= c(nohighschool = "S1702_C01_018", 
                                                  highschool = "S1702_C01_019",
                                                 somecollege = "S1702_C01_020", 
                                                 collegeplus = "S1702_C01_021"),
                         output="tidy", geometry=TRUE) %>%
  select(-moe)

You can also choose to include margin of error (moe) or not. Visualizations can be in typical R GGplot library mode or included into maps. All ACS information includes IDs that categorizes survey data.

ACS_geo_2010 <- ACS_2010 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate))


ACS_geo_2011 <- ACS_2011 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate)) 

ACS_geo_2012 <- ACS_2012 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate)) 

ACS_geo_2013 <- ACS_2013 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate)) 

ACS_geo_2014 <- ACS_2014 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate)) 

ACS_geo_2015 <- ACS_2015 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate)) 

ACS_geo_2016 <- ACS_2016 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate))

ACS_geo_2017 <- ACS_2017 %>%
  select('GEOID','NAME','variable','estimate','geometry') %>%
  filter(variable=='S1702_C02_001') %>%
  group_by(GEOID, NAME) %>%
  summarize(estimate = sum(estimate))

Geographic Mapping of ACS Data

png(file="images/ACS_geo_2010.png")
tm_shape(ACS_geo_2010) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2010 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2011.png")
tm_shape(ACS_geo_2011) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2011 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2012.png")
tm_shape(ACS_geo_2012) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2012 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2013.png")
tm_shape(ACS_geo_2013) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2013 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2014.png")
tm_shape(ACS_geo_2014) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2014 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2015.png")
tm_shape(ACS_geo_2015) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2015 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2016.png")
tm_shape(ACS_geo_2016) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2016 Post-Recession", asp=1)
dev.off()

png(file="images/ACS_geo_2017.png")
tm_shape(ACS_geo_2017) + tm_polygons("estimate") + tm_layout(title.position=c("left","top"), title="% Poverty in U.S. Year: 2017 Post-Recession", asp=1)
dev.off()

Twitter is a Social Media Engagement Multiplier

With the resignation of CEO Jack Dorsey as the executive leader of Twitter, I began to reflect upon the platform and what exactly the brand stands for. Twitter has been widely criticized for being a megaphone for extremism, hatred and anti-democratic ideology. My personal experience with Twitter has been one of desperate persuasiveness as I try to engage multiple people at once on issues that I care about. It’s very easy to get emotionally addicted to Twitter and invest a ton of emotional capital into it.

Microblogging, when it is healthy, can be a platform that engages multiple people at once very quickly who have varying points-of-view or advocacy, and watch as it get’s retweeted, liked or shared across social media. But people also use it to spread misinformation, harmful caricatures in real time, and watch as it becomes viral. My personal experience with Twitter has been like walking into a room with dozens of people arguing and trying to ask a question or bring a different point-of-view, and then quickly being dismissed, or insulted and at times being pushed out the room and the door shut behind me.

Ironically, this is exactly what happened to me once in real life. At a university I tried to insert myself into a conversation or topic that the vast majority of the participants didn’t think I should be involved in. And quite literally, the door was shut in front of me. It was humiliatingly painful; but I was very young and didn’t understand that I didn’t belong. Growing up, I was always taught that one of the greatest things about our country was diversity. Diversity of ideas, diversity of people, etc.

As I grew older, I realized quickly that the reality is far less ideal or utopian. Although we say we want diversity of ideas; really we want only our ideas to be accepted. And people who are different in race, culture, language, gender, identity are not always welcomed in the same spaces. That is a lot like twitter today. This became even more painfully evident when Twitter Spaces was launched. It quickly became a land mind as people battled it out in such racist hosting rooms as “Are there too many Black women in public?”, “Should White People Exist” and “Should Black People Exist?”.

As a data scientist, I studied extensively, the nature of associations on twitter and how people influence others based on who they follow and their own followership. For more information on this, read my article on association analysis in Twitter (for more information read my article on Apriori association analysis as a supplement to my Twitter article). What it taught me is that Twitter at its most beneficial is a “multiplier”. By multiplier, I am referring to Twitter’s ability to take information presented by someone on the platform, be it a blog, image, tweet, etc., and multiply that content to tens, hundreds, and even thousands of people near instantaneously better than any other platform.

So say for instance, your write a blog on your website. You may have hundreds and even thousands of people who have subscribed to your website. But that blog, in terms of engagement will likely not grow at the rate at which the reference to that article in a tweet would grow, keeping all other variables constant. For instance, if you have one hundred subscribers on your blog, and one hundred followers on twitter. The twitter reference will multiply your blog’s engagement. The same can be said for other platforms such as LinkedIn and Facebook (Meta).

My rule of thumb for Twitter now is to use it as a catalyst to bring more people to my site. Twitter is a multiplier and should not be used to have conversations. Express your ideas, then leave them there for people to engage. Remember that as a content creator, any engage – even if it’s negative – is a win! I also recommend creating a developer account and download Twitter feed data via the Twitter API. Twitter is really a great platform to understand this “multiplier” effect of social media.

I’d love to hear people’s comments on this. I’m open to have a conversation anytime on the topic. BTW, this article will be tweeted as well.

Information Technology and Workplace Diversity

My career in information technology spans twenty years, but my exposure to information technology started before I was a teenager.  At that time, business productivity was popular, but still regulated to personal desktop computers and IBM Mainframes.  Web services, e-commerce, cellular phones, software services, etc. didn’t exist. Productivity software was truly a one-to-one experience.  You installed software on a computer, you used that software to do your job, printed out the results, and finally handed the information to those who requested it.  There was network communication and email, but it was text based, rudimentary and not secured.

The population of personal computer users was low as well, except in the workplace, where business technology was increasingly evolving into a competitive edge for businesses.  Proprietary software was being replaced by retail options such as Lotus 1-2-3, PC Write, dBase, etc.  As the software industry expanded, so did the talent pool of developers, engineers, administrators, and managers.  New software product life cycles emerged and entire business processes were created.

Today, Information Technology is a much larger reflection of our global society.  A good portion of which extends to collaborative communities of developers and engineers that can be anywhere at anytime.  The irony in not promoting diversity is that a technology company is more likely than ever to have associations and relationships with clients and customers that are very diverse in ideas, backgrounds, cultures, religions, etc.  Diversity is about mitigating the risk of making bad business decisions that would impact a corporate brand as well as building a talented workforce.  It makes economic sense to have a diverse workforce because as the influence of IT continues to expand into globalized infrastructures (i.e., cloud based services and outsourcing), marketing, communication and supply chain processes and projects will need to have work resources that have skills and knowledge in markets where it needs to be competitive and relevant.

Derek Moore, contributor

 

 

What Companies Need to Know About Big Data and Social Computing in Information Technology Management

Internet statistics estimate that 500 million tweets are produced per day. That translates to millions of conversations about a vast array of topics.  “Big data” is a term that has become more prominent as social media sites such as Twitter, Facebook, Instagram, etc. continue to generate large data streams.  Consumers produce click stream data and complete transactions visiting corporate websites to make purchases, schedule appointments for services or typing reviews on Yelp, Amazon and Uber about an experience that they’ve had.  With a well-planned IS strategy, this data  can be analyzed to gain insight into their customers and make critical strategic decisions necessary to compete.  Here are a few things companies should know about “Big Data” and social media computing as a business strategy.

Understand that social media and social networking is more a concept than a platform.

One of the  biggest problems with companies adopting social media as part of their IT business strategy is that the concept of social media for many IT managers does not extend beyond Twitter and Facebook.  There are many platforms for which social media is beneficial to business.  Slack and Github build on crowd-sourcing by emulating project management, software development and agile methodologies; even though those platforms are not primarily used for social media.

As more engineering firms adopt open source solutions, agile and DevOps development companies are deciding to use code development repositories such as GitHub.  Microsoft has already adopted GitHub as part of its Visual Studio Team Foundation options for source control.  The power of GitHub is very evident as global communities of developers use it to make some of the most innovative software products in languages such as Python, Java, C#, Ruby, etc.  It’s has also become a viable social media platform for software engineers who frequently collaborate on sprints.  Companies are also turning to solutions such as Slack to build entire global teams of developers to collaborate of on projects and sprints.

Social media as an IT business strategy is about understanding its contextual design and how the user interacts with it.  Part of understanding the contextual design of social media includes identifying the actors (primary and secondary) for which the platform are based and how those users interact with it to build relationships and communities.

Context also extends to how a user interfaces with social media.  Take, for example, the device many currently have in their pockets.  Apply classifications of contextual scope to this device and determine all the ways users interact through a platform (tablet, smartphone, computer, etc).  

A method known as the 4-I’s framework¹ is a good model to understand the user interaction in the context of social media.  The method is typically utilized in classifying interactions with information systems as described above.  The 4-I’s include:

  • Inscriptive (inputs)
  • Informative (outputs)
  • Interactive (processing)
  • Isolated (stored data)

This framework is useful for looking at ways to interact as a user that can perform as well as the information exchanged within that platform.  Another method that is popular is the MVC model or Model-View-Controller model which is used in software analysis and engineering as an architectural platform for implementing user interfaces on computers through separation of layers of those systems.

Do not dismiss “Big Data” as a gimmick.

The term “Big Data” itself may seem oversold through marketing, but the production of large data sets is very real, very fast and very large – with new data set being produced every day through public and private portals.

Big data is described as data that has variety (video, text, images, unstructured and structured), volume (over a terabyte, scale of brand), velocity (constant production of data streams), and veracity (the data needs to be cleaned and managed) .

 Information has become more fluid and available to more people faster and easier. Although no company should drive business decisions by what happens on Twitter or Facebook (or on the Dow), the power of “Big Data” as a tool can help in  trending analysis, customer segmentation and insight into short to long term business decisions.  

With “Big Data” companies will be able to:

  • Respond more quickly to market by making faster decisions.
  • Make patterns more evident to make changes to processes and products.
  • Better realize innovations and products and services and bring those to market faster.
  • Build and manage new and current data streams.
  • Create a data analytics ecosystem.  Make analyzing and aggregating data a business process all employees to utilize.

For a “Big Data” strategy to be successful, companies must:

  • Create data lakes and systems where raw data can live prior to being transformed for the business intelligence and reporting.
  • Remove data silos where data exists but is only accessible to a few internal stakeholders.  
  • Create a data analytics ecosystem
  • Create hybrid cloud solutions and begin moving applications to the cloud.

Know what association and segmentation analysis are and how to use them to learn about your customers.

With data streams, most coming online every day, new analytical methods can be used to gain insight into what consumers need in products and services.  Two popular analytical methods include association analysis and segmentation analysis.  In my next blog, I will discuss how these methods give insights into customers to better predict how they shop and what campaign ads are more likely to be successful with consumers.

With the popularity of Map Reduce and Hadoop, the business world is seeing an increase in “Big Data” analytics based on click stream and social media data.  Large data sets which would have taken days to analyze can now be done in minutes.

Conclusion

As data has become more prominent within an organization, and the means of collecting because easier and more ubiquitous, new skills will be necessary in certain roles to take full advantage of this data to drive value.  The corporate culture will need to adhere more to a data culture, where there is a value quotient to it collecting, cleansing, aggregating and analyzing data sources and data repositories.  Business leaders must establish new models that take advantage of social media and big data assets.

Works Cited

  1. Pitt, Leyland; Berthon, Pierre; Robson, Karen.  Deciding When to Use Tablets for Business Applications.  MIS Quarterly Executive Volume 10 Number 3 September 2011.