Campaign management is a strategy to use marketing campaigns to create sales and leads.  The Internet has been a treasure trove of consumer behavior for decades, it is only recently that web analytics has become a tool to create powerful business insight.  Two popular strategies deal specifically with pattern recognition.  Two pattern recognition strategies include customer segmentation (clustering) and market basket analysis (association).

This project will help determine how consumer Internet behavior analysis can be used in marketing strategies based on event time, web page views, real-time social media feeds and other information constantly being tracked through agent software, social websites, and web traffic logs. To determine the usefulness of such business strategy in business decisions about media campaign ads, my project team at the University of North Carolina at Greensboro (Chi-Squared)  collected twitter feed using a twitter developer account and simulated click stream information based on real-world content management metadata to create an association market basket, cluster models, and an eventual regression model.  The goal was to demonstrate the use of internet data analytics (web analytics) using popular analytical method to predict how revenue streams can be determined.  

The project was developed to determine the qualitative value of user online behavior and patterns to help business leaders make decisions about campaign ads and campaign management.

Click Stream data from websites and feeds, and the accessibility of more powerful analytical tools, has driven analytical methods such as forecasting and search engine optimization in retail markets.  Popular and powerful Map Reduce databases such as Hadoop and MongoDB are opening up a world of possibilities in the area of web analytics. Web analytics is a subset of business analytics and a feature of data analysis acquired through web server logs, programs, service agents and interfaces mainly collected in real-time on potentially millions or billions of events.  The massive array of internet traffic such as the number unique visitors on a website and social media feed from twitter that capture this data, has promoted further support of web semantics by the world wide web consortium and has created new services such as Google analytics and Amazon Web Services. “Big Data” web analytics is an area that will continue to create a wealth of opportunities for corporate decision-makers.

Content campaigning is a powerful tool in the hands of marketing professionals. Web and mobile content media and catalog metadata is crucial to provider revenue stream.  Simply put, legal digital movie and music downloads represent the main revenue stream for retail portals.  Predictive analytics allows internet companies to get insight into what customers are likely to purchase and also determine what content is likely to become more popular on social media on a particular day or period of time.   Market basket and association analysis help to create campaigns for new content that unique visitors and customers are likely to be interested in and purchase (ideally).   It is a very relevant topic in the growing business practice of understanding media content sales on the internet and how human and machine event tracking can play a role in generating revenue stream.   Information from the web will drive future campaigns and revenue streams.  This project serves as a way to demonstrate basic analytical methods for generating successful ad campaigns.

In the project it was observed that associations are best for determining market advertising on the websites.  Click stream behavior is a very good way of developing implied rules of what type of content visitors would like to see on a particular website. It was observed that analysis of click stream data (navigational data) provides recommender systems for many online businesses. These recommender systems provide benefits for making business decisions which can be used to generate revenue for businesses. By analyzing twitter data it was observed that targeting specific segments of followers on twitter can increase the campaign success of music albums, songs or artists. This information provides valuable insights for generic ad campaigns.  What was determine was the number of followers an account had is a better predictor of if and how often an artist is mentioned than the number of friends and listings, which was much more sporadic.  Also, it was determine that creating segments of followership improved the regression model by reducing the potentially massive number of outliers and focusing on the majority of accounts rather than focus on just a few accounts with very large followers.

As the Big Data growing as the main data source today, one of the data type is defined attributes to the big data, which is the clickstream of the ad banner or other media files on the webpage. Philip Russom states in Big Data Analysis  “One of the things that makes big data really big is that it’s coming from a greater variety of sources than ever before. Many of the newer ones are Web sources, including logs, clickstreams, and social media. User organizations have been collecting Web data for years. However, for most organizations, it’s been a kind of hoarding. We’ve seen similar untapped big data collected and hoarded, such as RFID data from supply chain applications, text data from call center applications, semi structured data from various business-to-business processes, and geospatial data in logistics. What’s changed is that far more users are now analyzing big data instead of merely hoarding it. The few organizations that have been analyzing this data now do so at a more complex and sophisticated level. Big data isn’t new, but the effective analytical leveraging of big data is 5.”

In her research on click stream analysis, Sule Gündüz explains about how a web page prediction model is based on click stream tree representation of user behavior. She demonstrates that predicting the next request of a user as she visits Web pages has gained importance as Web-based activity increases. Markov models and their variations, or models based on sequence mining have been found well suited for this problem. However, higher order Markov models are extremely complicated due to their large number of states whereas lower order Markov models do not capture the entire behavior of a user in a session. The models that are based on sequential pattern mining only consider the frequent sequences in the data set, making it difficult to predict the next request following a page that is not in the sequential pattern. Furthermore, it is hard to find models for mining two different kinds of information of a user session. She proposes a new model that considers both the order information of pages in a session and the time spent on them. She also clusters user sessions based on their pair-wise similarity and represent the resulting clusters by a click-stream tree. The new user session is then assigned to a cluster based on a similarity measure. The click-stream tree of that cluster is used to generate the recommendation set. The model can be used as part of a cache prefetching system as well as a recommendation model.

Satya Menon & Dilip Soman also mentions about the prediction model from the clickstream data in their article, “Managing the Power of Curiosity for Effective Web Advertising Strategies” that investigates the effect of curiosity on the effectiveness of Internet advertising. In particular, they identify processes that underlie curiosity resolution and study its impact on consumer motivation and learning. The dataset from our simulated Internet experiment includes process tracking variables (i.e., click stream data from ad-embedded links), traditional attitude and behavioral intention measures, and open-ended protocols. They find that a curiosity-generating advertising strategy increases interest and learning relative to a strategy that provides detailed product information. Furthermore, though curiosity does not dramatically increase the observed quantity of search in our study, it seems to improve the quality of search substantially (i.e., time spent and attention devoted to specific information), resulting in better and more focused memory and comprehension of new product information. To enhance the effectiveness of Internet advertising of new products, we recommend a curiosity advertising strategy based on four elements: (1) curiosity generation by highlighting a gap in extant knowledge, (2) the presence of a hint to guide elaboration for curiosity resolution, (3) sufficient time to try and resolve curiosity as well as the assurance of curiosity-resolving information, and (4) the use of measures of consumer elaboration and learning to gauge advertising effectiveness. As they mention about the curiosity, actually all the curiosity statistics comes from the click stream data from the ad-embedded links. Essentially, their assumption of the prediction model comes from the click stream data.

For winning the ad campaign, the user’s behavior is very important to tracking and building the prediction model. Based on Randolph Bucklin and Catarina Sismeiro,  using the clickstream data record in Web server log files, is to develop and estimate a model of the browsing behavior of visitors to a Web site. Two basic aspects of browsing behavior are examined: the visitor’s decisions to continue browsing (by submitting an additional page request) or to exit the site and the length of time spent viewing each page. They propose a type II models that captures both aspects of browsing behavior and handles the limitations of server log-file data. In their article, they notify that the part of log-file data coming from the click stream, so that they also use the click stream as the data for their prediction model.

Overall, many of other researchers use the click stream data building their prediction model, as same as us, we are using the click stream data from twitter feed to predict the ad campaign in the real-world.

Data-driven analysis has become important for decision makers.  It helps improve productivity and operations.  Most data driven companies, especially those that exist on the internet are in the position to optimize their product revenue through their main channel or portal on the internet.  With standardizations on internet protocols and web services, it’s now much more relevant to use search engines, page views and click streams to gauge possible new revenue streams.  In the competitive work of data analytics, the internet is presently ground zero.   Standard techniques such as customer-driven innovation (CDI) have create such search engine optimization and social media analytics.  Anywhere where there is a prevalence of information generated by humans or machines, innovations are spawned to capitalize on this.  What’s even more encouraging is that data is becoming much more accessible and open.  With Google Analytics anyone can have access to click stream data from their own website for free.  Twitter and other social media data is available for any developer to capture and analyze.  Cloud services such as Microsoft Azure, Google Cloud Services and Amazon Web Services allow developers to build data analytics free of charge depending on the level of service, type of applications accessed and amount of data necessary to analyze.  Websites such as http://data.gov provide tens of thousands of datasets free of charge.  Public policy considerations have provided new opportunities for science, government and its citizens.  It has literally become a data driven world for which anyone has access to the world’s data.  It is no longer a question of if web and data analytics is necessary to be competitive.  According to William McComb, CEO of Fifth & Pacific, a upscale brand company, regarding branding and ads.  “…the girl…that we want to target is even more digitally obsessed and lives her life on mobile devices.  We believe that we’re at a point, the economy is at a point, and the consumer has evolved to a point where she doesn’t need to have a physical store.”  It’s this economy that is driven by data analysis.

The data source for these models consist of live twitter feed data collected from a twitter social media account JSON file.  This data was parsed for popular content (specifically for artist name) and record based on the number of occurrences, time and the actual twitter text that contains the target value.

To marketer’s twitter is significant in terms of determining the popularity of a brand based on tweets, also known as brand tweets and the amount of follower engagement measured by the number of followers per person.  To many companies, this can translate to clicks on their twitter bit.ly link or more followers added and is part of engagement breakdown.  Engagement breakdown measures the number of replies, tweets, re-tweets, mentions and favorites.  The click stream data in this project is engineered using seeded catalog metadata specific to music media and simulated web logs that are used by companies to collect user page view behavior on websites.  There are four main ways of capturing click stream data: Apache Web logs, web beacons, JavaScript tags and packet sniffing.

Nowadays, various companies have realized the potential of online ad campaigns in effectively increasing their customer base and bringing in more profits. Typically most of the companies have limited budgets set aside for their ad campaigns and naturally the budgets can only pay for a limited number of ads in a given time period. Thus the major challenge is to make this limited number of ads very effective by targeting them as much as possible to the right segment of people. Click stream data is a valuable resource in this direction. It has the potential to give companies valuable insights on knowing the right set of people to target their ads towards and thus get maximum profits out of their ad campaigns. In the same way, content providers can also use click stream data analysis to appropriately target the ads of their customers. Twitter feed data analysis also works in a similar vein. While click stream data analysis analyses data collected from behavioral habits of web users, twitter feed data analysis more specifically concentrates on analyzing data from users twitter feeds. The analysis we have done through click stream data and twitter feed analysis in this project gave us valuable insights that can be used by companies to more effectively target their ads.

Team members and authors:  Derek Moore,  Ashwini Wani, Junyi Hu, Ramya Nimmagadda.
Class:  ISM 678 Business Analytics
Instructor:  Dr. Hamid Nemati, Department of Information Systems and Operation Management, Bryan School of Business and Economics
University of North Carolina at Greensboro