Analytics, Dictionary

Dictionary of Marketing Questions

Creating Content for Your Marketing Funnel

funnel and attributed marketing behavior

For each stage of the funnel, you’ll need to answer the following questions:

  • How will customers at this stage find me?
  • What kind of information do I need to provide to help them move from one stage to the next?
  • How will I know if they have moved from one stage to another?

Click to Learn More

In Awareness stage, keeping track of lead* analysis metrics (include program investment, percent of new names, total successes, total targets, investment per target, and average demographic score), answer the following three questions:

  1. Which programs bring in targets or leads most cost-effectively?
  2. Where are we exhausting our lists?
  3. Which programs are bringing in the highly qualified leads?

*”lead” in this context means “leading” as in “leading or lagging indicators” and not “lead”as in “lead nurturing”

Customer Response Models

  • Are you assessing customer response models for statistical as well as business validity?
  • Are you applying ““haircut method”?
    • A naïve application of an incrementality percentage derived from market-level models indiscriminately to all customer histories will bias attribution substantially. In these methodologies, highly effective digital marketing treatments will be penalized while ineffective ones will be favored. As a result, differentiation will be dampened and reallocation opportunities might be squandered.

Marketing Allocation

  • What econometric methods have you applied (as such log-log multi-regression models, Bayesian approaches, diffusion models) to identify causal relationshipsbetween outcome (e.g., consumer purchase funnel and sales) and marketing and other business drivers based on observed behaviors?
    • Traditional mix models, test/control experiments and judgmental attribution methods are not comprehensive enough to provide timely and credible answers to questions regarding marketing allocations, impact and trade-offs.
  • What were your hypotheses on the expectation of the direction of impact; the magnitude of the impact; and the lag between the cause and effect?
  • How did you identify and test the impact of intermediate outcomes (as organic search queries, own-site web traffic, online video viewing, social media exposure, brand awareness, etc) have on marketing tactics?
  • What control variables did you take into account?
    • To account for external factors that are impacting customers such as economy, competitive landscape, and seasonality

Statistical Analysis

General Statistics Questions

Today people have to deal with up to terabytes of data and have to make sense of it and glean the important patterns from it.  Statistics can help greatly in this process by helping to answer several important questions about your data:

  • What patterns are there in my database?
  • What is the chance that an event will occur?
  • Which patterns are significant?
  • What is a high level summary of the data that gives me some idea of what is contained in my database?

Click to Learn M0re


Besides p-value, what statistical test have you conducted to ensure your hypothesis is correct?

  • One of the most important messages is that the p-value cannot tell you if your hypothesis is correct. Instead, it’s the probability of your data given your hypothesis.
  • A common misconception among nonstatisticians is that p-values can tell you the probability that a result occurred by chance. This interpretation is dead wrong
  • Nor can a p-value tell you the size of an effect, the strength of the evidence or the importance of a result.

Click to Learn More



Web Presentations

Anything you do for you customer the impact of which is not recorded in your internal system for your future use is a waste of your customer’s time, a monitory loss for your company, and degradation of your brand. Information on the customer is the soft currency that powers the internet. The cliché – delivering what you customer wants in the right time in the right channel via the right content – can only be achieved if the customer profile is constantly updated by both marketing and sales.

Today’s customer is highly engaged and educated regardless of their background. Content marketing is the byproduct of hyper interested labor force.  The great success of the internet is not so much connecting everyone (mail and other channels did that centuries prior to TimBL’s invention), but allowing everyone to access most of human knowledge within seconds with their own tools (thus fulfilling Carnegie’s library vision). In this context, user experience expectation is not directly influenced by company or brand but by customer’s unsensational appetite to access information the way they desire.

Web presentation is an important component for your marketing tool kit. An important element of web presentation is the trust of the audience in your information. Your company’s name and the brand are impacted every time there is a web presentation. As Peter F Druker pointed out, that everyone from janitor to CEO speaks for your company and its brand. The same is true on how you conduct your webcast presentations. In every single web presentations, there are actually three meetings – before, during, and after. Each of this meeting needs to be managed in their own unique way.


The Utopia: Standardized Data Across The Globe (for Global Company)


Increase standardization decreases customization, which leads to lower initial cost curve and increase adoption rate of the technology. Though over time costs do increase, the increase is handicapped with cross-learning and spreading of the risks among various internal entities. Theoretically all companies will benefit in concentrating their key data processes into a standard format and flow.


My Experience:

With few notable exceptions, in global companies (at least in Fortune 500) all major standardization for data, systems, processes, metrics and technologies initiatives will fail in reaching most, if any, of their objectives. Experience curve, rate of technology change, rate of adoption, competition, short-termism, and internal company’s group will detract, diminish, and, eventually, downsize major data initiatives.

Internal special interest teams will not yield their influence over data and its related processes. They tend to be better organized, closer to the source of power, and have grassroot support that major (top-down) standardization initiatives do not. The locus of control is locked into the mid-level managers who have incentives to protect and grow their vision of data structure. The bigger the bureaucratic structure the more powerful the managers on local level. They are already set much of the strategic direction of the company. LinkedIn and the like have, unsurprisingly, scares number of articles about projects that unified the entire company from end-to-end of analytics continuum successfully.


Why is Search Important to Your Business?

The search engines employ advance algoriths and technologies that are in constant state of evolution and innovation. Though the systems are brilliant, they require that the sites they index cooperate. Failure to maintain a mutual relationship between the site and search engine will lead to decrease in traffic.

Results in positions 1, 2, and 3 receive much more traffic than results down the page, and considerably more than results on deeper pages. The fact that so much attention goes to so few listings means that there will always be a financial incentive for search engine rankings. No matter how search may change in the future, websites and businesses will compete with one another for this attention, and for the user traffic and brand visibility it provides[1]. One study shows that 56% of clicks and a third of time spent searching will be spend on the first link. [2]

Google represents 65% of the searches[3]. That share is, more or less, stable. Bing maintains the other third of the market.

Many things that are in Google’s ranking algorithm correlate very well with brands. Google’s algorithmic inputs have started favoring things that brands are better at. Google is rewarding better links rather than just more links. They’re things around user and usage data[4].

The Latest Research:

A recent study shows that three key drivers of Google search rank are

Domain-Level Link Features – based on link/citation metrics such as quality of links, trust

Page-Level Link Features – PageRank, trust metrics, quantity of linking root domains, links

Page-Level Keyword & Content-Based Features – content relevance scoring, on-page optimization of keyword usage, topic-modeling algorithm scores on content, content quality/relevance


The following are least influential

Domain-Level Keyword Usage – Exact-match keyword domains, partial-keyword match

Domain-Level Keyword-Agnostic Features – Domain name length, TLD extension

Page-Level Social Metrics – Quantity/quality of tweeted links, Facebook shares, Google +1s


[2] F Takes – Leiden University, 2011



Analytics, Dictionary

Analytics Dictionary

Actionable Metric
An actionable metric is one that ties specific and repeatable actions to observed results.

The opposite of actionable metrics are vanity metrics (like web hits or number of downloads) which only serve to document the current state of the product but offer no insight into how we got here or what to do next.
Click to Learn More


Generally refers to controlled, paid messages you send to the public via newspaper and magazine displays, billboards, TV and radio commercials and website banners.

Click to Learn More

Agile Fashion

What to do:

  • Find out where you are
  • Take a small step towards your goal
  • Adjust your understanding based on what you learned
  • Repeat

How to do it:

When faced with two or more alternatives that deliver roughly the same value, take the path that makes future change easier.

And that’s it. Those four lines and one practice encompass everything there is to know about effective software development.

Click to Learn More

Agile, Project Management
Agile Project Management – Vocabulary & Artifacts
Iterations (sprints)

As a Project Management method, Agile focuses on delivering Features – or Deliverables – as often as possible. As part of the definition, the Deliverable must be Completed, Tested, Debugged and Usable.

Core to any Agile method are Iterations. Iterations are fixed-length periods of time, of 1 to 4 weeks (usually 2) during which we try to accomplish a list of things or deliver certain features.

The idea behind iterations is to give the team a short term objective that creates both a sense of emergency and a feeling of accomplishment once it is completed – an addictive cocktail. Short-term, achievable objectives help to keep morale high.

Sprint Backlog:

The sprint backlog is composed of a set of top priority items chosen from the Product Backlog by the team. Once the team has selected and estimated the items, there is a commitment from the team to complete them within the duration of the Sprint, in order for it to be successful.

The objective of a sprint is to release or implement one (or many) working feature(s). The team will therefor break user-stories into smaller more manageable tasks required for the release. Typically these tasks will take a maximum of 16 hours to complete.

User-stories (or Items)

Agile is highly customer-driven. Therefore, features are usually translated into User-Stories. A Story explains how a feature is to be used and gives it context. The proper way to write Stories is to start by:

As a [ role ], I want [ goal / need / desire ] (optionally: so that [ benefit ])

i.e. As a user, I want to search for my customers by their first and last names.
i.e. As a non-administrative user, I want to modify my own schedules but not the schedules of other users.

The Product Owner is responsible for writing clear and concise user-stories usually following the “INVEST” method : Independent, Negotiable, Valuable, Estimable, Small, Testable.

Product Backlog

The Product Backlog contains the list of the customer’s requirements, prioritized, typically by business value. This list is broken down into smaller items or user-stories.

The Product Backlog regroups all the remaining Stories for a given project. It is meant to be properly maintained and prioritized. Once an Iteration is completed, stories are selected from the Backlog, based on Priorities and are sent to the new sprint.

Ideally, the Product Backlog is created before the project is launched and new stories are added to it as the need arises or if epics need to be broken down. Stories in the backlog should have been roughly estimated by the Product Owner.

The team contributes to the backlog by properly estimating Items and User-Stories, either in Story-points or in estimated hours. Some teams relying on Story points play “Poker Planning” to ease the estimation process.

Story Points

A story point is an arbitrary measure of effort used by Scrum teams. It is used to accelerate the estimation of the effort required to implement a user-story. Typically, the stories vary from 1,2,4,8,16 or the fibonacci series (1,2,3,5,8,13,21,34,45).

Points are a relative value that do not directly corelate to actual hours which helps scrum teams to think abstractly about the effort required to complete a story. Teams will typically estimate the smallest story in their sprint backlog that everyone can relate to and determine it to be a 1 point story and use this as a baseline to estimate other stories.

Estimated Points and hours are so incompatible that it is recommended to use only one of them as a measure of estimation. Agencies usually are required to log time in order to bill their clients. In this case, logging time and estimating in points is a correct method.

Comparing point velocity between teams is almost impossible and should not be attempted since its measure is arbitrary – a bit like comparing apples to oranges.

Iteration Review

The iteration or Sprint review gathers together the Scrum Team, the Scrum Master and Product owner.

The objective of an Iteration Review is to provide a realistic report of the team’s progress to the Product Owner. It is also an excellent opportunity for the team to get feedback on the User Stories delivered to the customer.

Finally, it is a great tool to understand what happened during the sprint, if stories were under estimated and if the team has been able to keep to its commitment. This allows the team, Scrum Master and Product owner to learn about the team’s capacity and improve their estimation accuracy.

The typical format of the Iteration review:

Review the Iteration theme and goal
Rexamine the Sprint backlog and its user-stories; has the scope changed? Was anything new introduced? How can we avoid these in the future? Was something not completed? Why?
Demonstrate User Stories and determine whether the objective of the User-Story was achieved or not. (Accept or Reject) Were there any changes to the user story? Was it split or merged?
The Iteration review is usually followed by the Iteration kickoff.

Iteration kickoff

The Iteration Kickoff meeting usually follows the Iteration review meeting and on the last day of a Sprint, in preparation of the next Sprint.

The objective of the meeting is for the Product owner to present the Scrum team and Scrum Master with the top priority features to be released. It gives an opportunity for the team to ask questions and clarify the user-stories.

Together, the Scrum Team and Product owner will decide on a Theme and Goal for the upcoming Sprint. The Scrum Team will then proceed to estimate these top priority items in order to select how many items they can commit to. Although there can always be negotiation between the team and the Product Owner, only the Scrum team can commit to.

The success of the Sprint will be determined during the next Iteration Review meeting, at the end of the upcoming sprint.


At the end of the Iteration Kickoff, must come the commitment. Once planned and estimated, each individual within the scrum team must be willing to commit to completing the User Stories in the Sprint Backlog.

In order to do that, one must understand the binding nature of a commitment. A commitment is a contractual binding pledge the team is taking together.

Committed teams tend to go the extra mile to get things done as was agreed. They are highly motivated and engaged and their members are strongly bonded towards a mutual objective. A “the total is more than the sum of its parts” kind of alliance.

Before committing the team and Scrum Master should make sure that:

The stories are NOT under estimated;
Individuals are accepting responsibility for the stories;
They can keep the scope fixed for the duration of the sprint;
They can be protected from external influences;
They can keep focused on the objectives at hand.


The burndown is one of the Hallmarks of Agile reporting. It is a great yet simple visual progress indicator of the work that remains to be done in the sprint, day-by-day.

During the sprint planning meeting, the team will identify specific stories from the backlog and estimate the tasks that must be completed in order for the sprint to be successful.

A sprint is successful if all the user-stories in the backlog are completed by the end of the sprint.

As stories are completed the remaining work to be done is burned down and gives the team visibility (motivation and commitment) on the progress of the sprint.

Having a trendline on your burndown can help visualize whether or not the team is on track towards completing all the stories in time.

Agile Roles
There are 4 typical roles in Agile. Product Owner, Project Manager, Worker and Stakeholder.

  • The Stakeholder is usually the instigator of the project or the investor.
  • The Product Owner is the Stakeholder’s representative. He is in charge of prioritizing the Backlog and Writing the User-Stories.
  • The Scrum Master is a facilitator to the team. He organizes the team, removes impediments, oversees the process, manages the Sprint backlog and the overall progress of the project.
  • The Workers or Scrum Team are the ones getting things done. They estimate the stories in points and the tasks in hours. If the stories are estimated to be too big for a sprint, they can request the stories to be broken down by the Product Owner.


Teams are asked to organize themselves by creating the tasks related to the stories and estimating them, whether it be by Estimated Hours or Value Points. They subsequently assign tasks among each other and launch the iteration.

Daily (scrum) Meeting

The objective of a daily meeting is to provide a status and progress update to the rest of the team. These meetings are held daily and should last no more than 15 minutes. Standing up usually enforces that rule.

During the stand-up meeting, each team member is encouraged to answer three questions:

What was done yesterday;
What will be done today;
Am I blocked or do I foresee any Impediments.
Daily meetings are an efficient way to promote commitment and track if the team is able to fulfill its commitment.


Releases are a part of the Grander Scheme of things. A release usually consists of a few iterations, where many Features or Deliverables are sent to the client or put into production.

Click to Learn More

Disciplined Agile Delivery (DAD) process framework in detail, working through a case study to show how it can be applied in practice. The DAD process framework has several important characteristics:

  • People first. DAD team members should be self-disciplined and DAD teams should be self organizing and self aware. The DAD process framework provides guidance which DAD teams leverage to improve their effectiveness, but it does not prescribe mandatory procedures. In DAD we foster the strategy of cross-functional teams made up of cross-functional people (generalizing specialists). There should be no hierarchy within the team, and team members are encouraged to be cross-functional in their skill set and indeed perform work related to disciplines other than their original specialty.
  • Learning oriented. In the years since the Agile Manifesto, we’ve discovered that the most effective organizations are the ones that promote a learning environment for their staff. There are three key aspects which a learning environment must address.
    • The first is domain learning – how are you exploring and identifying what your stakeholders need, and perhaps more importantly how are you helping them to do so?
    • The second is learning to improve your process at the individual, team, and enterprise levels.
    • The third is technical learning, which focuses on understanding how to effectively work with the tools and technologies being used to craft the solution for your stakeholders.
  • Agile. The DAD process framework adheres to and enhances the values and principles of the Agile Manifesto. Teams following either iterative or agile processes have been shown to produce higher quality, provide greater return on investment (ROI), provide greater stakeholder satisfaction, and deliver quicker as compared to either a traditional/waterfall approach or an ad-hoc (no defined process) approach.
  • Hybrid. DAD is the formulation of many strategies and practices from both mainstream agile methods as well as other sources. The DAD process framework extends the Scrum construction lifecycle to address the full delivery lifecycle while adopting strategies from several agile and lean methods. These sources include Scrum, Extreme Programming (XP), Agile Modeling (AM), Unified Process (UP), Kanban, and several others.
  • IT solution focused. The DAD approach will advance your focus from producing software to providing solutions –which is where real business value lies for your stakeholders. A fundamental observation is that as IT professionals we do far more than just develop software. Yes, software is clearly important, but in addressing the needs of our stakeholders we will often provide new or upgraded hardware, change the business/operational processes that stakeholders follow, and even help change the organizational structure in which our stakeholders work.
  • Full delivery lifecycle. DAD addresses the project lifecycle from the point of initiating the project through construction to the point of releasing the solution into production. We explicitly observe that each iteration is NOT the same. Projects do evolve and the work emphasis changes as we move through the lifecycle. To make this clear, we carve the project into phases with light-weight milestones to ensure that we are focused on the right things at the right time, such as initial visioning, architectural modeling, risk management, and deployment planning. This differs from mainstream agile methods, which typically focus on the construction aspects of the lifecycle; details about how to perform initiation and release activities, or even how they fit into the overall lifecycle, are typically vague and left up to you.
  • Goals driven. One of the challenges in describing a process framework is that you need to provide sufficient guidance to help people understand it, but if you provide too much guidance you become overly prescriptive. As we’ve helped various organizations improve their software processes over the years, we’ve come to believe that the various process proponents are coming from one extreme or the other. Either there are very detailed processes descriptions — the IBM Rational Unified Process (RUP) is one such example — or there are very light-weight process descriptions, Scrum being a perfect example. The challenge with RUP is that many teams didn’t have the skill to tailor it down appropriately, often resulting extra work being performed. On the other hand many Scrum teams had the opposite problem with not knowing how to tailor it up appropriately, resulting in significant effort spent reinventing or relearning techniques to address the myriad issues that Scrum doesn’t cover. Either way, a lot of waste could have been avoided if only there was an option between these two extremes.
  • Risk and value driven. The DAD process framework adopts what is called a risk/value lifecycle; effectively, this is a light-weight version of the strategy promoted by the Unified Process (UP). DAD teams strive to address common project risks, such as coming to stakeholder consensus around the vision and proving the architecture, early in the lifecycle. DAD also includes explicit checks for continued project viability, whether sufficient functionality has been produced, and whether the solution is production ready. It is also value-driven, a strategy which reduces delivery risk, in that DAD teams produce potentially consumable solutions on a regular basis.
  • Enterprise aware. With the exception of start-up companies, agile delivery teams don’t work in a vacuum. There are often existing systems currently in production, and minimally your solution shouldn’t impact them although your solution should leverage existing functionality and data available in production. There are often other teams working in parallel to your team, and you may wish to take advantage of a portion of what they’re doing and vice versa. There may be a common vision which your organization is working towards, a vision which your team should contribute to. There will be a governance strategy in place, although it may not be obvious to you, which hopefully enhances what your team is doing. Enterprise awareness is an important aspect of self discipline because as a professional you should strive to do what’s right for your organization and not just what’s interesting for you.

Click to Learn More

An attempt to answer a variety of questions about user/customer behavior. Think about each Analytics report as a response to a particular kind of user analysis question
Click to Learn More from Google

“Asynchronous” is “parallel”
Click to Learn More


Attribution is based on capturing touch point data over a historical period to determine which touch points are the most effective at which stages in the buying process to support investment allocations and produce higher aggregate results.

Attribution is simply the ability to evaluate the performance of each touch point in the buying process. A key premise of attribution is that all touches play a role in impacting the buying process. To create any type of attribution model you need data related to both converting and non-converting opportunities. There are various approaches to attribution. Three of the most common are last-touch, equal attribution, and fractional attribution:

  • Last-Touch Attribution is based on the idea that the last touch has the greatest impact on the buying process and therefore receives the majority or all of the credit for the entire sale. Some companies take the opposite approach and use first-touch attribution which is based on the idea that the first touch is what “primes the pump.” Neither of these models account for all the prior or following touches that may have impacted the buying behavior. As a result you may end up eliminating important earlier or later touches because you aren’t sure of their value. Despite the problems with these techniques, people use these approaches because they are relatively easy to create.
  • Equal Attribution is one way to overcome last-touch attribution issues. Just like it sounds, this approach assumes that all touches are equal, which means that an equal value is assigned to every single touch. The downside is that you may end up unnecessarily duplicating some efforts because you aren’t sure which touches have the greatest impact. So you may end up investing more than you need to – because this approach doesn’t provide insight into which touches perform best. That takes us to the concept of Fractional Attribution.
  • Fractional Attribution assigns a calculated “weight” to each marketing touch throughout the buyer’s purchase journey. Typically this weight is determined by the corresponding relative impact that particular touch will have on producing the desired business outcome, such as purchase. This approach enables marketers to take multiple prior exposures into consideration. Determining the weights requires understanding which touches perform best. Using fractional attribution requires understanding of the statistical significance of the various touches in order to quantify their contributing effect. When building this type of model and assigning weights it is important to keep in mind that there are touches other than marketing touches that drive the desired outcome.

Link to Learn More from Nimble Blog, July 29, 2o14

Batch Processing (vs Real Time Processing)
This is in contrast to “online” or interactive programs which prompt the user for such input.
Click to Learn More from Wikipedia

Batch data processing is an efficient way of processing high volumes of data is where a group of transactions is collected over a period of time. Data is collected, entered, processed and then the batch results are produced (Hadoop is focused on batch data processing). Batch processing requires separate programs for input, process and output. An example is payroll and billing systems.
Batch and real time data processing both have advantages and disadvantages. The decision to select the best data processing system for the specific job at hand depends on the types and sources of data and processing time needed to get the job done and create the ability to take immediate action if needed.
Click to Learn M0re

Business Intelligence (BI)
Descriptive or historical analysis of operational data
Click to Learn More

Change-Data-Capture (CDC)
Captures changes made in source systems and then replicates that change into a target system, keeping the databases synchronized. In some cases, the CDC tool can be sold separately from the rest of the data replication package. Other parts of a data replication solution can include schema and DDL replication, an easy to manage user interface, and software and hardware architecture designed for moving large amounts of data very quickly without creating down-time for your sources or targets or interfering with the ability of your enterprise applications to keep running. This same capability can also ensure that in the event of a crash, your company has the most up to date data with which to pick up the pieces. In addition, good data replication solutions should be fully automated in order to optimize IT productivity and save costs on professional service needs.

Source: Solutions Review Buyers Guides and Best Practices
Click to Learn More

Cohort Analysis
Cohort Analysis comes at the customer question from a different direction. The idea is to choose specific customers first, and then dimensionally analyze their actions, and the reactions of related parties
Ad-hoc cohort analysis requires raw horsepower for dynamic joins at the customer key level.

For performance reasons, cohort analysis processing should take place mostly on the EDW server, not the reporting tool platform.  In addition, certain aggregate tables and views may be required in the back-end EDW.
Click to Learn More

Complex Event Processing (CEP)
It combines data from multiple sources to detect patterns and attempt to identify either opportunities or threats. The goal is to identify significant events and respond fast. Sales leads, orders or customer service calls are examples.
Click to Learn More

Content Marketing

The art of communicating with your customers and prospects without selling.

Content marketing’s purpose is to attract and retain customers by consistently creating and curating relevant and valuable content with the intention of changing or enhancing consumer behavior. It is an ongoing process that is best integrated into your overall marketing strategy, and it focuses on owning media, not renting it.

Has three key components messaging, promotions, and navigation path to influence your customers


The essence of this content strategy is the belief that if we, as businesses, deliver consistent, ongoing valuable information to buyers, they ultimately reward us with their business and loyalty. Optimal communication with your customer means communicating with them individually.

Click to Learn More

Customer Response Models

Response models use data mining to find similarities between responders from previous marketing campaigns to predict who is likely or not likely to respond to a future campaign. The model is then scored against the prospects of the new campaign and a marketer can choose to mail only those people that are most likely to purchase. This increases conversions and decreases costs by only mailing to those most likely to respond.

The models include

  • Recency, Frequency, Monetary (RFM) models (Click to Learn More from Canopy Labs)
  • Traditional Response or Regression Models
  • Multi-Channel Customer Level Response Models (Click to Learn More from Avinash Kaushik, 2012)

Click to Learn More

Data Gravity

Consider Data as if it were a Planet or other object with sufficient mass. As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet.  As the mass or density increases, so does the strength of gravitational pull.  As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.

Services and Applications can have their own Gravity, but Data is the most massive and dense, therefore it has the most gravity.  Data if large enough can be virtually impossible to move.

Click to Learn More

Data Integration

Data integration is the process of combining data from many different sources into an application. You need to deliver the right data in the right format at the right timeframe to fuel great analytics and business processes.

A data integration project usually involves the following steps:

    • Accessing data from all its sources and locations, whether those are on premises or in the cloud or some combination of both.
    • Integrating data, so that records from one data source map to records in another (e.g., even if one dataset uses “lastname, firstname” and another uses “fname, lname,” the integrated set will make sure both end up in the right place).  This type of data preparation is essential for analytics or other applications to be able to use the data with any success.
    • Delivering integrated data to the business exactly when the business needs it, whether it is in batch, near real time, or real time.

Click to Learn More

Data Preparation

Discovery, profiling, sampling, cleansing, filtering, structuring and transforming, for example is in an iterative, exploratory fashion to transform the data into a suitable form for meaningful analysis. As a result of this process, repeatable data transformations are established that can be integrated into data science and analytical workflows.

Figure 2.The Iterative Nature of Data Preparation


Figure 3.Improve Your Understanding of Data Through Data Preparation

Gartner: Data Preparation is Not an Afterthought, 7 November 2014

Data Quality
Have a problem if the data doesn’t mean what you think it does, or should

  • Data not up to spec : garbage in, glitches, etc.
  • You don’t understand the spec : complexity, lack of metadata.

Data Warehouse

Businesses need historic data for a number of reasons including:

    • Trend analysis of past behaviour
    • Customer history for individual enquiries
    • Regulatory reporting and retention requirements

Click to Learn More

Demand Generation (vs Lead Generation)

Drives awareness, interest, and/or changing/shaping perspective in a company’s (and industry’s) products and services. It includes a mix of inbound and outbound marketing.The goal is:

  • To drive closed business with minimal interaction with the consumer or business you’re attracting.
  • Brand awareness
  • Positioning
  • Create interest and change perspectives.

If the conversion can be applied online with no interaction with the company, demand generation is critical to driving awareness, trust, and authority with your products and services.Companies often have an inbound sales team to respond to demand generated sales requests.


To affect the largest possible share of your audience, barriers to discovering, consuming, and sharing your content must be removed. Maximizing demand generation requires removing registration capture (and, therefore, lead generation) from the primary flow.


Your audience is more likely to purchase your products or services

Commentary from Maxim Kind: 

It is an ingredient to a successful marketing culture.


Click to Learn More from Marketing Tech Blog

Click to Learn More from Lead For Mix Blog

Click to Learn More from Content Marketing Institute, 2014

Demographic Gating

Behavioral Scoring + Demographic Scoring = Demographic Gating

Note: Specifically for B2B marketers, the most important piece of demographic gating is that it reveals what your actual cost per qualified lead is (and not just your cost per lead).

With marketing automation, lead scoring should be treated as a science, so you can get an accurate indicator of both sales readiness and sales fitness. This allows you to prioritize leads effectively and measure marketing effectiveness. Most marketers start with behavioral scoring, and then they throw in some demographic scoring if they have time. However, when it comes to lead qualification, your demographic measures are key.

The demographic gating model is a step up from normal demographic scoring. Rather than just telling you who is qualified, demographic gating informs you exactly how targeted your marketing efforts are based the percentage of people who are demographically qualified. It looks at all of your leads—what initiative they came in through, where they came from (social, paid social, SEM)—and then determines exactly how targeted your efforts are, based on your channels and marketing initiatives.

Click to Learn More from Etumos

Digital Analytics
Digital analytics data is organized into a general hierarchy of users, sessions and hits. It doesn’t matter where the data comes from, it could be a website or a mobile app or a kiosk. This model works for web, apps or anything else. A pageview is one of the fundamental metrics in digital analytics

Digital analytics data is organized into a hierarchy of hits, sessions and users.
Click to Learn More


Google Analytics – Describes characteristics of your users, their sessions and actions. The dimension City describes a characteristic of sessions and indicates the city, for example, “Paris” or “New York”, from which each session originated.

Not every metric can be combined with every dimension. Each dimension and metric has a scope: users, sessions, or actions. In most cases, it only makes sense to combine dimensions and metrics that share the same scope. For example, Sessions is a session-based metric so it can only be used with session-level dimensions like Source or City. It would not be logical to combine Sessions with an action-level (or, hit-level) dimension like Page.
Click to Learn More from Google

Data from mobile apps is not sent to Analytics right away. When a user navigates through an app, the Google Analytics SDK stores the hits locally on the device and then sends them to your Google Analytics account later in a batch process called dispatching. Dispatching is necessary for two reasons:

Mobile devices can lose network connectivity, and when a device isn’t connected to the web, the SDK can’t send any data hits to Google Analytics.
Sending data to Google Analytics in real time can reduce a device’s battery life.
Click to Learn More

Without the power afforded by an EDW, development staff may be needed to set up temporary database structures and batch processing–not good.
Click to Learn More

Enterprise Resource Planning (ERP)
Enterprise resource planning (ERP) is business process management software that allows an organization to use a system of integrated applications to manage the business and automate many back office functions related to technology, services and human resources.

The central feature of all ERP systems is a shared database that supports multiple functions used by different business units. In practice, this means that employees in different divisions—for example, accounting and sales—can rely on the same information for their specific needs.
Click to Learn More

Click to Learn More from NetSuite ERP (their study guide on ERP)

Frames Per Second (FPS)
Many factors contribute to an app’s frame rate, and there are various ways to code JavaScript and CSS to reduce or eliminate jank and achieve the desired rate.
Click to Learn More

Frames and frame rate: You should know how the browser constructs frames and why the 60fps rate is important for a smooth display. Learn more here: Click to Learn More from Google

Click to Learn More from Udacity course on Browser Rendering Optimization: Building 60 FPS Web Apps.

Each of those frames has a budget of just over 16ms (1 second / 60 = 16.66ms). In reality, however, the browser has housekeeping work to do, so all of your work needs to be completed inside 10ms. When you fail to meet this budget the frame rate drops, and the content judders on screen. This is often referred to as jank, and it negatively impacts the user’s experience.
Click to Learn More from Google

Google Analytics – The number of days elapsed since users last visited your property. Used to calculate user loyalty.

A request for a small image file. This image request is how the data is transmitted from a website or app to the data collection server. There are many different kinds of hits depending on your analytics tool. Here are some of the most common hits in Google Analytics:

Pageviews/Screenviews: A pageview (for web, or screenview for mobile) is usually automatically generated and measures a user viewing a piece of content. A pageview is one of the fundamental metrics in digital analytics. It is used to calculate many other metrics, like Pageviews per Visit and Avg. Time on Page.

Events: An event is like a counter. It’s used to measure how often a user takes action on a piece of content. Unlike a pageview which is automatically generated, an event must be manually implemented. You usually trigger an event when the user takes some kind of action. The action may be clicking on a button, clicking on a link, swiping a screen, etc. The key is that the user is interacting with content that is on a page or a screen.

Transactions: A transaction is sent when a user completes an ecommerce transaction. You must manually implement ecommerce tracking to collect transactions. You can send all sorts of data related to the transaction including product information (ID, color, sku, etc.) and transactional information (shipping, tax, payment type, etc.)

Social interaction hit: A social interaction is whenever a user clicks on a ReTweet button, +1 button, or Like button. If you want to know if people are clicking on social buttons then use this feature! Social interaction tracking must be manually implemented.

Customized user timings: User timings provide a simple way to measure the actual time between two activities. For example, you can measure the time between when a page loads and when the user clicks a button. Custom timings must be implemented with additional code.
Click to Learn More

Hit, Engagement (Engagement Hit): any hit that is not marked as “non interaction” and is not filled only with custom variable information. This means that the hit has at least page information, ecommerce transaction information, ecommerce item information, event information or social tracking information.
Click to Learn More

Internet of Things

The Business case for the Internet of Things — indeed for everything becoming network connected — lies in the positive outcomes enabled by leveraging the data collected from the instrumentation of the business as a whole. Its not just about collecting more data from existing applications, it’s about being able to obtain data in new ways. The combination of this new data and traditional analytics is expected to add Business value via ‘Insights’ that come from marring these two, previously isolated data sets, in real-time. Robust Data Flow management is required to handle the veracity, velocity, variety, and volume of IoT Data Streams when combining them with Insight Analytics.

Click to Learn More

Jank or Judder
Apps whose displays tend to jump raggedly during animations, scrolling, or other user interaction suffer from a performance issue.

The jank phenomenon can be avoided by ensuring that an app runs at a consistent sixty frames per second (60fps).
Click to Learn More

Monitoring KPIs and analytics are obviously helpful for assessing performance and efficacy, but they’re much more than that. “They are early indicators of where resources should be allocated, either to improve problems, or increase growth or revenue,” says Hugo Smoter, director of marketing for ecommerce site Spreadshirt. Keeping an eye on metrics can make sure you’re spending marketing dollars wisely and targeting the people who are most inclined to make a purchase.
Click to Learn More

Lead Generation (vs. Demand Generation)

Drives interest or inquiry into products or services. The goal is the collection of qualified connections to build relationships with and nurture until closed as a customer. It is focused on capturing their information and is a sales-centric (while demand generation is marketing-centric).

If your conversion requires sales interaction, negotiation, or longer sales cycles, lead generation is critical to target and acquire qualified sales leads nurtured through to a close.

If attempting to generate leads, though, may put more emphasis on the expertise of the company and how establishing a relationship between the two long-term would be great strategically.

How to Measure:
Lead scoring is critical – understanding whether the lead is ideal, has the available budget, is close to a purchase decision. Longer sales cycles, multi-step engagements, and enterprise sales require a lead generation strategy and process.


New contacts available for sales or marketing


Click to Learn More from Marketing Tech Blog

Click to Learn More from Lead For Mix Blog

Click to Learn More from Content Marketing Institute, 2014


A market consist of all the potential customers sharing a particular need or want who might be willing and able (i.e., propensity to) to engage in exchange to satisfy that need or want.

Click to Learn More from Kotler, P. (2004), Marketing Management, Prentice-Hall, Englewood Cliff, NJ.


Multidisciplinary (economics, psychology, sociology, history, statistics…) management process of identifying and satisfying consumer and organizational needs profitably. Marketing is the process of planning and executing the conception, pricing, promotion, and distribution of ideas, goods, and services to create exchanges that satisfy individual and organizational goals.

Transactional Marketing (Website) vs/and Relationship Marketing (Website)

Drucker Line:

  • “Business has only two functions — marketing and innovation. All the rest are costs.”
  • “The aim of marketing is to know and understand the customer so well the product or service fits him and sells itself.“
  • “The aim of marketing is to make selling unnecessary.”

Click to Learn More

The ultimate goal of marketing is to change consumer behavior. Thus, the true measure of marketing effectiveness is not how likely a customer is to buy, but whether and by how much marketing increases a customer’s likelihood to buy. For that reason, any method used for digital attribution must be based on estimates of incremental effects. In addition, the incremental effects must express causality, not just correlation.

Operational Intelligence (OI)

Operational Intelligence (OI) uses real time data processing and CEP to gain insight into operations by running query analysis against live feeds and event data. OI is near real time analytics over operational data and provides visibility over many data sources. The goal is to obtain near real time insight using continuous analytics to allow the organization to take immediate action.

Contrast this with operational business intelligence (BI) –  descriptive or historical analysis of operational data. OI real time analysis of operational data has much greater value.

Real time OI can also monitor social media allowing an organization the ability to react to negative activities (e.g., tweets or posts) to mitigate effects in a timely fashion before they snowball into something ugly and potentially damaging.
Click to Learn More

Optimization (part of KPI)
Characterized by maximizing efficiency
Click to Learn More

Optimization relies on predictive models that track non-linear relationships between specific goals and spend levels in order to “predict” the incremental changes in conversions based on the relationship between the variables. Many organizations attempt to “optimize” campaigns via A/B testing, a form of scenario analysis. Unfortunately A/B testing doesn’t address the complex non-linear interactions. An algorithmic approach that simultaneously analyzes all possible scenarios is needed to see which combinations produce the best incremental results.
Click to Learn More from Nimble Blog, July 29, 2014

Utilizing Scrolling website capabilities

Type of Parallax

  • Layers – Multiple background and foreground layers are defined which can move in horizontal or vertical directions, and scroll at various speeds, some are automatically controlled and others are dependent on user interaction, and can also be set in a composite.
  • Sprite – Combining many images or bitmaps in to pseudo-layers to create a single image, whereby a flat image can also appear to be three-dimensional, and where only one part of the image is displayed depending on the position.
  • Repeated pattern manipulation – Multiple tiles or screens appear to float over repeated backgrounds.
  • Raster – Lines of pixels within an image are typically composited and refreshed in a top-to-bottom order with a slight delay between drawing one line and the next line.

Click to Learn More

Pixel Pipeline

The full pixel pipeline

  • JavaScript. Typically JavaScript is used to handle work that will result in visual changes, whether it’s jQuery’s animate function, sorting a data set, or adding DOM elements to the page. It doesn’t have to be JavaScript that triggers a visual change, though: CSS Animations, Transitions, and the Web Animations API are also commonly used.
  • Style calculations. This is the process of figuring out which CSS rules apply to which elements based on matching selectors, e.g. .headline or .nav > .nav__item. From there, once rules are known, they are applied and the final styles for each element are calculated.
  • Layout. Once the browser knows which rules apply to an element it can begin to calculate how much space it takes up and where it is on screen. The web’s layout model means that one element can affect others, e.g. the width of the <body> element typically affects its children’s widths and so on all the way up and down the tree, so the process can be quite involved for the browser.
  • Paint. Painting is the process of filling in pixels. It involves drawing out text, colors, images, borders, and shadows, essentially every visual part of the elements. The drawing is typically done onto multiple surfaces, often called layers.
  • Compositing. Since the parts of the page were drawn into potentially multiple layers they need to be drawn to the screen in the correct order so that the page renders correctly. This is especially important for elements that overlap another, since a mistake could result in one element appearing over the top of another incorrectly.

Click to Learn More from Google

Per Request Attribution
Google Analytics – This attribution gives aggregate values for a single metric or for a metric/dimension pairing
Click to Learn More from Google

Pivots (part of KPI)
Pivots are characterized by maximizing learning
Click to Learn More

Developing internal stakeholders’ capabilities to integrate the new technology and process; predominately driven by change management and knowledge exchange between capabilities developers (consultants, vendors, internal consultants, cross-department teams, etc) and end-users.

For technology capabilities (such as Hadoop, Tableau, etc), a non-comprehensive list of needs goes like this:

– Provides error handling and failure recovery.
– Has logging and monitoring.
– Data security (encryption, access authorization…).
– Deployment into Production vs. UAT environment or data folder structure and deployment automation.
– Provide easy access to data for production support for investigation purposes.
– Disaster recovery (DR).
– Master Data management: policies around data frequencies, source availability
– Concepts of Data Quality: enforcement through metadata driven rules, hierarchies/attributes.
– Have a testing and integration procedure cycle: from unit testing to user acceptance testing.
– Multi-tenancy, with the assumption that the product of choice is shared across projects. Attention must be given to storage and processing capacity planning.
– Business process integration: policies around data frequencies, source availability.
– Lifecycle management: data retention, purge schedule, storage, archival.
– Metadata: data definition, catalog, lineage.
Click to Learn More

Programmatic Marketing
Sometimes refered to as (Flow Advertising)
Give the marketer access to more data for better decisions
Click to Learn More


A method of announcing your product or service using more dynamic means you can more easily modify or change. Examples include coupons; sales; celebrity endorsements; event, team or league sponsorships; contests; rebates; free samples; catalogs; social media; donations; and direct mail.

Click to Learn More


The full pixel pipeline
(Pixel Pipeline)

Used in conjunction with paint. This is because painting is actually two tasks: 1) creating a list of draw calls, and 2) filling in the pixels. The latter is called “rasterization” and so whenever you see paint records in DevTools, you should think of it as including rasterization. (In some architectures creating the list of draw calls and rasterizing are done in different threads, but that isn’t something under developer control.)
Click to Learn More

Real Time Data Processing
Real time data processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real time). Radar systems, customer services and bank ATMs are examples.
Real time data processing and analytics allows an organization the ability to take immediate action for those times when acting within seconds or minutes is significant. The goal is to obtain the insight required to act prudently at the right time – which increasingly means immediately.
Click to Learn More


Service-Oriented Architecture (SOA)

Service Oriented Architecture (SOA) is a business-centric IT architectural approach that supports integrating your business as linked, repeatable business tasks, or services
Click to Learn More

A collection of hits, from the same user, grouped together. By default, most analytics tools, including Google Analytics, will group hits together based on activity. When the analytics tool detects that the user is no longer active it will terminate the session and start a new one when the user becomes active.
Click to Learn More

Shapley value

It describes a way to assign credit among a group of players who cooperate for a certain end

Truth is, most of your target customers are probably exposed to different messages on different channels. Yesterday a banner ad and a TV spot, today a search ad, two more banners, an e-mail and a magazine spread. More money, more problems. You’d like to know what’s working and what isn’t — that is, what’s contributing how much to that final sale — and thus: attribution modeling.

There are still important things no attribution model can do (yet, as of June 2016):

  1. Include all relevant data — offline media and online media viewed on different devices are generally not in the model (although Google’s Universal Analytics is starting to bridge this gap)
  2. Measure brand or social impact — attribution works best with e-commerce variables such as online sales; it doesn’t incorporate success metrics important to brand advertisers, such as consideration, or positive impact on social conversation
  3. Go very deep — the models work best at the channel level (e.g., paid search vs. e-mail) and get less useful, even irrelevant, as you drill down into detail (e.g., sites, placements, sizes, creative versions)
  4. Tell you why — you may find out e-mail contributed a lot more than you thought, but you’re on your own figuring out why
  5. Be easy to act on — without knowing why something worked (or didn’t), you may find yourself with more questions to deal with (e.g., “Would a cat have done better in that video banner?”), before you’re comfortable shifting budget around

Click to Learn More how Google uses this theory in Google Analytics

Click to Learn More of applying Shapley Value from Michael Conklin and Stan Lipovetsky (2000)

Shared Information Model

In order to scale, system software architecture must be organized around a common “shared information model” that spans multiple systems. This leads us to the principle of “data-oriented” design: expose the data and hide the code.

As more devices and systems get woven into the fabric of our networked world, the scale and the complexity of integration is growing at a rapid pace. Existing methodologies and training for system software design, rooted in principles of object-oriented design, that worked superbly for small scale systems begin to break down as organizations discover operational limits which requires frequent and unintended redesigns in programs year over year. Fundamentally, object-oriented thinking leads firms to think in terms of tightly-coupled interactions that include strong state assumptions. Large scale distributed systems are often a mix of subsystems created by independent parties, often using different middleware technologies, with misaligned interfaces. Integrating such sub-systems using object-oriented thinking poses some fundamental challenges:
(1) it is brittle to incremental and independent development, where interfaces can change without notice;
(2) there is often an “impedance mis-match” between sub-systems in the quantity and the quality of information that must be exchanged between the two sides;
(3) there is a real need to dynamically adapt in real-time to network topology reconfigurations and failures;
(4) scalability, performance, and up-time cannot always be compromised in this dynamic environment .

A different paradigm is needed in order to address these new challenges in a systematic manner. As the scale of the integration and complexity grows, the only unifying common denominators between disparate sub-systems (generally numbering more than two) are:
(1) the data they produce and consume;
(2) the services they use and offer.

Click To Learn More from 2007

Software Development Kit (SDK)
Instead of using JavaScript to collect data as you do on a website, you’ll use an SDK, or Software Development Kit, to collect data from your mobile app. There are different SDKs for different operating systems, including Android and iOS.
Click to Learn More

Spurious Correlations
The traditional correlation is referred to as the weak correlation, as it captures only a small part of the association between two variables: weak correlation results in capturing spurious correlations and predictive modeling deficiencies, even with as few as 100 variables. In short, our strong correlation(with a value between 0 and 1) is high (say above 0.80) if not only the weak correlation is also high (in absolute value), but when the internal structures (auto-dependencies) of both variables X and Y that you want to compare, exhibit a similar pattern or correlogram.
Click to Learn More

A tag is a piece of code — often JavaScript — that can read cookies and drop cookies onto a user’s device or perform other tasks. A tag is typically used to register a page visit or collect basic information such as transaction amount

Tag code can be meant to gather analytics data, or to cookie a site visitor you plan to retarget later, or perhaps it serves advertising. Some of these are colloquially called “pixels,” because they serve the same purpose as an old-school 1×1 pixel “clear gif” or “transparent gif” — they call to a third-party server and allow for the tracking of something.

Tag placement requires access to the site code, and as such, requires the time of an IT person. Typically, those people sit on a different team from marketing and a conflict of time and resources can occur.

Alternative to Tag
The alternative then is to pass this data through a backend process, often called server-to-server integration or server-direct. Now, when the tag fires, live basic data can go through the cookie, and the supplemental data through the backend connection (note that cookie mapping is required). This keeps the load for the browser light, so the user experience isn’t hindered despite all the additional data being used.

Some even go a step further, collecting all the information themselves for all the tags, and passing everything through the servers.
Click To Learn More

The tools best-guess of an anonymous person.
To measure a user on a website almost all analytics tools use a cookie.
Click To Learn More

User Inertia

Where for business users it’s simply easier to select applications from a given cloud provider via drop down menu than source a much better alternative. So applications that might be good enough—but not great—may solve a need today, but there’s no thinking as to whether those same applications are the truly the best fit, or can meet tomorrow’s needs.

Because of vendor lock-in, data gravity, and user inertia for application use, there’s a real danger that companies that once cobbled together various best of breed technologies for competitive advantage now must rely on a monolithic cloud entity to introduce innovative technologies at a fast enough pace to keep up with changing market conditions. This is a risky proposition.

Click to Learn More

Click to Learn More

Web Conference
Tends to be smaller groups of people and tend to be much more interactive in nature. There tends to be application/desktop/screen sharing with a much higher likelihood for transfer of control to participants. Other features may involve user polling, meeting recording, and online collaboration via the use of a whiteboard and drawing tools.
Click to Learn More

Very similar to other forms of “broadcasting” information, because they use the internet to broadcast information that is either live or has been recorded. This can be either audio and or video transmissions. As an example a company may broadcast a press conference at a later date, in addition to doing the live call.
Click to Learn More


Involve larger groups of participants, usually involve power points or slide shows, and are not as interactive. They may involve multiple presenters, presenting to the group. They often require user registration, in advance via email. They normally utilize self-branding, where the general idea is to use your own company’s logo, etc. to brand the conference room.

Although like a web conference, Webinar’s are much less likely to have control passing, remote control, or video conferencing. Vendors will many times price their product differently for webinar’s, figuring they will involve larger groups of people, and you will want more sophisticated usage/signup/statistical reporting after the webinar.
Click to Learn More