Things I learned for browsing websites

Good Anecdote from HBR review : Machine learning excels at predicting things. It can inform decisions that hinge on a prediction, and where the thing to be predicted is clear and measurable.

Power of Analytics - Predictive Analytics is about tomorrow


1. Study One Factor — Using basic spreadsheet software, study historic trends in your business to forecast expected revenue tomorrow, next week, or next year, which is useful for setting budgets and goals. Data scientists call this kind of analysis “univariate time series” because you look at only one variable over time, ignoring how other factors might come into play. For example, you might look at the timing of offers you have made and how well they have done.

2. Study Two Factors — Begin using what is called “correlation analysis” to predict customer behavior, and start gaining control over future revenue. Correlation analysis looks at two trends or factors to see how they relate and whether one might be able to predict the other. You can use ordinary spreadsheet software. For example, you might add holidays and the school-year calendar to your analysis in step one. Then, you may notice a correlation between the start of spring break and how successful your offer was. You see the opportunity to make timing decisions regarding your offers that take into account a greater awareness of the customer’s needs.
3. Study Three or More Factors — Known as “multivariate regression,” some of this can be done with spreadsheets, but at this stage, most companies turn to specialized data-driven marketing software. Most spreadsheet software has limitations; if your software lets you have a million rows, it will not be enough if you have 10 million customers. But here, you can start to see the power this analysis can bring to the table. Using our example above, what if you added household income, number of children, and children’s ages to the analysis? You can see how you could more accurately target your ideal customer and properly allocate precious marketing resources.
4. Leverage Real-Time Data — Imagine using multivariate analysis based on data collected in real time, predicting customers’ behaviors instantly, and delivering the appropriate content at the moment they need to see it. This is the most advanced level of analysis, and it only scratches the surface of what is possible.


Hype around machine learning 

Machine learning experts wanted to spend their time building models, not processing massive datasets or translating business problems into prediction problems. Likewise, the current technological landscape, both commercial and academic, focuses on enabling more sophisticated models (via Latent variable models), scaling model learning algorithms (via distributed compute), or fine-tuning (via Bayesian hyper optimization)—essentially all later stages of the data science pipeline.

If companies want to get value from their data, they need to focus on accelerating human understanding of data, scaling the number of modeling questions they can ask of that data in a short amount of time, and assessing their implications. In our work with companies, we ultimately decided that creating true impact via machine learning will come from a focus on four principles:

Stick with simple models: We decided that simple models, like logistic regression or those based on random forests or decision trees, are sufficient for the problems at hand. The focus should instead be on reducing the time between the data acquisition and the development of the first simple predictive model.

Explore more problems: Data scientists need the ability to rapidly define and explore multiple prediction problems, quickly and easily. Instead of exploring one business problem with an incredibly sophisticated machine learning model, companies should be exploring dozens, building a simple predictive model for each one and assessing their value proposition.

Learn from a sample of data-not all the data: Instead of focusing on how to apply distributed computing to allow any individual processing module to handle big data, invest in techniques that will enable the derivations of similar conclusions from a data subsample. By circumventing the use of massive computing resources, they will enable the exploration of more hypotheses.

Focus on automation: To achieve both reduced time to first model and increased rate of exploration, companies must automate processes that are normally done manually. Over and over across different data problems, we found ourselves applying similar data processing techniques, whether it was to transform the data into useful aggregates, or to prepare data for predictive modeling—it’s time to streamline these, and to develop algorithms and build software systems that do them automatically.

For example, marketers often compare customer lifetime value with the cost of acquiring a customer. The problem is that customer lifetime value relies on a prediction of the net profit from a customer (so it’s largely unobserved and uncertain), while the business has much more control and certainty around the cost of acquiring a customer (though it’s not completely known). Treating the two values as if they’re observed and known is risky, as it can lead to major financial losses.


Once you’ve recognised your skill gaps, you may decide to hire a data scientist to help you get more value out of your data. However, despite the hype, data scientists are not magicians. In fact, because of the hype, the definition of data science is so diluted that some people say that the term itself has become useless. The truth is that dealing with data is hard, every organisation is somewhat different, and it takes time and commitment to get value out of data. The worst thing you can do is to hire an expensive expert to help you, and then ignore their advice when their findings are hard to digest. If you’re not ready to work with a data scientist, you might as well save yourself some money and remain in a state of blissful ignorance.


10 text mining examples can give you an idea of how this technology is helping organizations today.
1 – Risk management

No matter the industry, Insufficient risk analysis is often a leading cause of failure. This is especially true in the financial industry where adoption of Risk Management Software based on text mining technology can dramatically increase the ability to mitigate risk, enabling complete management of thousands of sources and petabytes of text documents, and providing the ability to link together information and be able to access the right information at the right time.

2 – Knowledge management

Not being able to find important information quickly is always a challenge when managing large volumes of text documents—just ask anyone in the healthcare industry. Here, organizations are challenged with a tremendous amount of information—decades of research in genomics and molecular techniques, for example, as well as volumes of clinical patient data—that could potentially be useful for their largest profit center: new product development.  Here, knowledge management software based on text mining offer a clear and reliable solution for the “info-glut” problem.

3 – Cybercrime prevention

The anonymous nature of the internet and the many communication features operated through it contribute to the increased risk of  internet-based crimes. Today, text mining intelligence and anti-crime applications are making internet crime prevention easier for any enterprise and law enforcement or intelligence agencies.

4 – Customer care service

Text mining, as well as natural language processing are frequent applications for customer care. Today, text analytics software is frequently adopted to improve customer experience using different sources of valuable information such as surveys, trouble tickets, and customer call notes to improve the quality, effectiveness and speed in resolving problems. Text analysis is used to provide a rapid, automated response to the customer, dramatically reducing their reliance on call center operators to solve problems. 

5 – Fraud detection through claims investigation

Text analytics is a tremendously effective technology in any domain where the majority of information is collected as text. Insurance companies are taking advantage of text mining technologies by combining the results of text analysis with structured data to prevent frauds and swiftly process claims.

6 – Contextual Advertising

Digital advertising is a moderately new and growing field of application for text analytics. Here,  companies such as Admantx have made text mining the core engine for contextual retargeting  with great success. Compared to the traditional cookie-based approach, contextual advertising provides better accuracy, completely preserves the user’s privacy.

7 – Business intelligence

This process is used by large companies to uphold and support decision making. Here, text mining really makes the difference, enabling the analyst to quickly jump at the answer even when analyzing petabytes of internal and open source data. Applications such as the Cogito Intelligence Platform (link to CIP) are able to monitor thousands of sources and analyze large data volumes to extract from them only the relevant content.

8 – Content enrichment

While it’s true that working with text content still requires a bit of human effort, text analytics techniques make a significant difference when it comes to being able to more effectively manage large volumes of information. Text mining techniques enrich content, providing a scalable layer to tag, organize and summarize the available content  that makes it suitable for a variety of purposes.

9 – Spam filtering

E-mail is an effective, fast and reasonably cheap way to communicate, but it comes with a dark side: spam. Today, spam is a major issue for  internet service providers, increasing their costs for service management and hardware\software updating; for users, spam is an entry point for viruses and impacts productivity. Text mining techniques can be implemented to improve the effectiveness of statistical-based filtering methods

10 – Social media data analysis

Today, social media is one of the most prolific sources of unstructured data; organizations have taken notice. Social media is increasingly being recognized as a valuable source of market and customer intelligence, and companies are using it to analyze or predict customer needs and understand the perception of their brand. In both needs Text analytics can address both by analyzing large volumes of unstructured data, extracting opinions, emotions and sentiment and their relations with brands and products.