Data & AI

IBM Watson Retrieve and Rank service

Staged annually at IBM Client Center in Rome-Italy, LabArt is the leading event for the cloud and cognitive solutions, presenting a diverse range of IBM cloud services for IoT, Bluemix and Watson to customers and business partners.

I will participate as tech expert and this post reflects what I will show during the event. I will have a desk where to explain and to play with the customers and business partners regarding Watson services and in details with Retrieve and Rank service.

IBM Watson Retrieve and Rank service helps users find the most relevant information for their query by using a combination of search and machine learning algorithms to detect “signals” in the data. You can load data into the service, train a machine learning model based on known relevant results, then leverage this model to provide improved results to their end users based on their question or query.

Using the Retrieve and Rank Service
Using the Retrieve and Rank Service

The purpose of the IBM Watson Retrieve and Rank service is to help you find documents that are more relevant than those that you might get with standard information retrieval techniques. The service is based on two different phases:

  1. Retrieve: it is based on Apache Solr, in this phase you can send runtime queries
  2. Rank: The rank component (ranker) creates a machine-learning model trained on your data. You call the ranker in your runtime queries to use this model to boost the relevancy of your results with queries that the model has not previously seen.

The service combines several proprietary machine learning techniques, which are known as learning-to-rank algorithms. During its training, the ranker chooses the best combination of algorithms from your training data. Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems.

The Solr cluster is populed with a collection, it is a logical index of the data in your documents, in my POC the data collection is composed by solutions, troubleshooting, suggestions on how to resolve an e-mail configuration problem. This is a portion of my Solr collection:

Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. “relevant” or “not relevant”) for each item.

To train IBM Watson Retrieve and Rank service can be used a CSV file, each row in the file represents a possible answer to a question. The row contains the question identifier, the feature scores and it also contains a label indicating whether it is the right answer. The vast majority of rows in the file are for wrong answers with a smaller percentage being the correct answer. The two important “columns” in the file are the first column that contains a unique question id and the last column that contains the label.

In my previous article I presented a Proof Of Concept – POC – (here) to use the Retrieve and Rank service as support to IBM Control Desk to address the service requests regarding e-mail client configuration problem.

I created a custom collection on how to troubleshoot e-mail problems. This is a portion of my Solr collection, where I added two different tech notes from Microsoft support and TechRepublic support. A customer support operator can manage a service request against an e-mail problem (by Control Desk) retrieving the best rank information present in the collection:

During the rank phase you can create a relevance file in CSV format from your ground truth. Make sure that the relevance file meets the training data quality standards. The relevance file that you use with the script takes the following CSV format:


The IBM Watson Retrieve and Rank service is ideal for applications that must index and search content scalably in a cloud environment. The IBM Watson Retrieve and Rank service supports nearly all of the Solr indexing and search APIs, and can be used largely as a replacement for your existing Solr solutions. You benefit from new features developed both by the open source community and from advanced information retrieval techniques that are built by the Watson algorithm teams, each data cluster and ranker is highly available in the Bluemix environment.

See you on April 14 at LabArt in Rome.


Show More

Related Articles

Lascia un commento

Il tuo indirizzo email non sarĂ  pubblicato. I campi obbligatori sono contrassegnati *

Back to top button
HTML Snippets Powered By :