IBM Watson Retrieve and Rank service
Staged annually at IBM Client Center in Rome-Italy, LabArt is the leading event for the cloud and cognitive solutions, presenting a diverse range of IBM cloud services for IoT, Bluemix and Watson to customers and business partners.
I will participate as tech expert and this post reflects what I will show during the event. I will have a desk where to explain and to play with the customers and business partners regarding Watson services and in details with Retrieve and Rank service.
IBM Watson Retrieve and Rank service helps users find the most relevant information for their query by using a combination of search and machine learning algorithms to detect “signals” in the data. You can load data into the service, train a machine learning model based on known relevant results, then leverage this model to provide improved results to their end users based on their question or query.
The purpose of the IBM Watson Retrieve and Rank service is to help you find documents that are more relevant than those that you might get with standard information retrieval techniques. The service is based on two different phases:
- Retrieve: it is based on Apache Solr, in this phase you can send runtime queries
- Rank: The rank component (ranker) creates a machine-learning model trained on your data. You call the ranker in your runtime queries to use this model to boost the relevancy of your results with queries that the model has not previously seen.
The service combines several proprietary machine learning techniques, which are known as learning-to-rank algorithms. During its training, the ranker chooses the best combination of algorithms from your training data. Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems.
The Solr cluster is populed with a collection, it is a logical index of the data in your documents, in my POC the data collection is composed by solutions, troubleshooting, suggestions on how to resolve an e-mail configuration problem. This is a portion of my Solr collection:
Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. “relevant” or “not relevant”) for each item.
To train IBM Watson Retrieve and Rank service can be used a CSV file, each row in the file represents a possible answer to a question. The row contains the question identifier, the feature scores and it also contains a label indicating whether it is the right answer. The vast majority of rows in the file are for wrong answers with a smaller percentage being the correct answer. The two important “columns” in the file are the first column that contains a unique question id and the last column that contains the label.
In my previous article I presented a Proof Of Concept – POC – (here) to use the Retrieve and Rank service as support to IBM Control Desk to address the service requests regarding e-mail client configuration problem.
I created a custom collection on how to troubleshoot e-mail problems. This is a portion of my Solr collection, where I added two different tech notes from Microsoft support and TechRepublic support. A customer support operator can manage a service request against an e-mail problem (by Control Desk) retrieving the best rank information present in the collection:
"add" : { "doc" : { "id" : 20002, "author" : "Microsoft support", "bibliography" : "tech note MS2", "title": "create and configure an email profile in Outlook", "body" : "create and configure an email profile in Outlook: Step 1: Open the Mail Setup dialog box Click Start, click Run, type Control in the Open box, and then click OK. Depending on the version of Windows running on your computer, do one of the following: Windows XP: If you are in the Category View, click User Accounts, and then click Mail. If you are not in the Category View, double-click Mail. Windows Vista: Click User accounts, and then click Mail. The Mail Setup dialog box opens. Step 2: Start the New Profile wizard Click Show Profiles. Click Add to start the New Profile wizard. Step 3: Create a profile In the Profile Name box, type Test, and then click OK to name the new e-mail profile. Follow the steps appropriate for your version of Outlook: Microsoft Office Outlook 2010 Click to select the Manually configure server settings check box. Click Next. On the Choose Service page, click Internet E-mail. Click Next. Fill in the boxes in the Internet E-mail Settings dialog box. Make sure that the Account Type setting is set to POP3. Note Enter the information from your ISP or from your e-mail administrator in the Incoming mail server box and in the Outgoing mail server (SMTP) box. Click Next, follow the prompts to finish setting up your account, and then click Finish. Your new profile is created. Go to step 4. Microsoft Office Outlook 2007 Click to select the Manually configure server settings check box. Click Next. On the Choose E-mail service page, click Internet E-mail. Click Next. Fill in the boxes in the Internet E-mail Settings dialog box. Make sure that the Account Type setting is set to POP3. Note Enter the information from your ISP or from your e-mail administrator in the Incoming mail server box and in the Outgoing mail server (SMTP) box. Click Next, follow the prompts to finish setting up your account, and then click Finish. Your new profile is created. Go to step 4. Microsoft Office Outlook 2003 and earlier versions of Outlook Click Add a new e-mail account. Click Next. Click POP3. Click Next. Fill in the boxes in the Internet E-mail Settings dialog box. Make sure that the Account Type setting is set to POP3. Note Enter the information from your ISP or from your e-mail administrator in the Incoming mail server and Outgoing mail server (SMTP) boxes. Click Next. Click Finish. Your new profile is created. Go to step 4. Step 4: Set the default profile On the Mail dialog box, under the When Starting Microsoft Outlook, use this profile box, click to select the new profile that you created in step 3. Click OK. Use Outlook to send yourself an e-mail. If you successfully receive the e-mail, you have completed troubleshooting the problem. If you do not receive the e-mail, creating a new profile did not resolve your problem. Try method 2. Note If you use Dial-Up Networking to connect to the Internet, unfortunately, this article will not be able to help you further. Refer to the Next steps section for additional options." } }, "add" : { "doc" : { "id" : 20003, "author" : "TechRepublic", "title": "tips for Outlook mai", "bibliography" : "tips for Outlook mail", "body" : "Outlook is currently the de facto standard email/calendaring client in the business world. Generally speaking, it works like a champ. But there are times when Outlook goes down in a ball of flames. When that happens, if you don't have a bevy of tricks to pull out of your pocket, you might find yourself in a world of pain. But troubleshooting Outlook doesn't have to be a nightmare. In fact, you can almost script out the troubleshooting process with these 10 handy tips. 1: Scan PST Those PST files will inevitably develop errors. When they do, they can prevent Outlook from working properly. When Outlook is starting to fuss, one of the first things I do is run scanpst.exe against each PST file used within Outlook. But be warned: Scan PST can take some time to run. It has to back up your data file, scan for errors, and repair any errors found. If the data file is large, this process can take quite some time. To run Scan PST, you'll need to locate the scanpst.exe executable. (Its location will depend upon the version of Windows being used. } }, "commit" : { } }
During the rank phase you can create a relevance file in CSV format from your ground truth. Make sure that the relevance file meets the training data quality standards. The relevance file that you use with the script takes the following CSV format:
“{question}”,”{answer_id1}”,”{relevance_label1}”,”{answer_id2}”,”{relevance_label2}”,”{…}”
The IBM Watson Retrieve and Rank service is ideal for applications that must index and search content scalably in a cloud environment. The IBM Watson Retrieve and Rank service supports nearly all of the Solr indexing and search APIs, and can be used largely as a replacement for your existing Solr solutions. You benefit from new features developed both by the open source community and from advanced information retrieval techniques that are built by the Watson algorithm teams, each data cluster and ranker is highly available in the Bluemix environment.
See you on April 14 at LabArt in Rome.
One Comment