I have read the documentation but it is not clear. Please give a simple example
Training data quality standards The file must contain at least 49 unique questions. The number of records must be at least 50 times the number of fields that are identified in your Solr configuration. For example, if your collection defines five fields, you must have at least 250 records in your training data. At least two different relevance labels must exist in the data and those labels must be well represented. A label is well represented if it occurs at least once for every 100 unique questions. For example, you have 300 unique questions and you use a relevance scale of 1,2,3,4. At least two distinct labels (for example, 1 and 4) must each appear three times (0.01 X 300) in the training data. Do not use zero (0) in your relevance scale
Have you tried the Cranfield example described here? https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/retrieve-rank/get_start.shtml#create-collection It provides sample data to index and corresponding training data to train a ranker with. You can model your training data after this example.
Is the Web Tooling provided here any help: https://www.ibm.com/watson/developercloud/doc/retrieve-rank/ranker_tooling.html ? It contains a tutorial as part of the work flow which will walk through the setup for training a ranker as well as evaluating the final performance. In addition, the tooling defaults a lot of the parameters (like number of answers per query) to a reasonable set of values in the background when helping you generate the training data.
Other specific notes based on some things you mentioned:
This seems like a really small set of documents in your corpus.
My collection has 150 documents
Retrieve and Rankis designed for the long tail setting (i.e. lots of potential documents in the corpus with lots of unique queries...each query is answerable by a handful of different documents from the corpus). As a result, it is difficult to know about all types of queries which might come during runtime, so we train on a small set of labelled queries and hope for generalizations that will apply to other queries. This is not to say that your approach will not work on a 150 document collection, but it might be helpful to entertain other options. See here for a discussion around using NLC: https://developer.ibm.com/answers/questions/310671/retrieve-and-rank-at-what-point-of-training-can-we/#answer-310893%29
See here for a discussion on 'amount of training data' required. It assumes you're approaching this through the tooling: https://developer.ibm.com/answers/questions/314977/retrieve-rank-best-practices-for-training.html#answer-315084
How many lines my CSV should have ?
This is not strictly true. You should think of
Because retrieve and rank results completely depend on how well u train the ranker .
Retrieve and Rankas a two phased approach: 1. The first phase,
Retrieve, uses Solr to gather candidate answers from the corpus based on term overlap with the keywords in the query. Improving the performance of this first phase is dependent on setting up the indexing, analyzers etc correctly (similar to any other Solr implementation).
Rankis where the training data comes in. It is generating a [learning-to-rank] model that will hopefully generalize to the long tail of queries that come the system's way during production usage. : https://en.wikipedia.org/wiki/Learning_to_rank