@Kenneth Jensen (IBM) So simple yet so elegant solution. I'm just in my first steps of learning Modeler - it is challenging to get out of Excel mode.
I tried with 150,000 houses and 500 stations. Works perfectly.
THANKS!!!Thu, 01 Aug 2019 15:13:01 GMTStudent@paceAnswer by Kenneth Jensen
In Modeler, I think that the best way of do this by creating a *long* data set rather than a *wide* data set as you have in the Excel Workbook.
I have solved it as follows and you can find the solution in the [attached file][1]:
1. `Import` each of the two Excel sheets with the houses and with the gas stations
2. `Merge` the two sheets using the "Keys" merge method but without selecting any of the fields as keys. This creates the cartesian product of the two sheets; each house is combined with each gas station and with 100 houses and 84 gas stations, you end up with a data set that has 100 x 84 = 8,400 records.
3. Use a `Derive`node to compute the distance for each record or each combination of house and gas station. I have used the same formula as used in the Excel Workbook, but please make sure that is accurate.
4. Use a `Distinct` node to select the record with the shortest distance for each house, which should leave you with one record per house (100 records) and the ID of the closest gas station as well as the distance.
One issue with this approach is that the size of the data set generated in step 2 grows exponentially as the number of houses and/or gas stations grows and if you approach the limits of your environment, you may need to deploy a more sophisticated search strategy than what I have described here.
[1]: /answers/storage/temp/28980-minimum-distance.zip