Hi All,
I am trying to monitor a specific directory and process the files placed there. As part of the initial processing I want to ignore the header (1 line) in each file. DirectoryScan and the Filesource using the hasHeaderLine paramter seemed the perfect solution. In unit testing I was using a single file and it worked fine but now in live testing with multiple files I can see that the hasHeaderLine is only applied to the first file. Subsequent files the 1st line is passed through.
Sample code below (I've tried 1u, true and 2u as potential values)
stream<rstring filename> InputFiles = DirectoryScan()
{
param
directory : "/opt/var/source" ;
}
stream<InputRecordStructure> InputRecords = FileSource(InputFiles)
{
param
format : csv ;
separator : ",";
parsing : fast;
ignoreExtraCSVValues : true;
hasHeaderLine : 1u;
moveFileToDirectory : "/opt/var/processed";
}
Answer by Bruce Glassford (912) | Jan 13, 2017 at 09:13 AM
Which version are you running? I just did some tests on 4.2.0.2 on my RH6 system here (since this could be a problem for some of our analytics), and it worked fine.
Answer by Ruleman (127) | Jan 13, 2017 at 04:47 PM
This is indeed a known defect in release 4.1.1. It looks like it was fixed in 4.1.1 fix pack 2. That was September 2016, right around the same time as the release of 4.2.0. I don't know for sure if the fix is in 4.2.0.0, but by Bruce's observation it's definitely in 4.2.0.2.
DirectoryScan and FileSource operator 1 Answer
FileSource file permissions issue 2 Answers
How to write incorrectly formatted tuples from FileSource into DB ? 6 Answers
New manual record added in "FILESOURCE" operator is not moving to "FILESINK" operator 1 Answer
Running several FileSource operators 2 Answers