Overview

Skill Level: Any Skill Level

This shows how to split input data to multiple output files depending on the input record value using External Target stage.

Ingredients

DataStage with Parallel Engine installed.

Step-by-step

  1. Prepare an input file and shell script in Engine tier

    1.1. Input file.

    aaa,bbb,1
    1,A,2
    2,B,3
    "aaa",c,4
    aaa,,5
    1,AA,
    2,,33
    aaa,C,

    Note:

    – Comma is the field separator.

    – First field is to be used as output file name.

     

    1.2. Simple awk script

    #! /bin/sh

    awk -F, -v outdir="$1" 'NF >= 2 {
    gsub(/\"/, "", $1)

    if (prevfile != $1) {
    outfile = sprintf("%s/%s", outdir, $1)
    if (printed == 1) { close(outfile); printed = 0 }
    }
    prevfile = $1

    if (NF > 2) {
    for (i = 2; i < NF; ++i) {
    printf("%s,", $i) >> outfile
    }
    }
    print $NF >> outfile
    if (printed == 0) { printed = 1 }
    }'

     

    Note: below sample is taken from DataStage on Windows. You can use the same technique for DataStage on Unix/Linux.

    splitfile1

  2. Create a parallel job to call the script in External Target stage

    splitfile2

     

    splitfile3

     

    splitfile4

     

    splitfile5

     

    splitfile6

     

    splitfile7

     

    splitfile8

     

    splitfile9

     

    splitfile10

     

  3. Run the job

    splitfile11

     

    splitfile12

     

    splitfile13

Join The Discussion