Overview

Skill Level: Beginner

This shows how to split input data to multiple output files depending on the input record value using External Target stage.

Ingredients

DataStage with Parallel Engine installed.

Step-by-step

  1. Prepare an input file and shell script in Engine tier

    1.1. Input file.

    aaa,bbb,1
    1,A,2
    2,B,3
    "aaa",c,4
    aaa,,5
    1,AA,
    2,,33
    aaa,C,

    Note:

    – Comma is the field separator.

    – First field is to be used as output file name.

     

    1.2. Simple awk script

    #! /bin/sh

    awk -F, -v outdir="$1" 'NF >= 2 {
    gsub(/\"/, "", $1)

    if (prevfile != $1) {
    outfile = sprintf("%s/%s", outdir, $1)
    if (printed == 1) { close(outfile); printed = 0 }
    }
    prevfile = $1

    if (NF > 2) {
    for (i = 2; i < NF; ++i) {
    printf("%s,", $i) >> outfile
    }
    }
    print $NF >> outfile
    if (printed == 0) { printed = 1 }
    }'

     

    Note: below sample is taken from DataStage on Windows. You can use the same technique for DataStage on Unix/Linux.

    splitfile1

  2. Create a parallel job to call the script in External Target stage

    splitfile2

     

    splitfile3

     

    splitfile4

     

    splitfile5

     

    splitfile6

     

    splitfile7

     

    splitfile8

     

    splitfile9

     

    splitfile10

     

  3. Run the job

    splitfile11

     

    splitfile12

     

    splitfile13

Join The Discussion