Collections
A collection is an IBM® Streams Processing Language (SPL) component that can be used to represent complex arrangements of streaming data. Collections are supported by built-in SPL functions to access and manipulate the collection contents.
Quick links
Overview
- Set
- Map
- List
As with primitive types such as strings, SPL implements collection types of bounded length. Bounded-length variants follow the same rules as
bounded strings. For example, list<int32>[2]
can store a maximum of two
integers. You can declare bounded-size types with unbounded elements, such as
list<rstring>[3]
, though doing so does not offer the same marshalling
optimization opportunities. SPL limits all bounded or unbounded collections to a maximum of
231-1 elements. For types with bounded size, the bounds must be compile-time constants.
You can also declare bounded collections that contain unbounded types, such as a bounded list of
rstring
. From a developer perspective, there is little difference between bounded
and unbounded types. However, there are some compile-time optimization advantages when bounded types
are used.
listLiteral ::= ‘[
' expr*, ‘]
' # for example, [0, x, x+1, x+2]
mapLiteral ::= ‘{
' (
expr ‘:
' expr )
*, ‘}
' # for example, { "Mon": −1, "Fri": 1 }
setLiteral ::= ‘{
' expr*, ‘ }
' # for example, { "IBM", "GOOG"}
Composite types have their own operators. For more information, see Expression language. You can subscript a list or map with
l[i]
, check membership in a collection with "x in s
", iterate over
a collection with "for(T x in s) ...
", and so on. There are also various functions
to work on collections; for example, setUnion(set1, set2)
or
setIntersection(set1,
set2)
.For
maps, the left operand of the in
operator is the key. In other words, "Oz"
in m
checks whether "Oz"
is a key of the map m
. Value (as
opposed to key) membership tests use functions, not the in
operator.
The list collection
Lists are sequence containers with their contents ordered linearly. Lists allow random, zero-based access to their contents. In simple terms, they can be thought of as dynamic arrays, allowing access to individual list items; but at the same time, they allow for dynamic allocation of new entries where the list is expanded or contracted as needed.
One more advantage over traditional arrays is that lists in SPL provide for tighter boundary checking, ensuring that requests to access list elements beyond the list boundary are prohibited.
T
is the type of the elements, and n
is the size for bounded lists.
Constructor | Description |
---|---|
list< T> |
unbounded |
list< T>[ n] |
bounded |
The following example of a list of fruits shows a one-dimensional list that contains six
rstring
elements.
...
mutable rstring myFruit;
// Immutable - must be initialized upon declaration
list<rstring> fruit = ["apple", "orange", "banana", "pear", "pineapple", "apple"];
myFruit = fruit[1]; // orange
...
The set collection
Sets are unordered containers that enable the retrieval of their contents based on their value (the value of a set element is also its key). Sets store unique items (no duplicates). After an element is added to a set, it cannot be modified. However, list elements can be added or removed and the set expands and contracts as needed.
T
is the
type of the elements, and n
is the size for bounded sets.
Constructor | Description |
---|---|
set< T> |
unbounded |
set< T>[ n] |
bounded |
The following example of a set of vehicles shows a simple set of five rstring
elements.
set <rstring> vehicles = {"car", "truck", "airplane", "boat", "bicycle"};
The map collection
Maps are implemented as associative containers that store elements in a key-to-value pairing. Like sets, a map collection requires that all keys are unique. Key and value items can be declared as different types with full support for nested types (even keys can be of any type, including other composite types).
T
is the
type of the elements, and n
is the size for bounded maps.
Constructor | Description |
---|---|
map< K,V> |
unbounded |
map< K,V>[ n] |
bounded |
int:rstring
and an rstring:int
element.int32
key.map <int32, rstring> monthsYear = {1 : "January", 2 : "February", 3 : "March", ...};
map <rstring, int32> daysMonth = {"January" : 31, "February" : 28, "March" : 31, ...};
Working with collections
- Access operators
- After you declare the members that are contained in a collection, Streams provides a number of simple access operators. The three basic mechanisms to work with collection
items are as follows.
l[i]
to access items that are stored in a list.-
in
to check membership of a collection. -
for(T x in s) ...
to iterate over the members of a collection.Note: As afor-loop
iterates over a collection, that collection becomes immutable for the body of thefor-loop
.
Note: When lists or maps are accessed by using thein
operator, the operator on the left is the access key. If you want to work with the value rather than the key, you must use the appropriate SPL collection function rather than thein
operator. - SPL collection functions
- In addition to the access operators, the Streams Processing Language standard toolkit provides several functions that can be used to manipulate collections as part of
SPL expressions. SPL functions are either declared as mutable or immutable (non-mutating). A mutating function
attempts to modify the contents of a passed input parameter (a collection, in this case). A
non-mutating function returns a new object (a collection) that contains the function results. The
following listing shows an
example.
<list T> public T concat(T values1, T values2) // Returns a new list of values concatenating the contents of values1 and values2 <list T> public void concatM(mutable T values1, T values2) // Appends values2 to the end of values1
Using primitive operators and collections
SPL type | C++ type | C++ base implementation | C++ reflective type |
---|---|---|---|
list<T> |
SPL::list<T> |
std::vector<T> |
SPL::List |
set<T> |
SPL::set<T> |
std::tr1::unordered_set<T> |
SPL::Set |
map<T,K> |
SPL::map<T,K> |
std::tr1::unordered_map<T,K> |
SPL::Map |
list<T>[N] |
SPL::blist<T,N> |
N/A | SPL::BList |
set<T>[N] |
SPL::bset<T,N> |
N/A | SPL::BSet |
map<T,K>[N] |
SPL::bmap<T,K,N> |
N/A | SPL::BMap |
Example
composite Main {
// Tuple schema definitions for re-use later
type
// Unbounded list of list of int32
listSchema2d = tuple <list<list<int32> > inputMatrix>;
// Bounded list
listSchema1d = tuple <list<float64>[100] outputList>;
...
graph
// Read a CSV file containing 100x100 matrix of random integers
stream <listSchema2d> matrixData = FileSource() {
param
format : csv;
file : "matrix.txt";
}
// Pass the matrix data to the Primitive Operator for processing
stream <listSchema1d> listData = myOp(matrixData) {}
// Sink the results of the Primitive Operator to a new file
() as sinkListData = FileSink(listData) {
param
format : csv;
file : "list.txt";
}
...
}
...
// Tuple processing for non-mutating ports
void MY_OPERATOR::process(Tuple const & tuple, uint32_t port)
{
// Define the input and output port tuples
IPort0Type const & tp = static_cast<IPort0Type const&>(tuple);
OPort0Type outputTuple;
// Get a reference to the 2d list of list of int32
SPL::list<SPL::list<SPL::int32> > const & inputMatrix = tp.get_inputMatrix();
// Could also use IPort0Type::inputMatrix_type const& ....
// Loop over the input matrix and calculate the average of each row
for (uint32_t dim1=0, dim1Max = inputMatrix.size(); dim1 < dim1Max; dim1++) {
SPL::list<SPL::int32> const& row = inputMatrix[dim1];
// Could also use IPort0Type::inputMatrix_type::value_type const& ...
// Sum the data values in the row
SPL::float64 sum = 0.0;
for (uint32_t dim2=0, dim2Max = row.size(); dim2 < dim2Max; dim2++)
sum += row[dim2];
// Append the average of the row to the output tuple
outputTuple.get_outputList().push_back (sum / row.size());
}
// Submit output tuple
submit(outputTuple, 0);
}
...