The second OpenTechAI workshop in Helsinki, Finland will be held on May 6 and 7. The event is sponsored by IBM & VTT and is by invitation only. The workshop focuses on open source AI topics particularly relevant to Finland, and to Europe as a whole. On May 6 a number of posters will be on display, and there will be an opportunity to talk with the poster authors. This is an ideal venue to network and build new collaborations. The following is a list of the posters.

Videos of Posters

More on Posters

Power generation capacity forecasting for transmission system operation

View Poster:

Jussi Kiljander (VTT) Sergio Motta (VTT) Erkka Rinne (VTT), and Janne Takalo-Mattila (VTT)

Being a member of the Nordic Electricity Market, Finland needs a source of reliable information on the total generating capacity of its power plants. That is, it needs to have a clear idea on how much energy is available to be produced within the country. However, finding this information is not so straightforward – the energy that is effectively produced is not always the same as the available capacity, because large portion of the plants are flexible. Moreover, smaller power plants don’t have any historical production data, which makes it difficult to analyze their true capacity at different times. This challenge was the main driver behind the development of CapFor, a software platform that forecasts the available capacity of all CHP and nuclear power plants in Finland for a 7-day horizon. The CapFor provides accurate capacity forecasts (< 5% NRMSE for inflexible plants). It has been in production use by Fingrid since beginning of 2019 and produces a 7-day forecast every morning to be reported into the Nordic Regional Security Coordinator’s system.

Improving Industry 4.0 through Service Science

View Poster:

Agostinho Silva (CEFAGE – Évora University)

This research aims to assess the impact of Industry 4.0 (I4.0) on Portuguese Ornamental Stone (OS) firms’ response to the threats arising from the procurement model resulting from Building Information Modelling (BIM). The transition witnessed from the Third to the Fourth Industrial Age leads to the emergence of paradigms such as BIM, seeking efficiency in Architecture, Engineering and Construction (AEC) through a global approach and procurement oriented towards standardized products and I4.0, where production comes to be supported by Cyber-Physical Systems (CPS). Integrated in the AEC supply chain, the OS sector shows Portugal to be the eighth country in OS trade worldwide, and the second per capita, with its competitiveness coming from customization. BIM represents threats for its sustainability, particularly in firms of the Cluster Portugal Mineral Resources (CPMR). The literature review showed that Service Science (S-S) is an inter-disciplinary scientific area that combines organisation with technological knowledge, with a view to categorizing, innovating and creating value for service-systems. Guided by the pragmatic paradigm and using mixed methodology of parallel convergence, this research focuses on conceptualization of an S-S framework, to which was applied a representative sample of CPMR companies. This allowed measurement of the evolution of Key Concern Indicators (KCI) indexed to stakeholders’ concerns, when operations change from the current situation to BIM procurement and I4.0 production. The results obtained showed significant relief of stakeholder concerns regarding delivery time, costs, sustainability and product quality, when BIM operations evolve to I4.0, allowing the conclusion that in technical terms, the impact of I4.0 on the threats arising from BIM procurement in the sample is clearly positive.

View Poster:

Svetlana Levitan (IBM CODAIT and Data Mining Group), Nick Pentreath (IBM CODAIT), and Ludovic Claude (CHUV and Human Brain Project)

Machine learning pipelines span organizational teams and tools. Challenges include:
• Need to bridge various languages, frameworks, runtimes, versions
• Friction between teams – data science vs production vs business
• The proliferation of formats – lack of standardization leads to custom solutions
Sharing the models produced from machine learning and putting those models in production where they are needed is possible today with 3 competing standards: PMML, PFA and ONNX. This poster shows the options available to store, exchange and deploy ML models and highlights a practical use case where PFA is used to support some functionalities (cross-validation, benchmarking) in a federated machine learning application deployed over a network of hospitals for the Human Brain Project.

BigMedilytics – Breast Cancer Pilot

View Poster:

Juha Pajula (VTT), Kari Antila (VTT), Harri Pölönen (VTT), Simona Rabinovici-Cohen (IBM Haifa) and Oliver Hijano-Cubelos (Institute Curie)

At BigMedilytics EU project (agreement number 780495) VTT is working with IBM Haifa, Israel, to develop AI applications for Institute Curie Breast Cancer Hospital in France. The purpose of the developed system is to adapt deep learning algorithms for big data analytics of multi-modal imaging and clinical data to improve outcomes and reduce costs in neoadjuvant chemotherapy of breast cancer treatment. The aim is to create multi-modal pipelines for (1) Prediction of response to neoadjuvant chemotherapy and (2) Prediction of cohorts for clinical trials towards next generation therapies. Within the pilot study, various models for the different clinical tasks will be tested and scored.

In the first phase, we utilize the traditional machine learning approaches for breast cancer studies. Originally, some of the applied methods have been developed for other research questions like detecting Alzheimer disease or predicting outcome of traumatic brain injuries. These adapted methods will set the benchmark for the state-of-the-art deep learning neural networks that are under development.

At later phase, we will research options to utilize the traditional ML with new deep learning approaches to develop transparent decision support using deep learning technologies. The breast cancer pilot will utilize both clinical data from Curie Institute and open data from multicenter I-SPY breast cancer trial. The data includes magnetic resonance (MR), diffusion weighted magnetic resonance (DW-MR), mammography (MG) and ultrasound (US) images as well as laboratory measures and health register data. Based on previous work, the best results are achieved by using multiple modalities and large demographics of patients. The data from Curie Institute is actual treatment data and for this reason has limited labeling available. The open data like I-SPY datasets for breast cancer treatments have more annotations available and for that reason, they are investigated as training data for the machine learning system. The initial approach will be to utilize the magnetic resonance and diffusion imaging data to create image based automatic features for machine learning classifiers. These features will be extended with laboratory and health register data to make the classifiers more accurate.

In the later phase, we will study options to use mammography images and multimodality approaches for estimating the features. For example, combined DW-MR fused with contrast enhanced MR images are considered an interesting modern approach within breast cancer treatment estimation.

Finnish Platform for Occupational Health

View Poster:

Anssi Smedlund (Finnish Institute of Occupational Health)

Digitalization changes occupational health industry rapidly. Stakeholders, e.g. pension companies, occupational health service providers and welfare technology companies, are digitizing their existing services and creating totally new services and modes of operations. Finnish Institute of Occupational Health (FIOH) promotes creation of common data platform to store and link distributed data sources in order to produce new data-driven knowledge for decision making, human resource management and individual persons. In addition, the project will facilitate and orchestrate co-creation activity aiming to find new analyses and data driven services in the field of occupational health. The project also focuses also on relevant questions when it comes to effectiveness and transparency of the new digital services.

This poster presents a vision of an open or semi-open platform for all accumulating occupational health related data, which is then utilized in producing real time indicators, predictions and diagnosis for the benefit of Finnish working life and occupational health.

Multi-step-ahead simulation of dynamic chemical processes using machine learning models

View Poster:

Mikko Tahkola (VTT)

Computational cost of detailed and large physics-based dynamic chemical process models can be too high to be used in e.g. operator training purposes as simulation of the model would be required to run at least in real-time. Replacing computationally heavy components of the physics-based model with a data-driven surrogate model might be one solution to increase the simulation speed. In this study, the data is created in dynamic process modelling and simulation software Apros. Tensorflow/Keras Python-libraries are then utilized in developing NARX and LSTM neural network architectures which one- and multi-step-ahead prediction accuracies are tested. Also the accuracy of MLP neural network in one-step-ahead predicting is tested. Finally, the workflow to implement data-driven models back to the physics-based environment is defined.

Privacy-preserving AI : AI on Encrypted Data

View Poster:

Oguzhan Gencoglu (Top Data Science), Chun Fang (Top Data Science) and Hung Ta (Top Data Science)

Current implementations of machine learning and AI algorithms require access to data, which opens up potential security and privacy risks. Recent advancements in AI and computer science research introduces promising opportunities in privacy-preserving machine learning algorithms. With the encryption scheme called homomorphic encryption, certain mathematical functions can be performed in the ciphertext domain while guaranteeing exact results as if they have been performed on the plaintext domain. Using this concept, we propose approximations to deep neural networks architectures and their components such as activation functions and pooling layers. We evaluate our method by converting a convolutional neural network for classifying handwritten digit images into the encrypted domain. We achieve 92.5% accuracy rate (98% when there is no privacy protection). We show that deep neural networks and several other machine learning algorithms can still work with the encrypted data with little decrease in predictive performance.

MIDAS – BOTter than the real thing

View Poster:

Peter Poliwoda (IBM) and Juha Pajula (VTT)

Enabling Healthcare Policy Makers capture the voice of the public by using a Twitter chatbot campaign builder. IBM and VTT are working on a solution to reach out to the general public and ask their opinions on a variety of heath care related topics using machine learning, natural language processing and latest cutting-edge authentication technologies.

Automatic test generation for student exams in Bulgaria

View Poster:

Simeon Monov (IBM), Asen Rahnev (University of Plovdiv Paisii Hilendarski), Nikolay Pavlov (University of Plovdiv Paisii Hilendarski) and Thomas Truong (IBM)

In this project we are investigating and creating models for automatic test generation for student exams in Bulgarian language using deep learning. This is a collaboration between University of Plovdiv, Bulgaria and IBM Cognitive OpenTech group.

1. Test existing state of the art question generation systems in English
2. Create a new curated question answering dataset in Bulgarian language by using crowdsourcing, test extraction from existing books, etc…
3. Test the existing state of the art models using Bulgarian language dataset and see how well they perform.
4. Investigate new models for generation of problems to solve and generation of questions that require reasoning based on the corpus.

DataBio – Data-Driven Bioeconomy

View Poster:

Caj Södergård, Paula Järvinen, Jarmo Kalaoja, Pekka Siltanen and Renne Tergujeff

Bioeconomy covers the utilization of raw materials from agriculture, forestry and fishery for food, energy and biomaterials with responsibility & sustainability. DataBio shows with 27 pilots the benefits of Big Data in the raw material production for the bioeconomy industry. DataBio is a EU H2020 Lighthouse. Duration 2017 – 2019. Volume 16 M€, 48 partners. including IBM Haifa. VTT biggest partner & Technical Manager & WP4 leader

A data-driven restaurant understands and reacts on human behaviour

View Poster:

Sari Järvinen (VTT Technical Research Centre of Finland) and Johannes Peltola (VTT Technical Research Centre of Finland)

crEATe is an innovation ecosystem orchestrated by VTT, Fazer and IBM and open to actors interested in new business opportunities around food and eating. Our mission is to develop vitality-enhancing, personalised and sustainable food and eating solutions that are meaningful to consumers, clients and communities. The innovation ecosystem applies new technologies and business models, which facilitate development of the new solutions.

As part of the crEATe innovation ecosystem in the TestEat restaurant, VTT has implemented a human behaviour tracking system capable of measuring and analysing customer and personnel behaviour. The data is collected using a network of depth sensors with a 3D view on the restaurant space. The depth sensors are accurate and objective in the measurement of human behaviour and actions, but preserve the privacy of the restaurant customers, as it is not possible to identify the persons based on depth sensor data. Multi sensor tracking contains an auto-calibration functionality, which lowers the complexity and costs of setting up accurate tracking systems for larger areas. Simulation in training data creation allows extending the behaviour analytics with new behaviour models with low amount of real training data.

The data gathered with the tracking system can be analysed to insight on customer behaviour and combined with other data sources: sales data, menu choices, waste measurement and other restaurant IoT devices, weather data etc. to build a holistic view on restaurant performance, customer experience and factors affecting on it. VTT has machine learning based tools for automatic analysis and classification of large amounts of behaviour data. AI methods learn characteristics behind different behavioural patterns, which allows cost efficient analytics. Real-time data analytics can be used to predict human behaviour and this insight can act as an input to the digital restaurant services to support data-driven decision-making.

The Feasibility of Corporately Funded Universal Basic Income (UBI) Programs

View Poster:

Steven Spohrer (ISSIP)

Historical and contemporary discussions of UBI’s are reliant on governments providing basic income, not corporations. This poster explores the profit motive for UBI, and proposes future research directions to understand and achieve strategic corporate UBI.


Data Standard for Adaptive Self-Organization

View Poster:

Susu Nousala (Tongj Univesity, Shanghai, China; University of Melbourne, Melbourne, Australia), Marco Cataffo (Politecnico di Torino) and David Ing (Aalto University)

The Creative Systemic Design Platform work focuses on facilitating learning of a diverse set of organizations, researching on the quality of interactions occurring among different agents, both human and non, and their context. A major communication channel available today employs sensors to observe otherwise invisible conditions of the environment, enabling a detailed understanding of how patterns of interactions among biological elements influence the global conditions. To address some of the most urgent need resulting from urbanization, industrialization and globalization processes, in 2018 we started working on micro-farming automated systems to bring self-organization capacity in human settlements. To start building knowledge models that fit diverse range of human agents, we set the preconditions for a class of Design Students to interact with an indoor greenhouse, in order to constrain the space and time of natural cycles of vegetation and water and the number of observable interactions. This process led to work on the development of data standards for collection and integration protocols to embed qualitative observation. On a slightly longer term time scale, it is key element to enable access to self-monitoring practice into farm management, distributing learning and adaptation capacity as basis for autonomous, ecologically fitting human settlements.

Mozilla Common Voice

View Poster:

George Roter (Mozilla), Alex Klepel (Mozilla)

Today’s speech recognition technologies are largely tied up in a few companies and products that have an advantage through proprietary access to voice data. Also, only majority languages (cash markets) are being served and optimised for. Machines don’t understand everyone. They understand a fraction of people. Hence only a fraction of people benefit from this massive technological shift. Think about how speech recognition could be used by minority language speakers to enable more people to have access to technology and the services the internet can provide, even if they never learned how to read? Same is true for vision impaired or physically handicapped people (ie. people who can’t use a touch screen or keyboard). Regular market forces will not help them.

Launched in June 2017, Mozilla’s Open Innovation project “Common Voice” is taking a multifaceted approach to open innovation to democratize speech technologies. The project wants to build open and publicly available datasets of labelled audio that anyone can use to train voice-enabled applications.

Since Mozilla enabled multi-language support in June 2018, Common Voice has grown to be more global and more inclusive. Over the past months, communities have enthusiastically rallied around the project, launching data collection efforts in 27 languages with 70+ more in progress on the Common Voice site. Mozilla just released the multi-language data that was collected so far. Which means Common Voice represents the largest public domain transcribed voice dataset, with more than 1,400 hours of voice data and 18 languages represented, including English, French, German and Mandarin Chinese (Traditional), but also for example Welsh and Kabyle.

Join The Discussion

Your email address will not be published. Required fields are marked *