Nutch

Get this dataset

Overview

This dataset includes execution logs generated from two versions of Nutch, an open source application. The two Nutch versions are respectively: (i) Before the commit for [NUTCH-1934], hereinafter version 1, and (ii) after the commit for [NUTCH-1934], hereinafter version 2. This dataset highlights the difference in execution behavior between the two versions. Main raw data files are nutch.logstashed.v1 and nutch.logstashed.v2 — Nutch execution logs for version 1 and version 2 after logstashing. This dataset also includes all other artifacts (intermediate and final output) generated from the raw data files as part of the approach used in the paper “Runtime Monitoring in Continuous Deployment by Differencing Execution Behavior Model .”

Dataset Metadata

Format License Domain Number of Records Size Originally Published
CSV
JSON
CDLA-Sharing Time Series 57 delta_mappings.csv
94137 logrecords.modelmining.v1.txt
125695 logrecords.modelmining.v2.txt
42254 notmapped.combined.txt
9632 notsubset.combined.txt
94137 nutch.logstashed.v1
125695 nutch.logstashed.v2
5.8MB March 09, 2017

Example Records

T1098,T903
1,T694,1487598878515
1,T695,1487598891505
{"message":"2017-02-20 19:25:55,088 INFO  http.Http [FetcherThread] - http.proxy.port = 8080","@version":"1","@timestamp":"2017-02-20T13:55:55.088Z","path":"/root/monika_intern/nutch-rerun/coderefactor2/v2.log","host":"localhost","type":"nutch","timestamp":"2017-02-20 19:25:55,088","text":"INFO  http.Http [FetcherThread] - http.proxy.port = 8080","_grokked":"true","datasource":"irl_nutch","_dated":"true"}
{"message":"2017-02-20 21:25:17,297 INFO  fetcher.FetchItemQueues [pool-1-thread-1] -   inProgress    = 6","@version":"1","@timestamp":"2017-02-20T15:55:17.297Z","path":"/root/monika_intern/nutch-rerun/coderefactor2/v2.log","host":"localhost","type":"nutch","timestamp":"2017-02-20 21:25:17,297","text":"INFO  fetcher.FetchItemQueues [pool-1-thread-1] -   inProgress    = 6","_grokked":"true","datasource":"irl_nutch","_dated":"true"}
{"message":"2017-02-20 19:24:40,962 INFO  crawl.Injector [Thread-1] - Injector: overwrite: false","@version":"1","@timestamp":"2017-02-20T13:54:40.962Z","path":"/root/monika_intern/nutch-rerun/coderefactor2/v1.log","host":"localhost","type":"nutch","timestamp":"2017-02-20 19:24:40,962","text":"INFO  crawl.Injector [Thread-1] - Injector: overwrite: false","_grokked":"true","datasource":"irl_nutch","_dated":"true"}
{"message":"2017-02-20 19:24:52,679 DEBUG util.ObjectCache [pool-1-thread-1] - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-root/mapred/local/localRunner/job_local824858682_0001.xml, instantiating a new object cache","@version":"1","@timestamp":"2017-02-20T13:54:52.679Z","path":"/root/monika_intern/nutch-rerun/coderefactor2/v2.log","host":"localhost","type":"nutch","timestamp":"2017-02-20 19:24:52,679","text":"DEBUG util.ObjectCache [pool-1-thread-1] - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-root/mapred/local/localRunner/job_local824858682_0001.xml, instantiating a new object cache","_grokked":"true","datasource":"irl_nutch","_dated":"true"}

Citation

@inproceedings{gupta2018,
author="Gupta, Monika
and Mandal, Atri
and Dasgupta, Gargi
and Serebrenik, Alexander,
editor="Pahl, Claus
and Vukovic, Maja
and Yin, Jianwei
and Yu, Qi",
title="Runtime Monitoring in Continuous Deployment by Differencing Execution Behavior Model",
booktitle="Proceedings of the International Conference on Service-Oriented Computing",
year=2018,
publisher=Springer,
pages="812--827",
doi={978-3-030-03596-9\_58}
}