first commit

coin-dataset · Jan 16, 2019 · 3eda4a6 · 3eda4a6
commit 3eda4a6
Show file tree

Hide file tree

Showing 45 changed files with 796,359 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,22 @@
+MIT License
+
+Copyright (c) [year] [fullname]
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
diff --git a/README.md b/README.md
@@ -0,0 +1,24 @@
+## Benchmark Experiments
+In order to provide a benchmark for our COIN dataset, we evaluate various of approaches under two different settings: step localization and action segmentation. We also conduct experiments on our task-consistency method under the first setting. The following provides the links of source codes. Thank the authors for sharing their code!
+
+### Step Localization
+In this task, we aim to localize a series of steps and recognize their corresponding labels given an instruction video. The following methods are evaluated:
+* [SSN](https://github.com/yjxiong/action-detection) [1]
+* [R-C3D](https://github.com/VisionLearningGroup/R-C3D) [2]
+* Our Task Consistency Approach. Please see [tc-rc3d](tc-rc3d) and [tc-ssn](tc-ssn) for details.
+
+### Action Segmentation
+The goal of this task is to assign each video frame with a step label. The following methods are evaluated:
+* [Action Sets](https://github.com/alexanderrichard/action-sets) [3]
+* [NeuralNetwork-Viterbi](https://github.com/alexanderrichard/NeuralNetwork-Viterbi) [4]
+* [TCFPN-ISBA](https://github.com/Zephyr-D/TCFPN-ISBA) [5]
+
+Note that, these methods use frame-wise fisher vector as video representation, which comes with huge computation and storage cost on the COIN dataset (the calculation of fisher vector is based on the improved Dense Trajectory (iDT) representation, which requires huge computation cost and storage space). To address this, we employed a bidirectional LSTM on the top of a VGG16 network to extract dynamic feature of a video sequence[6].
+
+### References
+[1] Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang, and D. Lin. Temporal action detection with structured segment networks. In ICCV, pages 2933–2942, 2017.
+[2] H. Xu, A. Das, and K. Saenko. R-C3D: region convolutional 3d network for temporal activity detection. In ICCV, pages 5794–5803, 2017.
+[3] A. Richard, H. Kuehne, and J. Gall. Action sets: Weakly supervised action segmentation without ordering constraints. In CVPR, pages 5987–5996, 2018.
+[4] A. Richard, H. Kuehne, A. Iqbal, and J. Gall. Neuralnetwork-viterbi: A framework for weakly supervised video learning. In CVPR, pages 7386–7395, 2018.
+[5] L. Ding and C. Xu. Weakly-supervised action segmentation with iterative soft boundary assignment. In CVPR, pages 6508–6516, 2018.
+[6] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. TPAMI, 39(4):677–691, 2017.
diff --git a/tc-rc3d/README.md b/tc-rc3d/README.md
@@ -0,0 +1,92 @@
+# Task-Consistency for R-C3D
+
+### Test Environment
+
+* Operating system - Ubuntu 16.04
+* Language - Python 3.5.2
+* Several dependencies -
+  - numpy 1.15.3
+  - terminaltables 3.1.0
+
+### Result Refinement
+
+[1] Refine the scores:
+
+```sh
+python3 result_refine.py <results> <info1> <info_c3d>
+```
+
+`<results>` is the score file in JSON format outputted by R-C3D. `<info1>` is the canonical database file of dataset COIN in JSON format which can be downloaded from the [website of COIN](...). `<info_c3d>` is the database file of dataset required by R-C3D. 
+
+JSON `<info1>` is required to have the structure like:
+
+```
+{
+	"database": {
+		<video_id, str>: {
+			"video_url": <video_url, str>,
+			"duration": <video_duration, float>,
+			"recipe_type": <target_id, int>,
+			"class": <target_class, str>,
+			"subset": ("training"|"validation"),
+			"start": <random point between the start of the whole video and the start of the first action, float>,
+			"end": <random point between the end of the whole video and the end of the last action, float>
+			"annotation": [
+				{
+					"id": <action_id, int>,
+					"segment": [start, end],
+					"label": <action_label, str>
+				},
+				...
+			]
+		},
+		...
+	}
+}
+```
+
+JSON `<info_c3d>` is required to have the structure like:
+
+```
+{
+	"version": <version, str>,
+	"taxonomy": [
+		{
+			"parentID": <id of the parent node, int>,
+			"parentName": <name of parent node, str>, //There is supposed to be a global root node with name of "Root"
+			"nodeID": <id of this node, int>,
+			"nodeName": <name of this node, str>
+		},
+		...
+	],
+	"database": {
+		<video_id, str>: {
+			"video_url": <video_url, str>,
+			"duration": <video_duration, float>,
+			"resolution": "<width>x<height>",
+			"subset": ("training"|"validation"),
+			"annotation": [
+				{
+					"label": <action_id, int>,
+					"segment": [start, end],
+				},
+				...
+			]
+		},
+		...
+	}
+```
+
+The refined scores will be dumped into a new JSON file with name of `<results>` suffixed with `.new`. 
+
+[2] Calculate the metrics of refined results:
+
+use `json_eval.py` to calculate the metrics.
+
+```sh
+python3 json_eval.py <info_c3d> <results>
+```
+
+`<info_c3d>` denotes the same meaning as in the first command. `<results>` is the refined result file with extension name as `result.json.new` if it hasn't been renamed. The `evaluate.py` module is required to launch this program. 
+
+The module `evaluate.py` is forked from <https://github.com/ECHO960/PKU-MMD> and several functions we need in these programs are added.
diff --git a/tc-rc3d/evaluate.py b/tc-rc3d/evaluate.py
@@ -0,0 +1,242 @@
+# Copyright (c) 2017 Chunhui_Liu@STRUCT_ICST_PKU, All rights reserved.
+#
+# evaluation protocols used for PKU-MMD dataset
+# http://www.icst.pku.edu.cn/struct/Projects/PKUMMD.html
+#
+# In proposal folder:
+#	each file contains results for one video.
+#	several lines in one file,
+#	each line contain: label, start_frame, end_frame, confidence  
+
+import os
+import numpy as np
+#import matplotlib.pyplot as plt
+
+source_folder = '/home/lch/PKU-3D/result/detc/'
+ground_folder = '/mnt/hdd/PKUMMD/test_label_0330/'
+fig_folder = '/home/lch/PKU-3D/src/fig/'
+theta = 0.5 #overlap ratio
+number_label = 52
+
+# calc_pr: calculate precision and recall
+#	@positive: number of positive proposal
+#	@proposal: number of all proposal
+#	@ground: number of ground truth
+def calc_pr(positive, proposal, ground):
+	if (proposal == 0): return 0,0
+	if (ground == 0): return 0,0
+	return (1.0*positive)/proposal, (1.0*positive)/ground
+
+def overlap(prop, ground):
+	l_p, s_p, e_p, c_p, v_p = prop
+	l_g, s_g, e_g, c_g, v_g = ground
+	if (int(l_p) != int(l_g)): return 0
+	if (v_p != v_g): return 0
+	return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0)
+
+# match: match proposal and ground truth
+#	@lst: list of proposals(label, start, end, confidence, video_name)
+#	@ratio: overlap ratio
+#	@ground: list of ground truth(label, start, end, confidence, video_name)
+#
+#	correspond_map: record matching ground truth for each proposal
+#	count_map: record how many proposals is each ground truth matched by 
+#	index_map: index_list of each video for ground truth
+def match(lst, ratio, ground):
+	cos_map = [-1 for x in range(len(lst))]
+	count_map = [0 for x in range(len(ground))]
+	#generate index_map to speed up
+	index_map = [[] for x in range(number_label)]
+	try:
+		for x in range(len(ground)):
+			index_map[int(ground[x][0])].append(x)
+			#print("node 2")
+	except:
+		print("node 3")
+
+	for x in range(len(lst)):
+		for y in index_map[int(lst[x][0])]:
+			if (overlap(lst[x], ground[y]) < ratio): continue
+			if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue
+			#if count_map[y]>0:
+				#continue
+			cos_map[x] = y
+		if (cos_map[x] != -1): count_map[cos_map[x]] += 1
+	positive = sum([(x>0) for x in count_map])
+	return cos_map, count_map, positive
+
+# plot_fig: plot precision-recall figure of given proposal
+#	@lst: list of proposals(label, start, end, confidence, video_name)
+#	@ratio: overlap ratio
+#	@ground: list of ground truth(label, start, end, confidence, video_name)
+#	@method: method name
+"""def plot_fig(lst, ratio, ground, method):
+	lst.sort(key = lambda x:x[3]) # sorted by confidence
+	cos_map, count_map, positive = match(lst, ratio, ground)
+	number_proposal = len(lst)
+	number_ground = len(ground)
+	old_precision, old_recall = calc_pr(positive, number_proposal, number_ground)
+
+	recalls = [old_recall]
+	precisions = [old_precision] 
+	for x in xrange(len(lst)):
+		number_proposal -= 1;
+		if (cos_map[x] == -1): continue
+		count_map[cos_map[x]] -= 1;
+		if (count_map[cos_map[x]] == 0) : positive -= 1;
+
+		precision, recall = calc_pr(positive, number_proposal, number_ground)   
+		if precision>old_precision: 
+			old_precision = precision
+		recalls.append(recall)
+		precisions.append(old_precision)
+		old_recall = recall
+	fig = plt.figure()
+	plt.axis([0,1,0,1])
+	plt.plot(recalls,precisions,'r')  
+	plt.savefig('%s%s.png'%(fig_folder,method))"""
+
+# f1-score:
+#	@lst: list of proposals(label, start, end, confidence, video_name)
+#	@ratio: overlap ratio
+#	@ground: list of ground truth(label, start, end, confidence, video_name)
+def f1(lst, ratio, ground):
+	cos_map, count_map, positive = match(lst, ratio, ground)
+	precision, recall = calc_pr(positive, len(lst), len(ground))
+	print("{:f} {:f}".format(precision,recall))
+	try:
+		score = 2*precision*recall/(precision+recall)
+	except:
+		score = 0.
+	return score
+
+# Interpolated Average Precision:
+#	@lst: list of proposals(label, start, end, confidence, video_name)
+#	@ratio: overlap ratio
+#	@ground: list of ground truth(label, start, end, confidence, video_name)
+#
+#	score = sigma(precision(recall) * delta(recall))
+#	Note that when overlap ratio < 0.5, 
+#		one ground truth will correspond to many proposals
+#		In that case, only one positive proposal is counted
+def ap(lst, ratio, ground):
+	lst.sort(key = lambda x:x[3]) # sorted by confidence
+	cos_map, count_map, positive = match(lst, ratio, ground)
+	score = 0;
+	number_proposal = len(lst)
+	number_ground = len(ground)
+	old_precision, old_recall = calc_pr(positive, number_proposal, number_ground)
+	total_recall = old_recall
+	#print("{:f} {:f}".format(old_precision,old_recall))
+	#print(str(len(lst)))
+
+	for x in range(len(lst)):
+		#print("{:f} {:f}".format(old_precision,old_recall))
+		number_proposal -= 1;
+		#if (cos_map[x] == -1): continue
+		if cos_map[x]!=-1:
+			count_map[cos_map[x]] -= 1;
+			if (count_map[cos_map[x]] == 0): positive -= 1;
+
+		precision, recall = calc_pr(positive, number_proposal, number_ground)   
+		#print("{:f} {:f}".format(precision,recall))
+		#print(str(precision) + " " + str(recall))
+		#print(str(type(precision)) + " " + str(type(recall)))
+		score += old_precision*(old_recall-recall)
+		if precision>old_precision: 
+			old_precision = precision
+		old_recall = recall
+	#print(score)
+	return score,total_recall
+
+def miou(lst,ground):
+	cos_map,count_map,positive = match(lst,0,ground)
+	miou = 0
+	count = len(lst)
+	#print("{:d}: {:d}".format(count,len(ground)))
+	real_count = 0
+	for x in range(count):
+		if cos_map[x]!=-1:
+			miou += overlap(lst[x],ground[cos_map[x]])
+			real_count += 1
+	return miou/float(real_count) if real_count!=0 else 0.
+
+def miou_per_v(lst,ground):
+	cos_map,count_map,positive = match(lst,0,ground)
+	count = len(lst)
+	v_miou = {}
+	#print("{:d}: {:d}".format(count,len(ground)))
+	for x in range(count):
+		if cos_map[x]!=-1:
+			v_id = lst[x][4]
+			miou = overlap(lst[x],ground[cos_map[x]])
+			if v_id not in v_miou:
+				v_miou[v_id] = [0.,0]
+			v_miou[v_id][0] += miou
+			v_miou[v_id][1] += 1
+	miou = 0
+	for v in v_miou:
+		miou += v_miou[v][0]/float(v_miou[v][1])
+	miou /= len(v_miou)
+	return miou
+
+# process: calculate scores for each method
+"""def process(method):
+	folderpath = source_folder+method+'/'
+
+	v_props = [] # proposal list separated by video
+	v_grounds = [] # ground-truth list separated by video
+	
+	#========== find all proposals separated by video========
+	for video in os.listdir(folderpath):
+		prop = open(folderpath+video,'r').readlines()
+		prop = [prop[x].replace(",", " ") for x in xrange(len(prop))]
+		prop = [[float(y) for y in prop[x].split()] for x in xrange(len(prop))]
+		ground = open(ground_folder+video,'r').readlines()
+		ground = [ground[x].replace(",", " ") for x in xrange(len(ground))]
+		ground = [[float(y) for y in ground[x].split()] for x in xrange(len(ground))]
+		#append video name
+		for x in prop: x.append(video)
+		for x in ground: x.append(video) 
+		v_props.append(prop)
+		v_grounds.append(ground)
+
+	#========== find all proposals separated by action categories========
+	# proposal list separated by class
+	a_props = [[] for x in xrange(number_label)]
+	# ground-truth list separated by class
+	a_grounds = [[] for x in xrange(number_label)]
+
+	for x in xrange(len(v_props)):
+		for y in xrange(len(v_props[x])):
+			a_props[int(v_props[x][y][0])].append(v_props[x][y])
+	
+	for x in xrange(len(v_grounds)):
+		for y in xrange(len(v_grounds[x])):
+			a_grounds[int(v_grounds[x][y][0])].append(v_grounds[x][y])
+
+	#========== find all proposals========
+	all_props = sum(a_props,[])
+	all_grounds = sum(a_grounds, [])
+
+	#========== calculate protocols========
+	print "================================================"
+	print "evaluation for method: %s"%method
+	print "---- for theta = %lf"%theta
+	print "-------- F1 = ", f1(all_props, theta,all_grounds)
+	print "-------- AP = ", ap(all_props, theta,all_grounds)
+	print "-------- mAP_action = ", sum([ap(a_props[x+1], theta, a_grounds[x+1]) \
+		for x in xrange(number_label-1)])/(number_label-1)
+	print "-------- mAP_video = ", sum([ap(v_props[x], theta, v_grounds[x]) \
+		for x in xrange(len(v_props))])/len(v_props)
+	print "-------- 2DAP = ", sum([ap(all_props, (ratio+1)*0.05, all_grounds) \
+		for ratio in xrange(20)])/20
+
+	plot_fig(all_props, theta, all_grounds, method)
+	print "===============================================" """
+
+if __name__ == '__main__':
+	methods = os.listdir(source_folder)
+	methods.sort()
+	for method in methods:
+		process(method)