Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some issue when trying to use the new DAG functionality in the 0.21.29 release #25

Open
asaelm opened this issue Jan 11, 2011 · 3 comments
Labels

Comments

@asaelm
Copy link

asaelm commented Jan 11, 2011

See the traceback from the logs below.

 Traceback (most recent call last):
   File "/usr/lib/python2.6/runpy.py", line 122, in _run_module_as_main
     "__main__", fname, loader, pkg_name)
   File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
     exec code in run_globals
   File "/data/0/hdfs/local/taskTracker/jobcache/job_201011021825_0760  /attempt_201011021825_0760_r_000000_1/work/rec2.py", line 156, in 
     main(runner)
   File "dumbo/core.py", line 211, in main
     job.run()
   File "dumbo/core.py", line 61, in run
     run(*args, **kwargs)
   File "dumbo/core.py", line 366, in run
     for output in dumpcode(inputs):
 UnboundLocalError: local variable 'inputs' referenced before assignment
@klbostee
Copy link
Owner

This error alone doesn't give me much to go on unfortunately. Could you post the code for the program that you were trying to run by any chance?

@asaelm
Copy link
Author

asaelm commented Jan 19, 2011

Here's another code that produces a similar error:

from dumbo import *
import numpy
import json
import re
import numpy
import itertools
import math

ALL_DIGITS = re.compile('^[0-9]+$')

def mapper1(key, value):
    yield value.split('\t')

def maxreducer(key, values):
    yield key, max(values)

def reducer1(key, values):
    yield key, list(values)

def videos_mapper1(key, value_list):
    w = 1./len(value_list)
    for v in value_list:
        yield v, w

def sum_sqrt_reducer(key, values):
    yield key, math.sqrt(sum(values))

def pairs_mapper1(key, value_list):
    w = 1./len(value_list)
    value_list = sorted(value_list)
    for v1, v2 in itertools.combinations(value_list, 2):
        yield (v1, v2), w

def pair_sumreducer(key, values):
    yield key, sum(values)

def pair_to_secondary_first_mapper(pair, value):
    yield pair[0], (2, pair[1], value)

def video_to_primary_mapper(video, value):
    yield video, (1, value)

def normalize_reducer(video, values):
    dummy, w = values.next()
    for dummy, video2, e in values:
        yield video2, (2, video, e/w)

def runner(job):
    o1 = job.additer(mapper1, reducer1)
    o2 = job.additer(videos_mapper1, sum_sqrt_reducer, input=o1)
    o3 = job.additer(pairs_mapper1, pair_sumreducer, input=o1)
    o4 = job.additer(pair_to_secondary_first_mapper, input=o3)
    o5 = job.additer(video_to_primary_mapper, input=o2)
    o6 = job.additer(identitymapper, normalize_reducer, input=[o4, o5])
    o7 = job.additer(identitymapper, normalize_reducer, input=[o6, o5])

if __name__ == "__main__":
    main(runner)

@klbostee
Copy link
Owner

Presumably the error happens for the iteration that executes "video_to_primary_mapper"?

It looks like the "newopts['numreducetasks'] = '0'" on line 400 of "dumbo/core.py" is somehow not going through, which will indeed get you into an old code path that leads to the specified error.

Getting to the bottom of this will probably require reproducing the problem and doing some further debugging, but unfortunately I don't have the time for that right now. I don't think it would be that complicated to figure out though, so it doesn't have to be done by someone who's (already) very familiar with the Dumbo internals...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants