Skip to content

Map Reduce algorithm execution

DavidSL448 edited this page Aug 15, 2020 · 1 revision

Xrepo can execute Map Reduce algorithms written in Python, the user can write the algorithm in the create or update view. the following must be true for the algorithm to run properly

  • The script written on the user interface must start with the declaration of the interpreter the user want to use, like so:
#!/usr/bin/env python3.6
  • the script must use the STDIO facilities to read and output data from the mapper and reducers.
  • the system does not provide any validation, is up to the user to test their algorithms.

a useful way to test the mapper and reducer is piping the input and output of the scripts, i.e

echo "foo foo quux labs foo bar quux" | ./mapper.py | .reducer.py
  • Python files on the server must have execute permissions. the script that create the Python files is already configured to set this permissions, nonetheless this is something to check if the execution is failing.

Xrepo is configured to run the algorithm for every file associated to the sampling associated to the laboratory and generate a result file for each input. for better performance consider using large files instead of many little ones.

task execution.

The execution of the algorithm task is similar to the search functionality, but here we add the previous step of creating and placing in the HDFS server the .py executable files when the algorithm is created or updated, as shown in the following figure:

After the files are placed on the Hadoop server, the user can execute the Map reduce task. Inside xrepo this is placed on /src/main/java/co/edu/uniandes/xrepo/service/reports/HdfsRunMRAlgorithmTaskProcessorService.java as a processor that is called by the batch task monitor. The local sh script is located on /src/main/resources/mapreduce-files/RunMRAlgoritm.sh.

HDFS server configuration.

we use the following location on the server /home/hadoop/xrepo to store all Map reduce algorithms related to xrepo. on the /home/hadoop/xrepo/algorithms folder we generate a folder and name it by the algorithm id from mongoDB and place inside the related executable files. on this folder you also find the RunMRAlgorithm.sh, this script maps the input from xrepo to a streaming command that ultimately will execute map reduce.

Clone this wiki locally