added initial reward collision model by oravner · Pull Request #2 · tomjur/ModelBasedDDPG

oravner · 2019-05-02T17:45:35Z

No description provided.

tomjur · 2019-05-03T07:25:53Z

pre_trained_reward.py

            self.workspace_image_inputs = tf.placeholder(tf.float32, (None, 55, 111), name='workspace_image_inputs')
            self.images_3d = tf.expand_dims(self.workspace_image_inputs, axis=-1)
-        self.goal_pose_inputs = tf.placeholder(tf.float32, (None, 2), name='goal_pose_inputs')
+        self.goal_pose_inputs = None # tf.placeholder(tf.float32, (None, 2), name='goal_pose_inputs')


please remove unused inputs - you will also need to change make feed, remove _generate_goal_features and anything else that is no longer relevant

Sorry I forgot to remove this file. Please notice that it breaks the whole project since we haven't wrote the deterministic reward model which wraps the collision model.

no problem, the next step should fix this.
i just don't want to carry over dead code that we'll forget to delete

tomjur · 2019-05-03T07:28:36Z

pre_trained_reward.py

        else:
            workspace_id, start_joints, goal_joints, action, next_joints, reward, terminated, status = batch[i]
-        goal_pose = openrave_manager.get_target_pose(goal_joints)
+        goal_pose = None #openrave_manager.get_target_pose(goal_joints)


remove goal_pose entirely + remove the newly unused openrave_manager -> see if you can also remove the dependency on OpenRaveManager completely for the training process

Sorry I forgot to remove this file. Please notice that it breaks the whole project since we haven't wrote the deterministic reward model which wraps the collision model.

reward_collision_network.py

tomjur · 2019-05-03T07:43:17Z

reward_collision_model.py

@@ -0,0 +1,171 @@
+from reward_collision_network import CollisionNetwork


general comments for file:

why do we need the class TensorBoard and the logic is not inside CollisionModel? it would be less messy to have it inside CollisionModel - or do we need it elsewhere also?

this file is a trainer class right? if so please rename CollisionModel to CollisionTrainer

I had many dilemmas how to use the tensorboard reporting since it's API is not very design convenient. Good idea, it will be an internal class inside the CollisionModel. If we will use it in other models as well (such as the Reward itself) I will move it to some utility file.

I thought to rather separate the 2 models to: 1. Model which only holds the network logic and architecture. 2. Model which wraps the network logic for training and prediction. What do you think is better and why?

I think you have it right - 2 classes work best. what I meant in my comment was to just to rename to "Trainer" since the role of the non-network class is to train the network class.

reward_collision_model.py

tomjur · 2019-05-03T07:53:31Z

reward_collision_model.py

+        self.status_input = tf.placeholder(tf.int32, [None, ])
+        self.prediction = tf.argmax(tf.nn.softmax(self.net_output), axis=1)
+
+        self.global_step = 0


i think that if you don't use a global_step var, then tf keeps a global step var that gets updated every optimize call.
summaries are automatically updated accordingly

I don't understand. You recommend removing this var and leave only the tf Variable? This is the same logic as was in the reward model code.

maybe the reward model was also incorrect (:

I think that if you don't specify a global step (ever) than tf holds an internal counter that raises every time you call an optimize function.

maybe we can remove and see if everything just works as expected?

according to the tf doc, i need to create tf.variable of global_step and pass it to minimize / apply_gradients for it to be updated. Then I need to create my own pythonic variable of global step to pass to tensorboard and the tf.saver

reward_collision_model.py

tomjur · 2019-05-03T07:59:31Z

reward_collision_model.py

+            [self.test_board.summaries] + self.test_measures,
+            test_feed)[0]
+        self.test_board.writer.add_summary(test_summary, self.global_step)
+        self.test_board.writer.flush()


for both methods above see comments about:
global step
data_batch -> explicit parameters

reward_collision_model.py

tomjur · 2019-05-03T08:05:19Z

reward_collision_model.py

+
+    def train(self, train_data, test_data, image_cache, session):
+        session.run(tf.global_variables_initializer())
+        session.run(tf.local_variables_initializer())


tf.initialize_all_variables()

I got warning tf.initialize_all_variables() is deprecated and should use tf.global_variables_initializer() instead ):

so why the second line? also a warning?
this comment shows i was working with a deprecated tf version (:

for some reason some variables are classified as globals and some are locals

tomjur · 2019-05-03T08:05:33Z

reward_collision_model.py

+
+            for train_batch in train_data:
+                train_batch, train_status_batch = get_batch_and_labels(train_batch, image_cache)
+                # TODO: assert if train_status contains goal status


what does this comment mean?

I assumed that all the data doesn't contain any goal status sample, so it will be best practice to assert here if it does exist

ok, so use assert so it is not left as a comment

reward_data_manager.py

tomjur

great job - definitely more readable than my previous main function (:
see comments and questions inline!

tomjur · 2019-07-02T16:29:56Z

data/config/reward_config.yml

@@ -1,14 +1,14 @@
 general:


are all the below changes justified? especially the oversampling ratios...

tomjur · 2019-07-02T17:29:48Z

reward_collision_model.py

+        self.net_output = self.network.status_softmax_logits
+        self.status_input = tf.placeholder(tf.int32, [None, ])
+        # TODO: fix collision_prob
+        self.collision_prob = tf.expand_dims(tf.nn.softmax(self.net_output)[:, 1], -1)


added initial reward collision model

57b3ac7