Q1.  Which two instructions in the "programming language" of the 2011 HW  would be the most difficult for robots to follow? 
  
  We found the following two steps to be the most difficult:
Holding the pen is a difficult task since there is no sensory feedback and hence takes a lot of training for the robot to ‘learn’ the optimum amount of pressure to be applied to avoid slipping
This step involves loosening of grip to the point where it is just enough to allow the rotation of pencil but does not allow the pencil to slip. It is very difficult to achieve this intricate pressure condition without the use of any sensory feedback.
Q2. The robot following the learning paradigm as in Kalakrishnan is  clearly gaining some expertise. Which aspects of the execution may be called  implicit or automatic, and which aspects may be more explicit? What could be  the "chunks" in this structure? 
  
  Implicit learning involves interacting with the environment and we think that  the learning process in this case is an implicit aspect of the execution. 
  The initial positioning of the hand could be one explicit aspect which has to  be specified at the beginning of each trial. 
  As discussed earlier, the learning here involves the robot exploring various  instances in the problem space, identifying favorable patterns in terms of  performance metrics. The favorable regions here can be seen as low-dimensional  embeddings in the problem space. These represent implicit constraints among the  variables. Chunks, in this case, can be the dimensions in the low-dimensional  embedding obtained. These dimensions represent an inter-relation between the  variables(end-effector positions, orientation, force and torque) which must  hold for favorable execution. 
  Q3. Comment on whether human learning may also be following  similar "reward" based processes? Consider the learning process for  the fire-fighting expert who knows how to fight complex fires.
  
  We could not find any counter example to the question of whether human learning  is a reward based process. We discussed that every task we do has some reward  associated with it and it can be positive or negative, intrinsic or external.  Satisfaction, discontent can be seen as rewards for many day to day tasks. 
  In the fire fighters example, we saw that the process cannot be carried out in  any arbitrary manner and experts have through practice learned the best  possible ways of carrying it out. The reward function in this case could be  minimizing the damage, extinguishing fire as fast and effectively as possible.
  References