ganeshp/se367/hw4

SE367 / hw4 / Group A

Ganesh Pitchiah

Motor Expertise

To look at the possible interplay between control and automaticity in motor behaviour we have looked at two scenarios:
(a) an instruction set to instruct a "robot" to lift a pen and
(b) use of reinforcemnt learning to optimize a grasping strategy.

Ans 1:
We collectively agreed that instructions involving 'rotation' and 'relative motion w.r.t to the affectors' of the robots are difficult to execute.
Step 3: Involves the first rotation task and forms a bottle neck in serial execution of the instructions
Step 7: 'Hooking' the I-affector is a delicate task and the demonstrator himself fails the first time in the video.
Step 13: The base of the wrist is still in contact with the paper and the robot will face complications (ex. the paper slides out) if it tries to slide its hand like humans, after writing a few characters.

Ans 2:
The content in quotes is borrowed from Kalakrishnan et al., 2011.
Explicit Instructions: "Positions and orientations are initialized from a kinesthetic demonstration of the task. The required forces and torques cannot be observed during this process, and are initialized to zero."
Implicit Instructions: "Using a control cost in conjunction with the PI2 algorithm allows for generation of smooth trajectories (initialized by trial and error) for exploration that do not deviate from start or goal points and ensures smoothness of the trajectory."
Chunking: "The algorithm samples trajectories around the current policy, measures their cost by executing them on the robot, and subsequently updates the policy as a weighted average of the samples."
Our Conclusion: The encountered samples (forces and torques of the three fingers) with a measure of their usefulness (cost) which comprise the final policy can be termed as the chunks in this structure.

Ans 3:
Examples of human learning can be linked to both classical and instrumental conditioning. Taste aversion is a classically conditioned avoidance of certain foods when eating a food is followed by a sickness. However, "reward" based learning is more akin to operant conditioning.
To better illustrate the point, we discussed the example of a basketball player. The motivation/reward to learn basketball might be the money (extrinsic) or the satisfaction (intrinsic) or both. While the hope of reward ensures activity, the reward itself reinforces certain kind of behaviour over others.

However, we realised when faced with a situation, the chess-player/ fire-fighter has to choose a strategy and not just think about the reward.
Strategy/ Behaviour/ Technique: While the goal itself is explicit, the strategies learnt to acheive it might be explicit (aware) or implicit (not aware). Some explicit strategies being:
(a) Basketball: Increase individual no. of goals while ensuring team's victory.
(b) Fire-fighter: Idenitfy the weak spots while keeping the fire under control.

References
1. Kalakrishnan, Mrinal, et al. "Learning force control policies for compliant manipulation." Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011.