Tutorial: Multiple Robot Control using the Mirroring and Forwarding Schemes

Video: https://github.com/fabawi/wrapyfi/assets/4982924/a7ca712a-ffe8-40cb-9e78-b37d57dd27a4

This tutorial demonstrates how to use the Wrapyfi framework to run a facial expression recognition (FER) model on multiple robots. The model recognizes 8 facial expressions which are propagated to the Pepper and iCub robots. The expression categories are displayed by changing the Pepper robot’s eye and shoulder LED colors—or robotic facial expressions—by changing the iCub robot’s eyebrow and mouth LED patterns. The image input received by the model is acquired from the Pepper and iCub robots’ cameras by simply forwarding the images to the facial expression recognition model (check out the forwarding scheme for more details on forwarding). We also provide a simple application manager that handles the communication between the model and the robots. The application manager is responsible for forwarding images to the FER model, and transmitting recognized facial expressions to the robots. The application manager itself is composed of mirrored (check out the mirroring scheme instances running on one or several machines, depending on the configuration.

Methodology


Fig 1: Facial expression recognition for updating the affective cues on the Pepper and iCub robots.

Siqueira et al. (2020) presented a neural model for facial expression recognition, relying on an ensemble of convolutional branches with shared parameters. The model provides inference in real-time settings, owing to its relatively small number of parameters across ensembles, unimodal visual input, and non-sequential structure. For the last timestep \(n\), a majority vote is cast on the output categories resulting from each ensemble branch \(\text{e}_i\):

\[\begin{split}\begin{align} \textbf{c}(f)_n &= \sum_{i=1}^{E} [\text{e}_i = f] \\ \text{c}_n &= {arg\,max}_f \textbf{c}(f)_n \end{align}\end{split}\]

where \(E=9\) is signifying the number of ensembles. The emotion category index is denoted by \(f\in[1,8]\). The resulting \(\textbf{c}(f)_n\) holds counts of the ensemble votes for each emotion category \(f\) at \(n\).

Given the model’s sole reliance on static visual input, falsely recognized facial expressions lead to abrupt changes in the inferences. To mitigate sudden changes in facial expressions, we apply a mode smoothing filter to the last \(N\) discrete predictions—eight emotion categories—where \(N=6\) corresponding to the number of visual frames acquired by the model per second:

\[\begin{split}\begin{align} \textbf{k}(f)_n &= \sum_{i=n-N+1}^{n} [\text{c}_i = f] \\ \text{k}_n &= {arg\,max}_f \textbf{k}(f)_n \end{align}\end{split}\]

resulting in the emotion category \(\text{k}_t\) being transmitted from the inference script running the facial expression recognition model to the application manager executed on PC:A. The application manager manages exchanges to and from the model and robot interfaces.

We execute the application on three to six machines, depending on the configuration:

  • PC:A (mware: YARP): Running the application manager and forwarding messages to and from the FER model.

  • S:1 (mware: YARP): Running the FER model and forwarding messages to and from the application manager.

  • PC:104 (mware: YARP): Running on the physical iCub robot (only needed when running the physical iCub robot).

  • PC:ICUB (mware: YARP): Running the iCub robot control workflow (only needed when running the physical or simulated iCub robot).

  • PC:PEPPER (mware: YARP, ROS): Running the Pepper robot control workflow (only needed when running the physical Pepper robot).

  • PC:WEBCAM (mware: YARP): Running the webcam interface for acquiring images from the webcam (only needed when running the simulated robot).

Note: For this tutorial, PC:ICUB, PC:WEBCAM, and PC:PEPPER scripts are running on PC:A to simplify the process. However, they could also be executed on dedicated machines as long as the network configurations (roscore and yarpserver IP addresses) are set correctly.

At least one of either two robot PCs (PC:ICUB and PC:PEPPER) must be running for the application to work. The webcam interface (PC:WEBCAM) is optional and is only needed if we want to acquire images from a webcam rather than a robot’s camera. We note that all machine scripts can be executed on a single machine, but we distribute them across multiple machines to demonstrate the flexibility of the Wrapyfi framework.

Images arrive directly from each robot’s camera:

  • The iCub robot image arrives from the left eye camera having a size of \(320\times240\) pixels and is transmitted over YARP at 30 FPS.

  • The Pepper robot image arrives from the top camera having a size of \(640\times480\) pixels and is transmitted over ROS at 24 FPS. The image is directly forwarded to the facial expression model, resulting in a predicted expression returned to the corresponding robot’s LED interface.

Modifying the FER Model

To integrate Wrapyfi into the ESR9 facial expression recognition model, we first need to modify the model to accept and return data from and to the robot interfaces.

This is achieved by using Wrapyfi interfaces which provide minimal examples of how to design the structure of templates and common interfaces, used for large-scale and complex applications. Templates and interfaces limit the types of data that can be transmitted. We can of course decide to transmit custom objects, something that Wrapyfi was designed to enable in the first place. However, in instances where we would like multiple applications to communicate and understand the information transmitted, a common structure must be introduced to avoid creating specific interfaces for each new application.

Receiving Images from the Robot Interfaces

The model acquires images using OpenCV’s VideoCapture class. We modify the model to receive images from the robot interfaces by replacing the VideoCapture class with a class that receives and returns images from any middleware supported by Wrapyfi. The modified class is defined in the Wrapyfi video interface. This interface is identical to the VideoCapture class, except that it receives and returns images from any middleware supported by Wrapyfi.

The interface is used to receive images from the robot interfaces by passing the middleware name and topic name to the interface constructor:

from wrapyfi_interfaces.io.video.interface import VideoInterface

cap = VideoInterface("/icub/cam/left", mware="yarp")

In the example above, the interface receives images from the iCub robot’s left eye camera over YARP. Note that here we replace the VideoCapture source with the topic name to which the robot’s framework publishes the images. Similarly, we can receive images from the Pepper robot’s top camera over ROS by passing the topic name to the interface constructor:

cap = VideoInterface("/pepper/camera/front/camera/image_raw", mware="ros")

Getting the return value from the interface is identical to the VideoCapture class:

ret, frame = cap.read()

with every call to cap.read() returning a boolean value ret indicating whether the frame was successfully read, and the image frame itself.

Sending the Recognized Expression to the Robot Interfaces

The Facial Expression Message Template provided as part of the Wrapyfi interfaces collection, allows for standardized transmission of information relating to affect. This template is similar in operation to other interfaces, where instead of wrapping methods with the Wrapyfi registry decorator, we simply call the method with arguments specifying “what” should be transmitted and “where/how” (as in the port/topic address, communication pattern, middleware).

We first import the template and instantiate it:

from wrapyfi_interfaces.templates.facial_expressions import FacialExpressionsInterface

_FACIAL_EXPRESSION_BROADCASTER = FacialExpressionsInterface(facial_expressions_port_out=facial_expressions_port,
    mware_out=facial_expressions_mware, facial_expressions_port_in="")

Setting the facial_expressions_port_out and mware_out arguments tells the template that it should activate its communication in publish mode, meaning that it would be transmitting emotion categories rather than receiving them. In this case, we specify the receiving port as empty, since receiving affective signals is not needed.

Next, we must send the prediction signal (the emotion category, scores, emotion continuous—arousal and valence, emotion index, etc.). This is done by calling transmit_emotion() everytime a prediction is made:

prediction, = _FACIAL_EXPRESSION_BROADCASTER.transmit_emotion(*(_predict(input_face, device)),
    facial_expressions_port=facial_expressions_port, _mware=facial_expressions_mware)

Where the prediction dictionary is transmitted over the middleware and returned as prediction from the method call. Now, any template called from another instance of the same application or any other application subscribed to the specified port/topic on the same middleware within the network should be able to receive the prediction dictionary. This allows the robot or any application manager to receive the values predicted by the model at any step in time, as long as the model ESR9 is running.

Pre-Requisites:

Note: The following installation instructions are compatible with Ubuntu 18-22 and are not guaranteed to work on other distributions or operating systems. All installations must take place within a dedicated virtualenv, mamba/micromamba, or conda environment.

  • Install Wrapyfi with all requirements (including NumPy, OpenCV, PyYAML) on all machines (excluding PC:104). Throughout this tutorial, we assume that all repositories are cloned into the $HOME\Code directory. Wrapyfi should also be cloned into the $HOME\Code directory in order to access the examples.:

    cd $HOME/Code
    git clone https://github.com/fabawi/wrapyfi.git
    cd wrapyfi
    pip install .
    pip install "numpy>1.17.4,<1.26.0" "opencv-python>=4.2.0.34,<4.6.5.0" "pyyaml>=5.1.1"
    
  • Install SciPy for performing median smoothing (on PC:ICUB and PC:PEPPER):

    # could be installed in several ways, but we choose pip for simplicity
    pip install "scipy==1.9.0"
    
  • Install PyTorch for running the facial expression recognition model (on S:1):

    # could be installed in several ways, but we choose pip for simplicity
    pip install "torch==1.12.1" "torchvision==0.13.1"
    
  • Install the ESR9 FER model with Wrapyfi requirements (on S:1):

    cd $HOME/Code
    git clone https://github.com/modular-ml/wrapyfi-examples_ESR9.git
    cd wrapyfi-examples_ESR9
    pip install -r requirements.txt
    

Cloning the Wrapyfi interfaces repository on all machines (excluding PC:104) is needed since it provides dedicated interfaces for communicating with the robots, acquiring and publishing webcam images, and providing message structures for standardizing exchanges between applications:

cd $HOME/Code
git clone https://github.com/modular-ml/wrapyfi-interfaces.git

and add it to the PYTHONPATH environment variable:

export PYTHONPATH=$PYTHONPATH:$HOME/Code/wrapyfi-interfaces

When Using the Pepper Robot with NAOqi 2.5:

Note: Installation instructions apply to PC:PEPPER

  • ROS and Interfaces:

    • Install ROS Noetic or Robostack bundling of ROS Noetic in a mamba or micromamba environment

    • Install the camera info manager for the Pepper camera on local system: sudo apt install ros-noetic-camera-info-manager or within a Robostack env: micromamba install -c robostack ros-noetic-camera-info-manager

    • Activate and source ROS on local system: source /opt/ros/noetic/setup.bash or activate the Robostack env: micromamba activate ros_env

    • Clone the Pepper Camera package:

    cd $HOME/Code
    git clone https://github.com/modular-ml/pepper_camera.git
    
    • Install the Pepper Camera dependencies on local system: sudo apt install libgstreamer1.0-dev gstreamer1.0-tools or within a Robostack env: micromamba install gst-plugins-base gst-plugins-good gstreamer -c conda-forge

    • Create a ROS workspace and link the Pepper Camera resources into it:

    mkdir -p $HOME/pepper_ros_ws/src
    cd $HOME/pepper_ros_ws
    ln -s $HOME/Code/pepper_camera src/pepper_camera
    
    • Compile the ROS node using catkin:

    catkin_make
    
  • Docker with NAOqi & ROS Kinetic - Python 2.7:

    cd $HOME/Code
    git clone https://github.com/modular-ml/pepper-ros-docker.git
    
    • Build the Pepper ROS Docker image:

    cd pepper-ros-docker
    docker build . -t minimal-pepper-ros-driver
    

When Using the iCub Robot:

Note: Installation instructions apply to PC:ICUB. They can also be followed for PC:A, S:1, PC:WEBCAM, and PC:PEPPER, however, only YARP with Python bindings is needed for these machines. If these machines have their required packages and Wrapyfi installed inside a mamba or micromamba environment, then installing the following within the environment should suffice: micromamba install -c robotology yarp

Running the Application

Easy: iCub simulation only; running all scripts on a single machine

Here we mirror the facial expressions of an actor facing a webcam on a simulated iCub robot. The images from the webcam are streamed to the ESR9 (Siqueira et al., 2020) FER model, which then classifies their facial expressions and returns the predicted class to the application manager (robot workflow manager). The manager transmits the readings to the iCub interface and displays an approximated facial expression on the robot’s face.

Preparing the iCub robot (in simulation)

Start the yapserver to enable communication with the iCub robot (on any machine):

yarpserver

Start the iCub simulator (on PC:ICUB):

iCub_SIM

The facial expressions shown on the iCub’s face are not enabled by default when running the iCub simulator, so we need to start the iCubFaceExpressions module to enable them (on PC:ICUB):

simFaceExpressions

Start the iCub emotion interface to receive the facial expressions on a specific port/topic (on PC:ICUB):

emotionInterface --name /icubSim/face/emotions --context faceExpressions --from emotions.ini

Connect the iCub simulator ports to the iCub emotion interface (on PC:ICUB):

yarp connect /face/eyelids /icubSim/face/eyelids
yarp connect /face/image/out /icubSim/texture/face
yarp connect /icubSim/face/emotions/out /icubSim/face/raw/in

Running the iCub interface

Start the iCub interface to receive the facial expressions from the application manager and activate the facial expressions on the iCub robot (on PC:ICUB):

cd $HOME/Code/wrapyfi-interfaces
python wrapyfi_interfaces/robots/icub_head/interface.py \
--simulation --get_cam_feed \
--control_expressions \
--facial_expressions_port "/control_interface/facial_expressions_icub"

Start the camera interface to receive images from the webcam and forward them to the application manager (on PC:WEBCAM):

cd $HOME/Code/wrapyfi/examples/applications
python wrapyfi_interfaces/io/video/interface.py --mware yarp --cap_source "0" --fps 30 --cap_feed_port "/control_interface/image_webcam" --img_width 320 --img_height 240 --jpg

Start two mirrored instances of the application manager (on PC:A and PC:ICUB, respectively):

  1. The first instance is responsible for running the application workflow (on PC:A):

cd $HOME/Code/wrapyfi/examples/applications
WRAPYFI_DEFAULT_COMMUNICATOR="yarp" python affective_signaling_multirobot.py --wrapyfi_cfg wrapyfi_configs/affective_signaling_multirobot/COMP_mainpc.yml --cam_source webcam
  1. The second instance is responsible for running the robot (iCub) control workflow (on PC:ICUB):

cd $HOME/Code/wrapyfi/examples/applications
WRAPYFI_DEFAULT_COMMUNICATOR="yarp" python affective_signaling_multirobot.py --wrapyfi_cfg wrapyfi_configs/affective_signaling_multirobot/OPT_icubpc.yml --cam_source webcam

Note: running two instances is not necessary if we configure a single script to handle all exchanges; however, we do so to separate the application workflow from the robot control workflows. In this example, where we run a single robot, the utility of such separation is not apparent. If we were to merge the workflows in the main configuration COMP_mainpc.yml file, then we must also run the workflow for robot A when wanting to run robot B only.

Run the ESR9 FER model, acquiring images from the webcam and forwarding the recognized expression to the application manager (on S:1):

cd $HOME/Code/wrapyfi-examples_esr9/
export PYTHONPATH=$HOME/Code/wrapyfi-interfaces:$PYTHONPATH
python main_esr9.py webcam -w "/control_interface/image_esr9" -d -s 2 -b --frames 10 --max_frames 10 --video_mware yarp --facial_expressions_mware yarp --facial_expressions_port "/control_interface/facial_expressions_esr9" --face_detection 3 --img_width 320 --img_height 240 --jpg

Outcome: Make sure you are facing the webcam and you should now be able to see the simulated iCub robot changing his facial expressions, corresponding to your own.

Intermediate: iCub & Pepper; running scripts on multiple machine

Here we mirror the facial expressions of an actor facing the Pepper or iCub robot camera on both (physical) robots. The images from the chosen camera are streamed to the ESR9 (Siqueira et al., 2020) FER model, which then classifies their facial expressions and returns the predicted class to the application manager (robot workflow manager). The manager transmits the readings to the iCub and Pepper robot interfaces, displays an approximated facial expression on the iCub robot’s face, and triggers a color change on the Pepper robot’s eye and shoulder LEDs.

Preparing the iCub robot

Hardware preparation:

  • Connect the iCub robot to the power supply and switch it on (please follow the instructions specific to your iCub robot)

  • Connect your iCub robot’s (PC:104) ethernet cable to a network switch attached to all other machines (excluding PC:WEBCAM which is not needed in this setup)

Start the yapserver to enable communication with the iCub robot (on any machine):

yarpserver

Note: Ensure every PC is configured to detect yarpserver. Assuming the yarpserver is running on a machine with an IP <IP yarpserver>:

yarp detect <IP yarpserver> 10000

Initialize and configure the iCub camera device on a specific port/topic (on PC:104):

yarpdev --from camera/ServerGrabberDualDragon.ini --split true --framerate 30

Initialize and configure the iCub emotion device on a specific port/topic (on PC:104):

yarpdev --name /icub/face/raw --device serial --subdevice serialport --context faceExpressions --from serialport.ini

Start the iCub emotion interface to receive the facial expressions on a specific port/topic (on PC:104):

emotionInterface --name /icub/face/emotions --context faceExpressions --from emotions.ini

Connect the input/output ports for expression reading and writing (on PC:104):

yarp connect /icub/face/emotions/out /icub/face/raw/in

Preparing the Pepper robot

Hardware preparation:

  • Connect an ethernet cable to the back of the Pepper robot’s head

  • Connect the other end of the ethernet cable to a network switch attached to all other machines (excluding PC:WEBCAM which is not needed in this setup)

  • Switch on the Pepper Robot

  • On initialization completion, press the chest button on the Pepper robot for him to speak out its current IP. This IP will be referred to as <IP Pepper>

Build the Pepper ROS workspace and start the roscore to enable communication with the Pepper robot (on PC:PEPPER):

cd $HOME/pepper_ros_ws
catkin build
roscore

Note: Ensure the Pepper ROS Docker container (and any other machine using ROS if manual changes to the configuration files are made) is configured to detect the roscore URI. Assuming the roscore is running on PC:PEPPER with an IP <IP roscore>:

export ROS_MASTER_URI=<IP roscore>

If the Pepper ROS Docker image was built under the name minimal-pepper-ros-driver:latest, start the container (on PC:PEPPER):

docker ps -a
		
# If no container exists:
docker run -it --network host --name pepperdock minimal-pepper-ros-driver:latest
	
# If a container exists but is 'exited':
docker start pepperdock

Launch the Pepper robot’s interfaces within the container (on PC:PEPPER):

docker exec -it pepperdock bash -i
  	export ROS_MASTER_URI=http://<IP roscore>:11311
  	roslaunch pepper_extra pepper_wrapyfi.launch ip:=<IP Pepper> 

Call the ROS services on the Pepper robot to start them within the docker container. The robot should transition to an idle mode without movement and speak out (on PC:PEPPER):

docker exec -it pepperdock bash -i
  	export ROS_MASTER_URI=http://<IP roscore>:11311
  	rosservice call /pepper/pose/idle_mode "{idle_enabled: true, breath_enabled: false}"
  	rosservice call /pepper/pose/home
  	rosservice call /pepper/speech/say "{text: 'hello and welcome, my name is pepper', wait: false}"

Running the robot interfaces

Start the iCub interface to receive the facial expressions from the application manager and activate the facial expressions on the iCub robot (on PC:ICUB):

cd $HOME/Code/wrapyfi-interfaces
python wrapyfi_interfaces/robots/icub_head/interface.py \
--get_cam_feed \
--control_expressions \
--facial_expressions_port "/control_interface/facial_expressions_icub"

Start the Pepper interface to receive the facial expressions from the application manager and enable the LED color changes on the Pepper robot (on PC:PEPPER):

source $HOME/pepper_ros_ws/devel/setup.bash
cd $HOME/Code/wrapyfi-interfaces
python wrapyfi_interfaces/robots/pepper/interface.py \
--get_cam_feed \
--control_expressions \
--facial_expressions_port "/control_interface/facial_expressions_pepper"

Start three mirrored instances of the application manager (on PC:A, PC:ICUB, and PC:PEPPER, respectively):

  1. The first instance is responsible for running the application workflow (on PC:A):

cd $HOME/Code/wrapyfi/examples/applications
WRAPYFI_DEFAULT_COMMUNICATOR="yarp" python affective_signaling_multirobot.py --wrapyfi_cfg wrapyfi_configs/affective_signaling_multirobot/COMP_mainpc.yml --cam_source pepper
  1. The second instance is responsible for running the robot (iCub) control workflow (on PC:ICUB):

cd $HOME/Code/wrapyfi/examples/applications
WRAPYFI_DEFAULT_COMMUNICATOR="yarp" python affective_signaling_multirobot.py --wrapyfi_cfg wrapyfi_configs/affective_signaling_multirobot/OPT_icubpc.yml --cam_source pepper
  1. The third instance is responsible for running the robot (Pepper) control workflow (on PC:PEPPER):

cd $HOME/Code/wrapyfi/examples/applications
WRAPYFI_DEFAULT_COMMUNICATOR="yarp" python affective_signaling_multirobot.py --wrapyfi_cfg wrapyfi_configs/affective_signaling_multirobot/OPT_pepperpc.yml --cam_source pepper

Note: The --cam_source argument can be set to either icub or pepper, defining where from the image arrives. Switching the camera source requires minimal changes to the control workflow instances and does not affect the FER model since the camera image is forwarded from the source to a dedicated topic/port to which the FER subscribes.

Run the ESR9 FER model, acquiring images from the webcam and forwarding the recognized expression to the application manager (on S:1):

cd $HOME/Code/wrapyfi-examples_esr9/
export PYTHONPATH=$HOME/Code/wrapyfi-interfaces:$PYTHONPATH
python main_esr9.py webcam -w "/control_interface/image_esr9" -d -s 2 -b --frames 10 --max_frames 10 --video_mware yarp --facial_expressions_mware yarp --facial_expressions_port "/control_interface/facial_expressions_esr9" --face_detection 3 --img_width 320 --img_height 240 --jpg

Outcome: Make sure you are facing the right camera (Pepper or iCub) and you should now be able to see the robots changing their facial expressions (iCub) or LED colors (Pepper) corresponding to your facial expressions.