We present the collection and annotation of a multi-modal database with negative human-human interactions. The work is part of supporting behavior recognition in the context of a virtual reality aggression prevention training system. The data consist of dyadic interactions between professional aggression training actors (actors) and naive participants (students). In addition to audio and video, we have recorded motion capture data with kinect, head tracking, and physiological data: Heart rate (ECG), galvanic skin response (GSR) and electromyography (EMG) of biceps, triceps and trapezius muscles. Aggression levels, fear, valence, arousal and dominance have been rated separately for actors and students. We observe higher inter-rater agreement for rating the actors than for rating the students, consistently for each annotated dimension, and a higher inter-rater agreement for speaking behavior than for listening behavior. The data can be used among others for research on affect recognition, multimodal fusion and the relation between different bodily manifestation.