In modeling visual backward masking, the focus has been on temporal effects. More specifically, an explanation has been sought as to why strongest masking can occur when the mask is delayed with respect to the target. Although interesting effects of the spatial layout of the mask have been found, only a few attempts have been made to model these phenomena. Here, we elaborate a structurally simple model which employs lateral excitation and inhibition together with different neural time scales to explain many spatial and temporal aspects of backward masking. We argue that for better understanding of visual masking, it is vitally important to consider the interplay of spatial and temporal factors together in one single model.