Language allows us to efficiently communicate about the things in the world around us. Seemingly simple words like 'this' and 'that' are a cornerstone of our capability to refer, as they contribute to guiding the attention of our addressee to the specific entity we are talking about. Such demonstratives are acquired early in life, ubiquitous in everyday talk, often closely tied to our gestural communicative abilities, and present in all spoken languages of the world. Based on a review of recent experimental work, we here introduce a new conceptual framework of demonstrative reference. In the context of this framework, we argue that several physical, psychological, and referent-intrinsic factors dynamically interact to influence whether a speaker will use one demonstrative form (e.g., this) or another (e.g., that) in a given setting. However, the relative influence of these factors themselves is argued to be a function of the cultural language setting at hand, the theory-of-mind capacities of the speaker, and the affordances of the specific context in which the speech event takes place. It is demonstrated that the framework has the potential to reconcile findings in the literature that previously seemed irreconcilable. We show that the framework may to a large extent generalize to instances of endophoric reference (e.g., anaphora) and speculate that it may also describe the specific form and kinematics a speaker’s pointing gesture takes. Testable predictions and novel research questions derived from the framework are presented and discussed.