The semester is organized within the framework of the Berlin Mathematics Research Center MATH+ and supported by the Einstein Foundation Berlin. We are committed to fostering an atmosphere of respect, collegiality, and sensitivity. Please read our MATH+ Collegiality Statement.

The second workshop will reveal different mathematical approaches to the analysis of small data sets in order to make the role of mathematics in this overall process more transparent.

At the first workshop we had already discussed the question: When using mathematical methods, how does expert knowledge factor into the results of non-mathematical research? We examined this with regard to small data sets.

With small data sets, it is not the unmanageability of the data that forces us to use “complex” methods to analyze it. The decision to use mathematics has other reasons. But if given data is only transformed with the help of mathematics using a given algorithm, then the actual gain in knowledge does not lie in the mathematical step. The selection of research objects and their coordination, the selection of the method and the interpretation of the mathematical results are the actual steps of the overall process in which gaining knowledge plays a role. But because these steps are carried out by experts, questions arise about the connection between the mathematical method and our knowledge.

Not so much “how” but more “what do you use mathematical methods for?” How do they lead to hypotheses? Are they only used to check the consistency of given hypotheses? How are possible uncertainties negotiated? Is there an open or hidden “physical” connection in the research object that can be modeled mathematically? Is the (perhaps misunderstood) rigor of mathematics used to support one’s own arguments?

- Nataša Conrad
- Wolfgang König
- Vijay Natarajan
- Christoph von Tycowicz
- Dennis Mischke
- Marcus Weber
- Sarah Wolf
- Fabian Telschow
- Vikram Sunkara.

Use our mailing list TES_small_data@zib.de for communication. The participants of the workshop received a link to a cloud folder to can download the slides.

**Wed. 17.01.2024**

09:00 – 10:00 Registration – Coffee & Hanging Posters

10:00 – 10:30 Short Introduction

10:30 – 12:00 Session “Dynamical Models”

12:00 – 13:30 LUNCH Break

13:30 – 15:00 Session “Large Deviation Theory”

15:00 – 15:30 COFFEE Break

15:30 – 17:00 Session “Topological Data Analysis”

17:00 – 18:30 Workshop Reception

**Thu. 18.01.2024**

COFFEE available

09:00 – 10:30 Session “Geometric Data Analysis”

10:30 – 10:45 Short Break

10:45 – 12:15 Session “Data Temporality”

12:15 – 13:30 LUNCH Break

13:30 – 15:00 Session “Complexity”

Speakers:

- Evelyn Gius, Digital Philology, TU Darmstadt
- Nan Z. Da, Department of English, John Hopkins University
- Mihai Prunescu, Institute of Mathematics, Academy of Romania

Abstract:

In this session, we will start with “small data analysis” as it is done in Computational Literary Studies. Especially the possibilities and impossibilities of computational methods applied to literature will be discussed with two speakers from this field. Is it a matter of complexity of thoughts or of missing contextual knowledge in the data that disables mathematical approaches? Do we have a scheme that compares literature complexity with the complexity of computational methods?

We will see that some tasks performed by experts in comparative studies can be expressed by operations on infinite(?) boolean rings. Some research questions can eventually be modeled mathematically by satisfiability problems in this way.

This leads to the question of the complexity of these problems. With the last speaker of this session, we will discuss whether, in this mathematical field, P≠NP holds. Thus, is there a concrete mathematical problem in literature studies that is too complex to be solved deterministically in polynomial time?

15:00 – 15:30 COFFEE Break

15:30 – 17:00 Session “Statistical Inference and Modeling”

Abstract: Functional Magnetic Resonance Imaging (fMRI) Neuroscience is a relatively new discipline, with the first fMRI experiment having been performed as recently as 1991. Due to the expensive and time consuming nature of the method, small sample studies are historically ubiquitous in the fMRI literature. For instance, in 2018 the median sample size of high-impact neuroimaging papers was only n=24, with an average sample size across all literature of n=60 ([1],[2]). Such sample sizes are problematic, as they result in lower statistical power, increased risk of false positives, and reduced generalizability of findings; problems which are greatly exacerbated in the large data imaging setting.

This talk shall provide a broad overview of how fMRI statisticians handle various problems associated with small-sample studies. Following a brief overview of fMRI as a subject, we shall touch on (1) how fMRI scientists use permutation and bootstrapping techniques to address distributional assumptions violated in the small sample setting, (2) current meta-analysis and data sharing techniques employed to synthesize small study analysis results and (3) the recent move towards big data cohorts and the pros and cons such a shift brings.

[1] – Yeung AWK (2018) An updated survey on statistical thresholding and sample size of fmri studies. Frontiers in Human Neuroscience 12, DOI 10.3389/fnhum.2018.00016

[2] – Szucs D, Ioannidis JP (2020) Sample size evolution in neuroimaging research: An evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018)

in high-impact journals. NeuroImage 221:117164, DOI https://doi.org/10.1016/j.neuroimage.2020.117164

17:00 – 18:30 Refreshments / Poster Session (Post-Its / Discussions)

**Fri. 19.01.2024**

COFFEE available

09:00 – 10:30 Session “Uncertainty”

Speakers:

- Hans-Liudger Dienel (TU Berlin)
- Andrea Heilrath (TU Berlin) on “Beyond Numbers: The Art of Uncertainty Visualization”
- Claude Garcia (Bern University of Applied Sciences) on “Choices we make in times of crisis: better representing agency in our models”

Abstract:

Mathematics has developed various concepts and methods to represent, analyse, and quantify uncertainty. Mathematical models of the systems under study and their (often simulated, stochastic) outcomes can support decision making under uncertainty. This session wants to look into interesting open challenges, in particular, in the (overlapping) contexts of small data and of complex social systems, such as: How can different kinds of uncertainty be accounted for? How can uncertainty be visualised and communicated, in particular, beyond a mathematical audience? How can models deal with uncertainties arising from human interaction?

10:30 – 10:45 Short Break

10:45 – 12:00 “Join-In” Discussion

12:00 – 13:00 LUNCH Break / Poster Session (Collecting Post-Its)

13:00 – 14:30 Session “Machine Learning”

14:30 – 15:00 Closing Remarks

Thank you very much to all participants (71 in presence at ZIB over the three days and more than 10 virtually online). It was fun to discuss mathematical and interdisciplinary aspects of small data analysis. We have one weekend for digesting the highlights of the workshop. See you at our Hackathon!