Overview
The project’s main aim is to create a curated database of secondary data sources and data files from research publications that relate to ‘social trust’. The work underpinning the dataset will include: (1) a targeted literature review identifying all the relevant publications and whether they include publicly available datasets; (2) securing datasets from important studies that do not have publicly available data directly form the authors; (3) cleaning, recoding, transforming, and documenting the datasets. The primary purpose of the dataset is to serve as a teaching dataset underpinning a textbook on ‘Sociological Methodology Applied to Research on Trust’. The textbook will have an innovative focus that introduces specific quantitative research methods as part of a journey through inter-disciplinary academic research on the concept of ‘social trust’. An initial textbook proposal application - including two draft example chapters and a detailed appendix chapter documenting the TRMD dataset - to be submitted to Oxford University Press is also part of the project’s aimed outputs, as is an article for the journal Teaching Statistics. The dataset itself can be used by other educators around the world.
Statement of purpose:
The main aim of the proposed project is to create a curated database of secondary quantitative replication data on the topic of (social, institutional and interpersonal) ‘trust’. This database will serve several purposes, of which the following will be actively pursued within the timeframe of the BA/Leverhulme Small Research Grant:
- It will be the data foundation for a planned sociological research methodology textbook;
- It will allow educators teaching quantitative methods in the social sciences to easily develop demonstrative examples of statistical analyses beyond those to be included in the textbook;
- It will allow students of quantitative research methods to reproduce results from published research beyond the examples included in the textbook, but related to the additional exercises and methods presented there;
- It will provide a publicly available online resource for researchers starting out in the area of empirical social trust research;
- It will contribute and advance current initiatives to promote open research practices through data and code sharing.
The motivation behind the project has two scholarly foundations. One is educational and relates to the need to develop a more applied and engaging quantitative pedagogy for social science disciplines that are methodologically and epistemologically eclectic (such as sociology). The other motivation is meta-scientific and relates to ongoing struggles across the social sciences to establish more open and reproducible research practices. I will briefly describe how the two interrelate, and how the proposed programme of activities.
Context and background:
Quantitative sociology lags behind other social science disciplines – such as political science or economics – in its adoption of open research practices, which aim to ensure that published results are reproducible and the analytical procedures transparent (Ferguson et al. 2023; Freese 2007; Freese and Peterson 2020; Moody, Keister, and Ramos 2022; Weeden 2023). The reasons for this are multifaceted. Weeden (2023) highlights the internal fragmentation of sociology along a plurality of epistemological and methodological orientations. Another reason might be that quantitative sociologists rarely use experimental methods (Gërxhani and Miller 2022) , which have been at the forefront of the recent shift towards open science practices in psychology (Ferguson et al. 2023). On the other hand, pursuing causal explanations, which is at the heart of econometrics, is somewhat marginal in sociological science (Gërxhani and Miller 2022; Goldthorpe 2016); sociologists tend instead to rely on large-scale survey data to target broad estimands that generally result in low p-values (and low effect sizes). As many have argued, that’s not ideal for the purposes of scientific advancement (Lundberg, Johnson, and Stewart 2021; Smaldino and McElreath 2016; McElreath and Smaldino 2015).
Regardless of the reasons for their limited adoption, based on large surveys of academic practitioners, Ferguson et al. (2023), p. 3 have found that “there are fairly high levels of stated support for open science, even among scholars in a discipline like sociology, where there is less institutionalization of these practices”. Institutionalisation and research culture are key elements in achieving advancements in respect to open science practices. One major factor affecting both, which sociology shares with its disciplinary peers, is the lack of methodological socialisation in reproducibility practices. Potential solutions have long been promoted in some trailblazing centres of excellence – like Harvard University’s Institute for Quantitative Social Science (King 2006) – but have only recently started gaining wider adoption through teaching reproducibility practices as part of applied research methods training at both undergraduate (Ball 2023; Vilhuber et al. 2022) and postgraduate levels (Stojmenovska, Bol, and Leopold 2019). It takes small steps to achieve giant leaps, and these developments will undoubtedly change the teaching of applied quantitative research methods in the social sciences for the better, which will in turn shape the scientific practices of future generations of researchers.
My own teaching practice has sought to contribute to these positive trends. I have had the opportunity to design quantitative research modules at both undergraduate and postgraduate levels at two institutions, and a few years ago I implemented a simple but unorthodox approach: instead of introducing commonly used methods in turn and separately from one another, take the students on a journey across one coherent research theme, deconstructing the data and methods underpinning selected key articles and incrementally reproducing (parts of) the original analyses. It not only introduced students to replication packages and data repositories, but they could learn how basic methods relate to the more complex analyses reported in published research. As for the research theme, I settled on “social trust”, which is not only the focus of a large body of academic scholarship but is facilitated by the inclusion of useful measurement variables in all the major cross-national large-scale social survey programmes (such as the European Social Survey or the World/European Values Survey/Study) that provide publicly available data.
Designed around this basic idea, my recent methodological teaching has incorporated a number of open research practices, including the public sharing of computer code usable to prepare raw data from popular secondary survey datasets into datasets ready for analysis on public course websites that I wrote and maintain. This teaching portfolio has now matured into a concrete need to develop my course notes and programming code into a more structured textbook that would make a broader contribution to the advancement of quantitative pedagogy in sociology and other social sciences. It will fit well among other recent approaches that place the idea of story-telling and replication at the centre of quantitative teaching and research dissemination (Alexander 2023; Gelman, Hill, and Vehtari 2020; Gelman and Basbøll 2014). This project aims to create the essential data infrastructure that will underpin this textbook.
Methodology and activities:
The first methodological step in the creation of a curated database of secondary quantitative replication data on social trust will be a systematic data review. This will map the field and identify all the relevant potential datasets. As part of this Stage 1, the following activities will be carried out:
- Review the methods and data used in recent (last 5 years) research outputs that contain at least one central variable relating to “social trust”, and classify the sources along several criteria, including details about which statistical methods it can help exemplify and whether a replication package is publicly available;
- Find, download and save relevant publicly available replication datasets
- Identify very relevant articles with potentially available replication data not made public; contact the authors to secure datasets and code for the purpose of replication and secure approval from the authors to potentially use the dataset and their publication in a textbook and associated resources.
The second step (Stage 2) will be to prepare the datasets for use as teaching resources. This requires cleaning the datasets and reducing them to the ideal size that is sufficient and necessary to replicate the central analysis involving “social trust” in each of the selected publications. The following activities will make up this stage:
- Writing and clearly documenting computer code either from scratch or revising/translating any supplied replication code files;
- Where the originally published analysis relies on raw data from a publicly available large-scale survey, the code will be developed in such a way that it reproduces the analysis dataset from the raw data made available by the survey organisations (e.g. ESS, WVS, ISSP, BSA).
The third and fourth steps will take place in parallel. In Stage 3, the final pool of datasets prepared at Stage 2 will be thoroughly documented and, alongside the larger review table created in Stage 1, will be made publicly available via a free GitHub website. This will result in a public resource for the research and student community worldwide: the Trust Research Methodology Database (TReMeDa). Stage 4 is more directly targeted at the textbook project, and it involves creating additional smaller datasets appropriate for demonstrating specific methods and related to computing exercises to be included either in the main body of the print version of the planned textbook or on the larger and more flexible body of online resources associated with it.