<p dir="ltr">As highlighted in recent surveys, one of the biggest barriers to progress in code-switching (CS) research is the limited availability—both in quantity and quality—of annotated code-switched text (Doğruöz et al., 2023; Mondal et al., 2022). Winata et al. (2022) further show that this issue is particularly acute in the South African context. The survey reports that only a small body of CS research exists for South African languages and that data resources remain limited or absent. Notably, for Afrikaans–English specifically, there are no publicly available code-switched datasets.</p><p dir="ltr">Traditionally, researchers have relied on naturally occurring CS data from social media, speech recordings and manual or automated transcriptions. While these sources offer authenticity and sociolinguistic richness, they each introduce challenges such as ethical and privacy concerns, noise and inconsistency, costly annotation processes, and domain imbalance. Parallel corpora and substitutive generation methods also offer avenues for artificial CS creation, but for Afrikaans–English these corpora are either highly domain-specific or overly general and do not necessarily reflect naturally occurring switching patterns.</p><p dir="ltr">Motivated by these limitations, our study explores an alternative and increasingly relevant solution: generating synthetic CS data using multilingual large language models (LLMs). We develop a controlled prompting framework using <i>GPT-4o</i> and <i>Gemini 2.0-flash</i> to produce aligned Afrikaans, English, correct CS and incorrect CS sentences (constituting a sentence set) across diverse topics.</p><p dir="ltr">A total of 2000 sentence sets were generated (LLM_GPT_Gemini_2000_raw). The data set consists of 500 sentences each of the following: GPT generation with no CS-specific guidelines, GPT generation with explicit CS rules, Gemini generation with no CS-specific guidelines, Gemini generation with explicit CS rules.</p><p dir="ltr">Both GPT and Gemini were used as LLM-as-as-judge to evaluate the acceptability of generated sentences, and these ratings are given in the data set LLM_GPT_Gemini_2000_raw.</p><p dir="ltr">A subset of 200 sets of sentences was evaluated by humans for acceptability. The majority ratings were used and is given as a final validated data set: Final_annotated_200.</p>