Parentage analyses, selection of single nucleotide polymorphisms (SNPs) and design of panels for pedigree verification of the Bonsmara and Drakensberger cattle
The dataset was generated from the GeneSeek® Genomic Profiler (GGP) Bovine 150K BeadChip single nucleotide polymorphism (SNP) panel using 1 563 and 1 022 animals of the Bonsmara and Drakensberger, respectively. Data edits and calculations were all performed using the R software. After removing duplicate SNPs, those on the X, Y, and MT chromosomes, and those with unknown locations, a total of 119 375 autosomal SNPs based on the UMD3.1 bovine genome remained. Only animal genotypes with call rate more than 90% were considered for downstream analysis. The dataset only included 185 of the 200 parentage SNPs of the International Society for Animal Genetics (ISAG). The 185 SNPs were then evaluated for effectiveness in the Bonsmara and Drakensberger cattle populations. Results showed that the 185 ISAG were not sufficient for parentage testing in the two South African breeds.
The primary dataset consisting of 119 375 number of SNPs was used to validate the parent-offspring relationships recorded in pedigree and to detect regions exhibiting hemizygous deletions.
Additional quality control edits applied to the 119 375 SNPs included the removal of genotypes with median GenCall (GC), GenTrain (GT) scores lower than 0.60 and 0.55, respectively. All SNPs that departed from Hardy–Weinberg equilibrium (p < 0.001), with missing genotype rate of more than 5% and where the minor allele frequency (MAF) was <0.05 across the populations were discarded. After edits, 2 585 animals (BON = 1 563 and DRB = 1 022) with 92 835 autosomal SNPs remained. This dataset was then used for parentage assignment and reconstruction of pedigree records using the gawk and Hsphase R package scripts, respectively.
Finally, the design of low-cost genotype parentage panels consisting of 200 SNPs per breed were selected from 78 286 SNP markers that had a GT score and GC score of 0.60>= and 0.55>=, respectively; a call rate of 0.99%>=, MAF 0.05>=, and those that had a HWE p-value > 0.001. To minimize linkage disequilibrium (LD) among selected SNPs, the selected SNPs had to be at least 1 Mb apart and with a LogRRatio > 0.001.