The genome-wide analysis of genetic associations with lipid metabolism indicators was carried out using the technology of Bayesian networks (BN). It was performed to diagnose polygenic hypercholesterolemia on the basis of genetic data of the Russian population of patients. The data of 1,200 patients was analyzed. 196725 SNPs as well as clinical data, lipid pro_le indicators _ di_erent types of cholesterol _ were obtained for each of them. The genome-wide association analysis (GWAS) and the statistical method of Pearson's chisquared test were used for the initial selection of the most signi_cant parameters. Two of the patient states related to a lipid metabolism were studied. These states are the level of LDLC (low density lipoprotein) and the level of HDL-C (high density lipoprotein). The Bayesian networks having the simplest topology _ naive _ were used to predict the level of lipoprotein. The construction of ROC-curves and the calculation of the area under these curves (AUC) were used to assess a quality (reliability) of the prediction. AUC value increased from 0,5 for the initial BN to 0,9 after selecting of signi_cant parameters using the GWAS method or the Pearson one. A further increase in AUC to 0,99 and decrease in the number of prognostic parameters to 150 was performed using Bayesian network optimization with respect to the number of parameters-nodes. Here the optimized function was value of AUC. The ambiguity of obtaining prognostic parameters at various ways of initial reducing the number of network nodes using the methods of GWAS and Pirson is shown. Low values of AUC were obtained for an independent control group of patients, despite very good results on the quality of the predictions, which were obtained on the training set. Further application of the proposed methodology is possible after the substantial reduction of the number of SNPs on the base of the analysis of the respective molecular mechanisms.
Keywords: GWAS; LDL-C; HDL-C; SNP; bayesian networks.