Author/Authors :
Paola Bertolazzi، نويسنده , , GiovanniFelici ، نويسنده , , PaolaFesta ، نويسنده ,
Abstract :
SNPsarepositionsoftheDNAsequenceswherethedifferencesamongindividualsareembedded.The
knowledgeofsuchSNPsiscrucialfordiseaseassociationstudies,butevenifthenumberofsuch
positionsislow(about1%oftheentiresequence),thecosttoextractthecompleteinformationis
actuallyveryhigh.RecentstudieshaveshownthatDNAsequencesarestructuredintoblocksof
positions,thatareconservedduringevolution,wherethereisstrongcorrelationamongvalues(alleles)
of differentloci.ToreducethecostofextractingSNPsinformation,theblockstructureoftheDNAhas
suggestedtolimittheprocesstoasubsetofSNPs,theso-calledTagSNPs,thatareabletomaintainthe
mostoftheinformationcontainedinthewholesequence.Inthispaper,weapplyatechniquefor
featureselectionbasedonintegerprogrammingtotheproblemofTagSNPselection.Moreover,totest
the qualityofourapproach,weconsideralsotheproblemofSNPsreconstruction,i.e.theproblemof
derivingunknownSNPsfromthevalueofTagSNPsandproposetworeconstructionmethods,onebased
on amajorityvoteandtheotheronamachinelearningapproach.Wetestouralgorithmontwopublic
data setsofdifferentnature,providingresultsthatare,whencomparable,inlinewiththerelated
literature.Oneoftheinterestingaspectsoftheproposedmethodistobefoundinitscapabilitytodeal
simultaneouslywithverylargeSNPssets,and,inaddition,toprovidehighlyinformativereconstruction
rules intheformoflogicformulas.
Keywords :
Feature selection , Tag SNPs selection , Set covering heuristics , Logic programming