Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Managing data mining at digital crime investigation

2005, Forensic Science International

Forensic Science International 146S (2004) S37–S38 www.elsevier.com/locate/forsciint Managing data mining at digital crime investigation Kadir Ozkan* Police Crime Laboratory in Istanbul, Vatan Cd. A. Blok Kat:7, Fatih, Istanbul, Turkey Available online 14 October 2004 ‘‘Data Mining’’ has been loosely defined, by Daniel Barbará and Sushil Jajodia, as the process of extracting information from large amount of data. This is a general term with no concern with computers, digital systems, etc., however, technical definitions are concern with them. For instance, according to Swift’s definition, data mining is a discovering process of previously uncovered information by means of a database, datawarehouse which includes a great deal of diverse data and this process is used for planning and decision-making. Both of the definitions are important for digital investigation and are applicable too. However, second definition needs a database/datawarehouse and a computer program while the other needing only an education program of a standard or a few. Data Mining dates back to 1990s and uses the techniques of three scientific areas mainly: artificial intelligence, statistics and databases. Among early applications of data mining, there exists, finance, marketing, etc. but relatively new applications of ‘‘security’’ began to be implemented. Data mining can have two different kinds of implementations to the field (security): 1. Precrime implimentations. 2. Postcrime implimentations. The first of these two, generally aims prevention and estimation while the other aiming discovery of the evidences. Although beginning to be a popular concept, applying data mining before a crime is committed can be risky and not just because, for example, neural networks learns and changes its own patterns and this process produces a non* Tel.: +90 212 636 17 77; fax: +90 212 636 23 97. E-mail address: kadir.ozkan@kpl.gov.tr, ykadir_ozkan@hotmail.com. 0379-0738/$ – see front matter doi:10.1016/j.forsciint.2004.09.012 human-controlled output. A good example of a code can shed light to the problem: IF AND AND AND THEN RECOMMENDED ACTION social security number issued 89–121 days ago two overseas trips during last 3 months license type = Truck wire transfers 3–5 target 71% probability detain for further investigations report on findings to this system See [1]. This code basically recommends a police officer to detain a suspect because that ‘‘he is a lorry driver, got his social security ID in last 3 months and made two overseas trips during that period’’. It is arguable to teach police officers not to have any prejudge and use such a program although a meaningful sentence is well known by anybody involving justice: ‘‘Everybody is innocent, unless it is proven that they are not!’’ But, using such automatic systems before a crime is committed may mean to read the sentence backwards: ‘‘Everybody is suspect unless it is proven that they are not!’’ In order not to make strategic mistakes, it is wiser to argue the judicial problems of precrime implementations more and conduct implementations that can be used after a crime committed because they aim to reach the evidence and the possibility of making a strategic mistake is very low. Because of being a high-tech tool, data mining may have very diverse uses of post-crime investigations and one of them is digital crime investigation. Digital Crime Investigation can be defined as ‘‘the effort of discovering the crime and the criminals by means of digital evidences’’ in short and be divided into three: S38 K. Ozkan / Forensic Science International 146S (2004) S37–S38 1. Seizure and preservation of data which may contain evidences. 2. Finding the valuable data among many. 3. Making evidence-criminal connections and reporting discoveries before the court of law. Between these three, the first and the last are well described and more technical than the second one. But, although being one of the most important parts of the investigation, ‘‘finding the valuable information among a bulk of data has not been discussed among the forensic scholars enough and the country-wide applications of this step varies worldwide. This effort (stage) is very hard in fact because:  There is a huge amount of data and only some of it is information.  Only a little bit of the information is evidence.  There is a limited time for the investigation.  The process is so tiresome.  For investigators, there are a lot of hardware and software to learn about.  Some of the information in storage devices may be hidden in plain text, concealed by using steganography or cryptography.  Sometimes the equipment must be examined on premises.  Some of the files or BIOS may be password protected.  . . . etc [2]. Because of the above factors, we can say that digital investigation process needs automatic systems much and data mining can assist digital crime investigators to reach digital evidences. Data mining uses mainly five skills. Neural networks are one of them and are modeled after the human process of learning and remembering. They mimic the cognitive neurological functions of the human brain and can predict new observations from historical samples. In order to use neural networks for that purpose, a detailed case information must be obtained and installed to databases so that these databases can be used to find similarities, correlations between the detailed case information and the detailed attributes of digital evidences. After a learning process, data mining may supply solutions on how to find evidences by using their locations, file names, extensions, sizes etc. But incase of the absence of solution, digital investigators should have a well-described standard to use when dealing with digital devices. In addition to this, a list, which describes detail levels of digital crime investigation should be documented and distributed to all of the police departments. When a police department begins an investigation involving digital machines, should mark the case file with one of these detail levels in order to make digital crime laboratory work more effectively and efficiently. Moreover, because of being a computer technician, a digital crime investigator may not update himself with new codes of law or new kinds of crime (which is a general rule of computer crimes) or modus operandi and to get rid of this problem, the collaboration, assistantship of involving police department is needed. If the case is marked with high levels of detail, a police officer from that department should present when the search is conducted. In these circumstances, it will be easier, more professional to locate the evidence and will prevent the failure of reaching digital evidences that are existed but could not be found. By this way, if the search process is not needed more, will not be examined more and when examined will be in an effective way. References [1] M. Jesus, Investigative Data Mining for Security and Criminal Detection, Butterworth Heinemann, USA, 2003, p. 13. [2] E. Casey, Digital Evidence and Computer Crime, Academy Press, London, 2000.