Volumetric Data Classification: A Study Direct at 3-D Imagery
This thesis describes research work undertaken in the field of image mining (particularly medical image mining). More specifically, the research work is directed at 3-D image classification according to the nature of a particular Volume Of Interest (VOI) that appears across a given image set. In this thesis the term VOI Based Image Classification (VOIBIC) has been coined to describe this process. VOIBIC entails a number of challenges. The first is the identification and isolation of the VOIs. Two segmentation algorithms are thus proposed to extract a given VOI from an image set: (i) Volume Growing and (ii) Bounding Box. The second challenge that VOIBIC poses is, once the VOI have been identified, how best to represent the VOI so that classification can be effectively and efficiently conducted. Three approaches are considered. The first is founded on the idea of using statistical metrics, the Statistical Metrics based representation. This representation offers the advantage in that it is straightforward and, although not especially novel, provides a benchmark. The second proposed representation is founded on the concept of point series (curves) describing the perimeter of a VOI, the Point Series representation. Two variations of this representation are considered: (i) Spoke based and (ii) Disc based. The third proposed representation is founded on a Frequent Subgraph Mining (FSM) technique whereby the VOI is represented using an Oct-tree structure to which FSM can be applied. The identified frequent subtrees can then be used to define a feature vector representation compatible with many classifier model generation methods. The thesis also considers augmenting the VOI data with meta data, namely age and gender, and determining the effect this has on performance. The presented evaluation used two 3-D MRI brain scan data sets: (i) Epilepsy and (ii) Musicians. The VOI in this case were the lateral ventricles, a distinctive VOI in such MRI brain scan data. For evaluation purposes two scenarios are considered, distinguishing between: (i) epilepsy patients and healthy people and (ii) musicians and non-musicians. The results indicates that the Spoke based point series representation technique produced the best results with a recorded classification accuracy of up to 78.52% for the Epilepsy dataset and 84.91% for the Musician dataset.