Projects

'01, '02, '03, '04, '05, '06, '07, '08, '09). In these projects we have covered topics like for example Information Retrieval, Clustering and Classification as well as the usage of multimedia data and different metadata formats.

Projects 2009

Service Oriented Architectures for Navigation and Search in heterogeneous Repositories

The goal of this project is the development of visual tools and maintainance of the KD-Service webservice. Based on the existing KD-Service framework a client application has been developed which allows a user to interact with a series of visualisations of textual documents. To achieve this, the existing Knowledge Discovery Application has been integrated into a new application that communicates with the KD-Service backend. The KD-Service backend itself has been modified in many different areas, whereas the conversion of file formats into a textual representation and the handling of metadata are the most promient ones. Additionally the KnowMiner and other modules, for example the Information extraction framework, were modified to gain new features and to fix bugs.

top

Semantic-based Cross-Lingual Relationship Discovery in Media Repositories

In 2009, design studies for visualizations to be integrated with the Wissenswelten module of the APA online presence have been developed and evaluated. An advanced visualization of the relationships between political actors has been developed and integrated in the APA-Labs platform. In addition, several modules for the future blog analysis platform of APA have been developed and evaluated.

top

Visual Navigation and Manipulation of Process Knowledge Repositories

Our collaboration with AutomationX involves the development of various components for the visualization and manipulation of process-knowledge-spaces that are applied in AX Indigo 5. The problem-space allows for the exploration of different aspects of visualization, e.g. the simultaneous processing and rendering of thousands of objects within the process-knowledge-space or the integration of Unicode text rendering in 3D engines. A multitude of techniques is explored and applied like scene graphs or spatial partitioning that allow the visualization of highly complex knowledge-spaces when combined.
top

Extraction and Analysis of Semantic Patterns in Knowledge Repositories

The goal of the project is the development of an integrated, flexible system for supporting searching and analysis of heterogeneous, semantically integrated patent data bases. The sys-tem should simplify and accelerate information seeking processes and offer both automatic and visually supported means of analysis. In this way a considerable improvement in the effi-ciency of business processes should be achieved. Technically the project is based upon the integration of m2n Intelligence Management Frame-work, which is responsible for graph based control of application logic, communication and heterogeneous data sources, and the KnowMiner framework which provides knowledge discovery (clustering, classification, information extraction, projection etc.), retrieval (full-text and associative search) and support functionality (such as knowledge discovery optimized persistence).

top

Cross-Modal Visual Analysis

In the course of the project machine learning methods should be implemented and analysed to classify textual data. At the ICG appropriate classification algorithms have been developed which should be applied to the textual domain to test the performance of cross-domain classification. The goal was to analyse the applicability of those algorithms to problem settings in the text domain and to conduct comparative studies with standard text classification algorithms. A Gradient Decent Boosting Algorithm developed by the ICG, “GradBoost” was integrated in the Know Center machine learning framework (MLA). Additionally, the Know Center developed standard supervised classification algorithms (AdaBoost, Maximum Entropy, naïve Bayes, Decision Trees). Those algorithms were applied as weak learners for the GradBoost. All algorithms were evaluated on the Cluto datasets. As a baseline served a standard implementation of an text classifier, AdaBoost, to which the GradBoost (optimized to image data) should be compared to. The results revealed that the performance of the GradBoost strongly depends on the weak learner. Decision stumps (weak learner used for the image domain) resulted in a weak performance on textual data. On the contrary, with naïve Bayes and Maximum Entropy as weak learner GradBoost outperformed AdaBoost. In the course of the project, also extensive parameter studies have been conducted in order to improve the GradBoost performance In the course of a PhD thesis at ICG they developed an image classification of Web data whereas they also considered the HTML text around the images. Therefore, a knowledge transfer from Know Center to ICG supported them in respect to text classification (bag of words concept, vector space model, features, algorithms). Resulting from this cooperation, we agreed to perform such a classification together and to classify blogs that also contain images – that is to exploit visual features in addition to textual features. Further investigations should be made also in respect to fraud, information quality and credibility.

top

Dynamic Ontology Matching

The goal of the MIMOS – Know-Center cooperation project is the development of semi-automatic methods for ontology mediation. In ontology mediation appropriate concepts from different ontologies are matched with the goal of enabling interoperability between these ontolgies. A client-server based solution, the Semantic Mediation Tool (SMT), should serve as a proof-of-concept. The server component is basically a collection of algorithms for automated discovery of mappings between concepts. A visual client offers navigation and inspection capabilities allowing for accepting or declining of suggested mappings. In the following project phase the focus shall lie on development of matching algorithms which will be evaluated and compared to the state-of-the-art. Both the quality of mappings and the scalability to large data sets shall be addressed. Also, the implemented client-server solution shall be tested and enhanced to reach production-level stability and quality, whereby evaluation and improvement of the user interface should also play a role.
An additional goal of the project was to provide a KnowMiner server package with knowledge discovery functionality available through the public Knowledge Discovery API (KD-API). For demonstration purposes a client, the “Knowledge Discovery App” (KD-App), is supplied which exposes most of the KnowMiner functionality through a visualization-based GUI.

top

Topic Seismograph

The User Generated Content (UGC) Seismograph project aimed to provide for Kabel TV Wien, a local information provider, a pilot system for analysis of UGC with regional focus. Within the project a number of Web sources, such as Standard Online and ORF Online, have been crawled and analyzed, whereby the focus lies on comments and blog entries. Within 4 months about 750.000 news entries, blog entries, and comments have been extracted in extreme high quality. The crawling system incrementally observers 15 UGC sources and loads about 80.000 unique entries a month. The analyze system, based on Know-Miner (4.3) and KD Visualisation (4.3), has been in-stalled on different windows clients and the Kabel TV Wien users are able to visually analyze the UGC Data in depth. Searching the corpus and focusing on specific aspects are thereby also possible as hierarchically, in terms of topic and time, approaching the data.

top

Semantic Integration of Knowledge Bases

The increasing availability of open sources on the Worldwide Web confronts the producers of encyclopedic content with new challenges. The quality of commercially generated and updated content remains a unique selling point. However, with the advent of the semantic Web and Web 2.0, users expect content to be embedded and interlinked with other known sources and to be updated with a frequency which traditional encyclopedia production processes are ill-suited to match. In this project, encyclopedic knowledge based produced by the partners have been semantically enriched and interlinked with existing, external content. To this end, semi-automatic harmonization techniques and mashups were applied. The expected benefits include increased actuality of content and an embedding of content in the worldwide knowledge context.