C3po Content Analysis

1152 Words3 Pages

2.1 C3PO

Figure N-1. C3PO logo

Clever, Crafty, Content Proling of Objects (C3PO) is a software tool, which enables a detailed content analysis of large-scale collections [Petrov, Becker 2012]. For requirements of such a tool, we may refer to [OCLC paper by CD,LF,KD], saying that “a content profiler has the function of aggregating and analyzing content characteristics and producing a well-specified content profile that provides a meaningful and useful summary of the relevant aspects of the content. This component has to cope with large amounts of data in the content and support the watch and planning components by summarizing the important aspects to a content profile, exposed via the Content profile interface.”
Figure N provides a high level overview of processes happening in C3PO. The tool uses characterisation results of a digital collection from a repository as input, aggregates them and generates a profile of the content set in an automated manner. It outputs a detailed content profile describing the key characteristics of a collection. This content profile may be used by watch component Scout through C3PO API. Also, C3PO provides facilities for data export and further analysis of the content, such as helpful visualizations of the content characterisation results, partitioning of the collection into homogeneous sets based on any known characteristic.
Figure N: The steps in content profiling workflow

In the following subsections we will describe in details the components presented in Figure N.
2.1.1 Characterisation
Characterisation is a process of acquiring characteristics or properties about content in focus. More specifically characterisation could be split into 3 parts: identification. Necessary information about data s...

... middle of paper ...

...nt Plato and the watch component Scout. These tools altogether constitute The SCAPE Planning and Watch suite. The integration with Scout (REST API) offers the possibility to monitor the feature distributions of collections over time. By creating a historic profile from a collection, it is possible to watch its growth and changes in the distributions of key aspects such as formats over time. The integration with Plato uses an export of the content profile for the whole or a subset of a collection into a content profile. This profile identifies and describes the set of objects contained and provides a statistical summary of file format identification and important features extracted. Plato understands this profile and uses is to obtain statistics about the content set for which a plan is being created.
3.2.1 Update with development in C3PO to be able to parse SB data

Open Document