Exploiting the Semantic Web to Represent Information from On-line Collaborative Learning

In this paper we propose a framework for modeling, representing populating and enriching information from online collaborative sessions within Web forums. The main piece of the framework is an ontology called Collaborative Session Conceptual Schema (CS) that allows for specifying collaborative sessions. The paper describes the information this ontology needs to know, the alignment of the ontology with the ontologies of relevant specifications, how the ontology can be automatically populated from the data existent in forums, and how to model such data about what is happening during the collaboration by using a dialogue-based model. This model is based on primitive exchange moves found in any forum posts, which are then categorized at different description levels with the aim to effectively collect and classify the type and intention of the forum posts. An experiment has been conducted to assess the validity and usefulness of the presented approach. The research reported in this paper is currently undertaken within a FP7 European project called ALICE.


Introduction
There has been a great effort in the Semantic Web community in order to provide specifications, standards and ontologies to facilitate semantic processes in the web. 1 In the particular field of e-Learning, quite a few ontologies 1,2 and related standards 3 concerning the representation of the on-line collaborative learning processes have been defined so far.Representative approaches 4 propose to use the actions performed in the collaborative learning system to build a high-level representation of the process of collection and analysis of the interaction data.As Ref. 2 states, with a welldefined ontology structure, collaborative learning can accumulate the knowledge representation of learning objects and their use, including participant background, instruction designs, learning activities and outcomes, etc.
In addition, the use of new virtual learning environments, such as Personal Learning Environments, 5 provides a new paradigm of learning, where all the information generated during the learning process is digitally stored.In that context, specifications that allow for representing such information unambiguously are really important, as well as automatic algorithms that use such information in order to produce powerful and useful services that greatly improve the learning process and experience.
In order to support the specification of the collaborative activities that occur during the learning experience we can use some of the current specifications and standards that exist in the field of e-Learning, such as Learning Object Metadata (LOM) i , Dublin Core ii , Friend Of A Friend (FOAF) iii and Semantically-Interlinked Online Communities (SIOC) iv .
The aim os this work is to leverage these standards as much as possible with the purpose of modeling and representing information of collaborative learning activities in the context of online forums.We then collect and classify the interaction occurring and registered in the context of online forums according to the classes and relationships of our ontology.This fact can significantly improve the way a collaborative system used for learning and instruction can collect all the necessary information produced from the user-user and user-system interaction in an efficient manner. 6The ultimate aim of this approach is to provide an efficient and robust computational methodology that enables the effective collection and classification of data as part of the research undertaken in a FP7 European project called ALICE v . 7he paper is structured as follows: Second section overviews domain ontologies, standards and specifications relevant in the context of on-line collaborative learning.Third section presents a framework for representing the information related to collaborative learning activities that have been conducted within online forums.The framework includes an ontology that has been aligned with the most relevant specifications of the domain and a layer that allows for populating such ontology from Web forums in an automatic way.Fourth section shows how the information generated in collaborative learning forums can be captured and classified at several description levels.Later, the experiment conducted to validate the approach is presented and the validation is addressed in sections 5 and 6.Finally, last section concludes the paper by summarizing the main ideas and outlining ongoing and future work.

Domain Ontologies, Specifications and Standards in the context of Collaborative Learning
Nowadays we can find thousands of different domain ontologies.Finding the right ontology has become a challenge and several search engines, such as swoogle, 8 have appeared in order to facilitate the ontology search.
Representative approaches in the field of e-Learning include Ref. 2 that use a combination of a general domain ontology describing the common semantics needed for the implementation of a collaborative environment with several domain ontologies that are used to provide a framework for end-user tools.Ref. 4  proposes to use the actions performed in the collaborative learning system so as to build a high-level representation of the process of collection and analysis of the interaction data.In Ref 9 a theory-oriented interaction analysis approach based on theories of v ALICE project web site: http://www.aliceproject.eucollaborative learning is provided.However, the social processes happening behind real collaborative learning practices are very complex and subjective, and thus they fall far from a holistic view proposed by standards and ontologies. 10As Ref. 2 states, with a well-defined ontology structure, collaborative learning can accumulate the knowledge representation of learning objects, including participant background, group information, instruction designs, learning activities, learning outcomes, etc.
A further innovation presented in this paper is the incorporation of a machine-learning approach 11 to automatically analyze the interactions taken into account by users within collaboration.The idea is to learn the relationship between a set of contributions types and the perceived intention of their authors and improve the ontology information with such information.As far as we know, the automatic evaluation of online discussion contributions has been little investigated.Quite a few research studies 12,13,14 show a first step towards this direction by combining several quantitative analysis about threaded discussions.
In order to specify the collaborative activities that occur during the learning experience, we use some of the current specifications and standards.Some of the most relevant standards are described in the next sub sections.

Semantically-Interlinked Online Communities
The SIOC 15 initiative (Semantically-Interlinked Online Communities) aims to enable the integration of online community information.Online community sites (weblogs, message boards, wikis, etc.) contain a valuable source of information and are candidates to be searched when we need some information.However, online community sites are like islands without bridges connecting them.SIOC is an attempt to link these online community sites, by using Semantic Web technologies to describe the information that communities have about their structure and contents, and to find related information and new connections between content items and other community objects.
SIOC provides a Semantic Web ontology for representing rich data from the Social Web in RDF: the SIOC Core Ontology.It is the foundation for Semantically-Interlinked Online Communities and can be used to express information contained within community sites in a simple and extensible way.The

Published by Atlantis Press
Copyright: the authors 654 Downloaded by [Jordi Conesa] at 00:04 07 August 2012 Representing Collaborative Learning Information SIOC ontology is published as a W3C Member Submission vi .SIOC is commonly used in conjunction with the FOAF vocabulary for expressing personal profile and social networking information.SIOC enables semantic applications to be built on top of the existing Social Web.

Friend Of A Friend
FOAF 16 stands for Friend Of A Friend and is an specification that describes a language devoted to represent the linking information of people and information using the Web, regardless of where the information is or the format the data is structured with.FOAF is a recommendation of W3C in constant evolution since its creation (mid-2000).It has a stable core of classes and properties that will not be changed, while new terms may be added at any time.
FOAF descriptions are published as linked documents in the web, by using RDF/XML or RDFa syntax.In its descriptions, there are only various kinds of things, which are called classes; and links, which are called properties.FOAF allows for describing people, groups and documents.Main FOAF terms are grouped in the following categories: Core: describes characteristics of people and social groups that are independent from time and technology.Social Web: describes internet accounts, address books and information related to social web activities.Linked Data Utilities: FOAF is part of the Linked Data community vii and, therefore, needs to establish a simple factual data via a networked of linked RDF.

Simple Knowledge Organization System
The purpose of SKOS is to support the use of knowledge organization systems within the context of the Semantic Web.It is currently a W3C recommendation and its specification 17 is stable and written in RDF.Due to the tight integration of SKOS with other W3C specifications, such as FOAF and SIOC, the use of SKOS is advisable when reusing W3C specifications.

Other Relevant Specifications and proposals
There are other specifications and ontological frameworks that could be used for representing information about collaboration in e-Learning, but they are too specific to be taken into account or are still too preliminary.
Perhaps the more relevant discarded ontological frameworks are MOAT, 18 which stands for Meaning Of A Tag, and is a framework that allows for giving semantic to tags, and mIO! Ontology network 19 , which defines a network of ontologies for representing knowledge related with the user context.The use of MOAT would be useful to define and disambiguate the semantics of the tags used within forums by users.In addition, mIO! defines a set of ontologies to specifying the following information of users: where they are, what they like, what kind of services they can produce or consume, social information about them and what devices the user use to connect to Internet.
Other ontologies that could be taken into account in the specification of collaborative sessions are the ones describing communities of practice. 20,21,22However, these ontologies are more focused in the user than in the interactions the user makes.Therefore, they have resulted to be less useful than the actual W3C specifications that deal with online communities and user representation.

A Framework for Representing Information about Collaborative Learning Sessions
This section presents an ontology created with the purpose of representing information from collaborative sessions, how this ontology has been integrated with relevant specifications such as SIOC and how the ontology can be automatically populated from web forums.The created ontology is named Collaborative Session Conceptual Schema (CS 2 ) and aims at representing the collaborative sessions that several actors have enjoyed in their learning experiences within forums.SIOC and other specifications define some concepts that are relevant in the ontology domain.In order to improve the generalization of our approach and provide operability with the most prominent standards and specifications we should align the RDF version of our ontology to the RDF versions of such standards/specifications.With that objective in mind, CS 2 can be stored or imported from files in CSML

Published by Atlantis Press
Copyright: the authors format (Collaborative Session Markup Language), which is in turn based on the RDF representation for the CS 2 ontology but aligned with SIOC ontology.The objective of this research is to create an ontology that can be populated from either forums or the specification of forums written in some of the aligned formats, such as SIOC.In order to do so, we implemented some facilities that help the process of converting the information of particular kind of forums to CSML format.Figure 1 shows the CS 2 ontology in the context of the framework used to facilitate its population from forums and other specifications.
Next section describes the created ontology.Thereafter, the RDF representation of CS 2 is presented, giving information about its alignments with SIOC, FOAF and SKOS concepts and relationships.Finally the layer used to automatically populate the ontology is described.

CS 2 Ontology: the conceptual model
Despite an obvious alterative was to use SIOC directly, extending it with the classes, relationships and constraints that are necessary to fully represent collaborative sessions, SIOC specification was found to be quite more complex than what is needed for representing collaborative sessions from web forums.Hence, the decision was to create a more compact ontology, the above mentioned CS 2 , by reusing the definitions of SIOC ontology relevant to our domain when possible.
The decision of creating a new ontology, rather than using SIOC directly, came after a thorough deliberation in which the following aspects were taken into account: Simplification, a more concise ontology will facilitate its understanding, its use and will allow to deal with its knowledge more efficiently.Extensibility, having an ontology closer to the learning context will facilitate its extension with information related to different learning aspects, such as the learning objects used by users, or the learning paths and school curriculum the users follow.
Reusability, In order to maintain compatibility with SIOC, the CS 2 data model can be imported/exported from/into SIOC.
Then, collaborative sessions from our framework can be accessed using SIOC, and SIOC data can be accessed by our system using the CS 2 ontology.Therefore, the fact that we are using an internal different model becomes transparent to SIOC users.Compatibility, having an own ontology is a way of controlling that the ontology will be valid in the future, even when related specifications are changed.Then if SIOC evolves and, as a consequence, some of its classes become deprecated (as already happened with the User class), only the rules that define the alignment of the ontology with SIOC must be updated.But the CS 2 ontology and the processes that use its information will not need any update.
As aforesaid, CS 2 represents information about collaborative sessions.A collaborative session in e-Learning settings can be seen as a set of activities performed by several users playing several roles to achieve a common result.We are especially interested in the collaborative sessions that occur in a virtual environment, such as chats or forums.As we can see in Figure 2, the main entity of the CS 2 ontology is CollaborativeSession, which occurs within a site.A list of users, represented by the class UserAccount, can collaborate in the session with different roles.Each  Representing Collaborative Learning Information piece of communication within a session is represented by a post, which is created by one of the collaborators and may be categorized.The posts are related between them in a threaded structure through the replies relation.

Aligning CS 2 to other relevant specifications
As aforesaid, CS 2 should be aligned with SIOC specification so as to contain some of the elements defined on this and other related specifications, such as FOAF or Dublin Core.As a RDF version of the CS 2 ontology, we decided to define a specific and purposedelimited language for representing collaborative sessions and aligning these sessions with the classes and properties from SIOC that are useful to the representation of collaborative sessions, called Collaborative Session Markup Language (CSML).In order to maintain compatibility with SIOC, the CS 2 conceptual model can be imported (and exported) from (and into) CSML.Table 1 presents the SIOC classes that have been found relevant to represent collaborative sessions and therefore have been aligned to CSML.As can be seen in table 1, SIOC Core ontology includes classes relevant to our purpose of modeling data coming from collaborative learning sessions (e.g.discussion forums), such as Forum, Item, Post, Thread, and so on.In addition, SIOC types ontology defines classes that represent different kinds of Containers, Forums, Items and Posts.Some examples of these classes are: Address Book, Image Gallery, Wiki, Chat Channel and Message Board.For the sake of simplicity, we will work at the level of forums and posts at this stage, but will consider using their subclasses in further versions.The SIOC types ontology also includes two classes used to define post topics: Category and Tag.Category is defined as a subclass of a SKOS Concept.
For simplicity, literal topics will be used in CSML, but having the possibility to extend the language in the future with these two classes.On the other hand, User class from SIOC specification is not used as it is deprecated: UserAccount is used instead.SIOC specification contemplates using some elements from other ontologies, such as FOAF or SKOS.Moreover, some of the SIOC elements are defined as subclasses or sub properties of ontologies like FOAF or Dublin Core.Again, for the sake of simplicity, CSML properties will adjust to the subset of SIOC properties defined in table 2. Therefore, we need to include in CSML concepts from other ontologies that will take into account these new mechanisms, such as the class Person from FOAF that can help describe elements of UserAccount type with properties such as firstName and lastName, and also some elements from Dublin Core Terms that must be considered to include (e.g.date, title, description or subject).We can see in figure 3 the main classes and properties of SIOC Core Ontology aligned with CSML.Note that, in order to improve readability, we have omitted inverse relationships in figure 3. Therefore, it should be assumed that there is an inverse relationship for each of the relationships drawn on the figure.

Populating the CS 2 Ontology from forums
We now proceed with filling the ontology instances with the appropriate data collected and classified during the collaboration.As it is fully explained in Section 4, this data will be afterwards transformed into useful knowledge about what is happening during the collaboration by means of analysis techniques.
To this end, we base the data collection and classification into our ontology on the interaction occurred and registered in the context of online forums.The focus is on student interaction among peers driven by posts in online forums, which is the cornerstone of this approach.Participants need indeed to interact with each other to plan an activity, distribute tasks, explain, clarify, give information and opinions, elicit information, evaluate and contribute to the resolution of problematic issues, and so on.
The proposed Architecture defines a layer of converter components (see figure conversion layer of figure 1), each of which converts collaborative session data from different web forums into instances of CS 2 representing the same knowledge.Each converter will map the data from the corresponding data source into CS 2 entities, which at the end would be stored into a CSML.As we can see in figure 4, converters are defined as black boxes with a common interface that provides basically two interaction points: Available Collaborative Sessions, which returns a list of available collaborative sessions on the data source to convert.It does not return all data from collaborative sessions, only descriptive information.
Read Collaborative Session, which, given a collaborative session identifier, returns all the information of the corresponding collaborative session in the CS 2 ontology.The conversion process done by each specific converter component can be viewed as a mapping between two data models (original data source schema and CS 2 ) following a set of predefined mapping rules.These rules will vary depending on the converter being developed.

Representing Collaborative Learning Information
In order to exemplify the conversion process, table 3 presents an excerpt of the rules that allow populating CS 2 from a specific forum of a e-learning system called Intelligent Web Teacher (IWT) 38 .These rules have been defined as a mapping between CS 2 and the data model of the IWT forum.For a given forum with identifier idF, chosen after the call of the service available collaborative sessions, the conversion rules of table 3 are executed for selecting the data from IWT database to be converted.First, the Board containing the forum with the identifier idF is converted to a Site concept of CS 2 .Then, the forum with code idF is converted as a CollaborativeSession, and a site relationship is created with the selected Site.Thereafter, the topics of the forum are converted to Categories and related to the CollaborativeSession created.In order to maintain the threaded structure of IWT forums, the messages of each topic are selected and converted to Post concepts using a preorder selection algorithm.Finally, the users who are assigned to the selected Forum are converted to UserAccount and related to CollaborativeSessions.Also, the authors of the selected messages are selected and related to the authored Posts.Finally, the Group of the users that are assigned to the selected Forum, are selected as the Role of the user.
As shown in table 3, the conversion process is not deterministic.Some elements can be converted in different ways (in the example the creation date of a post can be extracted either from the attribute posted or edited), the semantics of the forum and the ontology may be slightly different (the roles of users may mean different things in different kind of forums).Also, data from forums may not be enough accurate (in the case of the example, IWT forums does not allow to identify what role a user has used to create a post; that is problematic if we take into account that users may have multiple roles).The problems that can appear when creating a new converter depend on the internal representation and the functionalities of each forum and can only be foreseen after a deep study of the data model of the forum to translate.Up to now, the responsible for the mapping process and resolving the conflicts that appear has been the designer of the converter.However, as more converters are created and more information is gathered about the potential problems and solutions some systematization is expected.
Although all converters will have a common structure or share the same tasks (read data from data source, and create instances of CS 2 entities), the implementation of each one is dependent from the type of data source being used.So far, converters for the Web forums depicted in figure 1 have been implemented for reference and validation purposes.It is planned to build converters for other forums.

Modeling Collaborative Interaction Data at different Description Levels
This section presents a methodological approach for modeling interaction data from collaborative learning activity that can be used in Web forums (see Ref. 23 for a complete description of the approach).The aim is to improve the available information represented by the ontological approach presented in the previous section.
The model proposed here is based on the integration of several models and methods: the Negotiation Linguistic Exchange Model; 24 a model of Discourse Contributions; 25 and the types of learning actions underlying a participant turn. 26The structure of a long interaction is constructed cooperatively by using the exchange as the basic unit for communicating knowledge.Following Ref. 24, three general exchange structure categories are considered (see also table 3): give-information exchange, elicit-information exchange and set-up--an-issue exchange, which consist of

Published by Atlantis Press
Copyright: the authors 659 Downloaded by [Jordi Conesa] at 00:04 07 August 2012 different types of moves 27 and describe a generic discourse goal.More specifically, the goal of the actor who initiates the give-information exchange is to inform his/her partners about a certain situation with the aim to change the partners' mental states.Informing includes moves that explain, give an opinion, describe or remind a situation in different ways.The actor goal of the elicitinformation exchange is to elicit the partners' state of mind (knowledge, beliefs, attitude, desire or abilities) of a situation, in which the actor is not aware or certain about.The actor goal of the set-up-an-issue exchange is to raise an issue (a problem or question) to be resolved by the participants by the corresponding providesolution and consent solution exchanges, which cause to explore their state of mind (knowledge, beliefs, etc.).
Based on the work of Ref. 26, 28 and 29 partners are involved in a process of realizing a number of learning actions which lead to the completion of the exchange goal.Each move type captures and controls the evolution of the learning action performed by a participant by setting the expectations of the type of learning actions which has to be realized next by the other participants so that the goal set by the initial move can be accomplished.
Completion of an exchange expresses the mutual beliefs of all participants about the accomplishment of its discourse goal.Moreover, it implies the achievement of a certain degree of knowledge building and distribution among the different participants.This degree can be deduced and measured by exploring the principal interaction indicators proposed by this model.For each participant the model measures: the total number of moves created, his/her participation behavior (proactive, reactive, supportive, or passive), the effectiveness and impact that each move has in the discourse and in the achievement of the current discourse goal, as well as the evaluation of the move content and significance by his/her peers and the tutor.
Consequently, interaction analysis takes into account both the way the interaction is structured and the types of contributions or posts, which are represented by the ontological approach presented in the previous section and particularized in table 4. The analysis results yield very useful conclusions on aspects such as individual and group working, dynamics, performance and success, which allows the tutor to obtain a global account of the progress of the individual and group work and thus to identify possible conflicts and monitor the whole learning process much better.Table 4. List of moves (turns) and cards to classify a discussion contribution.To manage and provide adequate information and knowledge from collaborative learning tasks in a computational manner, we propose three separate, necessary steps of the process to manage information and knowledge from collaborative learning activity: collection of information, analysis and presentation.The entire process fails if any one of these steps is omitted. 6ased on the linguistic model described so far, the first step of this process is to structure and classify the source information available in Web forums (e.g., users, posts,

Published by Atlantis Press
Copyright: the authors 660 Downloaded by [Jordi Conesa] at 00:04 07 August 2012 exchange types, etc.).All this information is then modeled and represented in our CS 2 ontology and it can be, in the second step, analyzed in order to extract knowledge about the collaborative learning and the participants.Finally, in the last step, this knowledge is eventually presented to participants either in real time (to guide directly students during the learning activity) or after the task is over (in order to understand the collaborative process).
Next, we describe the first step of this process and how the source information is structured and classified before being represented by our ontological representation.The last two steps of the process are assumed by the specific methodology and software supporting the collaborative learning and hence they are out of scope of this paper (see Ref. 23 for a full description).

Collection of information
To satisfy course evaluation requirements, discourse contributions need to be evaluated as effectively as possible in terms of quality and usefulness.Evaluation of hundreds of contributions and the relations among them in a multi-member discussion can be a tedious task for tutors and should be adequately supported.Moreover, self and peer evaluation should be also encouraged and facilitated by intuitive means.
To this end, in this section, we first provide an intuitive procedure to manually qualify the exchange type of interactions.This manual procedure is then replaced with a machine-learning approach to automatically classify posts (see Ref. 30).Similarly, peer manual evaluation could be also replaced with an automatic rating system.

Manual procedures
A set of certain thematic annotation cards based on the general exchange types identified previously can be considered for qualifying each exchange move in the discussion processes, namely give-information, elicitinformation and raise-an-issue.Consequently, participants are urged to qualify their contributions by using these annotation cards before sending a new or reply post (see figure 5 for a software representation).
In order to avoid unnecessary choice, each context of the discussion process determines a precise and short list of just those categories that are possible in a certain point of the discussion process (e.g., in replying any kind of request, just the cards involving the provision of information are provided to classify the reply).This makes the choice of the appropriate tag shorter and easier.Finally, the participant is required to commit certain actions to indicate s/he has read a certain post, such as send a reply and assent the contribution.The aim is both to provide reliable indicators on the number of posts read and to promote the discussion's dynamics by increasing the users' interaction with the system (see Ref. 23 for a complete description and validation of the approach).

Automatic post classification
A further innovation for the reliable collection of data is to automate the manual post tagging (see figure 5) so as to both minimize error-prone of manual post tagging and release students of unnecessary choice.To this end, different kind of classification algorithms, such as the presented in Ref.31, can be used so as to learn the relation between a set of types of interaction and the perceived intention of the authors of these interactions.
We explore the possibility of automatically categorize the posts on the 6 different exchange moves described above (see table 4).Although the design of optimal classifiers is out of the scope of this paper, the proposed methodology would take benefit from a first categorization approach (see Ref. 32 for a complete description of the approach).
Following the similar work of Ref. 14, for each post, a feature vector is constructed using the following methodology: (i) first, a list with the total words present in all the posts is generated; (ii) from this list, we Fig. 5.A list of tags to manually qualify a contribution.

Published by Atlantis Press
Copyright: the authors removed the words that appear only once, in order to mitigate the effects of orthographic errors; (iii) using the resulting words, the frequency count of each word on each text is computed, obtaining a dimensional feature vector for each post.
The resulting data lies in a high dimensional subspace, hindering the posterior estimation of the classifiers parameters.In order to mitigate this drawback, a previous dimensionality reduction step is applied by using the Principal Component Analysis algorithm 33 to extract the first n components, which account for the most of the data variance.
Using the final n-dimensional feature vectors, a state-of-the-art SVM classification algorithm is applied to the obtained posts.Briefly, the SVM algorithm 34 learns a binary classifier (two possible classes, a positive one and a negative one) from the training data.This classifier consists of a separating hyperplane maximizes the classification margin.Thus, a new post x is classified in positive or negative class, according the following decision rule type Where (x 1 , …, x N ) are the training samples, (w 1 , …, w N ) are the parameters of the classifier, and K denotes a Kernel function (or the dot product in the linear case).In our problem of posts classification, the amount of data available is usually large and sparse, being the most part of the frequency counts 0. In this scenario, we opted for a non-linear version of the SVM classifier, based on the application of Radial Basis Function kernels (RBF-SVM), 35 2 2 2 exp ) , ( being a parameter that will determine the influence area that has the SVM over the data space.The extension of the SVM algorithm to multi-class problems (more than 2 classes) can be carried out by the oneversus-all strategy. 36n order to validate the automatic classification procedure, the following protocol was followed: the total amount of data was randomly split in a training (90% of the data) and a testing set (the remaining 10%).The amount of data from the different classes was balanced in the partitions.We used the training set to learn the RBF-SVM classifier, using a portion of this set to find the optimal sigma and C parameters.
The testing protocol was repeated 20 times, and the average accuracy obtained is 61.29% for the 6-class problem (±2.08% confidence interval at 95%).This preliminary result constitutes a promising initial attempt to automatic classification of posts from their content.Nevertheless, we plan as future work to improve this part of the methodology by exploring other classification strategies and data normalization techniques.

Experimentation
This section presents an experimental approach to evaluate the ontological framework presented previously in terms of completeness and usefulness by addressing the requirements of a newly created Virtualized Collaborative Session (VCS) system that enables the virtualization of collaborative sessions 37 .
The realization of this system is first reported from the requirements that conducted the development of a VCS prototype where our CS 2 ontology is embedded and populated with data that models and represents information coming from live collaborative sessions of different Web-based forums.The specification of the CS 2 data in CSML format inputs the VCS system in order to proceed with the virtualization process.An experiment in a real context of learning is then reported for validation purposes by showing a real collaborative activity supported by the VCS system.
The purpose of this experiment is to demonstrate that our ontological framework allows for representing the relevant information about collaboration sessions underlying the content of live discussions in any Webbased forum as well as provide all the necessary information for creating virtual collaborative sessions.The usefulness of the presented framework can be addressed by this experiment since it provides an integrated way of accessing collaboration data without taking into account from where they come (different forums) or in what format they are (forums of different kind).Therefore, without CS 2 , or an equivalent framework, the VCS could not be created.

Realization of the VCS system
We provide next the main guidelines for the realization of a VCS system (see figure 6 and also Ref. 37 for a complete description of the system).The main feature of a VCS system is to be compatible with different kinds of chats, forums and collaborative sessions in general.
For the sake of our experiments, we used two very different Web forums: the Discussion Forum (DF) 23 and the Intelligent Web Teacher (IWT). 38As an input of the VCS system, we used an XML file containing the collaborative session data in the CSML common format (see Section 3).The CSML specifies the information found in collaborative sessions from both Web forums.Fig. 6.Architecture of the VCS system, which is compatible with multiple forums by using specific converters.
The process of conversion between the two sources of collaborative session data and CSML was done by developing specific converters (see figure 6), which were different for each kind of source (i.e., the data models of both IWT and DF forums).Then, the VCS system processed data in CSML format and created a complex learning object named Storyboard Learning Object (SLO), 37 containing information about scenes, characters, and other artifacts used during the later visualization of this learning object.This information could be edited and played in a multimedia fashion in order to enable moderators and learners to observe the virtualized collaborative session in an interactive way.
Overall, the VCS transformed live discussion sessions into animated storyboards consumed by learners, sessions evolved ("animate") over time, and the ultimate end-user interactions were handled.As a result, the VCS provided an attractive learning resource so that learners became more motivated and engaged in the collaborative activities (see figure 7 and Ref. 37 for a full description).Fig. 7. Samples of sequence of storyboard scenes from the VCS prototype with a discussion evolving over time after the virtualization of two different live collaborative sessions performed in the IWT and DF forums.

Experience in a real learning context
The real context of this experience is the virtual learning environment of the Open University of Catalonia (UOC) viii .Given the added value of asynchronous discussion groups, the UOC have incorporated on-line discussions as one of the pillars of its pedagogical model.To this end, great efforts are being made to develop adequate on-line tools to support the essential aspects of the discussion process, which include students' monitoring and evaluation as well as engagement in the collaboration.
In order to evaluate the prototype of the VCS and analyze its effects in the discussion process, the sample of the experiment consisted of 81 graduated students enrolled in the course Organization Management and Computer Science Projects from the Computer Science degree at the UOC were involved in this experience.Students were equally distributed into two classrooms and participated in the experience at the same time.Students from each classroom were required to use standard text-based discussion forums to support the same discussion with the same rules during the same time.In addition, in one of the classrooms (experimental group) the standard forum was equipped with the multimedia-based VCS tool.In the other classroom (control group) the VCS was not available.
The in-class collaborative assignment in both groups lasted three weeks in the Fall term and consisted of discussing the same issue: "Factors that lead a Computer Science project to failure".In this assignment, each student was required to post one contribution at least on the issue in hand.During the discussion, any student could contribute as many times as needed in the discussion forum by posting new contribution, replying to others as well as start extra discussion threads to provide new argumentations with regards to the issue addressed.In addition, in one classroom, participants could follow the discussion also by the VCS.The aim was to evaluate the effects of the VCS system in the participation by comparing the activity levels of the discussion between the two groups.

Data elaboration and interpretation of the results
The data from this experience was collected by means of the web-based forums supporting the discussions in each classroom.Moreover, specific data from the interaction with the VCS system was also collected considering the following validation criteria (see Ref. Analyzing the results of table 5, it seems that by using the VCS the participation quantity is fostered since the number of posts is higher.On the other hand, the number of views (i.e., readings) of text posts are lower in the forum that the VCS has, pointing out that some of the students have seen in the storyboard as an alternative to text posts, which is confirmed by the data collected from the activity logs of the VCS.
Finally, participation quality is shown in terms of the number of words per post.The lower mean statistics of words per post in the experimental group may mean that the users of the VCS were more effective and dynamic when communicating their ideas by either new posts or reply posts.As a result, the contributions became more structured and specific whereas the control group promoted larger monolithic one-sided points of view.

Validation
The validity of the CS 2 ontology has been tested on three levels, namely correctness, completeness and usefulness.First, the correctness of the ontology has been verified using the reasoners available within

Representing Collaborative Learning Information
Protégé ix over CSML, thus allowing for determining that the ontology is well written on a formal level, which means that it contains no contradictions and therefore it can be instantiated.Second, the ontology completeness has been validated naively by the experiment presented in the previous section, showing how the ontology has been used to represent information of several real forums of virtual environments related to computer science subjects.As we have demonstrated with this experiment, the current CS 2 specification allows for representing the relevant information about collaboration sessions underlying the content of IWT and DF Forums.Hence, since the ontology has been able to deal with the relevant information of the forums we can state that it allows for representing the information of the domain we are interested in.Nevertheless, the completeness of the ontology cannot be validated formally as it is an ontology for open environments. 39inally, the usefulness of the ontology has been proved by a naive validation of participation in a collaborative learning activity supported by the VCS.The ontology information has been necessary for the VCS system since it has provided information of collaborative sessions without taking into account from where the information comes and what format it originally followed.The results show higher level of activity in the forum tool equipped with the VCS in comparison to the standard forum tool without the VCS.

Conclusions and ongoing work
This paper shows the current research work undertaken within a FP7 European project ALICE devoted to provide on-line collaborative learning with authentic interactivity, challenging tools and user empowerment with the ultimate aim to influence learner motivation and engagement during the collaborative activities.To this end, a new ontology called CS 2 based on SIOC and FOAF ontologies was created for modeling information from online collaborative sessions within Web forums.CSML was created as the representation of CS 2 written in RDF and was aligned to the elements from SIOC, FOAF, SKOS and Dublin Core that are relevant for representing collaborative learning sessions.
ix "The Protegé Ontology Editor and Knowledge Acquisition System" retrieved from http://protege.stanford.eduA methodology based on a dialogue model was proposed to modeling and representing the source of the collaborative interaction data in our ontology.The data collected and represented in our ontology is to be afterwards transformed later on into useful knowledge about what is happening during the collaboration.This later process is out of scope in the research presented in this paper.
The presented ontology was validated at several levels, mostly through an experiment that validates the completeness and usefulness of the CS 2 ontology in which the ontology was used as an input for the VCS system, which allows for converting the collaborative communications in web forums to storyboard learning objects.
The results achieved confirm that the proposed framework was useful in the creation of the VCS system, but presents some limitations that can limit its usefulness in other contexts, such as the consistency and completeness of the data it contains and the usability of the framework.Data consistency problems come from the conversion process provided that the responsible for the conversion process may populate the ontology inconsistently, dirtying and compromising its data.In order to alleviate this potential problem, several mechanisms may be proposed to guarantee that the convertors populate the ontology consistently, such as to identify a responsible that supervises all conversions and create extra integrity constraints to enforce the quality of the imported data.The current framework only stores basic information about collaboration from web forums.The framework can be extended to take into account new types of information, such as opinion and sentiment information, location information or background information, and from other sources, such as chats, tweets, etc.
Currently, we keep exploiting our CS 2 ontology and CSML language within the ALICE project so as to model and represent in common format collaborative learning data coming from different data sources in different academic subjects and programs.The aim is to leverage the benefits of collaborative learning in on-line settings of different nature and where collaboration is difficult to achieve.The ultimate aim is to provide advanced collaborative learning resources with authentic interactivity, user empowerment and challenge, thus positively influencing learner motivation and engagement into the learning process. 37

Published by Atlantis Press
Copyright: the authors 665 Downloaded by [Jordi Conesa] at 00:04 07 August 2012 Furthermore, further work is to expand the ontology to represent collaboration information from other sources, such as chats, blogs and wikis as well as consider some relevant information found in the schema of certain forums that should be included in future versions of CS 2 , such as post ratings and scaffolds.
Finally, we plan to extend the CS 2 ontology in order to be able to add emotional information about the mood of the user for each post.The aim is to use opinion mining and sentiment analysis 40 in order to find out how the student feels when participating in a forum.This information can be taken into account during the learning process to improve learning and motivate students, for example, by proposing easy exercises to students who feel discouraged in order to boost their self-confidence.In that direction, we plan to study whether previous works on discovery and classification of learning disabilities 41 can be reused to find out whether a tendency of negative feelings is the result of a learning disability and to propose the best actions to perform according to the mood or limitations of each student.Thereafter, we plan to fully integrate the ontology within a virtual campus and apply the presented methodology in order to automatically infer the collaboration information generated during the virtual learning processes.Then, a set of services that use the collaboration information will be created in order to improve the learning experience of learners.

Fig. 1 .
Fig. 1.Framework for representing information related to Collaborative Learning Sessions from forums.

Table 1 .
Classes of SIOC Core Ontology relevant to Collaborative Sessions.
Class Description Community Community is a high-level concept that defines an online community and what it consists of.Container An area in which content Items are contained.Forum A discussion area on which Posts or entries are made.Item An Item is something which can exist within a Container.Post An article or message that can be posted to a Forum.Role A Role is a function of a UserAccount within the scope of a particular Forum, Site, etc. Space A Space is a place where data resides, e.g. on a website, desktop, file share, etc. Site A Site can be the location of either an online community or a set of communities, with UserAccounts and Usergroups creating Items in a set of Containers.It can be thought of as a web-accessible data Space.Thread A container for a series of threaded discussion Posts or Items.User Account A user account in an online community site.Usergroup A set of UserAccounts whose owners have a common purpose or interest.Can be used for access control purposes.

Table 2 .
Properties of SIOC aligned to CSML.Inverse properties have been omitted due to space constraints.
The Role a UserAccount has.has container The Container to which an Item belongs.has creator The UserAccount that made a Resource.has function A Role that a UserAccount has.has host The Site that hosts a Forum.has parent A Container or Forum that a Container or Forum is a child of.has reply Points to an Item or Post that is a reply or response to an Item or Post.has scope A resource that a Role applies to.has space A data Space which a resource is a part of.topic A topic of interest, linking to the appropriate URI, or of a SKOS category.

Table 3 .
Proposed mapping rules from IWT Forums into main CS 2 elements for a given Collaborative session identified by idF.

Table 5 .
Results on activity levels of the discussion in both control and experimental groups.