Semantic Technology represents a fairly diverse family of technologies that helps our machine to understand the meaning of natural language .Some examples of semantic technologies include natural language processing (NLP), data mining, artificial intelligence (AI), category tagging, and semantic search.
You might think of the goal of semantic technologies as separating signal from noise. Some examples of existing semantic technologies being used today include:
- Natural-language processing (NLP). NLP technologies attempt to process unstructured text content and extract the names, dates, organizations, events, etc. that are talked about within the text.
- Data mining. Data mining technologies employ pattern-matching algorithms to tease out trends and correlations within large sets of data. Data mining can be used, for example, to identify suspicious and potentially fraudulent trading behavior in large databases of financial transactions.
- Artificial intelligence or expert systems. AI or expert systems technologies use elaborate reasoning models to answer complex questions automatically. These systems often include machine-learning algorithms that can improve the system’s decision-making capabilities over time.
- Classification. Classification technologies use heuristics and rules to tag data with categories to help with searching and with analyzing information.
- Semantic search. Semantic search technologies allow people to locate information by concept instead of by keyword or key phrase. With semantic search, people can easily distinguish between searching for John F. Kennedy, the airport, and John F. Kennedy, the president.
They are implemented using many different programming languages, produce data (signal) in many different formats, rely on very different underlying formalisms, and rarely work well together without investing a significant amount of effort in integration engineering.
Semantic Web technologies—no matter what exact name is being used to refer to them—are a family of very specific technology standards from the World Wide Web Consortium (W3C) that are designed to describe and relate data on the Web and inside enterprises. These standards include:
- a flexible data model (RDF),
- schema and ontology languages for describing concepts and relationships (RDFS and OWL),
- a query language (SPARQL),
- a rules language (RIF),
- a language for marking up data inside Web pages (RDFa),
- and more.
So what precisely is the relationship between semantic technologies and Semantic Web technologies?
- Semantic Web technologies are a set of technologies that happen to be especially well-suited for implementing semantic technology algorithms and solutions.
- Collectively, Semantic Web technologies are a toolbox; as such, they can be used to implement a wide variety of algorithms, solutions, and applications. However, they are particularly appropriate for implementing semantic technologies. Consider the following examples:
- Classifying data can be accomplished very effectively by describing information using the schema and ontology languages that are part of the Semantic Web technology set.
- Semantic search requires a way to describe data conceptually and a way to search via these concepts. The Semantic Web technology stack satisfies both these conditions.
- NLP tools can identify unanticipated relationships between entities in source documents. The flexible graph-based data model that is one of the core Semantic Web standards is an ideal way of capturing all information obtained by NLP technology without the need to discard any data.
Semantic technologies are algorithms and solutions that bring structure and meaning to information. Semantic Web technologies specifically are those that adhere to a specific set of W3C open technology standards that are designed to simplify the implementation of not only semantic technology solutions, but other kinds of solutions as well.
Consider knowledge graph an example of semantic technology. A knowledge graph is a programmatic way to model a knowledge domain with the help of subject-matter experts, data interlinking, and machine learning algorithms. The easiest example is probably the box you see in Google’s results.
Every Company/Group/Individual creates their own version of the Knowledge Graph to limit complexity and organize information into data and knowledge. For example, Google’s Knowledge Graph, Knowledge Vault, Microsoft’s Satori, Facebook’s Entities Graph, etc.
So, there is no formal definition of Knowledge Graph. In a broader perspective, a Knowledge Graph is a variant of semantic network with added constraints whose scope, structure, characteristics and even uses are not fully realized and in the process of development.
For example if we talk about Virat Kohli and try to collect information about him from web then we will obviously go to his wikipedia page and gather all the information from that page but don’t you think its quite a lot of information to go through because there will be many more hyperlinks which will take you hours to go and check and even then how can we find a way to make this text data readable for machines ? With knowledge graph we transform this text data into something that can be used by the machines and also can be interpreted easily by us.
Simplest knowledge graph looks like this :
These two nodes are entities and edge represents a relationship between them for example :
Going through all pages and identifying entities and relationship manually is not scalable so we need machine to do it but how is it possible for them to identify entities and relationships between them ? This is where Natural Language Processing (NLP) helps.
If our machine can understand natural language it can build knowledge graph,for that we use different NLP techniques such as sentence segmentation, dependency parsing, parts of speech tagging, and entity recognition.
First thing we need to do is split document or article into sentences and then shotlist only sentences with one subject and one object.
The subject is the person or thing doing something, and the object is having something done to it. Just remember the sentence I love you. I is the subject of the sentence. You is the object of the sentence and also the object of my affection.
“Indian tennis player Sumit Nagal moved up six places from 135 to a career-best 129 in the latest men’s singles ranking. The 22-year-old recently won the ATP Challenger tournament. He made his Grand Slam debut against Federer in the 2019 US Open. Nagal won the first set.”
Let’s split the paragraph above into sentences:
- Indian tennis player Sumit Nagal moved up six places from 135 to a career-best 129 in the latest men’s singles ranking
- The 22-year-old recently won the ATP Challenger tournament
- He made his Grand Slam debut against Federer in the 2019 US Open
- Nagal won the first set
Out of these four sentences, we will shortlist the second and the fourth sentences because each of them contains 1 subject and 1 object. In the second sentence, “22-year-old” is the subject and the object is “ATP Challenger tournament”. In the fourth sentence, the subject is “Nagal” and “first set” is the object:
Extracting the objects in both of the above scenario is tricky because of multiple words object.How to solve this problem ?
Entities Extraction to identify subject and object
With POS tagging we can say nouns and proper nouns are th entities but this method fails when there are multi-words entities like in above case then we need to use dependency parsing.We need to parse the dependency tree of the sentence.
Dependency parsing is the process of analyzing the grammatical structure of a sentence based on the dependencies between the words in a sentence.
In Dependency parsing, various tags represent the relationship between two words in a sentence. These tags are the dependency tags. For example, In the phrase ‘rainy weather,’ the word rainy modifies the meaning of the noun weather. Therefore, a dependency exists from the weather -> rainy in which the weather acts as the head and the rainy acts as dependent or child. This dependency is represented by amod tag, which stands for the adjectival modifier.
Similar to this, there exist many dependencies among words in a sentence but note that a dependency involves only two words in which one acts as the head and other acts as the child. As of now, there are 37 universal dependency relations used in Universal Dependency (version 2). You can take a look at all of them here. Apart from these, there also exist many language-specific tags.
import spacy nlp=spacy.load('en_core_web_sm') text='It took me more than two hours to translate a few pages of English.' for token in nlp(text): print(token.text,'=>',token.dep_,'=>',token.head.text)
In the above code example, the dep_ returns the dependency tag for a word, and head.text returns the respective head word. If you noticed, in the above image, the word took has a dependency tag of ROOT. This tag is assigned to the word which acts as the head of many words in a sentence but is not a child of any other word. Generally, it is the main verb of the sentence similar to ‘took’ in this case.
Let’s get the dependency tags for one of the shortlisted sentences. I will use the popular spaCy library for this task:
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp("The 22-year-old recently won ATP Challenger tournament.") for tok in doc: print(tok.text, "...", tok.dep_) Output: The … det 22-year … amod – … punct old … nsubj recently … advmod won … ROOT ATP … compound Challenger … compound tournament … dobj . … punct The subject (nsubj) in this sentence as per the dependency parser is “old”. That is not the desired entity. We wanted to extract “22-year-old” instead. The dependency tag of “22-year” is amod which means it is a modifier of “old”. Hence, we should define a rule to extract such entities. Compound words are those words that collectively form a new term with a different meaning. Therefore, we can use this rule— extract the subject/object along with its modifiers, compound words and also extract the punctuation marks between them. In short, we will use dependency parsing to extract entities.
Now after entity extraction we need to find relationship between those entities.To extract the relation, we have to find the ROOT of the sentence (which is also the verb of the sentence). Hence, the relation extracted from above sentence would be “won”.
Rule is to find the ROOT word or the main verb in the sentence. Once the ROOT is identified, then the pattern checks whether it is followed by a preposition (‘prep’) or an agent word. If yes, then it is added to the ROOT word.