Knowledge base

IT is a great and exciting world to be in.
All who have passion for it will understand why we have created these pages.
Here we share our technical knowledge with you.
ProgrammingAICloudManagementMobileDB & AnalysisSafetyOtherArchitectureTips

Knowledge Discovery and Machine Learning

12/12/2016

Miroslav Smatana (TUKE)

Machine learning gives computers the ability to learn without being explicitly programmed.

Learning is enhancing performance in certain environment by gaining knowledge from experience in this environment.

Computer programs is able to learn from experience (training examples), if its performance increased during solution of a class of tasks thanks to give experience.

History of terms

 

Data mining

1
1805: Regression analysis (estimation of the relationship between the variables)*
2
1943: Model of neuron network
3
1965: Company decesion science - "evolutionary computing" to solve various problems
4
1990: Definition of the term "Data Mining"

* Estimation of orbits of comets and planets against the sun

Big data

1
1941: The first attempt to define the high increase of data "information explosion"
2
1999: The first definition of term "big data"
 
 

Cloud computing

Historical use of relevant terms

 

How we use the words:

Explanation of terms

Cloud

  • Service on demand - modular solutions based on open platforms
  • Internet access - thin client is sufficient for access to cloud, emphasis on security and compliance with standards
  • Payment for utilised resources - optimisation of operational cost
  • Scalability and elasticity - possibility of flexible adding and removing resources, capacities, services
  • Accumulating and sharing of resources - resistance of the solution against outage

Big data

Definition: Big data aredata that cannot processed by common means in requested time due to their volume, velocity of updates and or variety

3V/5V models characteristics:

  1. Volume - size of accumulated and processed data in GB/TB/PB
  2. Velocity - speed of generating data and how fast must the data be proocessed (data are updated fast: updates themselves can be small in volume)
  3. Variety - it is necessary to process data of various types (structured data from databases, texts, multimedia, sensory data etc.) type of data can change
  4. [Veracity] - data may be inconsistent,faulty, the source is untrustworthy
  5. [Value] - data are accumulated and processed to gain new knowlledge that can be applied effectively: accumulation of data must be potentially useful

knowledge discovery

Knowledge discovery in databases is a process of semi-automatic extraction of knowledge from databases. Knowledge must be:

  • Valid (in statistical meaning)
  • Yet unknown
  • Potentially useful (forgiven applicatio

Knowledge discovery is an iterative and interactive process. Mainly the following apply:

  • Statistics
  • Machine learning
  • Database systems

Process

  1. Understanding application and current knowledge (existing relevant knowledge and the aim of knowledge discovery process)
  2. Cleaning data (removing inconsistent data)
  3. Data integration from several sources (often heterogenous)
  4. Selection of data relevant for the given aim (attribute analysis)
  5. Data transformation into representation suitable for the given knowledge discovery process aim (eg. discretisation)
  6. Data mining - application of intteligent methods to gain valid patterns (the most important data mining tasks are description,association rules, classification/prediction and clustering
  7. Evaluation of found patterns - application of chosen scales
  8. Presentation of patterns - methods of knowledge representation and its visualisation (explicit knowledge)
  9. Use of discovered knowledge in given application

Machine learning

We are using machine learning everyday:

  • Search engines (let's tell Google what is relevant link)
  • Smap filter (mark spam and leave computer to understand why)
  • Facebook
  • Apple tagging pictures
  • ...

We cannot code everything (algorithms are not working):

  • Autonomous helicopter
  • Written text recognition
  • Processing of natural language
  • ...

Categories of machine learning:

1
Supervised learning (prediction / classification)
2
Unsupervised learning
3
Reinforcement learning
 

Example of supervised predictional learning (usually there are more dimensions and trend lines are not such easy):

Example of a linear decision boundary for binary classification:

Example of unsupervised laerning: clustering/segmentats identification:

Case studies

Application areas:

  • Marketing (segmentation of customers, optimisation of marketing campaigns)
  • Uncovering frauds (credit card transactions, mobile telecommunication networks)
  • Targetedadvertifins (based on observation of personal behaviour on web, recommendation systems)
  • Scientific application (medicine)
  • ...

 Affinio

Segment people to communities based on their preferences:

  • Understanding customers based on their behavior
  • Targeted marketing
  • Identification of the most suitable distribution channel and reaching to customers

Prelert

  • Behaviour analysis for security of payments
  • Detect a possible insider trading in a stock market

Barilliance

  • Personalised recommendation of products
  • Sending emails with relevant content
  • Recommendation directly on Web

Read more

http://ai.business/ 

https://rayli.net/blog/data/history-of-data-mining/

http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/#548f0fed55da

http://www.computerweekly.com/feature/A-history-of-cloud-computing

Find out more

Cloud
solutions
Knowledge base

Quick start

1
Contact
Contact us for a free consultation on your business needs!
2
Analysis
After the discussion we will perform mapping of the processes and analyse the current state.
3
Proposal
You will receive variety of scenarios to choose from discribing different ways how to solve your issue.
Contact us