+44 7887 691 077 inphinity@myinphinity.comPartnersContact us
INPHINITY
  • Company
    • About Inphinity
    • Partners
    • Events
    • CSR
    • Careers
    • Inphinity Universe
    • Inphinity.App
  • News
  • Products
    • Inphinity Suite
    • Inphinity Forms
    • Inphinity Flow
    • Inphinity Mole
    • Other products
  • Solutions
    • Budgeting and Planning
    • Mole Data Privacy Analytics
    • Project Management Solution
    • Audit Genie Solution
    • ESG Reporting & Compliance
  • Resources
    • Demo apps
      • HR Workflow Into Action
      • Inphinity Files Analyzer
      • Inphinity Universe: Demo Portal
    • Webinars
      • Qlik and Inphinity: The only cloud solution for the next generation of analytics
      • Inphinity Webinar On-Demand │Qlik and Inphinity: The only cloud solution for the next generation of analytics
      • Activate your HR Analytics with Qlik and Inphinity Suite
      • Activate your data intelligence with Qlik & Inphinity Suite Superpowers for Budgeting and Planning
      • Mission Critical: 4th Gen Analytics in Healthcare
      • Qlik World 2020 Sessions
      • UHMB NHS: Taking Actions Where The Data Is
      • Waterfall AM: Actionability In Financial Services
      • Pushing the Boundaries Video Series
      • Qlik Geex: Writeback Discussion
    • Success stories
      • Waterfall AM
      • Mayo Hardware
      • UHMB NHS UK
      • Emark Analytics
      • Thunderbox Business Success
      • Inphinity Forms on Mobile
      • Inphinity Healthcare: Mission Critical
    • Technical
      • Inphinity Suite November 2022
      • TechTuesday Video Series
      • Inphinity Suite May 2022
      • Inphinity Forms Version 10.0
      • Inphinity Flow Version 2.3
      • More releases
  • Training & Consultancy
  • Menu Menu

Preparing Your Unstructured Data In Qlik

Based on my previous blog post about unstructured data in general, we already know that data preparation is one of the 4 main steps in unstructured data analysis. Having clean and optimized data can shorten processing time enormously and also lead to more accurate results.

By Maria Oreska | | 4. May 2021

Parse data, normalize words, skip the punctuation, ignore short words if they aren’t relevant. Naturally, you can do all this outside of Qlik Sense and load results in your app. However, you can also consider doing it directly in your Qlik Sense. And no, I’m not talking about the SSE implementation, I mean the Qlik script as it is.

PARSING TEXTS

If you don’t use the unstructured data connector or another approach that parses the text for you, you need to parse the text during the reload of the app. Do you know the subfield() function? I’m pretty sure, you do. But did you know its third parameter is not required? If you use subfield(‘A;B;C’,’;’) in the script, it returns you 3 rows with values A, B, and C.

subfield('A;B;C',';') used in Qlik script returns 3 rows

NORMALIZING WORDS

Keeping the difference between London, LONDON, and london isn’t important very often. Analyzing all these words separately will multiple the processing time… not to mention memory utilization during the reload of the app. If you’re now thinking “really? Is this sooo big difference?” my answer is clear: “Yes. Because you don’t develop a solution for analyzing tens of words. You don’t need a solution for it.” Once you decide to develop a tool for analyzing unstructured data you need to think about the big data right from the beginning.

Once you decide to develop a tool for analyzing unstructured data you need to think about the big data right from the beginning.

In Qlik you can use both functions, lower() or upper() for normalizing words. It’s only up to you if you prefer having LONDON or London in the result. For end-user, you can use the capitalize() function when representing data.

lower(word) = lower_letters
upper(word) = UPPER_LETTERS
capitalize(word) = 'Capitalized Words'

INGORING PUNCTUATION

In the Slovak language, we have all these types of a simple lowercase a: a, á, ä. In a world where so many people ignore the punctuation in their communication, does it make sense to analyze München and Munchen? For many use cases, it doesn’t. Sometimes, it does.

If you want to ignore the punctuation, here is a Qlik trick – mapsubstring(). Yes, there is a function that can do the applymap() magic not on a word level but on a letter level instead! It can replace a letter for a longer word, but for our use case, we will just replace a letter with a letter.

Obviously, we need to define a mapping table for this letter conversion. If you want to save time, use this magic in combination with normalizing words – you don’t need to define the mapping for both, upper and lowercase versions.

[M_letters]:
mapping
load * inline
[letter,letter_norm
à,a
á,a
â,a
ã,a
ä,a
å,a
ă,a
č,c
ď,d
è,e
é,e
ê,e
ë,e
ě,e
í,i
ĺ,l
ľ,l
ň,n
ó,o
ô,o
ö,o
ō,o
ŏ,o
ő,o
ŕ,r
ř,r
ś,s
ŝ,s
š,s
ť,t
ú,u
ü,u
ũ,u
ū,u
ŭ,u
ů,u
ű,u
ý,y
ź,z
ž,z
];
mapsubstring('M_letters','München') = 'Munchen'
mapsubstring('M_letters','guľôčka') = 'gulocka'

IGNORING SHORT OR SPECIFIC WORDS

Based on the assumption we’re developing the solution with a big data approach, think about all types of words we don’t need to analyze all the words. ‘a’, ‘and’, ’however’ can be irrelevant. Use where clause in your script as soon as possible to optimize memory utilization.

Where len(word) > 3 is an example for length restriction. If you want to ignore some specific words you can use match(), wildmatch(), or exists().

…where len(word)>3
…where not(match(word,’and’,’however’,’therefore’))
…where not(wildmatch(word,’ther*’,’?owever’))

[words_to_ignore]:
Noconcatenate
Load * inline
[word
and
or
however
];

…where not exists(word);

Combining all these approaches together you are a big step closer to analyzing unstructured data in your Qlik app in an efficient way.

Interested?

Stay tuned, the analysis part will be the topic of my next blog! 😉

CONTACT US

WRITTEN BY

  • About
  • Latest Posts
Maria Oreska
Maria Oreska
CTO at Inphinity ∞
🤩 "You do not need to torture the data to tell you the truth - you just need to understand it." We generate a lot of data every day and it will not change. I am a mathematician and data analyst who see its potential not only in robotics or research but in understanding the world around us, our businesses, and clients' behavior, as well. I am doing my best to help people use their data wisely.
Maria Oreska
Latest posts by Maria Oreska (see all)
  • Inphinity Suite November 2022 - 29. November 2022
  • Inphinity Suite August 2022 - 15. September 2022
  • Inphinity Suite May 2022 - 31. May 2022

WRITTEN BY

Maria Oreska

🤩 “You do not need to torture the data to tell you the truth – you just need to understand it.” We generate a lot of data every day and it will not change. I am a mathematician and data analyst who see its potential not only in robotics or research but in understanding the world around us, our businesses, and clients’ behavior, as well. I am doing my best to help people use their data wisely.

See author's posts

Latest news

Experience the thrill of certainty at QlikWorld 2023

News & Updates | Imogen Pickett

Inphinity Suite November 2022

News & Updates | Maria Oreska

On-Demand Webinar: Take Action on Carbon Emissions with Inphinity and Cast Solutions

Event | Imogen Pickett

Most Popular

How process intelligence holds the key to excellence

Demos and Use Cases | Sean Price

How Analytics Holds a Vital Key to Health Services

News & Updates | Imogen Pickett

Inphinity Healthcare: Mission Critical Tools to Improve Patient Outcomes, Reduce Risk and Minimise Cost

Demos and Use Cases | Sean Price

Latest tweets

  • The Inphinity ∞ team are super excited to announce that we are Ruby sponsors at #qlikworld 2023. We cannot wait to… https://t.co/jddOQVTQI7yesterday
  • We are pleased to announce our partnership with @ancoreSoft. This combination of technologies will take the #Qlik u… https://t.co/KTNHRumgEc6 days ago
  • Inphinity Suite brings #qlik users unparelled additional functionality. Creating the opportunity for countless use… https://t.co/qvlp0Vc87F19 days ago
  • Check out this #ESG initiatives management application which gives decision makers greater control over ESG committ… https://t.co/K5sO0Jo65e32 days ago
  • The Inphinity ∞ team believe that #data is only meaningful when there is human interaction with it. To recognise th… https://t.co/wHTrKQ8a1x165 days ago
  • Join us at our webinar where @CastSolutionsAU will showcase a customer app for ANU combining #Qlik & #Inphinity. Th… https://t.co/h9X1bN6p07166 days ago

Subscribe and get fresh inphinity news, invitations and access to webinars
 

    Address

    Inphinity Limited
    Soho Square Centre
    18 Soho Square
    London W1D 3QL
    United Kingdom
    _

    Inphinity CEE
    Westend Gate,
    Dubravska cesta 14,
    84104 Bratislava,
    Slovakia

    +44 7887 691 077 

    inphinity@myinphinity.com

    Company

    • Contact us
    • News
    • Events
    • Partners
    • CSR
    • Careers

    Products

    • Inphinity Suite
    • Inphinity Forms
    • Inphinity Flow
    • Inphinity Mole
    • Inphinity.App
    • Other products

    Terms & Conditions | Privacy Policy | User License Agreement

    © 2019 All rights reserved by Inphinity - Enfold Theme by Kriesi
    • Twitter
    • LinkedIn
    Scroll to top

    We use cookies to give you the best possible experience on our website. By continuing to browse this site, you give consent for cookies to be used. See how.

    Got it!×

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refusing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only