From data to information
Big data describe all electronical data produced using new technologies, for personal and professional use. It includes company data (emails, documents, databases, transactions logs, …), data produced by sensors, content published on the internet (images, videos, sounds, text), e-commerce transactions, social networks exchanges, data produced by connected things (RFID tags, intelligent counters, smartphones…), geospatial data, etc.
Big Data challenges are sometimes referred to as the “3Vs” for Volume, Velocity and Variety or 5Vs (when adding Veracity and Value).
- Volume: the amount of data is constantly increasing.
- Velocity: the collection, analysis and use of data are more and more real-time activities
- Variety: data is available in very diverse formats
Infosquare’s Big Data and Analytics department mission: helping customers to make smart choices and choose the long-term and effective technologies which will enable them to generate business value quicker by integration data produced internally and/or publicly available.
OUR AREAS OF EXPERTISE
Data Visualization is a big challenge and a solid enabler for each company. The challenge and key: drastically reduce the time between the moment data is ingested and the moment it made available for consumption, with the target to make data available in real-time.
Used in an optimal manner, text mining (also called “text analytics”) enables companies to extract not only the information but more importantly the real sense contained in textual content produced in natural language. Whether used as a way to classify content, identify its subject or analyze the reputation of a product or service, mature text analytics technologies can effectively convert the “documents capital” into a real competitive advantage.
SEMANTICS & NOSQL DATABASES
Capturing, storing and analyzing both structured and unstructured data is a real challenge which require modern state-of-the-art technologies. Our expertise: making these new technologies available to our customers while ensuring their integration and compatibility with their existing systems.
The centralization and real-time integration of data enables the detection of situations. There are many practical use cases: preventive maintenance, theft or fraud, adverse events handling, customer queue monitoring, etc… Monitoring situations real-time allows for a prompt reaction, avoiding impacts and turning exceptions into new business opportunities.
Providing real-time and historical complete view about an entity. Integrating both data and documents, 360 views aim at presenting in an easily usable manner data related to an asset, an equipment, a customer, a supplier, an employee, or any other entity (or set of entities) in order to enable the execution of a task or a decision. While integrating data from different sources, they enable the identification of new facts about these entities.
In 2050, between 70% and 75% of humanity will live in cities. The challenge of “Smart Cities” is to analyze and improve the use of resources (water, electricity, etc…) and services (transport, security, …) provided to inhabitants while ensuring the respect of their individual rights. Big Data and IOT technologies are central to this challenge.
EXAMPLES OF PROJECTS
PERSONAL DATAGDPR Platform
- Definition (and discovery) of a map of the contributing systems (databases, applications, ERP, HR, ECM,…)
- Implementation of structured and unstructured data indexing mechanisms based on the indexing/search engine and an annotation engine
- Implementation of a GDPR Hub with integrated consent management
- Implementation of the on-demand export of a given customer’s data in XML format
- Technologies: Marklogic, GATE (General Architecture for Text Engineering), Google Tesseract
PHARMACEUTICAL INDUSTRYData visualization
- Guidance on the usage of the technology, improvement of the library of models and extension of the usage of the data visualization platform to more departments
- Clinical studies:
– Design and documentation of the data visualization according to study protocols specifications and CRFs
– Mapping of iDarts with clinical data from the SAS database (SDTM format)
- Extension of the platform to integrate data from SAP HANA, SAP BW and Hadoop
- Technologies: TIBCO Spotfire
POST-MARKETING PHARMACOVIGILANCESOCIAL MEDIA MONITORING
- Identification of the sites to be monitored
- Integration of the customer, products and domain ontologies and vocabularies
- Integration of the sentiment analysis / polarity classification engine
- Technologies: MarkLogic, GATE (General Architecture for Text Engineering), Nutch