The most important way of reporting these days seems to be PowerPoints or exported PDF’s.
These documents, along with other legacy material, tend to be hidden away in project folders with very useful names at the moment of project execution but whose relevance may have become less significant once a project has been completed.
So, how to find information on a company data drive when half of the useful material to browse through is called along the lines of “Final_final_v3”?
This is a relevant issue for many people working in small to large businesses especially when their documents are distributed across many drives, SharePoint sites and document management systems.
Access the pretrained machine learning models and some of the training data here.
ConocoPhillips Skandinavia AS (ConocoPhillips) and Kadme have collaborated on a research and development project using machine learning whereby figures are first of all being found in these documents and secondly being classified into over 40 relevant geoscience image categories such cross-sections, thin sections, graphs, maps, tables, seismic sections, etc.
In doing so, even a document that doesn’t carry the name of the topic of interest in the document name is still indexed and mined for relevant images, content and geographical location.
“A machine learning approach is what we need to solve this challenge,” said Peter Bormann from ConocoPhillips during a presentation at the DigEx Conference on 7th April. “It sounds easy to extract images from documents, but to actually find and isolate images in a PDF is not straightforward. It requires a machine learning approach,” he added.
The R&D Project between ConocoPhillips and Kadme has been ongoing since 2020 and is now finalizing a functioning tool that helps to find the images in reports and classifies these into categories, that may be useful for oil and gas professionals.
There is no need to transfer the file system of an entire company server into the cloud since the process can be run on premises which may significantly reduce costs and be more flexible compared to cloud only solutions.
Basically, Kadme advised that it integrated the images search into their existing enterprise search engine, enabling the user to search for images and content using locations such as wells or fields or key words. For instance, it is now possible to search an entire company document repository for “thin sections images” from the Ekofisk field”. Something that was impossible to do before.
“Our ultimate aim is that machine learning eventually will improve the ability for project teams to find legacy information effectively, to gain necessary insight and to prevent duplication of work,” emphasised Peter.
As the problem of finding and classifying image data is an issue many companies and organisations are facing, ConocoPhillips and Kadme collaborated to make the machine learning models publicly available for the industry. These models are trained on public data and have been released at the DIGEX 2022 Conference last week.
“We think this collaboration should be something other companies should also be able to benefit from and maybe others can develop this further,” said Peter.
HENK KOMBRINK