Tuesday, July 7, 2009

DATAEXTRACTION Means A Lot More Than Simple Attribute Extraction

How often do you use DATAEXTRACTION wizard for extracting some information other than block attribute data? I have often seen people leaving it behind for tasks like extracting specific text information mentioning that this tool is not powerful enough to get an accurate output. That's not true. Here I am going to show you how to extract a line list from a bunch of Piping & Instrumentation Diagrams (P&ID). The P&IDs contains a lot of other text objects. But with the help of some filtering options, you can extract only the required text information from the drawing.

1. Start DATAEXTRACTION wizard

2. Select the required P&ID drawings (Here I am using sample ones).

3. In the third page select only 'Text' as object type.(Tip: You can uncheck all entries simultaneously by using right click).

4. In the fourth page, select 'Drawing' and 'Text' in the category filter. Also select 'File Name' and 'Value' in the properties area.

5. In the refine data page, you will get a screen similar to the one shown below. This is the area where we are going to play our game. You will notice that the value field contains various text object values from different drawings. We need to confine the list to line numbers.

6. Now right click on the value column and select 'Filter Options' from the right click menu.

7. You will be provided with a filter dialog box as shown in the image below. For our purpose, select the 'Contains' list item from the first list box and type in a wildcard criteria *"-* in the second list box as shown in the picture.

8. Click 'Ok' to apply the filter. Now you can see the value column filtered to show only line numbers based upon the given criteria.

9. Click on 'Next' button and save the output as an excel file or table.


You can easily create various lists from your drawings irrespective of the data containers. Whether it be attribute or text, DATAEXTRACTION is powerful enough to pull the required data based upon different conditions. As the wildcard criteria plays a critical role in extracting the specific data, you need to be very careful to choose the right criteria in order to get proper results.

On a side note, I feel like the command name is a bit too long. It would be a lot better if it were something like DATAXTRACT or even shorter. Anyway, that's not a matter of a concern as we don't use this command very frequently.

2 comments:

Anonymous said...

Your wish for a shorter command name has already been answered. EATTEXT. Not only is it shorter, but it also happens to be the most memorable command.

har!s said...

Hi Tyrone,
Thanks a lot for reminding me of that. It never came into my mind while composing the post.