Campaign: Data Availability, Information Quality, Accountability...

New Technology to help with Transparency

Application of New Information Technology to the Transparency Objective

of the Open Government Directive

 

While the Open Government Directive offers an unprecedented challenge to Federal Agencies, information technology is now available to assist Agencies in achieving the Transparency objective of the Directive in a rapid, cost-effective manner. Throughout the Directive, the need to proactively provide easy to use, effective, and comprehensive, yet controlled access to Agency information is emphasized. Various references from the Directive emphasizing this theme include:

 

• 1.b. To the extent practicable and subject to valid restrictions, agencies should publish information on-line in an open format that can be retrieved, downloaded, indexed, and searched by commonly used web search applications. An open format is one that is platform independent, machine readable, and

made available to the public without restrictions that would impede the re-use of that information.

• 1.c. To the extent practical and subject to valid restrictions, agencies should proactively use modern technology to disseminate useful information, rather than waiting for specific requests under FOIA.

• 1.e.ii. In cases where the agency provides public information maintained in electronic format, a plan for timely publication of the underlying data. This underlying data should be in an open format and as granular as possible, consistent with statutory responsibilities and subject to valid privacy,

confidentiality, security, or other restrictions. Your agency should also identify key audiences for its information and their needs, and endeavor to publish high-value information for each of those audiences in the most accessible forms and formats.

• 3.a.i. A strategic action plan for transparency that ... (3) identifies high value information not yet available and establishes a reasonable time-line for publication online in open formats ...

 

Communicating the nature of the extraordinary impact upon the Transparency challenge offered by the new technology is best achieved with presentation of the problem, the opportunity, and the solution...

 

The Problem

For decades the information technology industry has refined our ability to house and retrieve information that is stored on computers in a clearly formatted manner that is generally referred to as structured data. Structured data has ample context associated with its format to allow the computer to present, manipulate, and retrieve it in virtually any manner we have

imagined. We even have tools that permit us to retrieve our structured data in ways not preimagined nor pre-defined, but in an ad hoc manner.

 

Although the amount of structured information housed within our computer systems continues to increase, it is proportionately decreasing as a subset of the total universe of computer housed information. Certainly there are advantages to housing this ever increasing volume of unstructured information. However, information, like memory, is only of value if it can be

retrieved when needed. Historically, the only way we have been able to retrieve unstructured data is by directly associating it with structured metadata that is created solely for the purpose of more easily locating unstructured data objects. The cost of producing this metadata is high

and it is not a comprehensive solution to the problem of finding our unstructured data. Thus we are left with a situation that allows us to properly manage and expose only a relatively small amount of the information we have collected on our computer systems, as we are not

able to manage unstructured information as handily as we would like.

 

The Opportunity

Recent advances in separate areas of information technology are enabling an extraordinary convergence of technological innovation that is of significant relevance to the Transparency challenge of the Open Government Directive. Automated text recognition within images and within speech has significantly improved in its ability to discern language from otherwise unstructured media, from standard sheets of paper, to voice recordings, and to multimedia, in general. Meanwhile the explosion of the internet has brought with it the ability to crawl, index, and search through enormous volumes of search-able text.

 

Unfortunately, not all text is inherently search-able. For text to be search-able on a computer, it must be encoded in a form that is discernible by the computer, such as ASCII, EBCDIC, etc. However, much of our burgeoning information is stored in bit-mapped formats that can be rendered as imagetext and readable to humans, but alas, is not understandable as text to a

computer, and thus is not search-able. As with an iceberg, we can clearly see the tip of the enterprise's universe of information in the form of structured data. But, as the largest volume of the iceberg residing below the water

line is not clearly visible, so too, is our transparency unstructured data limited. Thanks to the advances in search engine technology that can be applied to any search-able data, i.e. encoded text, we are able to readily peer below the water line and have transparency to a large portion of the information iceberg.

 

However, there is still a very large portion of the iceberg that is not transparent. Enormous volumes of information are buried in computer files such as images, sound, and video. While the information in these files is render-able to a sighted or non-hearing impaired human, we cannot readily apply our new tools such as search and analytic engines to this vast

information resource. With the convergence of advances in automated recognition and text based tools, the potential exists to tap into vast reservoirs of knowledge and make use of the entire information iceberg. If only we had the remaining pieces to bring to bear these incredible

technologies to our unstructured data repositories.

 

The Solution

Applying these convergent technologies to our Transparency challenge still requires certain incremental advances...

 

• Automatic recognition of key words needs to be just a little better to get that last percentage of recognition,

• Automated recognition is extremely computer processor intensive, and thus cannot be done in real time. In fact, its application to large volumes is a challenge in scalability.

• The quantum leap in available information that is theoretically possible is incredibly powerful and for that reason must be done in the context of important privacy and security considerations,

• For universal applicability, open standards and formats are required.

Using our decades of experience managing unstructured data and some recent patent pending innovations we, at SYSCOM, Inc., have developed a complete solution to addressing all of the above issues. SYSCOM's Imagetext Business Intelligence Gateway (IBIG) can produce comprehensive Transparency to all of the information within the “information iceberg”, in a manner that is secure, open, and scalable.

 

SYSCOM's IBIG solution directly addresses the President's Open Government Directive by using modern technology to expose and disseminate useful information to the general public and other federal agencies. As the demand for more government transparency grows SYSCOM's IBIG solution can lead the way. Please consider the use of IBIG to support your Open Government Initiative.

Submitted by

Tags

Voting

0 votes
Active
Idea No. 41