These tutorials are permanently archived at: http://help.discovertext.com/ and are available on Textifer’s You Tube Channel (linked from URL above)

Narrated by Stu Schulman, Founder, CEO of DiscoverText: a web (cloud) based system to perform text analysis and archive electronic data

The Tools and Account Management

v1: Scraping Content Off Facebook (6:32): Use of the Facebook graph API allows you use Fb. Fb’s multi-step setup process is discussed and recommendations are made. The Dashboard is where you set privacy settings for public visibility. Step by step instructions are shown with the Fb registration and settings pages to set up your scrape project. Achieving various results are discussed, mining and fetching, and the ingestion results that are collected on the Archive Page.

v2: Studying Facebook Gets Easier (2:23): The DiscoverText platform allows you to add a new search to an existing archive. Use of the Fb newsfeed opens other newsfeeds to categorize into buckets. The data is placed into coded buckets that you slowly build over time. Peer visibility and Facebook registration is touched on, but v1 is actually the better manual for correctly working through the Fb registration process.

v2: Harvesting Twitter Tweets (2:44): Using the Twitter API this tutorial instructs how to get a Twitter feed import, name the archive, go through the processing and notification. The Tweets are archived in your DiscoverText cloud account. It’s fast, immediate, cleanly archived, searcheable and sortable into your buckets.

Removing Duplicates and Clustering Near Duplicate Tools with Bulk Downloads from the Federal Docket System (6:36): The Archive Details page has a “review exact duplicates” analytic. The system looks at and compares all the documents in the selected archive such as greetings. The Near Duplicates sets a threshold for clustering for the sifter backend engine. The overall number of documents is not changed, just the groupings are changed. The new duplicate clusters are readable and color coded to assist with duplicate removal.

Manage Credentials and Visibility (1:58) : How to manage your profile, personal information, and public face in DiscoverText. Green means anyone can see it. Red means that only named people can see it. Please include your URL to help DiscoverText to maintain validity as a research site.

Peers in Discover Text (2:95): Peers and peer networks in the architecture of DiscoverText are an analytic network in the way that Fb is a social network. Peer networks aim to increase the quality of research. On your DiscoverText account you can accept or reject a peer request to share projects and coding of data. Agreeing to be a peer is about sharing the task of text analysis, not agreeing to be a person’s friend as on a social media site.

The Methodological Approach of PCAT/QDAP and DiscoverText

The Future of eRulemakeing (9:53): The development of a tool to analyze large amounts of text started 4 years ago. The inspiration came from the books “What Would Google Do?” (Jeff Jarvis) and “Everything is Miscellaneous” (David Weinberger) Currently over 1 million decisions are recorded on the PCAT/QDAP;DiscoverText web based system. PCAT/QDAP aimed at the Federal Docket System, and DiscoverText expanded to social and other types of media (Fb, Twitter, etc). Increasingly advanced social search tools use metadate, networks, and filtering. The future of documents is to bring them into a single database to make them easier to search and analyze.

Future tools need to process quicker and more accurate results that eliminate duplicate detection, etc. Future peer relations builds peer groups that securly segment peers into project groups throught credentials. Coding, Tagging or lebleing crosses the human/machine analytic divide through work flow. Crowdsourcing distributes rulemaking decision across many groups and boundaries. The next big thing on the internet is machine learning. Textifer aims to enchanc future machine-made-decisions. Text analytics is the new way to make decisions, making e-rulemaking the next Big Thing in the future of the internet.

The Future of Text Analysis (6:18): Repeats the visuals and concepts from The Future of eRulemaking. Advanced social search techniques to build better qualitative and quantitative analysis. The future of search will allow unlimited collection of text to filter, analyze the fragments of digital data (The new approach to data does not throw away any data–unlike classic regression theory).

A rudimentary Tag Cloud tool is included on this page, and at the time of tutorial writing was very simple. It will become more advanced with time.

Advertisements