MANUFACTURING community of practice

From NCOR Wiki
Revision as of 13:52, 30 September 2017 by Phismith (talk | contribs)
Jump to navigationJump to search

Open Knowledge Networks (OKN)


What is the state-of-the-art for open knowledge networks in MANUFACTURING?
What are some driving research questions that will benefit from MANUFACTURING OKN?
What are some driving commercial or consumer questions that will benefit from MANUFACTURING OKN
What are the gaps, why do they exist, and how do we address them?
How is MANUFACTURING different from other practices (biomedical, health, GEO, finance, self-driving vehicles, etc)?
What do we share with other domains? How can we benefit from this synergy?


MANUFACTURING breakout participants

[http:/ontology.buffalo.edu/smith Barry Smith], National Center for Ontological Research

Ram Sriram, NIST

Ruchari Sudarsan, DOE

William Regli, DARPA

jm@pinterest.com" Pinterest Existing ontologies or taxonomies from manufacturing and related domains


(Mostly public) Datasets


Use cases

Use cases span the following broad areas:

Manufacturing Capabilities (of companies, of manufacturing equipment, of sensors, ...)
Manufacturing Processes
Manufactured Products
Materials

Examples of questions the OKN methods might be able to answer: Risk management including forecasting/prediction. (suggested by Benjamin, seconded by several others) E.g., counterparty risk, bankruptcy. Often in combination with other aspects eg in procurement about suppliers or in health insurance/payment about claims.

Who owns a given legal entity (organization, legal person, maybe an account). This is a specialized task of Know Your Customer (KYC). It has applications in regulatory compliance, accounting, procurement, conflict-of-interest, money laundering, legal settlements, etc. Managed funds and ownership.

LEI Level 2; SEC Exhibit 21; SEC Regulation W which had a FIBO Rules pilot.  

Federation will be quite important overall, nearly immediately. Mapping between ontologies will be quite important in federation.

Louiqa: some summarization: Risk analytics, often combining financial with other kinds of data. Ownership. There’s a temporal factor involved. Mapping between ontologies or terminologies.

Mark: we’ve worked on internal ownership hierarchy. Shawn mentioned time series aspect is important.

Louiqa: Beyond financial issues, could financial networks be used to understand other aspects?

Mike (Cafarella): yes, including economic events, look at Twitter, as I did in some of my research around predicting unemployment, gun sales, other things. Could predict (“recover”) this and test it against public data to evaluate its accuracy. Or look at commercial vehicles activity. We could construct and recover various kinds of data. Maybe ownership would be a good area.

Mark: OpenCorporate. I think they make their dataset available free to researchers, but sell it for profit.

Louiqa: Maybe smart and connected cities. That could be a topic at the workshop breakout session.

Louiqa: I’d like to see what SEC could play in moving this forward.

Benjamin: maybe in connection with the W3C effort - eg in 2008-2010 - with Dave Raggett and Walter Hamscher to expose SEC XBRL data as linked data in RDF and SPARQL.

Louiqa: I’ve been in touch with some other people there recently.

Shawn: maybe address regulations, including public filings and comments associated with those.

Benjamin: FIBO work has been largely directed to regulations, e.g., the FIBO Rules pilot used linked data and the FIBO ontology to automate compliance with Federal Reserve Regulation W.

Louiqa: find flows and paths in the network.

Louiqa request to all: think of questions to ask the SEC folks.




What are the questions that are being asked to the data? How is the answer currently found (possibly painstaking)? Which datasets are consulted to find the answer?

Who owns a given entity? / who is their parent? Given observations on regulatory filings (intent to modify/create rules, rule proposals, etc.), speeches, twitter: Is regulatory capture at place? (right now this is answered by manually going through these various documents to understand the positioning of industry and government, or by tracing back the history of relationships of the individuals being politically appointed) Who are the closest competitors to a company? (one way to answer this is to read sections of 10-K’s and compare the similarity across companies) And then benchmark companies against their competitive set on things like investments in IT, automation, etc. (again, scouring filings) On the topic of using financial data to understand non-financial things: There was alleged insider trading at Equifax months before the hack became public. Could this story be generalized, i.e., could you identify potential bad events by monitoring executive trading / filings? Can Mike Cafarella's idea/research on measuring economic activity with social media be expanded to measure the housing market, inflation, amount of money being lent by banks to companies/individuals, etc. (currently government agencies often go through very exhaustive surveying and analysis) What data is available publicly on the operations of credit rating agencies? What other forms (other than 10-K) are available and tagged? E.g., companies have to disclose executive appointments, etc.

TREC Complex Answer Retrieval Track Given all the XBRL reports for a company, e.g., Daimler AG, can we create a Wikipedia-like page? https://en.wikipedia.org/wiki/Daimler_AG Can we customize such a page to be responsive to queries from the financial sector? Can we create a knowledge network of all entities and their relationships to Daimler? Given all XBRL reports of companies in a specific industry (NAICS sector), can we create a Wikipedia-like page, e.g., https://en.wikipedia.org/wiki/Automotive_industry  ?



Questions that can be answered: Corporate registry (per state) - Query by name and by charter ID What is the variation on stock returns? (based on an inferred network) Model of unemployment estimated from utterances on twitter “I lost my job”

Home ownership risk: This is an interesting discussion about dams, land set aside for a reservoir that was not acquired by the Army Corps of Engineers, developers wanting to build houses on that land, a county government encoding a warning about flood risk on a land plat, etc. http://swamplot.com/how-it-came-to-pass-that-hundreds-of-families-purchased-homes-inside-houstons-reservoirs/2017-09-20/ Such issues may come up frequently as we deal with future natural disasters due to climate change. Can someone identify potential datasets to create a financial knowledge network that is relevant to this problem.


FEIII 2018 Scored Task Assuming that we have a dataset (financial entity knowledge graph) that has been reported to the SEC (Exhibit 21) and a corresponding dataset that has been reported to GLEIF (Level 2 data). The FEIII task would be to align the two graphs. We can define metrics corresponding to the type of matching, overlap between the two datasets, mismatches, gaps, etc. The immediate outcome is an aligned open knowledge graph of companies / subsidiaries. Comments from Mike Willis The level 2 GLEIF data request is for the ultimate parent company under accounting consolidation rules. This relationship using the accounting consolidation rules may or may not be included within the Exhibit 21 ‘significant subsidiary’ listing. The ultimate parent may be ‘above’ the filer in the consolidation accounting ‘tree’ and or the reporting LEI entity may not be a ‘significant subsidiary’ in the filers accounting consolidation. The primary risk in the derivative counterparty funding is to ensure financial stability upon the ultimate termination or completion of the financial instrument. The accounting consolidation rules presume a ‘going concern’ and the entities that are included within the consolidation and may or may not be the same group of entities under the filer ‘liquidation’ accounting.

Friday September 29 2 p.m. https://meetings.sec.gov/orion/joinmeeting.do?MTID=069f809ec5300f1f874

  • Introductions - 5 minutes
  • Overview of FEIII Challenges [Louiqa Raschid] - 5 minutes
  • TREC Complex Answer Retrieval Track [Laura Dietz] - 5 minutes
  • Demo leveraging SEC financial data / IBM Financial API
       [Doug Burdick and Marina Danilevsky, IBM]   - 10 minutes
  • Overview of SEC [Austin Gerig and Mike Willis] - 5 minutes
  • Discussion - 30 minutes


https://karsha.umiacs.umd.edu/Fin_net_demo1/


SEC Call notes



Agenda:

  • Introductions - 5 minutes
   Louiqa -- large government initiative to create open knowledge networks/graphs based on financial datasets
  	 goal of the call is to figure out what kind of things/questions are of interest to the SEC
  	 
   Austin Gerig -- Assistant director of economics and risk analysis for the SEC
   
   Mark Flood -- OFR research principal
   
   Shawn Mankad -- Cornell
   
   Laura Dietz -- UNH
   
   Doug Burdick -- IBM research  -- uses SEC EDGAR data.  
   Marina -- IBM research will present a demo
   
   Ted Senator -- IARPA
   
   Michael Cafarella -- Umich & Apple
   
   Aparna Gupta -- RPI b-school faculty and currently at SEC on sabbatical


  • Overview of FEIII Challenges [Louiqa Raschid] - 5 minutes
   came out of "data science for macro modeling workshop", where the idea was formed to create competition on creating datasets
   
   2016 entity linkage problem RSS ID's and LEI's ; matching across addresses and entities
   
   2017 SEC 10K filings for a small number of companies; extract entities that are mentioned and classify their role and activity as it relates to the filing company 

Louiqa presents an “ego network” of financial companies, e.g., morgan stanley is connected to a bunch of other entities including their subsidiaries. Interactive figure with text summaries on edges.


  • TREC Complex Answer Retrieval Track [Laura Dietz] - 5 minutes

Laura discusses an info retrieval contest based on answering question queries, e.g., what are the subsidiaries of Bank of America. Laura studies the techniques to pull out and short answer this type of question in general domains.

End goal is to develop automatically a short report / wikipedia article specific to the question being queried

  • Demo leveraging SEC financial data / IBM Financial API
    	[Doug Burdick and Marina Danilevsky, IBM]   - 10 minutes

Marina presenting slide deck from IBM Research.

Illustrative example is to ingest raw financial filings to answer questions like what is the XX metric for company YY. What positions has Tim Cook held at Nike? (Board member)

Underlying this are NLP and text mining algorithms, and data integration across different data sources; REST API functionality; has a natural language interface -- though is preliminary;

SEC filings have structured (known schema, e.g., filing date, other meta data), semi-structured (XBRL information -- note this is not structured because there is a lot of variability from comapny to company in terminology (revenue or cloud can mean a gazillion different things), information presented, etc.), unstructured (html that an analyst would actually read). A lot of useful info is in the html, like metrics that are specific to the industry or company (Delta: miles per gallon).

  • Overview of SEC [Austin Gerig and Mike Willis] - 5 minutes

Mike Cafarella (Umich/Apple): creating many economic datasets based on social media, e.g., “i lost my job” to approximate something about the labor market.

Mike Willis (leads the SEC office of structured disclosure):

Financial statement and notes dataset. All the structured data that previous people were pulling and creating from edgar filings. This has info that is not available from other data aggregators. In the office of structured disclosure on the SEC website.

New 10-K format like an interactive document that combines the html and XBRL tags (called in-line XBRL) so you can do real-time topic searching (like searching for stock compensation), navigate more efficiently, and so on. The tags underlying this are tagged by the filer. Some of the definitional problems that were mentioned previously are actual accounting definitions, and a link is provided to the definition.

Austin Gerig (leads the office of research and data services:

data ingestion and loading it into different databases, HPC environments to support their economists / researchers Academic research

  • Discussion - 30 minutes

Q for IBM from SEC: What is the state of the demo as far as a product? IBM A: TBA; still preliminary stages.

Louiqa: What are the top 5 questions that the SEC wants to answer? SEC: Interesting research challenges from Mike Willis Roach assessment: Data Quality (extensions, negative values, inappropriate element selection, etc.) versus earnings quality. Hey I’m Special: Communication implications of (filing?) extension rates. What did you say?: Comparative sentiment analysis. Definitions matter - ‘boot’?: Appropriateness of extensions. Navigating disclosures: Disclosure modeling variances across comparable companies. Judge a book by it’s cover: Presentation options and variances. What not to wear: Presentation choices and options – best and worst practices. ‘Joe Friday’ vs Picasso: Facts versus Story telling – what do investors want? Does fashion matter: Trends in disclosure structures. DIY Hacks: What else could we use this for?

Austin Gerig Disconnect between facts versus story-telling. Are filers being strategic in their narratives? Monitoring trends in markets based on information extracted by public forms (and internally, the SEC merges this with confidential data), e.g., how did the filers/filings respond to events? Look for outliers using supervised learning -- some publicly known enforcement or action against the company. Train the forms and filings to ultimately identify other companies that should be monitored more closely. Can also be unsupervised. Open to new modeling techniques and approaches.

Louiqa: Can the SEC help create training datasets? Austin: maybe help with forming a competition problem / sample or synthetic data.

Laura: Is there a case or detailed example that can give us a better understanding of the kinds of problems, techniques, and type of analysis that the SEC is actually interested in. Austin: These are probably not public: enforcement- -- Detect fraud, detect instances where entities are breaking rules, Compliance division is resource constrained and more public info available. How do they prioritize which cases to investigate? How do they more efficiently use staff? Goes back to the unsupervised or supervised learning models that were brought up previously.

Louiqa; Are you interested specifically on finance industry? Austin: No. Anyone who issues stock (equities) for funding are of interest (and legal purview) of the SEC.


FINANCE community of practice - SEC call Friday September 22 3 p.m. Friday September 29 2 p.m. https://meetings.sec.gov/orion/joinmeeting.do?MTID=069f809ec5300f1f87491461c77c33af Meeting Number: 992 336 966 No password Audio Connection 202-551-7000 (US/Canada) 888-732-8001 (US/Canada Toll-free)