MANUFACTURING community of practice: Difference between revisions

From NCOR Wiki
Jump to navigationJump to search
mNo edit summary
mNo edit summary
Line 1: Line 1:
[http://ichs.ucsf.edu/open-knowledge-network/ Open Knowledge Networks] (OKN)
[http://ichs.ucsf.edu/open-knowledge-network/ Open Knowledge Networks] (OKN)


:What is the state-of-the-art for open knowledge networks in MANUFACTURING?
:What is the state-of-the-art for open knowledge networks in MANUFACTURING?
Line 13: Line 12:


:What do we share with other domains? How can we benefit from this synergy?
:What do we share with other domains? How can we benefit from this synergy?


'''MANUFACTURING breakout participants'''
'''MANUFACTURING breakout participants'''


[http:/ontology.buffalo.edu/smith Barry Smith], [http://ncor.us National Center for Ontological Research]
[http://ontology.buffalo.edu/smith Barry Smith], [http://ncor.us National Center for Ontological Research]


[https://www.nist.gov/people/ram-d-sriram Ram Sriram], NIST
[https://www.nist.gov/people/ram-d-sriram Ram Sriram], NIST
Line 25: Line 23:
[https://www.linkedin.com/in/bill-regli-083552/ William Regli], DARPA
[https://www.linkedin.com/in/bill-regli-083552/ William Regli], DARPA


jm@pinterest.com" Pinterest
[https://www.linkedin.com/in/jmilinovich/ John Milinovich], Pinterest
'''Existing ontologies or taxonomies from manufacturing and related domains'''




'''Existing ontologies or taxonomies from manufacturing and related domains'''


'''(Mostly public) Datasets'''
'''Existing (mostly public) Datasets'''


'''Use cases'''
'''Use cases'''  


Use cases span the following broad areas:  
Use cases span the following broad areas:  


:Manufacturing Capabilities (of companies, of manufacturing equipment, of sensors, ...)
:Manufacturing Capabilities (of companies, of manufacturing equipment, of sensors, ...)
::Use case: classification of suppliers, screening to select suitable suppliers (risk mitigation in supply-chain management -- for example when accepted bidder might drop out)
::In progress: scraping information on the webpages of manufacturing companies and mapping terms identified to ontologies to enable reasoning (Farhad Ameri, Collaborative agreement between NIST and Texas State)
:Can we create wikipedia-like pages for each company from this activity?
::Manufacturing Readiness Levels (MRL) [http://www.dodmrl.com/ of interest also to DOD]
:Manufacturing Processes
:Manufacturing Processes
:Manufactured Products  
:Manufactured Products  
::So far what exists are primarily NLP-based attempts to identify emerging trends in customer needs or markets for example from the study of Amazon reviews of products
::[https://www.nist.gov/sites/default/files/documents/el/msid/16_aBarnardFeeney.pdf  Standard for the Exchange of Product Model Data (STEP)]
:::[http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=902775 OntoSTEP]
::Can we convert this activity into an ontology-based OKN?
:Materials  
:Materials  
::On-going AFRL work (Clare Paul, Wright-Patt) to create a MatOnto, a large materials science ontology growing out of the Materials Genome Initiative
:Workforce development
::(from OKN Finance CoI) Creating many economic datasets based on social media, e.g., “i lost my job” to approximate something about the labor market.
:Patents
::Use case: to enable enhanced patent search resolving terminological inconsistencies
::[http://eil.stanford.edu/publications/sid/icegov_2012.pdf Focus on the patent system]
::[http://ieeexplore.ieee.org/document/6061369/ Retrieval of patent information]
::[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3632996/ Comparison of International Patent Classification (IPC) with MeSH]
:Robots
::Probably not enough data in the public domain to enable a useful OKN for robot use in manufacturing at this stage


Examples of questions the OKN methods might be able to answer:
'''Examples of questions the OKN methods might be able to answer'''
Risk management including forecasting/prediction.  (suggested by Benjamin, seconded by several others)  E.g., counterparty risk, bankruptcy.  Often in combination with other aspects eg in procurement about suppliers or in health insurance/payment about claims.
:One goal is to develop automatically short reports / wikipedia article specific to the question being queried
 
Who owns a given legal entity (organization, legal person, maybe an account). This is a specialized task of Know Your Customer (KYC). It has applications in regulatory compliance, accounting, procurement, conflict-of-interest, money laundering, legal settlements, etc. Managed funds and ownership.
LEI Level 2; SEC Exhibit 21; SEC Regulation W which had a FIBO Rules pilot. 
 
Federation will be quite important overall, nearly immediately.
Mapping between ontologies will be quite important in federation.
 
Louiqa:  some summarization:
Risk analytics, often combining financial with other kinds of data.
Ownership.  There’s a temporal factor involved.
Mapping between ontologies or terminologies. 
 
Mark:  we’ve worked on internal ownership hierarchy.  Shawn mentioned time series aspect is important. 
 
Louiqa:  Beyond financial issues, could financial networks be used to understand other aspects? 
 
Mike (Cafarella):  yes, including economic events, look at Twitter, as I did in some of my research around predicting unemployment, gun sales, other things.  Could predict (“recover”) this and test it against public data to evaluate its accuracy.  Or look at commercial vehicles activity.  We could construct and recover various kinds of data.  Maybe ownership would be a good area.
 
Mark:  OpenCorporate.  I think they make their dataset available free to researchers, but sell it for profit.
 
Louiqa:  Maybe smart and connected cities.  That could be a topic at the workshop breakout session. 
 
Louiqa:  I’d like to see what SEC could play in moving this forward. 
 
Benjamin:  maybe in connection with the W3C effort - eg in 2008-2010 - with Dave Raggett and Walter Hamscher to expose SEC XBRL data as linked data in RDF and SPARQL.
 
Louiqa:  I’ve been in touch with some other people there recently.
 
Shawn:  maybe address regulations, including public filings and comments associated with those. 
 
Benjamin:  FIBO work has been largely directed to regulations, e.g., the FIBO Rules pilot used linked data and the FIBO ontology to automate compliance with Federal Reserve Regulation W. 
 
Louiqa:  find flows and paths in the network.
 
Louiqa request to all:  think of questions to ask the SEC folks.
 
 
 
 
 
 
What are the questions that are being asked to the data? How is the answer currently found (possibly painstaking)? Which datasets are consulted to find the answer?
 
Who owns a given entity? / who is their parent?
Given observations on regulatory filings (intent to modify/create rules, rule proposals, etc.), speeches, twitter: Is regulatory capture at place? (right now this is answered by manually going through these various documents to understand the positioning of industry and government, or by tracing back the history of relationships of the individuals being politically appointed)
Who are the closest competitors to a company? (one way to answer this is to read sections of 10-K’s and compare the similarity across companies) And then benchmark companies against their competitive set on things like investments in IT, automation, etc. (again, scouring filings)
On the topic of using financial data to understand non-financial things:  There was alleged insider trading at Equifax months before the hack became public.  Could this story be generalized, i.e., could you identify potential bad events by monitoring executive trading / filings?
Can Mike Cafarella's idea/research on measuring economic activity with social media be expanded to measure the housing market, inflation, amount of money being lent by banks to companies/individuals, etc. (currently government agencies often go through very exhaustive surveying and analysis)
What data is available publicly on the operations of credit rating agencies?
What other forms (other than 10-K) are available and tagged? E.g., companies have to disclose executive appointments, etc.
 
TREC Complex Answer Retrieval Track
Given all the XBRL reports for a company, e.g., Daimler AG, can we create a Wikipedia-like page?
https://en.wikipedia.org/wiki/Daimler_AG
Can we customize such a page to be responsive to queries from the financial sector?
Can we create a knowledge network of all entities and their relationships to Daimler?
Given all XBRL reports of companies in a specific industry (NAICS sector), can we create a Wikipedia-like page, e.g.,
https://en.wikipedia.org/wiki/Automotive_industry  ?
 
 
 
 
Questions that can be answered:
Corporate registry (per state) - Query by name and by charter ID
What is the variation on stock returns? (based on an inferred network)
Model of unemployment estimated from utterances on twitter “I lost my job”
 
Home ownership risk:
This is an interesting discussion about dams, land set aside for a reservoir that was not acquired by the Army Corps of Engineers, developers wanting to build houses on that land, a county government encoding  a warning about flood risk on a land plat, etc.
http://swamplot.com/how-it-came-to-pass-that-hundreds-of-families-purchased-homes-inside-houstons-reservoirs/2017-09-20/
Such issues may come up frequently as we deal with future natural disasters due to climate change. Can someone identify potential datasets to create a financial knowledge network that is relevant to this problem.
 
 
FEIII 2018 Scored Task
Assuming that we have a dataset (financial entity knowledge graph) that has been reported to the SEC (Exhibit 21) and a corresponding dataset that has been reported to GLEIF (Level 2 data). The FEIII task would be to align the two graphs.
We can define metrics corresponding to the type of matching, overlap between the two datasets, mismatches, gaps, etc.
The immediate outcome is an aligned open knowledge graph of companies / subsidiaries.
Comments from Mike Willis
The level 2 GLEIF data request is for the ultimate parent company under accounting consolidation rules. This relationship using the accounting consolidation rules may or may not be included within the Exhibit 21 ‘significant subsidiary’ listing.  The ultimate parent may be ‘above’ the filer in the consolidation accounting ‘tree’ and or the reporting LEI entity may not be a ‘significant subsidiary’ in the filers accounting consolidation.
The primary risk in the derivative counterparty funding is to ensure financial stability upon the ultimate termination or completion of the financial instrument.  The accounting consolidation rules presume a ‘going concern’ and the entities that are included within the consolidation and may or may not be the same group of entities under the filer ‘liquidation’ accounting.
 
Friday September 29 2 p.m.
https://meetings.sec.gov/orion/joinmeeting.do?MTID=069f809ec5300f1f874
* Introductions - 5 minutes
* Overview of FEIII Challenges [Louiqa Raschid] - 5 minutes
* TREC Complex Answer Retrieval Track [Laura Dietz] - 5 minutes
* Demo leveraging SEC financial data / IBM Financial API
        [Doug Burdick and Marina Danilevsky, IBM]  - 10 minutes
* Overview of SEC [Austin Gerig and Mike Willis]    - 5 minutes
* Discussion                                        - 30 minutes
 
 
https://karsha.umiacs.umd.edu/Fin_net_demo1/
---------------
SEC Call notes
---------------
 
 
 
Agenda:
 
* Introductions - 5 minutes
    Louiqa -- large government initiative to create open knowledge networks/graphs based on financial datasets
  goal of the call is to figure out what kind of things/questions are of interest to the SEC
 
    Austin Gerig -- Assistant director of economics and risk analysis for the SEC
   
    Mark Flood -- OFR research principal
   
    Shawn Mankad -- Cornell
   
    Laura Dietz -- UNH
   
    Doug Burdick -- IBM research  -- uses SEC EDGAR data. 
    Marina -- IBM research will present a demo
   
    Ted Senator -- IARPA
   
    Michael Cafarella -- Umich & Apple
   
    Aparna Gupta -- RPI b-school faculty and currently at SEC on sabbatical
 
 
* Overview of FEIII Challenges [Louiqa Raschid] - 5 minutes
 
    came out of "data science for macro modeling workshop", where the idea was formed to create competition on creating datasets
   
    2016 entity linkage problem RSS ID's and LEI's ; matching across addresses and entities
   
    2017 SEC 10K filings for a small number of companies; extract entities that are mentioned and classify their role and activity as it relates to the filing company
 
Louiqa presents an “ego network” of financial companies, e.g., morgan stanley is connected to a bunch of other entities including their subsidiaries. Interactive figure with text summaries on edges.
 
 
* TREC Complex Answer Retrieval Track [Laura Dietz] - 5 minutes
 
Laura discusses an info retrieval contest based on answering question queries, e.g., what are the subsidiaries of Bank of America.  Laura studies the techniques to pull out and short answer this type of question in general domains.
 
End goal is to develop automatically a short report / wikipedia article specific to the question being queried
 
* Demo leveraging SEC financial data / IBM Financial API
    [Doug Burdick and Marina Danilevsky, IBM]  - 10 minutes
 
Marina presenting slide deck from IBM Research.
 
Illustrative example is to ingest raw financial filings to answer questions like what is the XX metric for company YY. What positions has Tim Cook held at Nike?  (Board member)
 
Underlying this are NLP and text mining algorithms, and data integration across different data sources;  REST API functionality; has a natural language interface -- though is preliminary;
 
SEC filings have structured (known schema, e.g., filing date, other meta data), semi-structured (XBRL information -- note this is not structured because there is a lot of variability from comapny to company in terminology (revenue or cloud can mean a gazillion different things), information presented, etc.), unstructured (html that an analyst would actually read).  A lot of useful info is in the html, like metrics that are specific to the industry or company (Delta: miles per gallon).
 
* Overview of SEC [Austin Gerig and Mike Willis] - 5 minutes
 
Mike Cafarella (Umich/Apple):
creating many economic datasets based on social media, e.g., “i lost my job” to approximate something about the labor market.
 
Mike Willis (leads the SEC office of structured disclosure):
 
Financial statement and notes dataset.  All the structured data that previous people were pulling and creating from edgar filings.  This has info that is not available from other data aggregators. In the office of structured disclosure on the SEC website.
 
New 10-K format like an interactive document that combines the html and XBRL tags (called in-line XBRL) so you can do real-time topic searching (like searching for stock compensation), navigate more efficiently, and so on.
The tags underlying this are tagged by the filer.
Some of the definitional problems that were mentioned previously are actual accounting definitions, and a link is provided to the definition.
 
Austin Gerig (leads the office of research and data services:
 
data ingestion and loading it into different databases,
HPC environments to support their economists / researchers
Academic research
 
* Discussion                                    - 30 minutes
 
Q for IBM from SEC:  What is the state of the demo as far as a product?
IBM A:  TBA; still preliminary stages.
 
Louiqa: What are the top 5 questions that the SEC wants to answer?
SEC:
Interesting research challenges from Mike Willis
Roach assessment: Data Quality (extensions, negative values, inappropriate element selection, etc.) versus earnings quality.
Hey I’m Special: Communication implications of (filing?) extension rates.
What did you say?: Comparative sentiment analysis.
Definitions matter - ‘boot’?: Appropriateness of extensions.
Navigating disclosures:  Disclosure modeling variances across comparable companies.
Judge a book by it’s cover: Presentation options and variances.
What not to wear:  Presentation choices and options – best and worst practices.
‘Joe Friday’ vs Picasso: Facts versus Story telling – what do investors want?
Does fashion matter:  Trends in disclosure structures.
DIY Hacks: What else could we use this for?
 
Austin Gerig
Disconnect between facts versus story-telling. Are filers being strategic in their narratives?
Monitoring trends in markets based on information extracted by public forms (and internally, the SEC merges this with confidential data), e.g., how did the filers/filings respond to events?
Look for outliers using supervised learning -- some publicly known enforcement or action against the company.  Train the forms and filings to ultimately identify other companies that should be monitored more closely.  Can also be unsupervised.  Open to new modeling techniques and approaches.
 
Louiqa:  Can the SEC help create training datasets? 
Austin:  maybe help with forming a competition problem / sample or synthetic data.
 
Laura: Is there a case or detailed example that can give us a better understanding of the kinds of problems, techniques, and type of analysis that the SEC is actually interested in.
Austin: These are probably not public:  enforcement- -- Detect fraud, detect instances where entities are breaking rules,
Compliance division is resource constrained and more public info available. How do they prioritize which cases to investigate? How do they more efficiently use staff?  Goes back to the unsupervised or supervised learning models that were brought up previously.
 
Louiqa;  Are you interested specifically on finance industry?
Austin:  No.  Anyone who issues stock (equities) for funding are of interest (and legal purview) of the SEC.


'''Methodology: Federation with mappings or coordinated development?'''
:Role of Industry Ontology Foundry


FINANCE community of practice - SEC call
'''What open data already exist?'''
Friday September 22 3 p.m.
:What are the questions that are being asked to the data? How is the answer currently discovered? Which datasets are consulted to find the answer?
Friday September 29 2 p.m.
https://meetings.sec.gov/orion/joinmeeting.do?MTID=069f809ec5300f1f87491461c77c33af
Meeting Number: 992 336 966 No password
Audio Connection 202-551-7000 (US/Canada) 888-732-8001 (US/Canada Toll-free)

Revision as of 16:10, 30 September 2017

Open Knowledge Networks (OKN)

What is the state-of-the-art for open knowledge networks in MANUFACTURING?
What are some driving research questions that will benefit from MANUFACTURING OKN?
What are some driving commercial or consumer questions that will benefit from MANUFACTURING OKN
What are the gaps, why do they exist, and how do we address them?
How is MANUFACTURING different from other practices (biomedical, health, GEO, finance, self-driving vehicles, etc)?
What do we share with other domains? How can we benefit from this synergy?

MANUFACTURING breakout participants

Barry Smith, National Center for Ontological Research

Ram Sriram, NIST

Ruchari Sudarsan, DOE

William Regli, DARPA

John Milinovich, Pinterest


Existing ontologies or taxonomies from manufacturing and related domains

Existing (mostly public) Datasets

Use cases

Use cases span the following broad areas:

Manufacturing Capabilities (of companies, of manufacturing equipment, of sensors, ...)
Use case: classification of suppliers, screening to select suitable suppliers (risk mitigation in supply-chain management -- for example when accepted bidder might drop out)
In progress: scraping information on the webpages of manufacturing companies and mapping terms identified to ontologies to enable reasoning (Farhad Ameri, Collaborative agreement between NIST and Texas State)
Can we create wikipedia-like pages for each company from this activity?
Manufacturing Readiness Levels (MRL) of interest also to DOD
Manufacturing Processes
Manufactured Products
So far what exists are primarily NLP-based attempts to identify emerging trends in customer needs or markets for example from the study of Amazon reviews of products
Standard for the Exchange of Product Model Data (STEP)
OntoSTEP
Can we convert this activity into an ontology-based OKN?
Materials
On-going AFRL work (Clare Paul, Wright-Patt) to create a MatOnto, a large materials science ontology growing out of the Materials Genome Initiative
Workforce development
(from OKN Finance CoI) Creating many economic datasets based on social media, e.g., “i lost my job” to approximate something about the labor market.
Patents
Use case: to enable enhanced patent search resolving terminological inconsistencies
Focus on the patent system
Retrieval of patent information
Comparison of International Patent Classification (IPC) with MeSH
Robots
Probably not enough data in the public domain to enable a useful OKN for robot use in manufacturing at this stage

Examples of questions the OKN methods might be able to answer

One goal is to develop automatically short reports / wikipedia article specific to the question being queried

Methodology: Federation with mappings or coordinated development?

Role of Industry Ontology Foundry

What open data already exist?

What are the questions that are being asked to the data? How is the answer currently discovered? Which datasets are consulted to find the answer?