MonsterDB – Fuzzy Commands

Make your data beautiful again.

Fuzzy Commands

Fuzzy commands will configure the fuzzy capability of the installation and the collection, examples of this include the definition of match and fuzzy indexes plus the ability to influence the controlled vocabulary system (nicknames, synonyms, antonyms, etc)

saveTable

usage from API:
  cursor = aCollection.saveTable(document)
usage from CLI: 
  db.aCollection.saveTable(document) 

The document represents a table (record type structure), please refer to the Table definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/

db.test.saveTable({tableName: "testT", columns: [{colName: "ID", display: "Identifier", primaryKey: true},{colName: “Desc”, display: “Description” labelPos:1}]})

saveConceptGroup

usage from API:
  cursor = aCollection.saveConceptGroup(document)
usage from CLI: 
  db.aCollection.saveConceptGroup(document) 

The document represents a purpose  (Conceptual Group), please refer to the Concept Groups definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/

db.test.saveConceptGroup({purposeName: "CoName", purposeType: "None" })

saveConcept

usage from API:
  cursor = aCollection.saveConcept(document)
usage from CLI: 
  db.aCollection.saveConcept(document) 

The document represents a purpose  (Concept), please refer to the Concept definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/

db.test.saveConcept({ purposeName: "CoName", column:  "CoName", matchClass:  "MatchCompanyName", mandatory: true, minWidth:1, maxWidth:3})

saveConceptMapping

usage from API:
  cursor = aCollection.saveConceptMapping(document)
usage from CLI: 
  db.aCollection.saveConceptMapping(document) 

The document represents a purpose  to table mapping (Concept Mapping), please refer to the Concept Mapping definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/

db.test.saveConceptMapping({purposeName: "CoName", purposeColumn:  "CoName",  tableName:  "testT", tableColumn:  "Desc", columnOrder: 0})

saveMatchRule

usage from API:
  cursor = aCollection.saveMatchRule(document)
usage from CLI: 
  db.aCollection.saveMatchRule(document) 

The document represents a matching rule (there can be many) that will score two records and decide the possible outcome, please refer to the Match Rule definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/

db.test.saveMatchRule({order: 0, rulePurpose: [ { purposeName: "CoName", mandatory: true, acceptWeight: 1.0, rejectWeight: 1.2 } ], matchSameSystem: true, highScore: 95.0, lowScore: 60.0, active: true})

saveFuzzyIndex

usage from API:
  cursor = aCollection.saveFuzzyIndex(document)
usage from CLI: 
  db.aCollection.saveFuzzyIndex(document) 

The document represents an bucketing index (there can be more than one on different fields) that will identify potential candidate records to be compared, please refer to the Index definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/

db.test.saveFuzzyIndex({indexName: "CoIndex", purposeName: "CoName", match: true, search: true, exactIndex: false, keyThreshold: 1000})

deleteFuzzyIndex

usage from API:
  cursor = aCollection.deleteFuzzyIndex(String)
usage from CLI: 
  db.aCollection.deleteFuzzyIndex(string) 

The string is the indexName parameter provided in the index definition.

db.test.deleteFuzzyIndex("CoIndex")

deleteTable

usage from API:
  cursor = aCollection.deleteTable(String)
usage from CLI: 
  db.aCollection.deleteTable(string) 

The string is the indexName parameter provided in the index definition.

db.test.deleteTable("testT")

deleteConceptGroup

usage from API:
  cursor = aCollection.deleteConceptGroup(String)
usage from CLI: 
  db.aCollection.deleteConceptGroup(string) 

The string is the purposeName parameter provided in the concept group definition.

db.test.deleteConceptGroup("CoName")

deleteConcept

usage from API:
  cursor = aCollection.deleteConcept(String)
usage from CLI: 
  db.aCollection.deleteConcept(string) 

The string is the purposeColumn parameter provided in the concept definition.

db.test.deleteConcept("CoName")

deleteConceptMapping

usage from API:
  cursor = aCollection.deleteConceptMapping(document)
usage from CLI: 
  db.aCollection.deleteConceptMapping(document) 

The document represents an abbreviated purpose  to table mapping (Concept Mapping), please refer to the Concept Mapping definitions on the schema documentation page here : https://entitystream.com/entitystream-schema-instructions/ only the fields shown below are required to locate and delete the mapping.

db.test.deleteConceptMapping({purposeName: "CoName", purposeColumn:  "CoName",  tableName:  "testT", tableColumn:  "Desc"})

deleteMatchRule

usage from API:
  cursor = aCollection.deleteTable(long)
usage from CLI: 
  db.aCollection.deleteTable(long) 

The long is the order parameter provided in the match rule definition.

db.test.deleteMatchRule(0)

removeFuzzy

usage from API:
  cursor = aCollection.removeFuzzy()
usage from CLI: 
  db.aCollection.removeFuzzy() 

This will remove the fuzzy matching element of the collection and revert it back to a standard collection

db.test.removeFuzzy()

setAutoMatch

usage from API:
  cursor = aCollection.setAutoMatch(true|false)
usage from CLI: 
  db.aCollection.setAutoMatch(true|false)

This will enable or disable auto matching on saved (insert/save/replace etc) documents in the collection. Useful if you don’t want to auto match a merge for a while.

Default is always false….

db.test.setAutoMatch(true)

peekQueue

usage from API:
  cursor = aCollection.peekQueue()
usage from CLI: 
  db.aCollection. peekQueue()

This shows the internal (persisted) work queue for auto matching, in the event that you update (ie save/replace/insert/update) a document then if fuzzy definitions exist on the collection it will be immediately submitted to this queue, the queue will then be run down asynchronously over time to match the changed documents against all other documents in the collection.

db.test.peekQueue()

 

Putting it Together

The following script can be run from the CLI (command Line) using the grr script command:

java -jar monsterDB.jar -p 27019  -d dbname -x test.grr

where test.grr contains:

db.test.saveTable({tableName: "testT", columns: [{colName: "ID", display: "Identifier", primaryKey: true},{colName: "Desc", display: "Description", labelPos:1}]})
db.test.saveConceptGroup({purposeName: "CoName", purposeType:  "None" })
db.test.saveConcept({ purposeName: "CoName", column:  "CoName", matchClass:  "MatchCompanyName", mandatory: true, minWidth:1, maxWidth:3})
db.test.saveConceptMapping({purposeName: "CoName", purposeColumn:  "CoName",  tableName:  "testT", tableColumn:  "Desc", columnOrder: 0})
db.test.saveMatchRule({order: 0, rulePurpose: [ { purposeName: "CoName", mandatory: true, acceptWeight: 1.0, rejectWeight: 1.2 } ], matchSameSystem: true, highScore: 95.0, lowScore: 60.0, active: true})
db.test.saveFuzzyIndex({indexName: "CoIndex", purposeName: "CoName", match: true, search: true, exactIndex: false, keyThreshold: 1000})
Fork me on GitHub