Roy Tang

roytang.net

Programmer, engineer, scientist, critic, gamer, dreamer, and kid-at-heart.

Blog Notes Photos Links Archives About

Subscribe: RSS JSON

2017 November

  • I've been studying the examples here: https://docs.marklogic.com/guide/semantics/tde#id_25531

    I have a set of documents that are structured with a parent name and an array of children nodes with their own names. I want to create a template that generates triples of the form "name1 is-a-parent-of name2". Here's a test I tried, with a sample of the document structure:

    declareUpdate();
    
    xdmp.documentInsert(
           '/test/tde.json',
           {
             content: {
               name:'Joe Parent',
               children: [
                 {
                   name: 'Bob Child'
                 },
                 {
                   name: 'Sue Child'
                 }
               ]
             }
           },
           {permissions : xdmp.defaultPermissions(),
            collections : ['test']})
    
    cts.doc('/test/tde.json')
    
    var tde = require("/MarkLogic/tde.xqy");
    // Load the user template for user profile rows
    var template = xdmp.toJSON(
    {
      "template":{
        "context":"content",
        "collections": [
          "test"
        ],
          "triples":[
          {
            "subject": {
              "val": "xs:string(name)"
            },
            "predicate": {
              "val": "sem:iri('is-parent-of')"
            },
            "object": {
              "val": "xs:string(children/name)"
            }     
          }
        ]   
      }
    }
    );
    //tde.validate([template]),
    tde.templateInsert("/templates/test.tde", template);
    tde.nodeDataExtract( 
      [cts.doc( '/test/tde.json' )]
    )
    

    However, the above throws an Exception:

    [javascript] TDE-EVALFAILED: tde.nodeDataExtract([cts.doc("/test/tde.json")]) -- Eval for Object='xs:string(children/name)' returns TDE-BADVALEXPRESSION: Invalid val expression: XDMP-CAST: (err:FORG0001) Invalid cast: (fn:doc("/test/tde.json")/content/array-node("children")/object-node()[1]/text("name"), fn:doc("/test/tde.json")/content/array-node("children")/object-node()[2]/text("name")) cast as xs:string?

    What is the proper syntax for extracting array nodes into a triple?

    2nd somewhat related question: say I also wanted to have triples of the form "child1 is-sibling-of child2". For the example above it would be "Bob Child is-sibling-of Sue Child". What would be the proper syntax for this? I'm not even sure how to begin with this one.

    Is TDE even the way to go here? Or is it better to do this programmatically? i.e. on document ingestion, generate those triples inside the document directly?

    (If it's relevant, the ML version being used is 9.)

  • Marklogic Optic API

    I've been testing migrating one of our systems to Marklogic 9 and using the Optics API.

    One of our functions involves grouping claims by member_id, member_name and getting the sums and counts, so I did something like this:

    var results = op.fromView('test', 'claims')
      .groupBy(['member_id', 'member_name'], [
             op.count('num_claims', 'claim_no'),
             op.sum('total_amount', 'claim_amount')
             ])
      .orderBy(op.desc('total_amount'))
      .limit(200)
      .result()
      .toArray();
    

    Above works fine. The results are of the form

    [
      { 
        member_id: 1, 
        member_name: 'Bob', 
        num_claims: 10, 
        total_amount: 500
      }, 
      ...
    ]
    

    However, we also have a field "company", where each claim is filed under a different company. Basically the relevant view columns are claim_no, member_id, member_name, company, claim_amount

    I would like to be able to show a column that list the different companies for which the member_id/member_name has filed claims, and how many claims for each company.

    i.e. I want my results to be something like:

    [
      { 
        member_id: 1, 
        member_name: 'Bob', 
        num_claims: 10, 
        total_amount: 500,
        companies: [
          {
            company: 'Ajax Co',
            num_claims: 8
          },
          {
            company: 'Side Gig',
            num_claims: 2
          }
        ]
      }, 
      ...
    ]
    

    I tried something like this:

    results = results.map((member, index, array) => {
      var companies = op.fromView('test', 'claims')
        .where(op.eq(op.col('member_id'), member.member_id))
        .groupBy('company', [
          op.count('num_claims', 'claim_no')      
        ])
        .result()
        .toArray();
      member.companies = companies;
      return member;
    });
    

    And the output seems correct, but it also executes quite slowly - almost a minute (total number of claim documents is around 120k)

    In our previous ML8 implementation, we were pre-generating summary documents for each member - so retrieval was reasonably fast with the downside that whenever we got a bunch of new data, all of the summary documents had to be re-generated. I was hoping that ML9's optic API would make it easier to do the retrieval/grouping/aggregates on the fly so we wouldn't have to do that.

    In theory, I could just add company to the groupBy fields, then merge the rows in the result query as needed. But the problem with that approach is that I can't guarantee I'll get the top 200 by total amount (as was my original query)

    So, the question is: Is there a better way of doing this with a reasonable execution time? Or should I just stick to pre-generating the summary documents?

2017 October

  • I'm trying out the Roxy deployer. The Roxy app was created using the default app-type. I setup a new ML 9 database, and I ran "ml local bootstrap" using the default ports (8040 and 8041)

    Then I setup a node application. I tried the following (sample code from https://docs.marklogic.com/jsdoc/index.html)

    var marklogic = require('marklogic');
    var conn = {
        host: '192.168.33.10',  
        port: 8040,
        user: 'admin',
        password: 'admin',
        authType: 'DIGEST'
    }
    
    var db = marklogic.createDatabaseClient(conn);
    
    db.createCollection(
      '/books',
      {author: 'Beryl Markham'},
      {author: 'WG Sebald'}
      )
    .result(function(response) {
        console.log(JSON.stringify(response, null, 2));
      }, function (error) {
        console.log(JSON.stringify(error, null, 2));
      });
    

    Running the script gave me an error like:

    $ node test.js
    {
      "message": "write document list: cannot process response with 500 status",
      "statusCode": 500,
      "body": "<error:error xsi:schemaLocation=\"http://marklogic.com/xdmp/error error.xsd\" xmlns:error=\"http://marklogic.com/xdmp/error\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n  <error:code>XDMP-IMPMODNS</error:code>\n  <error:name>err:XQST0059</error:name>\n  <error:xquery-version>1.0-ml</error:xquery-version>\n  <error:message>Import module namespace mismatch</error:message>\n  <error:format-string>XDMP-IMPMODNS: (err:XQST0059) Import module namespace http://marklogic.com/rest-api/endpoints/config does not match target namespace http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED of imported module /MarkLogic/rest-api/endpoints/config.xqy</error:format-string>\n  <error:retryable>false</error:retryable>\n  <error:expr/>\n  <error:data>\n    <error:datum>http://marklogic.com/rest-api/endpoints/config</error:datum>\n    <error:datum>http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED</error:datum>\n    <error:datum>/MarkLogic/rest-api/endpoints/config.xqy</error:datum>\n  </error:data>\n  <error:stack>\n    <error:frame>\n      <error:uri>/roxy/lib/rewriter-lib.xqy</error:uri>\n      <error:line>5</error:line>\n      <error:column>0</error:column>\n      <error:xquery-version>1.0-ml</error:xquery-version>\n    </error:frame>\n  </error:stack>\n</error:error>\n"
    }
    

    If I change the port to 8000 (the default appserver that inserts into Documents), the node function executes correctly as expected. I'm not sure if I need to configure anything else with the Roxy-created appserver so that it works with the node.js application.

    I'm not sure where the "DELETE_IF_UNUSED" part in the error message is coming from either. There doesn't seem to be any such text in the configuration files generated by Roxy.

    Edit: When accessing 192.168.33.10:8040 via the browser, I get a an xml with a similar error:

    <error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
      <error:code>XDMP-IMPMODNS</error:code>
      <error:name>err:XQST0059</error:name>
      <error:xquery-version>1.0-ml</error:xquery-version>
      <error:message>Import module namespace mismatch</error:message>
      <error:format-string>XDMP-IMPMODNS: (err:XQST0059) Import module namespace http://marklogic.com/rest-api/endpoints/config does not match target namespace http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED of imported module /MarkLogic/rest-api/endpoints/config.xqy</error:format-string>
      <error:retryable>false</error:retryable>
      <error:expr/>
      <error:data>
        <error:datum>http://marklogic.com/rest-api/endpoints/config</error:datum>
        <error:datum>http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED</error:datum>
        <error:datum>/MarkLogic/rest-api/endpoints/config.xqy</error:datum>
      </error:data>
      <error:stack>
        <error:frame>
          <error:uri>/roxy/lib/rewriter-lib.xqy</error:uri>
          <error:line>5</error:line>
          <error:column>0</error:column>
          <error:xquery-version>1.0-ml</error:xquery-version>
        </error:frame>
      </error:stack>
    </error:error>
    

    If it matters, MarkLogic version is 9.0-3.1. It's a fresh install too.

    Any advice?

  • I'm trying to migrate one of my dev envts from ML8 to ML9. I have an import script that successfully works on the ML8 version, but there's an error when I try running it against the ML9 database. The ML9 version is 9.0.3.1. The MLCP version is 9.0.3

    My MLCP options file is as follows:

    import
    -host 
    192.168.33.10
    -port 
    8041
    -username 
    admin
    -password
    admin
    -input_file_path 
    d:\maroon\data\mbastest.csv 
    -mode 
    local 
    -input_file_type 
    delimited_text 
    -uri_id 
    ClientId 
    -output_uri_prefix
    /test/records/
    -output_uri_suffix 
    .json 
    -document_type 
    json 
    -transform_module 
    /ingestion/transform.js
    -transform_function 
    testTransform
    -transform_param
    test
    -content_encoding 
    windows-1252 
    -thread_count
    1
    

    Here's the output of a test run with only 2 records in the test CSV file:

    17/10/30 14:07:33 INFO contentpump.LocalJobRunner: Content type: JSON
    17/10/30 14:07:33 INFO contentpump.ContentPump: Job name: local_455168344_1
    17/10/30 14:07:33 INFO contentpump.FileAndDirectoryInputFormat: Total input paths to process : 1
    17/10/30 14:07:38 WARN contentpump.TransformWriter: Failed document /test/records/31.json
    17/10/30 14:07:38 WARN contentpump.TransformWriter: <error:format-string xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax error, unexpected QName_, expecting $end or SemiColon_</error:format-string>
    17/10/30 14:07:38 WARN contentpump.TransformWriter: Failed document /test/records/32.json
    17/10/30 14:07:38 WARN contentpump.TransformWriter: <error:format-string xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax error, unexpected QName_, expecting $end or SemiColon_</error:format-string>
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner:  completed 100%
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 2
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 2
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 0
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 2
    17/10/30 14:07:38 INFO contentpump.LocalJobRunner: Total execution time: 5 sec
    

    If I remove the transform params, the import works fine.

    I thought it might be a parsing issue with my transform module itself, so I tried replacing it with the following example from the documentation:

    // Add a property named "NEWPROP" to any JSON input document.
    // Otherwise, input passes through unchanged.
    
    function addProp(content, context)
    {
      const propVal = (context.transform_param == undefined)
                     ? "UNDEFINED" : context.transform_param;
    
      if (xdmp.nodeKind(content.value) == 'document' &&
          content.value.documentFormat == 'JSON') {
        // Convert input to mutable object and add new property
        const newDoc = content.value.toObject();
        newDoc.NEWPROP = propVal;
    
        // Convert result back into a document
        content.value = xdmp.unquote(xdmp.quote(newDoc));
      }
      return content;
    };
    
    exports.addProp = addProp;
    

    (Of course I changed the params in the MLCP options file accordingly)

    The issue still persists even with just this test function.

    Any advice?