Hi,
In the Object Store Schema documentation the field element_id in table CONTENT is described as
The unique identifier for this content element. The identifier is a UTF8 string consisting of the UUID of the owning document or annotation concatenated with an integer indicating the content element sequence number.
I find nothing of that sort, e.g. I created a document with 2 content elements (in database storage area) and get
{8060EB6B-0000-CD1C-AD63-93BF1448ED17} for the object_id in docversion and
E5467556-1BBB-431D-B8E9-02697263E0DD and 7F5D5467-4CAC-448A-8D28-EF5070E5D5F0 for the element ids.
What I find is a reference to the content elements element_id in the retrieval_names column of docversion, albeit somewhat byte-swapped.
I'm moderately successful in figuring out the reference for a document with a single content element, but fail on multiple content elements. Can anyone give a better explanation of the data model in the case of default database storage and multiple content elements ?
Why on earth would I need this? I'm chasing a case of lost documents where there is a docversion entry (db storage area) but the corresponding CONTENT entries are missing.
Thanks for the efforts,
/Gerold
Answer by NKatic (1) | Jul 24 at 07:47 AM
Hi Gerold,
I'm more used to work with FSA rather than DBSA, but I looked into your question.
Here is what I did:
Through ACCE, I created a Document with one content element. Then, I checked it out to add a second version with two content elements.
In the "CONTENT" table, I find three entries with following ELEMENT_ID:
6E3F2561-B0A2-45BE-A1A3-365E6BE1C135
EAB1A498-26EB-44AD-ADB1-7FED705CEC41
BDC5C98C-6E66-438D-888F-302AFCF516B4
In DOCVERSION, the "CONTENT_INFO" column contains:
00441D7F00000000902902000000000098A4B1EAEB26AD44ADB17FED705CEC4100441D7F010000009D0500000000000061253F6EA2B0BE45A1A3365E6BE1C135
00441D7F0000000090290200000000008CC9C5BD666E8D43888F302AFCF516B4
As you can see, you can quickly find
98A4B1EAEB26AD44ADB17FED705CEC41
61253F6EA2B0BE45A1A3365E6BE1C135
8CC9C5BD666E8D43888F302AFCF516B4
You just have to do use the classic permutation pattern between (in my case) oracle GUID and FileNet ID without brackets to find your content elements in CONTENT table. I didn't try to figure out what is "00441D7F000000009029020000000000" in my case, note sure you'll need it.
Hope this helps. (I did multiple edit, sorry about that, I had hard time with bold and italic so I switched to list instead)
Best regards,
Nathan
Answer by GKrommer (71) | Jul 29 at 06:20 PM
Hi @NKatic ,
thanks for the effort. Interestingly my system (V5.5.3 Oracle DB, but I do not think that matters) behaves completely different.
I added a document with a single content element and looked at the corresponding database row.
CONTENT_INFO and CONTENT_INFO_2 are null (as are COMPONENT_TYPES and COMPONENT_TYPES_2 columns if that matters)
In CONTENT an ELEMENT of ID E64870DA-1850-8C43-58AD5FC4CA54 was created
RETRIEVAL_NAMES was set to 4F4E430002000F6170706C69636174696F6E2F706466000E50617463685F345F41342E706466010001000006615E0ADA70//48E65018654A8C4358AD5FC4CA54//04
When convert that hex to string I get ONCapplication/pdfPatch_4_A4.pdfa^ �pH�PeJ�CX�_��T
So ONC is a kind of header more or less followed the mime type, by a number that is the count of content elements+1 followed by the retrieval name followed by the byte swapped (but other byte swap than the GUID stuff) set in //.
I'm stumped :-(
Answer by NKatic (1) | Jul 30 at 04:45 AM
Hi @GKrommer ,
I can see the element_id reference in your retrieval_names field, but as a matter of fact it is weirdly dipatched before and between "//":
DA70//48E65018*654A*8C4358AD5FC4CA54//
I don't know what the 654A means in your context.
Can you provide a sample with a two content elements document?
Best regards,
Nathan
Answer by GKrommer (71) | Jul 30 at 05:54 PM
Of course I can @NKatic ,
Created a document and checked it in with 2 content elements. Results
CONTENT_REFERRAL_BLOB, COMPONENT_TYPES, COMPONENT_TYPES_2, RETRIEVAL_NAMES_2 , CONTENT_INFO and CONTENT_INFO_2 are all null
RETRIEVAL_NAMES is much longer now and contains (* are delimiter inserted by me)
4F4E430003000F6170706C69636174696F6E2F706466000E50617463685F345F41342E70646600245061746368657320666F72205072696E74696E67206F6E2041342050617065722E706466020001000006615E*0A*2FC4F85554DBD143A8C826BD33406537*04000200010700025C4A*0A*2FD8E1595AB64F4FAD89E9EAC82B2352*04 3. Two rows were created with element ids 59E1D82F-B65A-4F4F-AD89-E9EAC82B2352 and 55F8C42F-DB54-43D1-A8C8-26BD33406537
RETREIVAL_NAMES is converted from hex to string
ONCapplication/pdfPatch_4_A4.pdf$Patches for Printing on A4 Paper.pdfa^ ^|&:mrG±Ñâ¤#i\J xy
Answer by GKrommer (71) | Jul 30 at 05:58 PM
This is a sketch of what I tried to find out document with missing content in database storage area. It works if the document has only one content element....
select object_ID,create_date,U32_DOCUMENTTITLE from DOCVERSION where storage_area_id ='5C885A0ADC74464989BB54393868E097' and substr (retrieval_names, -28, 2) || substr (retrieval_names, -30, 2) || substr (retrieval_names, -32, 2) || substr (retrieval_names, -34, 2) || '-' || substr (retrieval_names, -24, 2) || substr (retrieval_names, -26, 2) || '-' || substr (retrieval_names, -20, 2) || substr (retrieval_names, -22, 2) || '-' || substr (retrieval_names, -18, 4) || '-' || substr (retrieval_names, -14, 12) NOT In (SELECT ELEMENT_ID FROM CONTENT)