Showing posts with label list. Show all posts
Showing posts with label list. Show all posts

Monday, January 15, 2018

Fetching the last updated items from a large list

A person standing on top of a ladder in the clouds.

I have a list with over 5000 items and on a regular interval I want to query the list to retrieve the last modified items since last time. Typically I will poll every 15 minutes with a CAML query like this:

<View Scope='RecursiveAll'>
    <Query>
        <Where>
            <And>
                <Eq>
                    <FieldRef Name='ContentType'/>
                    <Value Type='Computed'>Item2</Value>
                </Eq>
                <Geq>
                    <FieldRef Name='Modified' />
                    <Value IncludeTimeValue='True' StorageTZ='TRUE' Type='DateTime'>2018-01-15T14:24:00Z</Value>
                </Geq>
            </And>
        </Where>
    </Query>
    <RowLimit>5000</RowLimit>
</View>

This query works fine up till the list reached 5000 items and you will get the infamous item threshold error. In my case there will never be over 5000 updated items in my polling timespan so the fix it quite simple. As I limit on the modified date, add an index for the Modified field from the list settings page. Problem solved, queries will work. And to be sure, create your indexes up front – makes it a lot easier.

image

An alternative approach [update]

As pointed out by iOnline247 on twitter, an alternative approach is to use the GetChanges API for a list. The GetChanges function can take a specific query in, and you get a change token back which you can use on subsequent calls to get new items since your last call. This would entail storing the token somewhere, which is not a big deal, but will introduce another part to the solution and a little bit of complexity. My current solution is a scheduled powershell script and very stateless. It doesn’t matter if I get an item twice for example.

A third option, which is what I’m moving over to is using Microsoft Flow to trigger on changes on the list.

Photo by Samuel Zeller on Unsplash

Thursday, November 12, 2009

Disk based data structures

codeplex-logo Last year I created a project where I used memory mapped files as storage for a large Array. I’ve now polished the project a bit and included generic List and Dictionary implementations as well. The project can be found at Disk Based Data Structures - CodePlex.

I’ve also created a serializer project which benchmarks and picks the fastest serializer method for your type. This serializer is used to persist the data to disk. The classes are also implemented thread safe.

Background for the project

A disk based version of an array would require a lot of caching logic to make it perform fast enough compared to a pure memory implementation and a couple of years ago I stumbled across Memory Mapped Files which has long existed in the operating systems and is typically used in OS’ for the swap space.

The first time I worked with Memory Mapped files I used a library from MetalWrench, but this time around I got hold of Winterdom's much nicer implementation of the Win32 API. I've included the patch from Steve Simpson, but removed the dynamic paging since it slows things down and it's not necessary on 64bit systems. (If you want to use arrays which hold over 2gb of data on 32bit systems I recommend reverting to Steve's original version and set a view size of 200-500mb.) Future releases will use .Net 4.0’s System.IO.MemoryMappedFiles namespace.

The beauty of 64bit is that you have virtually unlimited address space, so each thread can get it's own view of the mapped file without running out of address space. 32bit Windows can only address 4gb.

As for performance my theory is that Microsoft has implemented a fairly good caching algorithm for it's swap file, so it should prove good enough for me. A few tests show a much better disk IO with the Memory Mapped API than using .Net's file IO library. I haven't testet the performance if you add the SEC_LARGE_PAGES flag, but it might help some.

Hope this library is useful for someone out there :)